[jira] [Commented] (SPARK-44795) CodeGenCache should be ClassLoader specific

2023-08-15 Thread GridGain Integration (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754852#comment-17754852
 ] 

GridGain Integration commented on SPARK-44795:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/42508

> CodeGenCache should be ClassLoader specific
> ---
>
> Key: SPARK-44795
> URL: https://issues.apache.org/jira/browse/SPARK-44795
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 3.5.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Blocker
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44824) There is content overlap in `ammoniteOut` used in ReplE2ESuite.

2023-08-15 Thread Yang Jie (Jira)
Yang Jie created SPARK-44824:


 Summary: There is content overlap in `ammoniteOut` used in 
ReplE2ESuite.
 Key: SPARK-44824
 URL: https://issues.apache.org/jira/browse/SPARK-44824
 Project: Spark
  Issue Type: Improvement
  Components: Connect, Tests
Affects Versions: 3.5.0, 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44823) Update black to 23.7.0 and fix erroneous check

2023-08-15 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44823:
---

 Summary: Update black to 23.7.0 and fix erroneous check
 Key: SPARK-44823
 URL: https://issues.apache.org/jira/browse/SPARK-44823
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44809) Remove unused custom metrics for RocksDB state store provider

2023-08-15 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-44809.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42491
[https://github.com/apache/spark/pull/42491]

> Remove unused custom metrics for RocksDB state store provider
> -
>
> Key: SPARK-44809
> URL: https://issues.apache.org/jira/browse/SPARK-44809
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.5.1
>Reporter: Anish Shrigondekar
>Assignee: Anish Shrigondekar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44564) Refine the documents with LLM

2023-08-15 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-44564:
--
Attachment: docstr_prompt_only.py

> Refine the documents with LLM
> -
>
> Key: SPARK-44564
> URL: https://issues.apache.org/jira/browse/SPARK-44564
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
> Attachments: docstr_prompt_only.py
>
>
> Let's first focus on the Documents of *PySpark DataFrame APIs*.
> *1*, Chose a subset of DF APIs
> Since the review bandwidth is limited, we recommend each PR contains at least 
> 5 APIs;
> *2*, For each API, copy-paste the function (including function signature, doc 
> string) to a LLM Model, and ask it to with a prompts (e.g. the attached 
> prompt), you can of course use/design your own prompt.
> For prompt engineering, you can refer to this [Best 
> practices|https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api]
>  
> *3*, Note that the LLM is not 100% reliable, the generated doc string may 
> still contain some mistakes, e.g.
> * The example code can not run
> * The example results are incorrect
> * The example code doesn't reflect the example title
> * The description use wrong version, add a 'Raise' selection for non-existent 
> exception
> * The lint can be broken
> * ...
> we need to fix them before sending a PR.
> We can try different prompts, choose the good parts and combine them to the 
> new doc sting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44564) Refine the documents with LLM

2023-08-15 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-44564:
--
Attachment: (was: docstr_prompt.py)

> Refine the documents with LLM
> -
>
> Key: SPARK-44564
> URL: https://issues.apache.org/jira/browse/SPARK-44564
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Let's first focus on the Documents of *PySpark DataFrame APIs*.
> *1*, Chose a subset of DF APIs
> Since the review bandwidth is limited, we recommend each PR contains at least 
> 5 APIs;
> *2*, For each API, copy-paste the function (including function signature, doc 
> string) to a LLM Model, and ask it to with a prompts (e.g. the attached 
> prompt), you can of course use/design your own prompt.
> For prompt engineering, you can refer to this [Best 
> practices|https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api]
>  
> *3*, Note that the LLM is not 100% reliable, the generated doc string may 
> still contain some mistakes, e.g.
> * The example code can not run
> * The example results are incorrect
> * The example code doesn't reflect the example title
> * The description use wrong version, add a 'Raise' selection for non-existent 
> exception
> * The lint can be broken
> * ...
> we need to fix them before sending a PR.
> We can try different prompts, choose the good parts and combine them to the 
> new doc sting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44822) Make Python UDTFs by default non-deterministic

2023-08-15 Thread Allison Wang (Jira)
Allison Wang created SPARK-44822:


 Summary: Make Python UDTFs by default non-deterministic
 Key: SPARK-44822
 URL: https://issues.apache.org/jira/browse/SPARK-44822
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Allison Wang


Change the default determinism of Python UDTF to be false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44821) Upgrade `kubernetes-client` to 6.8.1

2023-08-15 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44821:
-

 Summary: Upgrade `kubernetes-client` to 6.8.1
 Key: SPARK-44821
 URL: https://issues.apache.org/jira/browse/SPARK-44821
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44820) Switch languages consistently across docs for all code snippets

2023-08-15 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-44820:
-
Description: 
When a user chooses a different language for a code snippet, all code snippets 
on that page should switch to the chosen language. This was the behavior for, 
for example, Spark 2.0 doc: 
[https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html]

But it was broken for later docs, for example the Spark 3.4.1 doc: 
[https://spark.apache.org/docs/latest/quick-start.html]

We should fix this behavior change and possibly add test cases to prevent 
future regressions.

  was:
When a user chooses a different language for a code snippet, all code snippets 
on that page should switch to the chosen language. This was the behavior for, 
for example, Spark 2.0 doc: 
[https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html]

But it was broken for later docs, for example the Spark 3.4.1 doc: 
[https://spark.apache.org/docs/latest/quick-start.html]

We should fix this behavior change.


> Switch languages consistently across docs for all code snippets
> ---
>
> Key: SPARK-44820
> URL: https://issues.apache.org/jira/browse/SPARK-44820
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Allison Wang
>Priority: Major
>
> When a user chooses a different language for a code snippet, all code 
> snippets on that page should switch to the chosen language. This was the 
> behavior for, for example, Spark 2.0 doc: 
> [https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html]
> But it was broken for later docs, for example the Spark 3.4.1 doc: 
> [https://spark.apache.org/docs/latest/quick-start.html]
> We should fix this behavior change and possibly add test cases to prevent 
> future regressions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44819) Make Python the first language in all Spark code snippet

2023-08-15 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-44819:
-
Attachment: Screenshot 2023-08-15 at 11.59.11.png

> Make Python the first language in all Spark code snippet
> 
>
> Key: SPARK-44819
> URL: https://issues.apache.org/jira/browse/SPARK-44819
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Priority: Major
> Attachments: Screenshot 2023-08-15 at 11.59.11.png
>
>
> Currently, the first and default language for all code snippets is Sacla. We 
> should make Python the first language for all the code snippets.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44819) Make Python the first language in all Spark code snippet

2023-08-15 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-44819:
-
Description: 
Currently, the first and default language for all code snippets is Sacla. For 
instance: https://spark.apache.org/docs/latest/quick-start.html

We should make Python the first language for all the code snippets.

 

 

  was:
Currently, the first and default language for all code snippets is Sacla. We 
should make Python the first language for all the code snippets.

 

 

 

 


> Make Python the first language in all Spark code snippet
> 
>
> Key: SPARK-44819
> URL: https://issues.apache.org/jira/browse/SPARK-44819
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Priority: Major
> Attachments: Screenshot 2023-08-15 at 11.59.11.png
>
>
> Currently, the first and default language for all code snippets is Sacla. For 
> instance: https://spark.apache.org/docs/latest/quick-start.html
> We should make Python the first language for all the code snippets.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44819) Make Python the first language in all Spark code snippet

2023-08-15 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-44819:
-
Description: 
Currently, the first and default language for all code snippets is Sacla. We 
should make Python the first language for all the code snippets.

 

 

 

 

  was:
Currently, the first and default language for all code snippets is Sacla. We 
should make Python the first language for all the code snippets.

!image-2023-08-15-11-51-57-683.png|width=658,height=188!

 

 

 

 


> Make Python the first language in all Spark code snippet
> 
>
> Key: SPARK-44819
> URL: https://issues.apache.org/jira/browse/SPARK-44819
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Priority: Major
> Attachments: Screenshot 2023-08-15 at 11.59.11.png
>
>
> Currently, the first and default language for all code snippets is Sacla. We 
> should make Python the first language for all the code snippets.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44820) Switch languages consistently across docs for all code snippets

2023-08-15 Thread Allison Wang (Jira)
Allison Wang created SPARK-44820:


 Summary: Switch languages consistently across docs for all code 
snippets
 Key: SPARK-44820
 URL: https://issues.apache.org/jira/browse/SPARK-44820
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.4.1, 3.5.0
Reporter: Allison Wang


When a user chooses a different language for a code snippet, all code snippets 
on that page should switch to the chosen language. This was the behavior for, 
for example, Spark 2.0 doc: 
[https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html]

But it was broken for later docs, for example the Spark 3.4.1 doc: 
[https://spark.apache.org/docs/latest/quick-start.html]

We should fix this behavior change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44819) Make Python the first language in all Spark code snippet

2023-08-15 Thread Allison Wang (Jira)
Allison Wang created SPARK-44819:


 Summary: Make Python the first language in all Spark code snippet
 Key: SPARK-44819
 URL: https://issues.apache.org/jira/browse/SPARK-44819
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.5.0
Reporter: Allison Wang


Currently, the first and default language for all code snippets is Sacla. We 
should make Python the first language for all the code snippets.

!image-2023-08-15-11-51-57-683.png|width=658,height=188!

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44818) Fix race for pending interrupt issued before taskThread is initialized

2023-08-15 Thread Anish Shrigondekar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754729#comment-17754729
 ] 

Anish Shrigondekar commented on SPARK-44818:


PR here - [https://github.com/apache/spark/pull/42504]

 

cc - [~kabhwan] , [~jiangxb1987] 

> Fix race for pending interrupt issued before taskThread is initialized
> --
>
> Key: SPARK-44818
> URL: https://issues.apache.org/jira/browse/SPARK-44818
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core, Structured Streaming
>Affects Versions: 3.5.1
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Fix race for pending interrupt issued before taskThread is initialized



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44818) Fix race for pending interrupt issued before taskThread is initialized

2023-08-15 Thread Anish Shrigondekar (Jira)
Anish Shrigondekar created SPARK-44818:
--

 Summary: Fix race for pending interrupt issued before taskThread 
is initialized
 Key: SPARK-44818
 URL: https://issues.apache.org/jira/browse/SPARK-44818
 Project: Spark
  Issue Type: Task
  Components: Spark Core, Structured Streaming
Affects Versions: 3.5.1
Reporter: Anish Shrigondekar


Fix race for pending interrupt issued before taskThread is initialized



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42664) Support bloomFilter for DataFrameStatFunctions

2023-08-15 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42664.
---
Fix Version/s: 3.5.0
 Assignee: Yang Jie
   Resolution: Fixed

> Support bloomFilter for DataFrameStatFunctions
> --
>
> Key: SPARK-42664
> URL: https://issues.apache.org/jira/browse/SPARK-42664
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44794) Propagate ArtifactSet to stream execution thread

2023-08-15 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44794.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Propagate ArtifactSet to stream execution thread
> 
>
> Key: SPARK-44794
> URL: https://issues.apache.org/jira/browse/SPARK-44794
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Blocker
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44803) Replace `publish` with `publishOrSkip` in SparkBuild to eliminate warnings

2023-08-15 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44803.
---
Fix Version/s: 4.0.0
 Assignee: BingKun Pan
   Resolution: Fixed

> Replace `publish` with `publishOrSkip` in SparkBuild to eliminate warnings
> --
>
> Key: SPARK-44803
> URL: https://issues.apache.org/jira/browse/SPARK-44803
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44124) Upgrade AWS SDK to v2

2023-08-15 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754698#comment-17754698
 ] 

Steve Loughran commented on SPARK-44124:


+will need to make sure any classloaders set up to pass down com.amazonaws to 
children (e.g. hive classloader) now pass down software.amazon

> Upgrade AWS SDK to v2
> -
>
> Key: SPARK-44124
> URL: https://issues.apache.org/jira/browse/SPARK-44124
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44817) Incremental Stats Collection

2023-08-15 Thread Rakesh Raushan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754694#comment-17754694
 ] 

Rakesh Raushan commented on SPARK-44817:


[~cloud_fan] [~gurwls223] [~maxgekk] What are your thoughts over this ?
If this looks promising, i can work on raising PR for this.

> Incremental Stats Collection
> 
>
> Key: SPARK-44817
> URL: https://issues.apache.org/jira/browse/SPARK-44817
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Rakesh Raushan
>Priority: Major
>
> Spark's Cost Based Optimizer is dependent on the table and column statistics.
> After every execution of DML query, table and column stats are invalidated if 
> auto update of stats collection is not turned on. To keep stats updated we 
> need to run `ANALYZE TABLE COMPUTE STATISTICS` command which is very 
> expensive. It is not feasible to run this command after every DML query.
> Instead, we can incrementally update the stats during each DML query run 
> itself. This way our table and column stats would be fresh at all the time 
> and CBO benefits can be applied. Initially, we can only update table level 
> stats and gradually start updating column level stats as well.
> *Pros:*
> 1. Optimize queries over table which is updated frequently.
> 2. Saves Compute cycles by removing dependency over `ANALYZE TABLE COMPUTE 
> STATISTICS` for updating stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44817) Incremental Stats Collection

2023-08-15 Thread Rakesh Raushan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Raushan updated SPARK-44817:
---
Description: 
Spark's Cost Based Optimizer is dependent on the table and column statistics.

After every execution of DML query, table and column stats are invalidated if 
auto update of stats collection is not turned on. To keep stats updated we need 
to run `ANALYZE TABLE COMPUTE STATISTICS` command which is very expensive. It 
is not feasible to run this command after every DML query.

Instead, we can incrementally update the stats during each DML query run 
itself. This way our table and column stats would be fresh at all the time and 
CBO benefits can be applied. Initially, we can only update table level stats 
and gradually start updating column level stats as well.

*Pros:*

1. Optimize queries over table which is updated frequently.
2. Saves Compute cycles by removing dependency over `ANALYZE TABLE COMPUTE 
STATISTICS` for updating stats.


  was:
Spark's Cost Based Optimizer is dependent on the table and column statistics.

After every execution of DML query, table and column stats are invalidated if 
auto update of stats collection is not turned on. To keep stats updated we need 
to run `ANALYZE TABLE COMPUTE STATISTICS` command which is very expensive. It 
is not feasible to run this command after every DML query.

Instead, we can incrementally update the stats during each DML query run 
itself. This way our table and column stats would be fresh at all the time and 
CBO benefits can be applied.

*Pros:*

1. Optimize queries over table which is updated frequently.
2. Saves Compute cycles by removing dependency over `ANALYZE TABLE COMPUTE 
STATISTICS` for updating stats.



> Incremental Stats Collection
> 
>
> Key: SPARK-44817
> URL: https://issues.apache.org/jira/browse/SPARK-44817
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Rakesh Raushan
>Priority: Major
>
> Spark's Cost Based Optimizer is dependent on the table and column statistics.
> After every execution of DML query, table and column stats are invalidated if 
> auto update of stats collection is not turned on. To keep stats updated we 
> need to run `ANALYZE TABLE COMPUTE STATISTICS` command which is very 
> expensive. It is not feasible to run this command after every DML query.
> Instead, we can incrementally update the stats during each DML query run 
> itself. This way our table and column stats would be fresh at all the time 
> and CBO benefits can be applied. Initially, we can only update table level 
> stats and gradually start updating column level stats as well.
> *Pros:*
> 1. Optimize queries over table which is updated frequently.
> 2. Saves Compute cycles by removing dependency over `ANALYZE TABLE COMPUTE 
> STATISTICS` for updating stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44817) Incremental Stats Collection

2023-08-15 Thread Rakesh Raushan (Jira)
Rakesh Raushan created SPARK-44817:
--

 Summary: Incremental Stats Collection
 Key: SPARK-44817
 URL: https://issues.apache.org/jira/browse/SPARK-44817
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Rakesh Raushan


Spark's Cost Based Optimizer is dependent on the table and column statistics.

After every execution of DML query, table and column stats are invalidated if 
auto update of stats collection is not turned on. To keep stats updated we need 
to run `ANALYZE TABLE COMPUTE STATISTICS` command which is very expensive. It 
is not feasible to run this command after every DML query.

Instead, we can incrementally update the stats during each DML query run 
itself. This way our table and column stats would be fresh at all the time and 
CBO benefits can be applied.

*Pros:*

1. Optimize queries over table which is updated frequently.
2. Saves Compute cycles by removing dependency over `ANALYZE TABLE COMPUTE 
STATISTICS` for updating stats.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44806) Separate connect-client-jvm-internal

2023-08-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754684#comment-17754684
 ] 

Hudson commented on SPARK-44806:


User 'juliuszsompolski' has created a pull request for this issue:
https://github.com/apache/spark/pull/42501

> Separate connect-client-jvm-internal
> 
>
> Key: SPARK-44806
> URL: https://issues.apache.org/jira/browse/SPARK-44806
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44816) Cryptic error message when UDF associated class is not found

2023-08-15 Thread Niranjan Jayakar (Jira)
Niranjan Jayakar created SPARK-44816:


 Summary: Cryptic error message when UDF associated class is not 
found
 Key: SPARK-44816
 URL: https://issues.apache.org/jira/browse/SPARK-44816
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Niranjan Jayakar


When a Dataset API is used that either requires or is modeled as a UDF, the 
class defining the UDF/function should be uploaded to the service fist using 
the `addArtifact()` API.

When this is not done, an error is thrown. However, this error message is 
cryptic and is not clear about the problem.

Improve this error message to make it clear that an expected class was not 
found.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44803) Replace `publish` with `publishOrSkip` in SparkBuild to eliminate warnings

2023-08-15 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-44803:

Summary: Replace `publish` with `publishOrSkip` in SparkBuild to eliminate 
warnings  (was: Replace `publishOrSkip` with `publish` in SparkBuild to 
eliminate warnings)

> Replace `publish` with `publishOrSkip` in SparkBuild to eliminate warnings
> --
>
> Key: SPARK-44803
> URL: https://issues.apache.org/jira/browse/SPARK-44803
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44815) Cache Schema of DF

2023-08-15 Thread Martin Grund (Jira)
Martin Grund created SPARK-44815:


 Summary: Cache Schema of DF
 Key: SPARK-44815
 URL: https://issues.apache.org/jira/browse/SPARK-44815
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44814) Test to trigger protobuf 4.23.3 crash

2023-08-15 Thread Martin Grund (Jira)
Martin Grund created SPARK-44814:


 Summary: Test to trigger protobuf 4.23.3 crash
 Key: SPARK-44814
 URL: https://issues.apache.org/jira/browse/SPARK-44814
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44718) High On-heap memory usage is detected while doing parquet-file reading with Off-Heap memory mode enabled on spark

2023-08-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-44718:
---

Assignee: Zamil Majdy

> High On-heap memory usage is detected while doing parquet-file reading with 
> Off-Heap memory mode enabled on spark
> -
>
> Key: SPARK-44718
> URL: https://issues.apache.org/jira/browse/SPARK-44718
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.1
>Reporter: Zamil Majdy
>Assignee: Zamil Majdy
>Priority: Major
> Fix For: 4.0.0
>
>
> I see the high use of on-heap memory usage while doing the parquet file 
> reading when the off-heap memory mode is enabled. This is caused by the 
> memory-mode for the column vector for the vectorized reader is configured by 
> different flag, and the default value is always set to On-Heap.
> Conf to reproduce the issue:
> {{spark.memory.offHeap.size 100}}
> {{spark.memory.offHeap.enabled true}}
> Enabling these configurations only will not change the memory mode used for 
> parquet-reading by the vectorized reader to Off-Heap.
>  
> Proposed PR: https://github.com/apache/spark/pull/42394



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44718) High On-heap memory usage is detected while doing parquet-file reading with Off-Heap memory mode enabled on spark

2023-08-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-44718.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42394
[https://github.com/apache/spark/pull/42394]

> High On-heap memory usage is detected while doing parquet-file reading with 
> Off-Heap memory mode enabled on spark
> -
>
> Key: SPARK-44718
> URL: https://issues.apache.org/jira/browse/SPARK-44718
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.1
>Reporter: Zamil Majdy
>Priority: Major
> Fix For: 4.0.0
>
>
> I see the high use of on-heap memory usage while doing the parquet file 
> reading when the off-heap memory mode is enabled. This is caused by the 
> memory-mode for the column vector for the vectorized reader is configured by 
> different flag, and the default value is always set to On-Heap.
> Conf to reproduce the issue:
> {{spark.memory.offHeap.size 100}}
> {{spark.memory.offHeap.enabled true}}
> Enabling these configurations only will not change the memory mode used for 
> parquet-reading by the vectorized reader to Off-Heap.
>  
> Proposed PR: https://github.com/apache/spark/pull/42394



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44564) Refine the documents with LLM

2023-08-15 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-44564:
--
Description: 
Let's first focus on the Documents of *PySpark DataFrame APIs*.

*1*, Chose a subset of DF APIs
Since the review bandwidth is limited, we recommend each PR contains at least 5 
APIs;

*2*, For each API, copy-paste the function (including function signature, doc 
string) to a LLM Model, and ask it to with a prompts (e.g. the attached 
prompt), you can of course use/design your own prompt.
For prompt engineering, you can refer to this [Best 
practices|https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api]
 

*3*, Note that the LLM is not 100% reliable, the generated doc string may still 
contain some mistakes, e.g.
* The example code can not run
* The example results are incorrect
* The example code doesn't reflect the example title
* The description use wrong version, add a 'Raise' selection for non-existent 
exception
* The lint can be broken
* ...

we need to fix them before sending a PR.

We can try different prompts, choose the good parts and combine them to the new 
doc sting.

  was:
Let's first focus on the Documents of *PySpark DataFrame APIs*.

*1*, Chose a subset of DF APIs
Since the review bandwidth is limited, we recommend each PR contains at least 5 
APIs;

*2*, For each API, copy-paste the function (including function signature, doc 
string) to a LLM Model, and ask it to with a prompts (e.g. the attached 
prompt), you can of course use/design your own prompt.
For prompt engineering, you can refer to this [Best 
practices|https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api]
 

It is highly recommended to leverage *GPT-4* instead of GPT-3.5, since the 
former generate better results. 

*3*, Note that the LLM is not 100% reliable, the generated doc string may still 
contain some mistakes, e.g.
* The example code can not run
* The example results are incorrect
* The example code doesn't reflect the example title
* The description use wrong version, add a 'Raise' selection for non-existent 
exception
* The lint can be broken
* ...

we need to fix them before sending a PR.

We can try different prompts, choose the good parts and combine them to the new 
doc sting.


> Refine the documents with LLM
> -
>
> Key: SPARK-44564
> URL: https://issues.apache.org/jira/browse/SPARK-44564
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
> Attachments: docstr_prompt.py
>
>
> Let's first focus on the Documents of *PySpark DataFrame APIs*.
> *1*, Chose a subset of DF APIs
> Since the review bandwidth is limited, we recommend each PR contains at least 
> 5 APIs;
> *2*, For each API, copy-paste the function (including function signature, doc 
> string) to a LLM Model, and ask it to with a prompts (e.g. the attached 
> prompt), you can of course use/design your own prompt.
> For prompt engineering, you can refer to this [Best 
> practices|https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api]
>  
> *3*, Note that the LLM is not 100% reliable, the generated doc string may 
> still contain some mistakes, e.g.
> * The example code can not run
> * The example results are incorrect
> * The example code doesn't reflect the example title
> * The description use wrong version, add a 'Raise' selection for non-existent 
> exception
> * The lint can be broken
> * ...
> we need to fix them before sending a PR.
> We can try different prompts, choose the good parts and combine them to the 
> new doc sting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44782) Adjust Pull Request Template to incorporate the ASF Generative Tooling Guidance recommendations

2023-08-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754516#comment-17754516
 ] 

ASF GitHub Bot commented on SPARK-44782:


User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/42469

> Adjust Pull Request Template to incorporate the ASF Generative Tooling 
> Guidance recommendations
> ---
>
> Key: SPARK-44782
> URL: https://issues.apache.org/jira/browse/SPARK-44782
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.3.2, 3.4.1
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Recently releases [ASF Generative Tooling 
> Guidance|https://www.apache.org/legal/generative-tooling.html] recommends 
> keeping track of the generative AI tools used to author patches
> ??When providing contributions authored using generative AI tooling, a 
> recommended practice is for contributors to indicate the tooling used to 
> create the contribution. This should be included as a token in the source 
> control commit message, for example including the phrase “Generated-by: ”. 
> This allows for future release tooling to be considered that pulls this 
> content into a machine parsable Tooling-Provenance file.??
> We should adjust PR template accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44806) Separate connect-client-jvm-internal

2023-08-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754500#comment-17754500
 ] 

ASF GitHub Bot commented on SPARK-44806:


User 'juliuszsompolski' has created a pull request for this issue:
https://github.com/apache/spark/pull/42441

> Separate connect-client-jvm-internal
> 
>
> Key: SPARK-44806
> URL: https://issues.apache.org/jira/browse/SPARK-44806
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44782) Adjust Pull Request Template to incorporate the ASF Generative Tooling Guidance recommendations

2023-08-15 Thread Maciej Szymkiewicz (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754482#comment-17754482
 ] 

Maciej Szymkiewicz commented on SPARK-44782:


Created a pull request for this issue:
https://github.com/apache/spark/pull/42469

> Adjust Pull Request Template to incorporate the ASF Generative Tooling 
> Guidance recommendations
> ---
>
> Key: SPARK-44782
> URL: https://issues.apache.org/jira/browse/SPARK-44782
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.3.2, 3.4.1
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Recently releases [ASF Generative Tooling 
> Guidance|https://www.apache.org/legal/generative-tooling.html] recommends 
> keeping track of the generative AI tools used to author patches
> ??When providing contributions authored using generative AI tooling, a 
> recommended practice is for contributors to indicate the tooling used to 
> create the contribution. This should be included as a token in the source 
> control commit message, for example including the phrase “Generated-by: ”. 
> This allows for future release tooling to be considered that pulls this 
> content into a machine parsable Tooling-Provenance file.??
> We should adjust PR template accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44813) The JIRA Python misses our assignee when it searches user again

2023-08-15 Thread Kent Yao (Jira)
Kent Yao created SPARK-44813:


 Summary: The JIRA Python misses our assignee when it searches user 
again
 Key: SPARK-44813
 URL: https://issues.apache.org/jira/browse/SPARK-44813
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Kent Yao


{code:java}
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"'SPARK-44801'

>>> asf_jira.assign_issue(issue.key, assignee.name)

response text = {"errorMessages":[],"errors":{"assignee":"User 'airhot' cannot 
be assigned issues."}} {code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44801) SQL Page does not capture failed queries in analyzer

2023-08-15 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44801:


Assignee: Kent Yao  (was: Kent Yao 2)

> SQL Page does not capture failed queries in analyzer 
> -
>
> Key: SPARK-44801
> URL: https://issues.apache.org/jira/browse/SPARK-44801
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44801) SQL Page does not capture failed queries in analyzer

2023-08-15 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44801:


Assignee: (was: Kent Yao 2)

> SQL Page does not capture failed queries in analyzer 
> -
>
> Key: SPARK-44801
> URL: https://issues.apache.org/jira/browse/SPARK-44801
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44801) SQL Page does not capture failed queries in analyzer

2023-08-15 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44801:


Assignee: Kent Yao 2

> SQL Page does not capture failed queries in analyzer 
> -
>
> Key: SPARK-44801
> URL: https://issues.apache.org/jira/browse/SPARK-44801
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao 2
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44782) Adjust Pull Request Template to incorporate the ASF Generative Tooling Guidance recommendations

2023-08-15 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754429#comment-17754429
 ] 

Xiao Li commented on SPARK-44782:
-

+1 We should update the PR template. 

> Adjust Pull Request Template to incorporate the ASF Generative Tooling 
> Guidance recommendations
> ---
>
> Key: SPARK-44782
> URL: https://issues.apache.org/jira/browse/SPARK-44782
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.3.2, 3.4.1
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Recently releases [ASF Generative Tooling 
> Guidance|https://www.apache.org/legal/generative-tooling.html] recommends 
> keeping track of the generative AI tools used to author patches
> ??When providing contributions authored using generative AI tooling, a 
> recommended practice is for contributors to indicate the tooling used to 
> create the contribution. This should be included as a token in the source 
> control commit message, for example including the phrase “Generated-by: ”. 
> This allows for future release tooling to be considered that pulls this 
> content into a machine parsable Tooling-Provenance file.??
> We should adjust PR template accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org