[jira] [Created] (SPARK-41271) Parameterized SQL

2022-11-26 Thread Max Gekk (Jira)
Max Gekk created SPARK-41271:


 Summary: Parameterized SQL
 Key: SPARK-41271
 URL: https://issues.apache.org/jira/browse/SPARK-41271
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk


Enhance the Spark SQL API with support for parameterized SQL statements to 
improve security and reusability. Application developers will be able to write 
SQL with parameter markers whose values will be passed separately from the SQL 
code and interpreted as literals. This will help prevent SQL injection attacks 
for applications that generate SQL based on a user’s selections, which is often 
done via a user interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41271) Parameterized SQL

2022-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41271:


Assignee: Apache Spark  (was: Max Gekk)

> Parameterized SQL
> -
>
> Key: SPARK-41271
> URL: https://issues.apache.org/jira/browse/SPARK-41271
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Enhance the Spark SQL API with support for parameterized SQL statements to 
> improve security and reusability. Application developers will be able to 
> write SQL with parameter markers whose values will be passed separately from 
> the SQL code and interpreted as literals. This will help prevent SQL 
> injection attacks for applications that generate SQL based on a user’s 
> selections, which is often done via a user interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41271) Parameterized SQL

2022-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638836#comment-17638836
 ] 

Apache Spark commented on SPARK-41271:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/38712

> Parameterized SQL
> -
>
> Key: SPARK-41271
> URL: https://issues.apache.org/jira/browse/SPARK-41271
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Enhance the Spark SQL API with support for parameterized SQL statements to 
> improve security and reusability. Application developers will be able to 
> write SQL with parameter markers whose values will be passed separately from 
> the SQL code and interpreted as literals. This will help prevent SQL 
> injection attacks for applications that generate SQL based on a user’s 
> selections, which is often done via a user interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41271) Parameterized SQL

2022-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41271:


Assignee: Max Gekk  (was: Apache Spark)

> Parameterized SQL
> -
>
> Key: SPARK-41271
> URL: https://issues.apache.org/jira/browse/SPARK-41271
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Enhance the Spark SQL API with support for parameterized SQL statements to 
> improve security and reusability. Application developers will be able to 
> write SQL with parameter markers whose values will be passed separately from 
> the SQL code and interpreted as literals. This will help prevent SQL 
> injection attacks for applications that generate SQL based on a user’s 
> selections, which is often done via a user interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41272) Assign a name to the error class _LEGACY_ERROR_TEMP_2019

2022-11-26 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-41272:
---

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2019
 Key: SPARK-41272
 URL: https://issues.apache.org/jira/browse/SPARK-41272
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41272) Assign a name to the error class _LEGACY_ERROR_TEMP_2019

2022-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41272:


Assignee: (was: Apache Spark)

> Assign a name to the error class _LEGACY_ERROR_TEMP_2019
> 
>
> Key: SPARK-41272
> URL: https://issues.apache.org/jira/browse/SPARK-41272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41272) Assign a name to the error class _LEGACY_ERROR_TEMP_2019

2022-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638876#comment-17638876
 ] 

Apache Spark commented on SPARK-41272:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38808

> Assign a name to the error class _LEGACY_ERROR_TEMP_2019
> 
>
> Key: SPARK-41272
> URL: https://issues.apache.org/jira/browse/SPARK-41272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41272) Assign a name to the error class _LEGACY_ERROR_TEMP_2019

2022-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638877#comment-17638877
 ] 

Apache Spark commented on SPARK-41272:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38808

> Assign a name to the error class _LEGACY_ERROR_TEMP_2019
> 
>
> Key: SPARK-41272
> URL: https://issues.apache.org/jira/browse/SPARK-41272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41272) Assign a name to the error class _LEGACY_ERROR_TEMP_2019

2022-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41272:


Assignee: Apache Spark

> Assign a name to the error class _LEGACY_ERROR_TEMP_2019
> 
>
> Key: SPARK-41272
> URL: https://issues.apache.org/jira/browse/SPARK-41272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41273) Update plugins to latest versions

2022-11-26 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-41273:
---

 Summary: Update plugins to latest versions
 Key: SPARK-41273
 URL: https://issues.apache.org/jira/browse/SPARK-41273
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41273) Update plugins to latest versions

2022-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639021#comment-17639021
 ] 

Apache Spark commented on SPARK-41273:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38809

> Update plugins to latest versions
> -
>
> Key: SPARK-41273
> URL: https://issues.apache.org/jira/browse/SPARK-41273
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41273) Update plugins to latest versions

2022-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41273:


Assignee: Apache Spark

> Update plugins to latest versions
> -
>
> Key: SPARK-41273
> URL: https://issues.apache.org/jira/browse/SPARK-41273
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41273) Update plugins to latest versions

2022-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41273:


Assignee: (was: Apache Spark)

> Update plugins to latest versions
> -
>
> Key: SPARK-41273
> URL: https://issues.apache.org/jira/browse/SPARK-41273
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41273) Update plugins to latest versions

2022-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639022#comment-17639022
 ] 

Apache Spark commented on SPARK-41273:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38809

> Update plugins to latest versions
> -
>
> Key: SPARK-41273
> URL: https://issues.apache.org/jira/browse/SPARK-41273
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41274) Bump Kubernetes Client Version to 6.2.0

2022-11-26 Thread Ted Yu (Jira)
Ted Yu created SPARK-41274:
--

 Summary: Bump Kubernetes Client Version to 6.2.0
 Key: SPARK-41274
 URL: https://issues.apache.org/jira/browse/SPARK-41274
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.4.0
Reporter: Ted Yu


Bump Kubernetes Client Version to 6.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41274) Bump Kubernetes Client Version to 6.2.0

2022-11-26 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-41274.

Resolution: Duplicate

This is dup of commit 02a2242a45062755bf7e20805958d5bdf1f5ed74

> Bump Kubernetes Client Version to 6.2.0
> ---
>
> Key: SPARK-41274
> URL: https://issues.apache.org/jira/browse/SPARK-41274
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Priority: Major
>
> Bump Kubernetes Client Version to 6.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40872) Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40872:
-
Fix Version/s: 3.3.2

> Fallback to original shuffle block when a push-merged shuffle chunk is 
> zero-size
> 
>
> Key: SPARK-40872
> URL: https://issues.apache.org/jira/browse/SPARK-40872
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.3.0, 3.2.2
>Reporter: gaoyajun02
>Assignee: gaoyajun02
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> A large number of shuffle tests in our cluster show that bad nodes with chunk 
> corruption appear have a probability of fetching zero-size shuffleChunks. In 
> this case, we can fall back to original shuffle blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41261) applyInPandasWithState can produce incorrect key value in user function for timed out state

2022-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41261:


Assignee: Jungtaek Lim

> applyInPandasWithState can produce incorrect key value in user function for 
> timed out state
> ---
>
> Key: SPARK-41261
> URL: https://issues.apache.org/jira/browse/SPARK-41261
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> We observed the issue that user function retrieves incorrect key in user 
> function for timed out state. After RCA we figured out this could happen when 
> the columns of grouping keys are not placed sequentially at earliest place.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41261) applyInPandasWithState can produce incorrect key value in user function for timed out state

2022-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41261.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38798
[https://github.com/apache/spark/pull/38798]

> applyInPandasWithState can produce incorrect key value in user function for 
> timed out state
> ---
>
> Key: SPARK-41261
> URL: https://issues.apache.org/jira/browse/SPARK-41261
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.4.0
>
>
> We observed the issue that user function retrieves incorrect key in user 
> function for timed out state. After RCA we figured out this could happen when 
> the columns of grouping keys are not placed sequentially at earliest place.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41267) Add unpivot / melt to SparkR

2022-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41267.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38804
[https://github.com/apache/spark/pull/38804]

> Add unpivot / melt to SparkR
> 
>
> Key: SPARK-41267
> URL: https://issues.apache.org/jira/browse/SPARK-41267
> Project: Spark
>  Issue Type: Improvement
>  Components: R, SQL
>Affects Versions: 3.4.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.4.0
>
>
> Unpivot / melt operations have been implemented for Scala {{Dataset}} and 
> core Python {{{}DataFrame{}}}, but are missing from SparkR. We should add 
> these to achieve feature parity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41267) Add unpivot / melt to SparkR

2022-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41267:


Assignee: Maciej Szymkiewicz

> Add unpivot / melt to SparkR
> 
>
> Key: SPARK-41267
> URL: https://issues.apache.org/jira/browse/SPARK-41267
> Project: Spark
>  Issue Type: Improvement
>  Components: R, SQL
>Affects Versions: 3.4.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
>
> Unpivot / melt operations have been implemented for Scala {{Dataset}} and 
> core Python {{{}DataFrame{}}}, but are missing from SparkR. We should add 
> these to achieve feature parity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41275) Upgrade pickle to 1.3

2022-11-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-41275:


 Summary: Upgrade pickle to 1.3
 Key: SPARK-41275
 URL: https://issues.apache.org/jira/browse/SPARK-41275
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41275) Upgrade pickle to 1.3

2022-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41275:


Assignee: (was: Apache Spark)

> Upgrade pickle to 1.3
> -
>
> Key: SPARK-41275
> URL: https://issues.apache.org/jira/browse/SPARK-41275
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41275) Upgrade pickle to 1.3

2022-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639577#comment-17639577
 ] 

Apache Spark commented on SPARK-41275:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38810

> Upgrade pickle to 1.3
> -
>
> Key: SPARK-41275
> URL: https://issues.apache.org/jira/browse/SPARK-41275
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41275) Upgrade pickle to 1.3

2022-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41275:


Assignee: Apache Spark

> Upgrade pickle to 1.3
> -
>
> Key: SPARK-41275
> URL: https://issues.apache.org/jira/browse/SPARK-41275
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41275) Upgrade pickle to 1.3

2022-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639578#comment-17639578
 ] 

Apache Spark commented on SPARK-41275:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38810

> Upgrade pickle to 1.3
> -
>
> Key: SPARK-41275
> URL: https://issues.apache.org/jira/browse/SPARK-41275
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41276) Optimize constructor use of `StructType`

2022-11-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-41276:


 Summary: Optimize constructor use of `StructType`
 Key: SPARK-41276
 URL: https://issues.apache.org/jira/browse/SPARK-41276
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, SQL
Affects Versions: 3.4.0
Reporter: Yang Jie


There are two main ways to construct `StructType`:

- Primary constructor

```scala
case class StructType(fields: Array[StructField])
```

- Use `Seq` as input constructor

```scala
def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray)
```

These two construction methods are widely used in Spark, but the latter 
requires an additional collection conversion.

This pr changes the following 3 scenarios to use primary constructor to reduce 
one collection conversion:

1. For manually create `Seq` input scenes, change to use manually create 
`Array` input instead, for examaple:

https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63

2. For the scenario where 'toSeq' is added to create input for compatibility 
with Scala 2.13, directly call 'toArray' to instead, for example:

https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113

3. For scenes whose input is originally `Array`, remove the redundant `toSeq`, 
for example:

https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41276) Optimize constructor use of `StructType`

2022-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639586#comment-17639586
 ] 

Apache Spark commented on SPARK-41276:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38811

> Optimize constructor use of `StructType`
> 
>
> Key: SPARK-41276
> URL: https://issues.apache.org/jira/browse/SPARK-41276
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are two main ways to construct `StructType`:
> - Primary constructor
> ```scala
> case class StructType(fields: Array[StructField])
> ```
> - Use `Seq` as input constructor
> ```scala
> def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray)
> ```
> These two construction methods are widely used in Spark, but the latter 
> requires an additional collection conversion.
> This pr changes the following 3 scenarios to use primary constructor to 
> reduce one collection conversion:
> 1. For manually create `Seq` input scenes, change to use manually create 
> `Array` input instead, for examaple:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63
> 2. For the scenario where 'toSeq' is added to create input for compatibility 
> with Scala 2.13, directly call 'toArray' to instead, for example:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113
> 3. For scenes whose input is originally `Array`, remove the redundant 
> `toSeq`, for example:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41276) Optimize constructor use of `StructType`

2022-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41276:


Assignee: Apache Spark

> Optimize constructor use of `StructType`
> 
>
> Key: SPARK-41276
> URL: https://issues.apache.org/jira/browse/SPARK-41276
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> There are two main ways to construct `StructType`:
> - Primary constructor
> ```scala
> case class StructType(fields: Array[StructField])
> ```
> - Use `Seq` as input constructor
> ```scala
> def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray)
> ```
> These two construction methods are widely used in Spark, but the latter 
> requires an additional collection conversion.
> This pr changes the following 3 scenarios to use primary constructor to 
> reduce one collection conversion:
> 1. For manually create `Seq` input scenes, change to use manually create 
> `Array` input instead, for examaple:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63
> 2. For the scenario where 'toSeq' is added to create input for compatibility 
> with Scala 2.13, directly call 'toArray' to instead, for example:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113
> 3. For scenes whose input is originally `Array`, remove the redundant 
> `toSeq`, for example:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41276) Optimize constructor use of `StructType`

2022-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41276:


Assignee: (was: Apache Spark)

> Optimize constructor use of `StructType`
> 
>
> Key: SPARK-41276
> URL: https://issues.apache.org/jira/browse/SPARK-41276
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are two main ways to construct `StructType`:
> - Primary constructor
> ```scala
> case class StructType(fields: Array[StructField])
> ```
> - Use `Seq` as input constructor
> ```scala
> def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray)
> ```
> These two construction methods are widely used in Spark, but the latter 
> requires an additional collection conversion.
> This pr changes the following 3 scenarios to use primary constructor to 
> reduce one collection conversion:
> 1. For manually create `Seq` input scenes, change to use manually create 
> `Array` input instead, for examaple:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63
> 2. For the scenario where 'toSeq' is added to create input for compatibility 
> with Scala 2.13, directly call 'toArray' to instead, for example:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113
> 3. For scenes whose input is originally `Array`, remove the redundant 
> `toSeq`, for example:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org