[jira] [Created] (SPARK-41271) Parameterized SQL
Max Gekk created SPARK-41271: Summary: Parameterized SQL Key: SPARK-41271 URL: https://issues.apache.org/jira/browse/SPARK-41271 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Assignee: Max Gekk Enhance the Spark SQL API with support for parameterized SQL statements to improve security and reusability. Application developers will be able to write SQL with parameter markers whose values will be passed separately from the SQL code and interpreted as literals. This will help prevent SQL injection attacks for applications that generate SQL based on a user’s selections, which is often done via a user interface. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41271) Parameterized SQL
[ https://issues.apache.org/jira/browse/SPARK-41271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41271: Assignee: Apache Spark (was: Max Gekk) > Parameterized SQL > - > > Key: SPARK-41271 > URL: https://issues.apache.org/jira/browse/SPARK-41271 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Enhance the Spark SQL API with support for parameterized SQL statements to > improve security and reusability. Application developers will be able to > write SQL with parameter markers whose values will be passed separately from > the SQL code and interpreted as literals. This will help prevent SQL > injection attacks for applications that generate SQL based on a user’s > selections, which is often done via a user interface. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41271) Parameterized SQL
[ https://issues.apache.org/jira/browse/SPARK-41271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638836#comment-17638836 ] Apache Spark commented on SPARK-41271: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/38712 > Parameterized SQL > - > > Key: SPARK-41271 > URL: https://issues.apache.org/jira/browse/SPARK-41271 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Enhance the Spark SQL API with support for parameterized SQL statements to > improve security and reusability. Application developers will be able to > write SQL with parameter markers whose values will be passed separately from > the SQL code and interpreted as literals. This will help prevent SQL > injection attacks for applications that generate SQL based on a user’s > selections, which is often done via a user interface. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41271) Parameterized SQL
[ https://issues.apache.org/jira/browse/SPARK-41271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41271: Assignee: Max Gekk (was: Apache Spark) > Parameterized SQL > - > > Key: SPARK-41271 > URL: https://issues.apache.org/jira/browse/SPARK-41271 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Enhance the Spark SQL API with support for parameterized SQL statements to > improve security and reusability. Application developers will be able to > write SQL with parameter markers whose values will be passed separately from > the SQL code and interpreted as literals. This will help prevent SQL > injection attacks for applications that generate SQL based on a user’s > selections, which is often done via a user interface. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41272) Assign a name to the error class _LEGACY_ERROR_TEMP_2019
BingKun Pan created SPARK-41272: --- Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2019 Key: SPARK-41272 URL: https://issues.apache.org/jira/browse/SPARK-41272 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41272) Assign a name to the error class _LEGACY_ERROR_TEMP_2019
[ https://issues.apache.org/jira/browse/SPARK-41272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41272: Assignee: (was: Apache Spark) > Assign a name to the error class _LEGACY_ERROR_TEMP_2019 > > > Key: SPARK-41272 > URL: https://issues.apache.org/jira/browse/SPARK-41272 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41272) Assign a name to the error class _LEGACY_ERROR_TEMP_2019
[ https://issues.apache.org/jira/browse/SPARK-41272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638876#comment-17638876 ] Apache Spark commented on SPARK-41272: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38808 > Assign a name to the error class _LEGACY_ERROR_TEMP_2019 > > > Key: SPARK-41272 > URL: https://issues.apache.org/jira/browse/SPARK-41272 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41272) Assign a name to the error class _LEGACY_ERROR_TEMP_2019
[ https://issues.apache.org/jira/browse/SPARK-41272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638877#comment-17638877 ] Apache Spark commented on SPARK-41272: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38808 > Assign a name to the error class _LEGACY_ERROR_TEMP_2019 > > > Key: SPARK-41272 > URL: https://issues.apache.org/jira/browse/SPARK-41272 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41272) Assign a name to the error class _LEGACY_ERROR_TEMP_2019
[ https://issues.apache.org/jira/browse/SPARK-41272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41272: Assignee: Apache Spark > Assign a name to the error class _LEGACY_ERROR_TEMP_2019 > > > Key: SPARK-41272 > URL: https://issues.apache.org/jira/browse/SPARK-41272 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41273) Update plugins to latest versions
BingKun Pan created SPARK-41273: --- Summary: Update plugins to latest versions Key: SPARK-41273 URL: https://issues.apache.org/jira/browse/SPARK-41273 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41273) Update plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-41273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639021#comment-17639021 ] Apache Spark commented on SPARK-41273: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38809 > Update plugins to latest versions > - > > Key: SPARK-41273 > URL: https://issues.apache.org/jira/browse/SPARK-41273 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41273) Update plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-41273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41273: Assignee: Apache Spark > Update plugins to latest versions > - > > Key: SPARK-41273 > URL: https://issues.apache.org/jira/browse/SPARK-41273 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41273) Update plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-41273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41273: Assignee: (was: Apache Spark) > Update plugins to latest versions > - > > Key: SPARK-41273 > URL: https://issues.apache.org/jira/browse/SPARK-41273 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41273) Update plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-41273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639022#comment-17639022 ] Apache Spark commented on SPARK-41273: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38809 > Update plugins to latest versions > - > > Key: SPARK-41273 > URL: https://issues.apache.org/jira/browse/SPARK-41273 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41274) Bump Kubernetes Client Version to 6.2.0
Ted Yu created SPARK-41274: -- Summary: Bump Kubernetes Client Version to 6.2.0 Key: SPARK-41274 URL: https://issues.apache.org/jira/browse/SPARK-41274 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.4.0 Reporter: Ted Yu Bump Kubernetes Client Version to 6.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41274) Bump Kubernetes Client Version to 6.2.0
[ https://issues.apache.org/jira/browse/SPARK-41274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-41274. Resolution: Duplicate This is dup of commit 02a2242a45062755bf7e20805958d5bdf1f5ed74 > Bump Kubernetes Client Version to 6.2.0 > --- > > Key: SPARK-41274 > URL: https://issues.apache.org/jira/browse/SPARK-41274 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Ted Yu >Priority: Major > > Bump Kubernetes Client Version to 6.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40872) Fallback to original shuffle block when a push-merged shuffle chunk is zero-size
[ https://issues.apache.org/jira/browse/SPARK-40872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40872: - Fix Version/s: 3.3.2 > Fallback to original shuffle block when a push-merged shuffle chunk is > zero-size > > > Key: SPARK-40872 > URL: https://issues.apache.org/jira/browse/SPARK-40872 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.3.0, 3.2.2 >Reporter: gaoyajun02 >Assignee: gaoyajun02 >Priority: Major > Fix For: 3.4.0, 3.3.2 > > > A large number of shuffle tests in our cluster show that bad nodes with chunk > corruption appear have a probability of fetching zero-size shuffleChunks. In > this case, we can fall back to original shuffle blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41261) applyInPandasWithState can produce incorrect key value in user function for timed out state
[ https://issues.apache.org/jira/browse/SPARK-41261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41261: Assignee: Jungtaek Lim > applyInPandasWithState can produce incorrect key value in user function for > timed out state > --- > > Key: SPARK-41261 > URL: https://issues.apache.org/jira/browse/SPARK-41261 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > We observed the issue that user function retrieves incorrect key in user > function for timed out state. After RCA we figured out this could happen when > the columns of grouping keys are not placed sequentially at earliest place. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41261) applyInPandasWithState can produce incorrect key value in user function for timed out state
[ https://issues.apache.org/jira/browse/SPARK-41261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41261. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38798 [https://github.com/apache/spark/pull/38798] > applyInPandasWithState can produce incorrect key value in user function for > timed out state > --- > > Key: SPARK-41261 > URL: https://issues.apache.org/jira/browse/SPARK-41261 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.4.0 > > > We observed the issue that user function retrieves incorrect key in user > function for timed out state. After RCA we figured out this could happen when > the columns of grouping keys are not placed sequentially at earliest place. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41267) Add unpivot / melt to SparkR
[ https://issues.apache.org/jira/browse/SPARK-41267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41267. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38804 [https://github.com/apache/spark/pull/38804] > Add unpivot / melt to SparkR > > > Key: SPARK-41267 > URL: https://issues.apache.org/jira/browse/SPARK-41267 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.4.0 > > > Unpivot / melt operations have been implemented for Scala {{Dataset}} and > core Python {{{}DataFrame{}}}, but are missing from SparkR. We should add > these to achieve feature parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41267) Add unpivot / melt to SparkR
[ https://issues.apache.org/jira/browse/SPARK-41267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41267: Assignee: Maciej Szymkiewicz > Add unpivot / melt to SparkR > > > Key: SPARK-41267 > URL: https://issues.apache.org/jira/browse/SPARK-41267 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > > Unpivot / melt operations have been implemented for Scala {{Dataset}} and > core Python {{{}DataFrame{}}}, but are missing from SparkR. We should add > these to achieve feature parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41275) Upgrade pickle to 1.3
Yang Jie created SPARK-41275: Summary: Upgrade pickle to 1.3 Key: SPARK-41275 URL: https://issues.apache.org/jira/browse/SPARK-41275 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41275) Upgrade pickle to 1.3
[ https://issues.apache.org/jira/browse/SPARK-41275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41275: Assignee: (was: Apache Spark) > Upgrade pickle to 1.3 > - > > Key: SPARK-41275 > URL: https://issues.apache.org/jira/browse/SPARK-41275 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41275) Upgrade pickle to 1.3
[ https://issues.apache.org/jira/browse/SPARK-41275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639577#comment-17639577 ] Apache Spark commented on SPARK-41275: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38810 > Upgrade pickle to 1.3 > - > > Key: SPARK-41275 > URL: https://issues.apache.org/jira/browse/SPARK-41275 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41275) Upgrade pickle to 1.3
[ https://issues.apache.org/jira/browse/SPARK-41275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41275: Assignee: Apache Spark > Upgrade pickle to 1.3 > - > > Key: SPARK-41275 > URL: https://issues.apache.org/jira/browse/SPARK-41275 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41275) Upgrade pickle to 1.3
[ https://issues.apache.org/jira/browse/SPARK-41275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639578#comment-17639578 ] Apache Spark commented on SPARK-41275: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38810 > Upgrade pickle to 1.3 > - > > Key: SPARK-41275 > URL: https://issues.apache.org/jira/browse/SPARK-41275 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41276) Optimize constructor use of `StructType`
Yang Jie created SPARK-41276: Summary: Optimize constructor use of `StructType` Key: SPARK-41276 URL: https://issues.apache.org/jira/browse/SPARK-41276 Project: Spark Issue Type: Improvement Components: MLlib, SQL Affects Versions: 3.4.0 Reporter: Yang Jie There are two main ways to construct `StructType`: - Primary constructor ```scala case class StructType(fields: Array[StructField]) ``` - Use `Seq` as input constructor ```scala def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray) ``` These two construction methods are widely used in Spark, but the latter requires an additional collection conversion. This pr changes the following 3 scenarios to use primary constructor to reduce one collection conversion: 1. For manually create `Seq` input scenes, change to use manually create `Array` input instead, for examaple: https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63 2. For the scenario where 'toSeq' is added to create input for compatibility with Scala 2.13, directly call 'toArray' to instead, for example: https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113 3. For scenes whose input is originally `Array`, remove the redundant `toSeq`, for example: https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41276) Optimize constructor use of `StructType`
[ https://issues.apache.org/jira/browse/SPARK-41276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639586#comment-17639586 ] Apache Spark commented on SPARK-41276: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38811 > Optimize constructor use of `StructType` > > > Key: SPARK-41276 > URL: https://issues.apache.org/jira/browse/SPARK-41276 > Project: Spark > Issue Type: Improvement > Components: MLlib, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > There are two main ways to construct `StructType`: > - Primary constructor > ```scala > case class StructType(fields: Array[StructField]) > ``` > - Use `Seq` as input constructor > ```scala > def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray) > ``` > These two construction methods are widely used in Spark, but the latter > requires an additional collection conversion. > This pr changes the following 3 scenarios to use primary constructor to > reduce one collection conversion: > 1. For manually create `Seq` input scenes, change to use manually create > `Array` input instead, for examaple: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63 > 2. For the scenario where 'toSeq' is added to create input for compatibility > with Scala 2.13, directly call 'toArray' to instead, for example: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113 > 3. For scenes whose input is originally `Array`, remove the redundant > `toSeq`, for example: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41276) Optimize constructor use of `StructType`
[ https://issues.apache.org/jira/browse/SPARK-41276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41276: Assignee: Apache Spark > Optimize constructor use of `StructType` > > > Key: SPARK-41276 > URL: https://issues.apache.org/jira/browse/SPARK-41276 > Project: Spark > Issue Type: Improvement > Components: MLlib, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > There are two main ways to construct `StructType`: > - Primary constructor > ```scala > case class StructType(fields: Array[StructField]) > ``` > - Use `Seq` as input constructor > ```scala > def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray) > ``` > These two construction methods are widely used in Spark, but the latter > requires an additional collection conversion. > This pr changes the following 3 scenarios to use primary constructor to > reduce one collection conversion: > 1. For manually create `Seq` input scenes, change to use manually create > `Array` input instead, for examaple: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63 > 2. For the scenario where 'toSeq' is added to create input for compatibility > with Scala 2.13, directly call 'toArray' to instead, for example: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113 > 3. For scenes whose input is originally `Array`, remove the redundant > `toSeq`, for example: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41276) Optimize constructor use of `StructType`
[ https://issues.apache.org/jira/browse/SPARK-41276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41276: Assignee: (was: Apache Spark) > Optimize constructor use of `StructType` > > > Key: SPARK-41276 > URL: https://issues.apache.org/jira/browse/SPARK-41276 > Project: Spark > Issue Type: Improvement > Components: MLlib, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > There are two main ways to construct `StructType`: > - Primary constructor > ```scala > case class StructType(fields: Array[StructField]) > ``` > - Use `Seq` as input constructor > ```scala > def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray) > ``` > These two construction methods are widely used in Spark, but the latter > requires an additional collection conversion. > This pr changes the following 3 scenarios to use primary constructor to > reduce one collection conversion: > 1. For manually create `Seq` input scenes, change to use manually create > `Array` input instead, for examaple: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63 > 2. For the scenario where 'toSeq' is added to create input for compatibility > with Scala 2.13, directly call 'toArray' to instead, for example: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113 > 3. For scenes whose input is originally `Array`, remove the redundant > `toSeq`, for example: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org