[jira] [Assigned] (SPARK-42870) Move `toCatalystValue` to connect-common

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42870:


Assignee: Apache Spark

> Move `toCatalystValue` to connect-common
> 
>
> Key: SPARK-42870
> URL: https://issues.apache.org/jira/browse/SPARK-42870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42868) Support eliminate sorts in AQE Optimizer

2023-03-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702545#comment-17702545
 ] 

Apache Spark commented on SPARK-42868:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40484

> Support eliminate sorts in AQE Optimizer
> 
>
> Key: SPARK-42868
> URL: https://issues.apache.org/jira/browse/SPARK-42868
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42868) Support eliminate sorts in AQE Optimizer

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42868:


Assignee: Apache Spark

> Support eliminate sorts in AQE Optimizer
> 
>
> Key: SPARK-42868
> URL: https://issues.apache.org/jira/browse/SPARK-42868
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42868) Support eliminate sorts in AQE Optimizer

2023-03-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42868:


Assignee: (was: Apache Spark)

> Support eliminate sorts in AQE Optimizer
> 
>
> Key: SPARK-42868
> URL: https://issues.apache.org/jira/browse/SPARK-42868
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42809) Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702453#comment-17702453
 ] 

Apache Spark commented on SPARK-42809:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40482

> Upgrade scala-maven-plugin from 4.8.0 to 4.8.1
> --
>
> Key: SPARK-42809
> URL: https://issues.apache.org/jira/browse/SPARK-42809
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42827) Support `functions#array_prepend`

2023-03-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42827:


Assignee: (was: Apache Spark)

> Support `functions#array_prepend`
> -
>
> Key: SPARK-42827
> URL: https://issues.apache.org/jira/browse/SPARK-42827
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> Wait for SPARK-41233



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42827) Support `functions#array_prepend`

2023-03-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42827:


Assignee: Apache Spark

> Support `functions#array_prepend`
> -
>
> Key: SPARK-42827
> URL: https://issues.apache.org/jira/browse/SPARK-42827
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> Wait for SPARK-41233



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42827) Support `functions#array_prepend`

2023-03-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702398#comment-17702398
 ] 

Apache Spark commented on SPARK-42827:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40481

> Support `functions#array_prepend`
> -
>
> Key: SPARK-42827
> URL: https://issues.apache.org/jira/browse/SPARK-42827
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> Wait for SPARK-41233



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42508) Extract the common .ml classes to `mllib-common`

2023-03-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702396#comment-17702396
 ] 

Apache Spark commented on SPARK-42508:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40480

> Extract the common .ml classes to `mllib-common`
> 
>
> Key: SPARK-42508
> URL: https://issues.apache.org/jira/browse/SPARK-42508
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42779) Allow V2 writes to indicate advisory partition size

2023-03-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702351#comment-17702351
 ] 

Apache Spark commented on SPARK-42779:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40478

> Allow V2 writes to indicate advisory partition size
> ---
>
> Key: SPARK-42779
> URL: https://issues.apache.org/jira/browse/SPARK-42779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.5.0
>
>
> Data sources may request a particular distribution and ordering of data for 
> V2 writes. If AQE is enabled, the default session advisory partition size 
> (64MB) will be used as guidance. Unfortunately, this default value can still 
> lead to small files because the written data can be compressed nicely using 
> columnar file formats. Spark should allow data sources to indicate the 
> advisory shuffle partition size, just like it lets data sources request a 
> particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42805) 'Conflicting attributes' exception is thrown when joining checkpointed dataframe

2023-03-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42805:


Assignee: (was: Apache Spark)

> 'Conflicting attributes' exception is thrown when joining checkpointed 
> dataframe
> 
>
> Key: SPARK-42805
> URL: https://issues.apache.org/jira/browse/SPARK-42805
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.2
>Reporter: Maciej Smolenski
>Priority: Major
>
> Performing join using checkpointed dataframe leads to error in prepared 
> 'execution plan' because columns ids/names in 'execution plan' are not unique.
> This issue can be reproduced with this simple code (fails on 3.3.2, succeeds 
> on 3.1.2):
> {code:java}
> import spark.implicits._
> spark.sparkContext.setCheckpointDir("file:///tmp/cdir")
> val df = spark.range(10).toDF("id")
> val cdf = df.checkpoint()
> cdf.join(df) // org.apache.spark.sql.AnalysisException thrown on 3.3.2  {code}
>  
> The failure message is:
> {noformat}
> org.apache.spark.sql.AnalysisException:
> Failure when resolving conflicting references in Join:
> 'Join Inner
> :- LogicalRDD [id#2L], false
> +- Project [id#0L AS id#2L]
>    +- Range (0, 10, step=1, splits=Some(16))Conflicting attributes: id#2L
> ;
> 'Join Inner
> :- LogicalRDD [id#2L], false
> +- Project [id#0L AS id#2L]
>    +- Range (0, 10, step=1, splits=Some(16))  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:57)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:56)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:540)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:367)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:102)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:97)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:214)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
>   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:91)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89)
>   at org.apache.spark.sql.Dataset.withPlan(Dataset.scala:3887)
>   at org.apache.spark.sql.Dataset.join(Dataset.scala:920)
>   ... 49 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42805) 'Conflicting attributes' exception is thrown when joining checkpointed dataframe

2023-03-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702306#comment-17702306
 ] 

Apache Spark commented on SPARK-42805:
--

User 'ming95' has created a pull request for this issue:
https://github.com/apache/spark/pull/40477

> 'Conflicting attributes' exception is thrown when joining checkpointed 
> dataframe
> 
>
> Key: SPARK-42805
> URL: https://issues.apache.org/jira/browse/SPARK-42805
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.2
>Reporter: Maciej Smolenski
>Priority: Major
>
> Performing join using checkpointed dataframe leads to error in prepared 
> 'execution plan' because columns ids/names in 'execution plan' are not unique.
> This issue can be reproduced with this simple code (fails on 3.3.2, succeeds 
> on 3.1.2):
> {code:java}
> import spark.implicits._
> spark.sparkContext.setCheckpointDir("file:///tmp/cdir")
> val df = spark.range(10).toDF("id")
> val cdf = df.checkpoint()
> cdf.join(df) // org.apache.spark.sql.AnalysisException thrown on 3.3.2  {code}
>  
> The failure message is:
> {noformat}
> org.apache.spark.sql.AnalysisException:
> Failure when resolving conflicting references in Join:
> 'Join Inner
> :- LogicalRDD [id#2L], false
> +- Project [id#0L AS id#2L]
>    +- Range (0, 10, step=1, splits=Some(16))Conflicting attributes: id#2L
> ;
> 'Join Inner
> :- LogicalRDD [id#2L], false
> +- Project [id#0L AS id#2L]
>    +- Range (0, 10, step=1, splits=Some(16))  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:57)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:56)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:540)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:367)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:102)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:97)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:214)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
>   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:91)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89)
>   at org.apache.spark.sql.Dataset.withPlan(Dataset.scala:3887)
>   at org.apache.spark.sql.Dataset.join(Dataset.scala:920)
>   ... 49 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42805) 'Conflicting attributes' exception is thrown when joining checkpointed dataframe

2023-03-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702305#comment-17702305
 ] 

Apache Spark commented on SPARK-42805:
--

User 'ming95' has created a pull request for this issue:
https://github.com/apache/spark/pull/40477

> 'Conflicting attributes' exception is thrown when joining checkpointed 
> dataframe
> 
>
> Key: SPARK-42805
> URL: https://issues.apache.org/jira/browse/SPARK-42805
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.2
>Reporter: Maciej Smolenski
>Priority: Major
>
> Performing join using checkpointed dataframe leads to error in prepared 
> 'execution plan' because columns ids/names in 'execution plan' are not unique.
> This issue can be reproduced with this simple code (fails on 3.3.2, succeeds 
> on 3.1.2):
> {code:java}
> import spark.implicits._
> spark.sparkContext.setCheckpointDir("file:///tmp/cdir")
> val df = spark.range(10).toDF("id")
> val cdf = df.checkpoint()
> cdf.join(df) // org.apache.spark.sql.AnalysisException thrown on 3.3.2  {code}
>  
> The failure message is:
> {noformat}
> org.apache.spark.sql.AnalysisException:
> Failure when resolving conflicting references in Join:
> 'Join Inner
> :- LogicalRDD [id#2L], false
> +- Project [id#0L AS id#2L]
>    +- Range (0, 10, step=1, splits=Some(16))Conflicting attributes: id#2L
> ;
> 'Join Inner
> :- LogicalRDD [id#2L], false
> +- Project [id#0L AS id#2L]
>    +- Range (0, 10, step=1, splits=Some(16))  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:57)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:56)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:540)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:367)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:102)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:97)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:214)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
>   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:91)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89)
>   at org.apache.spark.sql.Dataset.withPlan(Dataset.scala:3887)
>   at org.apache.spark.sql.Dataset.join(Dataset.scala:920)
>   ... 49 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42805) 'Conflicting attributes' exception is thrown when joining checkpointed dataframe

2023-03-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42805:


Assignee: Apache Spark

> 'Conflicting attributes' exception is thrown when joining checkpointed 
> dataframe
> 
>
> Key: SPARK-42805
> URL: https://issues.apache.org/jira/browse/SPARK-42805
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.2
>Reporter: Maciej Smolenski
>Assignee: Apache Spark
>Priority: Major
>
> Performing join using checkpointed dataframe leads to error in prepared 
> 'execution plan' because columns ids/names in 'execution plan' are not unique.
> This issue can be reproduced with this simple code (fails on 3.3.2, succeeds 
> on 3.1.2):
> {code:java}
> import spark.implicits._
> spark.sparkContext.setCheckpointDir("file:///tmp/cdir")
> val df = spark.range(10).toDF("id")
> val cdf = df.checkpoint()
> cdf.join(df) // org.apache.spark.sql.AnalysisException thrown on 3.3.2  {code}
>  
> The failure message is:
> {noformat}
> org.apache.spark.sql.AnalysisException:
> Failure when resolving conflicting references in Join:
> 'Join Inner
> :- LogicalRDD [id#2L], false
> +- Project [id#0L AS id#2L]
>    +- Range (0, 10, step=1, splits=Some(16))Conflicting attributes: id#2L
> ;
> 'Join Inner
> :- LogicalRDD [id#2L], false
> +- Project [id#0L AS id#2L]
>    +- Range (0, 10, step=1, splits=Some(16))  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:57)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:56)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:540)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:367)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:102)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:97)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:214)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
>   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:91)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89)
>   at org.apache.spark.sql.Dataset.withPlan(Dataset.scala:3887)
>   at org.apache.spark.sql.Dataset.join(Dataset.scala:920)
>   ... 49 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42853) Update the Spark Doc to match the new website style

2023-03-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42853:


Assignee: (was: Apache Spark)

> Update the Spark Doc to match the new website style
> ---
>
> Key: SPARK-42853
> URL: https://issues.apache.org/jira/browse/SPARK-42853
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42853) Update the Spark Doc to match the new website style

2023-03-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42853:


Assignee: Apache Spark

> Update the Spark Doc to match the new website style
> ---
>
> Key: SPARK-42853
> URL: https://issues.apache.org/jira/browse/SPARK-42853
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42853) Update the Spark Doc to match the new website style

2023-03-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702243#comment-17702243
 ] 

Apache Spark commented on SPARK-42853:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/40269

> Update the Spark Doc to match the new website style
> ---
>
> Key: SPARK-42853
> URL: https://issues.apache.org/jira/browse/SPARK-42853
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42852) Revert NamedLambdaVariable related changes from EquivalentExpressions

2023-03-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702139#comment-17702139
 ] 

Apache Spark commented on SPARK-42852:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40475

> Revert NamedLambdaVariable related changes from EquivalentExpressions
> -
>
> Key: SPARK-42852
> URL: https://issues.apache.org/jira/browse/SPARK-42852
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Peter Toth
>Priority: Major
>
> See discussion 
> https://github.com/apache/spark/pull/40473#issuecomment-1474848224



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42852) Revert NamedLambdaVariable related changes from EquivalentExpressions

2023-03-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42852:


Assignee: Apache Spark

> Revert NamedLambdaVariable related changes from EquivalentExpressions
> -
>
> Key: SPARK-42852
> URL: https://issues.apache.org/jira/browse/SPARK-42852
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Peter Toth
>Assignee: Apache Spark
>Priority: Major
>
> See discussion 
> https://github.com/apache/spark/pull/40473#issuecomment-1474848224



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42852) Revert NamedLambdaVariable related changes from EquivalentExpressions

2023-03-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42852:


Assignee: (was: Apache Spark)

> Revert NamedLambdaVariable related changes from EquivalentExpressions
> -
>
> Key: SPARK-42852
> URL: https://issues.apache.org/jira/browse/SPARK-42852
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Peter Toth
>Priority: Major
>
> See discussion 
> https://github.com/apache/spark/pull/40473#issuecomment-1474848224



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42849) Session variables

2023-03-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702069#comment-17702069
 ] 

Apache Spark commented on SPARK-42849:
--

User 'srielau' has created a pull request for this issue:
https://github.com/apache/spark/pull/40474

> Session variables
> -
>
> Key: SPARK-42849
> URL: https://issues.apache.org/jira/browse/SPARK-42849
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Priority: Major
>
> Provide a type-safe, engine controlled session variable:
> CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name  [ type ][ 
> DEFAULT expresion ]
> SET {  variable = expression | ( variable [, ...] ) = ( subquery | expression 
> [, ...] )
> DROP VARIABLE  [ IF EXISTS ]variable_name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42849) Session variables

2023-03-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42849:


Assignee: Apache Spark

> Session variables
> -
>
> Key: SPARK-42849
> URL: https://issues.apache.org/jira/browse/SPARK-42849
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Assignee: Apache Spark
>Priority: Major
>
> Provide a type-safe, engine controlled session variable:
> CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name  [ type ][ 
> DEFAULT expresion ]
> SET {  variable = expression | ( variable [, ...] ) = ( subquery | expression 
> [, ...] )
> DROP VARIABLE  [ IF EXISTS ]variable_name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42849) Session variables

2023-03-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702068#comment-17702068
 ] 

Apache Spark commented on SPARK-42849:
--

User 'srielau' has created a pull request for this issue:
https://github.com/apache/spark/pull/40474

> Session variables
> -
>
> Key: SPARK-42849
> URL: https://issues.apache.org/jira/browse/SPARK-42849
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Priority: Major
>
> Provide a type-safe, engine controlled session variable:
> CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name  [ type ][ 
> DEFAULT expresion ]
> SET {  variable = expression | ( variable [, ...] ) = ( subquery | expression 
> [, ...] )
> DROP VARIABLE  [ IF EXISTS ]variable_name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42849) Session variables

2023-03-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42849:


Assignee: (was: Apache Spark)

> Session variables
> -
>
> Key: SPARK-42849
> URL: https://issues.apache.org/jira/browse/SPARK-42849
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Priority: Major
>
> Provide a type-safe, engine controlled session variable:
> CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name  [ type ][ 
> DEFAULT expresion ]
> SET {  variable = expression | ( variable [, ...] ) = ( subquery | expression 
> [, ...] )
> DROP VARIABLE  [ IF EXISTS ]variable_name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42851:


Assignee: Apache Spark

> EquivalentExpressions methods need to be consistently guarded by 
> supportedExpression
> 
>
> Key: SPARK-42851
> URL: https://issues.apache.org/jira/browse/SPARK-42851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kris Mok
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-41468 tried to fix a bug but introduced a new regression. Its change to 
> {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the 
> {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same 
> guard to the other "add" entry point -- {{addExpr()}}.
> As such, uses that add single expressions to CSE via {{addExpr()}} may 
> succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a 
> {{None}} due to failing the guard.
> We need to make sure the "add" and "get" methods are consistent. It could be 
> done by one of:
> 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or
> 2. Removing the guard from {{getExprState()}}, relying solely on the guard on 
> the "add" path to make sure only intended state is added.
> (or other alternative refactorings to fuse the guard into various methods to 
> make it more efficient)
> There are pros and cons to the two directions above, because {{addExpr()}} 
> used to allow (potentially incorrect) more expressions to get CSE'd, making 
> it more restrictive may cause performance regressions (for the cases that 
> happened to work).
> Example:
> {code:sql}
> select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) 
> from range(2)
> {code}
> Running this query on Spark 3.2 branch returns the correct value:
> {code}
> scala> spark.sql("select max(transform(array(id), x -> x)), 
> max(transform(array(id), x -> x)) from range(2)").collect
> res0: Array[org.apache.spark.sql.Row] = 
> Array([WrappedArray(1),WrappedArray(1)])
> {code}
> Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was 
> (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, 
> and {{getExprState()}} doesn't do extra guarding, so during physical 
> planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the 
> aggregation expression list and the result expressions list.
> {code}
> AdaptiveSparkPlan isFinalPlan=false
> +- SortAggregate(key=[], functions=[max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>  +- Range (0, 2, step=1, splits=16)
> {code}
> Running the same query on current master triggers an error when binding the 
> result expression to the aggregate expression in the Aggregate operators (for 
> a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show 
> up during codegen):
> {code}
> ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): 
> java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), 
> lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in 
> [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, 
> false)))#3]
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
>   at 
> 

[jira] [Assigned] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42851:


Assignee: (was: Apache Spark)

> EquivalentExpressions methods need to be consistently guarded by 
> supportedExpression
> 
>
> Key: SPARK-42851
> URL: https://issues.apache.org/jira/browse/SPARK-42851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kris Mok
>Priority: Major
>
> SPARK-41468 tried to fix a bug but introduced a new regression. Its change to 
> {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the 
> {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same 
> guard to the other "add" entry point -- {{addExpr()}}.
> As such, uses that add single expressions to CSE via {{addExpr()}} may 
> succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a 
> {{None}} due to failing the guard.
> We need to make sure the "add" and "get" methods are consistent. It could be 
> done by one of:
> 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or
> 2. Removing the guard from {{getExprState()}}, relying solely on the guard on 
> the "add" path to make sure only intended state is added.
> (or other alternative refactorings to fuse the guard into various methods to 
> make it more efficient)
> There are pros and cons to the two directions above, because {{addExpr()}} 
> used to allow (potentially incorrect) more expressions to get CSE'd, making 
> it more restrictive may cause performance regressions (for the cases that 
> happened to work).
> Example:
> {code:sql}
> select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) 
> from range(2)
> {code}
> Running this query on Spark 3.2 branch returns the correct value:
> {code}
> scala> spark.sql("select max(transform(array(id), x -> x)), 
> max(transform(array(id), x -> x)) from range(2)").collect
> res0: Array[org.apache.spark.sql.Row] = 
> Array([WrappedArray(1),WrappedArray(1)])
> {code}
> Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was 
> (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, 
> and {{getExprState()}} doesn't do extra guarding, so during physical 
> planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the 
> aggregation expression list and the result expressions list.
> {code}
> AdaptiveSparkPlan isFinalPlan=false
> +- SortAggregate(key=[], functions=[max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>  +- Range (0, 2, step=1, splits=16)
> {code}
> Running the same query on current master triggers an error when binding the 
> result expression to the aggregate expression in the Aggregate operators (for 
> a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show 
> up during codegen):
> {code}
> ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): 
> java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), 
> lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in 
> [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, 
> false)))#3]
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:517)
>   at 

[jira] [Commented] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702033#comment-17702033
 ] 

Apache Spark commented on SPARK-42851:
--

User 'rednaxelafx' has created a pull request for this issue:
https://github.com/apache/spark/pull/40473

> EquivalentExpressions methods need to be consistently guarded by 
> supportedExpression
> 
>
> Key: SPARK-42851
> URL: https://issues.apache.org/jira/browse/SPARK-42851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kris Mok
>Priority: Major
>
> SPARK-41468 tried to fix a bug but introduced a new regression. Its change to 
> {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the 
> {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same 
> guard to the other "add" entry point -- {{addExpr()}}.
> As such, uses that add single expressions to CSE via {{addExpr()}} may 
> succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a 
> {{None}} due to failing the guard.
> We need to make sure the "add" and "get" methods are consistent. It could be 
> done by one of:
> 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or
> 2. Removing the guard from {{getExprState()}}, relying solely on the guard on 
> the "add" path to make sure only intended state is added.
> (or other alternative refactorings to fuse the guard into various methods to 
> make it more efficient)
> There are pros and cons to the two directions above, because {{addExpr()}} 
> used to allow (potentially incorrect) more expressions to get CSE'd, making 
> it more restrictive may cause performance regressions (for the cases that 
> happened to work).
> Example:
> {code:sql}
> select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) 
> from range(2)
> {code}
> Running this query on Spark 3.2 branch returns the correct value:
> {code}
> scala> spark.sql("select max(transform(array(id), x -> x)), 
> max(transform(array(id), x -> x)) from range(2)").collect
> res0: Array[org.apache.spark.sql.Row] = 
> Array([WrappedArray(1),WrappedArray(1)])
> {code}
> Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was 
> (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, 
> and {{getExprState()}} doesn't do extra guarding, so during physical 
> planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the 
> aggregation expression list and the result expressions list.
> {code}
> AdaptiveSparkPlan isFinalPlan=false
> +- SortAggregate(key=[], functions=[max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>  +- Range (0, 2, step=1, splits=16)
> {code}
> Running the same query on current master triggers an error when binding the 
> result expression to the aggregate expression in the Aggregate operators (for 
> a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show 
> up during codegen):
> {code}
> ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): 
> java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), 
> lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in 
> [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, 
> false)))#3]
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
>   at 
> 

[jira] [Assigned] (SPARK-42247) Standardize `returnType` property of UserDefinedFunction

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42247:


Assignee: (was: Apache Spark)

> Standardize `returnType` property of UserDefinedFunction
> 
>
> Key: SPARK-42247
> URL: https://issues.apache.org/jira/browse/SPARK-42247
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> There are checks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42247) Standardize `returnType` property of UserDefinedFunction

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702022#comment-17702022
 ] 

Apache Spark commented on SPARK-42247:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40472

> Standardize `returnType` property of UserDefinedFunction
> 
>
> Key: SPARK-42247
> URL: https://issues.apache.org/jira/browse/SPARK-42247
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> There are checks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42247) Standardize `returnType` property of UserDefinedFunction

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42247:


Assignee: Apache Spark

> Standardize `returnType` property of UserDefinedFunction
> 
>
> Key: SPARK-42247
> URL: https://issues.apache.org/jira/browse/SPARK-42247
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> There are checks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42850) Remove duplicated rule CombineFilters in Optimizer

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42850:


Assignee: Apache Spark  (was: Gengliang Wang)

> Remove duplicated rule CombineFilters in Optimizer
> --
>
> Key: SPARK-42850
> URL: https://issues.apache.org/jira/browse/SPARK-42850
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42850) Remove duplicated rule CombineFilters in Optimizer

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42850:


Assignee: Gengliang Wang  (was: Apache Spark)

> Remove duplicated rule CombineFilters in Optimizer
> --
>
> Key: SPARK-42850
> URL: https://issues.apache.org/jira/browse/SPARK-42850
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42850) Remove duplicated rule CombineFilters in Optimizer

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702021#comment-17702021
 ] 

Apache Spark commented on SPARK-42850:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40471

> Remove duplicated rule CombineFilters in Optimizer
> --
>
> Key: SPARK-42850
> URL: https://issues.apache.org/jira/browse/SPARK-42850
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41843) Implement SparkSession.udf

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702006#comment-17702006
 ] 

Apache Spark commented on SPARK-41843:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40470

> Implement SparkSession.udf
> --
>
> Key: SPARK-41843
> URL: https://issues.apache.org/jira/browse/SPARK-41843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2331, in pyspark.sql.connect.functions.call_udf
> Failed example:
>     _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
>     AttributeError: 'SparkSession' object has no attribute 'udf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41818) Support DataFrameWriter.saveAsTable

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702003#comment-17702003
 ] 

Apache Spark commented on SPARK-41818:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40470

> Support DataFrameWriter.saveAsTable
> ---
>
> Key: SPARK-41818
> URL: https://issues.apache.org/jira/browse/SPARK-41818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
>     df.write.saveAsTable("tblA")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in 
> 
>         df.write.saveAsTable("tblA")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 350, in saveAsTable
>         
> self._spark.client.execute_command(self._write.command(self._spark.client))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 547, in _execute
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (java.lang.ClassNotFoundException) .DefaultSource{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41843) Implement SparkSession.udf

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702004#comment-17702004
 ] 

Apache Spark commented on SPARK-41843:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40470

> Implement SparkSession.udf
> --
>
> Key: SPARK-41843
> URL: https://issues.apache.org/jira/browse/SPARK-41843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2331, in pyspark.sql.connect.functions.call_udf
> Failed example:
>     _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
>     AttributeError: 'SparkSession' object has no attribute 'udf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41843) Implement SparkSession.udf

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702005#comment-17702005
 ] 

Apache Spark commented on SPARK-41843:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40470

> Implement SparkSession.udf
> --
>
> Key: SPARK-41843
> URL: https://issues.apache.org/jira/browse/SPARK-41843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2331, in pyspark.sql.connect.functions.call_udf
> Failed example:
>     _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
>     AttributeError: 'SparkSession' object has no attribute 'udf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41818) Support DataFrameWriter.saveAsTable

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702002#comment-17702002
 ] 

Apache Spark commented on SPARK-41818:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40470

> Support DataFrameWriter.saveAsTable
> ---
>
> Key: SPARK-41818
> URL: https://issues.apache.org/jira/browse/SPARK-41818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
>     df.write.saveAsTable("tblA")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in 
> 
>         df.write.saveAsTable("tblA")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 350, in saveAsTable
>         
> self._spark.client.execute_command(self._write.command(self._spark.client))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 547, in _execute
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (java.lang.ClassNotFoundException) .DefaultSource{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42848) Implement DataFrame.registerTempTable

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701934#comment-17701934
 ] 

Apache Spark commented on SPARK-42848:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40469

> Implement DataFrame.registerTempTable
> -
>
> Key: SPARK-42848
> URL: https://issues.apache.org/jira/browse/SPARK-42848
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42848) Implement DataFrame.registerTempTable

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701933#comment-17701933
 ] 

Apache Spark commented on SPARK-42848:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40469

> Implement DataFrame.registerTempTable
> -
>
> Key: SPARK-42848
> URL: https://issues.apache.org/jira/browse/SPARK-42848
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42848) Implement DataFrame.registerTempTable

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42848:


Assignee: Apache Spark

> Implement DataFrame.registerTempTable
> -
>
> Key: SPARK-42848
> URL: https://issues.apache.org/jira/browse/SPARK-42848
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42848) Implement DataFrame.registerTempTable

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42848:


Assignee: (was: Apache Spark)

> Implement DataFrame.registerTempTable
> -
>
> Key: SPARK-42848
> URL: https://issues.apache.org/jira/browse/SPARK-42848
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701903#comment-17701903
 ] 

Apache Spark commented on SPARK-42833:
--

User 'kazuyukitanimura' has created a pull request for this issue:
https://github.com/apache/spark/pull/40465

> Refactor `applyExtensions` in `SparkSession`
> 
>
> Key: SPARK-42833
> URL: https://issues.apache.org/jira/browse/SPARK-42833
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> Refactor `applyExtensions` in `SparkSession` in order to reduce the 
> duplicated codes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42584) Improve output of Column.explain

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42584:


Assignee: Apache Spark

> Improve output of Column.explain
> 
>
> Key: SPARK-42584
> URL: https://issues.apache.org/jira/browse/SPARK-42584
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> We currently display the structure of the proto in both the regular and 
> extended version of explain. We should display a more compact sql-a-like 
> string for the regular version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42584) Improve output of Column.explain

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42584:


Assignee: (was: Apache Spark)

> Improve output of Column.explain
> 
>
> Key: SPARK-42584
> URL: https://issues.apache.org/jira/browse/SPARK-42584
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> We currently display the structure of the proto in both the regular and 
> extended version of explain. We should display a more compact sql-a-like 
> string for the regular version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42584) Improve output of Column.explain

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701690#comment-17701690
 ] 

Apache Spark commented on SPARK-42584:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40467

> Improve output of Column.explain
> 
>
> Key: SPARK-42584
> URL: https://issues.apache.org/jira/browse/SPARK-42584
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> We currently display the structure of the proto in both the regular and 
> extended version of explain. We should display a more compact sql-a-like 
> string for the regular version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42835) Add test cases for Column.explain

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701623#comment-17701623
 ] 

Apache Spark commented on SPARK-42835:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40466

> Add test cases for Column.explain
> -
>
> Key: SPARK-42835
> URL: https://issues.apache.org/jira/browse/SPARK-42835
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42835) Add test cases for Column.explain

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42835:


Assignee: (was: Apache Spark)

> Add test cases for Column.explain
> -
>
> Key: SPARK-42835
> URL: https://issues.apache.org/jira/browse/SPARK-42835
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42835) Add test cases for Column.explain

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701621#comment-17701621
 ] 

Apache Spark commented on SPARK-42835:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40466

> Add test cases for Column.explain
> -
>
> Key: SPARK-42835
> URL: https://issues.apache.org/jira/browse/SPARK-42835
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42835) Add test cases for Column.explain

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42835:


Assignee: Apache Spark

> Add test cases for Column.explain
> -
>
> Key: SPARK-42835
> URL: https://issues.apache.org/jira/browse/SPARK-42835
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42833:


Assignee: Apache Spark

> Refactor `applyExtensions` in `SparkSession`
> 
>
> Key: SPARK-42833
> URL: https://issues.apache.org/jira/browse/SPARK-42833
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Apache Spark
>Priority: Minor
>
> Refactor `applyExtensions` in `SparkSession` in order to reduce the 
> duplicated codes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42833:


Assignee: (was: Apache Spark)

> Refactor `applyExtensions` in `SparkSession`
> 
>
> Key: SPARK-42833
> URL: https://issues.apache.org/jira/browse/SPARK-42833
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> Refactor `applyExtensions` in `SparkSession` in order to reduce the 
> duplicated codes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42557) Add Broadcast to functions

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701527#comment-17701527
 ] 

Apache Spark commented on SPARK-42557:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40463

> Add Broadcast to functions
> --
>
> Key: SPARK-42557
> URL: https://issues.apache.org/jira/browse/SPARK-42557
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.1
>
>
> Add the {{broadcast}} function to functions.scala. Please check if we can get 
> the same semantics as the current implementation using unresolved hints.
> https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42557) Add Broadcast to functions

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701526#comment-17701526
 ] 

Apache Spark commented on SPARK-42557:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40463

> Add Broadcast to functions
> --
>
> Key: SPARK-42557
> URL: https://issues.apache.org/jira/browse/SPARK-42557
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.1
>
>
> Add the {{broadcast}} function to functions.scala. Please check if we can get 
> the same semantics as the current implementation using unresolved hints.
> https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42832) Remove repartition if it is the child of LocalLimit

2023-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701478#comment-17701478
 ] 

Apache Spark commented on SPARK-42832:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40462

> Remove repartition if it is the child of LocalLimit
> ---
>
> Key: SPARK-42832
> URL: https://issues.apache.org/jira/browse/SPARK-42832
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42832) Remove repartition if it is the child of LocalLimit

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42832:


Assignee: (was: Apache Spark)

> Remove repartition if it is the child of LocalLimit
> ---
>
> Key: SPARK-42832
> URL: https://issues.apache.org/jira/browse/SPARK-42832
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42832) Remove repartition if it is the child of LocalLimit

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42832:


Assignee: Apache Spark

> Remove repartition if it is the child of LocalLimit
> ---
>
> Key: SPARK-42832
> URL: https://issues.apache.org/jira/browse/SPARK-42832
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42831) Show result expressions in AggregateExec

2023-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701471#comment-17701471
 ] 

Apache Spark commented on SPARK-42831:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/40461

> Show result expressions in AggregateExec
> 
>
> Key: SPARK-42831
> URL: https://issues.apache.org/jira/browse/SPARK-42831
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wan Kun
>Priority: Minor
>
> If the result expressions in AggregateExec are not empty, we should display 
> them. Or we will get confused because some important expressions do not show 
> up in the DAG.
> For example, the plan for query *SELECT sum(p) from values(cast(23.4 as 
> decimal(7,2))) t(p)*  was incorrect because the result expression 
> *MakeDecimal(sum(UnscaledValue(p#0))#1L,17,2) AS sum(p)#2* is not displayed
> Before 
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[], functions=[sum(UnscaledValue(p#0))], 
> output=[sum(p)#2])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- HashAggregate(keys=[], functions=[partial_sum(UnscaledValue(p#0))], 
> output=[sum#5L])
>  +- LocalTableScan [p#0]
> {code}
> After
> {code:java}
> == Physical Plan == 
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[], functions=[sum(UnscaledValue(p#0))], 
> results=[MakeDecimal(sum(UnscaledValue(p#0))#1L,17,2) AS sum(p)#2], 
> output=[sum(p)#2])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=38]
>   +- HashAggregate(keys=[], functions=[partial_sum(UnscaledValue(p#0))], 
> results=[sum#13L], output=[sum#13L])
>  +- LocalTableScan [p#0]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42831) Show result expressions in AggregateExec

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42831:


Assignee: (was: Apache Spark)

> Show result expressions in AggregateExec
> 
>
> Key: SPARK-42831
> URL: https://issues.apache.org/jira/browse/SPARK-42831
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wan Kun
>Priority: Minor
>
> If the result expressions in AggregateExec are not empty, we should display 
> them. Or we will get confused because some important expressions do not show 
> up in the DAG.
> For example, the plan for query *SELECT sum(p) from values(cast(23.4 as 
> decimal(7,2))) t(p)*  was incorrect because the result expression 
> *MakeDecimal(sum(UnscaledValue(p#0))#1L,17,2) AS sum(p)#2* is not displayed
> Before 
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[], functions=[sum(UnscaledValue(p#0))], 
> output=[sum(p)#2])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- HashAggregate(keys=[], functions=[partial_sum(UnscaledValue(p#0))], 
> output=[sum#5L])
>  +- LocalTableScan [p#0]
> {code}
> After
> {code:java}
> == Physical Plan == 
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[], functions=[sum(UnscaledValue(p#0))], 
> results=[MakeDecimal(sum(UnscaledValue(p#0))#1L,17,2) AS sum(p)#2], 
> output=[sum(p)#2])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=38]
>   +- HashAggregate(keys=[], functions=[partial_sum(UnscaledValue(p#0))], 
> results=[sum#13L], output=[sum#13L])
>  +- LocalTableScan [p#0]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42831) Show result expressions in AggregateExec

2023-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701470#comment-17701470
 ] 

Apache Spark commented on SPARK-42831:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/40461

> Show result expressions in AggregateExec
> 
>
> Key: SPARK-42831
> URL: https://issues.apache.org/jira/browse/SPARK-42831
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wan Kun
>Priority: Minor
>
> If the result expressions in AggregateExec are not empty, we should display 
> them. Or we will get confused because some important expressions do not show 
> up in the DAG.
> For example, the plan for query *SELECT sum(p) from values(cast(23.4 as 
> decimal(7,2))) t(p)*  was incorrect because the result expression 
> *MakeDecimal(sum(UnscaledValue(p#0))#1L,17,2) AS sum(p)#2* is not displayed
> Before 
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[], functions=[sum(UnscaledValue(p#0))], 
> output=[sum(p)#2])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- HashAggregate(keys=[], functions=[partial_sum(UnscaledValue(p#0))], 
> output=[sum#5L])
>  +- LocalTableScan [p#0]
> {code}
> After
> {code:java}
> == Physical Plan == 
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[], functions=[sum(UnscaledValue(p#0))], 
> results=[MakeDecimal(sum(UnscaledValue(p#0))#1L,17,2) AS sum(p)#2], 
> output=[sum(p)#2])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=38]
>   +- HashAggregate(keys=[], functions=[partial_sum(UnscaledValue(p#0))], 
> results=[sum#13L], output=[sum#13L])
>  +- LocalTableScan [p#0]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42831) Show result expressions in AggregateExec

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42831:


Assignee: Apache Spark

> Show result expressions in AggregateExec
> 
>
> Key: SPARK-42831
> URL: https://issues.apache.org/jira/browse/SPARK-42831
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wan Kun
>Assignee: Apache Spark
>Priority: Minor
>
> If the result expressions in AggregateExec are not empty, we should display 
> them. Or we will get confused because some important expressions do not show 
> up in the DAG.
> For example, the plan for query *SELECT sum(p) from values(cast(23.4 as 
> decimal(7,2))) t(p)*  was incorrect because the result expression 
> *MakeDecimal(sum(UnscaledValue(p#0))#1L,17,2) AS sum(p)#2* is not displayed
> Before 
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[], functions=[sum(UnscaledValue(p#0))], 
> output=[sum(p)#2])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- HashAggregate(keys=[], functions=[partial_sum(UnscaledValue(p#0))], 
> output=[sum#5L])
>  +- LocalTableScan [p#0]
> {code}
> After
> {code:java}
> == Physical Plan == 
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[], functions=[sum(UnscaledValue(p#0))], 
> results=[MakeDecimal(sum(UnscaledValue(p#0))#1L,17,2) AS sum(p)#2], 
> output=[sum(p)#2])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=38]
>   +- HashAggregate(keys=[], functions=[partial_sum(UnscaledValue(p#0))], 
> results=[sum#13L], output=[sum#13L])
>  +- LocalTableScan [p#0]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42828) PySpark type hint returns Any for methods on GroupedData

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42828:


Assignee: Apache Spark

> PySpark type hint returns Any for methods on GroupedData
> 
>
> Key: SPARK-42828
> URL: https://issues.apache.org/jira/browse/SPARK-42828
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: Joe Wang
>Assignee: Apache Spark
>Priority: Minor
>
> Since upgrading to PySpark 3.3.x, type hints for
> {code:java}
> df.groupBy(...).count(){code}
> are now returning Any instead of DataFrame, causing type inference issues 
> downstream. This used to be correctly typed prior to 3.3.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42828) PySpark type hint returns Any for methods on GroupedData

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42828:


Assignee: (was: Apache Spark)

> PySpark type hint returns Any for methods on GroupedData
> 
>
> Key: SPARK-42828
> URL: https://issues.apache.org/jira/browse/SPARK-42828
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: Joe Wang
>Priority: Minor
>
> Since upgrading to PySpark 3.3.x, type hints for
> {code:java}
> df.groupBy(...).count(){code}
> are now returning Any instead of DataFrame, causing type inference issues 
> downstream. This used to be correctly typed prior to 3.3.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42828) PySpark type hint returns Any for methods on GroupedData

2023-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701345#comment-17701345
 ] 

Apache Spark commented on SPARK-42828:
--

User 'j03wang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40460

> PySpark type hint returns Any for methods on GroupedData
> 
>
> Key: SPARK-42828
> URL: https://issues.apache.org/jira/browse/SPARK-42828
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: Joe Wang
>Priority: Minor
>
> Since upgrading to PySpark 3.3.x, type hints for
> {code:java}
> df.groupBy(...).count(){code}
> are now returning Any instead of DataFrame, causing type inference issues 
> downstream. This used to be correctly typed prior to 3.3.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42826) Add migration note for API changes

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42826:


Assignee: (was: Apache Spark)

> Add migration note for API changes
> --
>
> Key: SPARK-42826
> URL: https://issues.apache.org/jira/browse/SPARK-42826
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We deprecate & remove some APIs from 
> https://issues.apache.org/jira/browse/SPARK-42593. to follow the pandas.
> We should mention this in migration guide.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42826) Add migration note for API changes

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42826:


Assignee: Apache Spark

> Add migration note for API changes
> --
>
> Key: SPARK-42826
> URL: https://issues.apache.org/jira/browse/SPARK-42826
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We deprecate & remove some APIs from 
> https://issues.apache.org/jira/browse/SPARK-42593. to follow the pandas.
> We should mention this in migration guide.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42826) Add migration note for API changes

2023-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701295#comment-17701295
 ] 

Apache Spark commented on SPARK-42826:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40459

> Add migration note for API changes
> --
>
> Key: SPARK-42826
> URL: https://issues.apache.org/jira/browse/SPARK-42826
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We deprecate & remove some APIs from 
> https://issues.apache.org/jira/browse/SPARK-42593. to follow the pandas.
> We should mention this in migration guide.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42824) Provide a clear error message for unsupported JVM attributes.

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42824:


Assignee: Apache Spark

> Provide a clear error message for unsupported JVM attributes.
> -
>
> Key: SPARK-42824
> URL: https://issues.apache.org/jira/browse/SPARK-42824
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> There are attributes, such as "_jvm", that were accessible in PySpark but 
> cannot be accessed in Spark Connect. We need to display appropriate error 
> messages for these cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42824) Provide a clear error message for unsupported JVM attributes.

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42824:


Assignee: (was: Apache Spark)

> Provide a clear error message for unsupported JVM attributes.
> -
>
> Key: SPARK-42824
> URL: https://issues.apache.org/jira/browse/SPARK-42824
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There are attributes, such as "_jvm", that were accessible in PySpark but 
> cannot be accessed in Spark Connect. We need to display appropriate error 
> messages for these cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42824) Provide a clear error message for unsupported JVM attributes.

2023-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701220#comment-17701220
 ] 

Apache Spark commented on SPARK-42824:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40458

> Provide a clear error message for unsupported JVM attributes.
> -
>
> Key: SPARK-42824
> URL: https://issues.apache.org/jira/browse/SPARK-42824
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There are attributes, such as "_jvm", that were accessible in PySpark but 
> cannot be accessed in Spark Connect. We need to display appropriate error 
> messages for these cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41233) High-order function: array_prepend

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41233:


Assignee: (was: Apache Spark)

> High-order function: array_prepend
> --
>
> Key: SPARK-41233
> URL: https://issues.apache.org/jira/browse/SPARK-41233
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> refer to 
> https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html
> 1, about the data type validation:
> In Snowflake’s array_append, array_prepend and array_insert functions, the 
> element data type does not need to match the data type of the existing 
> elements in the array.
> While in Spark, we want to leverage the same data type validation as 
> array_remove.
> 2, about the NULL handling
> Currently, SparkSQL, SnowSQL and PostgreSQL deal with NULL values in 
> different ways.
> Existing functions array_contains, array_position and array_remove in 
> SparkSQL handle NULL in this way, if the input array or/and element is NULL, 
> returns NULL. However, this behavior should be broken.
> We should implement the NULL handling in array_prepend in this way:
> 2.1, if the array is NULL, returns NULL;
> 2.2 if the array is not NULL, the element is NULL, append the NULL value into 
> the array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41233) High-order function: array_prepend

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41233:


Assignee: Apache Spark

> High-order function: array_prepend
> --
>
> Key: SPARK-41233
> URL: https://issues.apache.org/jira/browse/SPARK-41233
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>
> refer to 
> https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html
> 1, about the data type validation:
> In Snowflake’s array_append, array_prepend and array_insert functions, the 
> element data type does not need to match the data type of the existing 
> elements in the array.
> While in Spark, we want to leverage the same data type validation as 
> array_remove.
> 2, about the NULL handling
> Currently, SparkSQL, SnowSQL and PostgreSQL deal with NULL values in 
> different ways.
> Existing functions array_contains, array_position and array_remove in 
> SparkSQL handle NULL in this way, if the input array or/and element is NULL, 
> returns NULL. However, this behavior should be broken.
> We should implement the NULL handling in array_prepend in this way:
> 2.1, if the array is NULL, returns NULL;
> 2.2 if the array is not NULL, the element is NULL, append the NULL value into 
> the array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42823) spark-sql shell supports multipart namespaces for initialization

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42823:


Assignee: Apache Spark

> spark-sql shell supports multipart namespaces for initialization
> 
>
> Key: SPARK-42823
> URL: https://issues.apache.org/jira/browse/SPARK-42823
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42823) spark-sql shell supports multipart namespaces for initialization

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42823:


Assignee: (was: Apache Spark)

> spark-sql shell supports multipart namespaces for initialization
> 
>
> Key: SPARK-42823
> URL: https://issues.apache.org/jira/browse/SPARK-42823
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42823) spark-sql shell supports multipart namespaces for initialization

2023-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701078#comment-17701078
 ] 

Apache Spark commented on SPARK-42823:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/40457

> spark-sql shell supports multipart namespaces for initialization
> 
>
> Key: SPARK-42823
> URL: https://issues.apache.org/jira/browse/SPARK-42823
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42720) Refactor the withSequenceColumn

2023-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701051#comment-17701051
 ] 

Apache Spark commented on SPARK-42720:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40456

> Refactor the withSequenceColumn
> ---
>
> Key: SPARK-42720
> URL: https://issues.apache.org/jira/browse/SPARK-42720
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42720) Refactor the withSequenceColumn

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42720:


Assignee: (was: Apache Spark)

> Refactor the withSequenceColumn
> ---
>
> Key: SPARK-42720
> URL: https://issues.apache.org/jira/browse/SPARK-42720
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42720) Refactor the withSequenceColumn

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42720:


Assignee: Apache Spark

> Refactor the withSequenceColumn
> ---
>
> Key: SPARK-42720
> URL: https://issues.apache.org/jira/browse/SPARK-42720
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42720) Refactor the withSequenceColumn

2023-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701050#comment-17701050
 ] 

Apache Spark commented on SPARK-42720:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40456

> Refactor the withSequenceColumn
> ---
>
> Key: SPARK-42720
> URL: https://issues.apache.org/jira/browse/SPARK-42720
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42819) Add support for setting max_write_buffer_number and write_buffer_size for RocksDB used in streaming

2023-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700994#comment-17700994
 ] 

Apache Spark commented on SPARK-42819:
--

User 'anishshri-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40455

> Add support for setting max_write_buffer_number and write_buffer_size for 
> RocksDB used in streaming
> ---
>
> Key: SPARK-42819
> URL: https://issues.apache.org/jira/browse/SPARK-42819
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Add support for setting max_write_buffer_number and write_buffer_size for 
> RocksDB used in streaming
>  
> We need these settings in order to control memory tuning for RocksDB. We 
> already expose settings for blockCache size. However, these 2 settings are 
> missing. This change proposes to add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42819) Add support for setting max_write_buffer_number and write_buffer_size for RocksDB used in streaming

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42819:


Assignee: Apache Spark

> Add support for setting max_write_buffer_number and write_buffer_size for 
> RocksDB used in streaming
> ---
>
> Key: SPARK-42819
> URL: https://issues.apache.org/jira/browse/SPARK-42819
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Assignee: Apache Spark
>Priority: Major
>
> Add support for setting max_write_buffer_number and write_buffer_size for 
> RocksDB used in streaming
>  
> We need these settings in order to control memory tuning for RocksDB. We 
> already expose settings for blockCache size. However, these 2 settings are 
> missing. This change proposes to add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42819) Add support for setting max_write_buffer_number and write_buffer_size for RocksDB used in streaming

2023-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42819:


Assignee: (was: Apache Spark)

> Add support for setting max_write_buffer_number and write_buffer_size for 
> RocksDB used in streaming
> ---
>
> Key: SPARK-42819
> URL: https://issues.apache.org/jira/browse/SPARK-42819
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Add support for setting max_write_buffer_number and write_buffer_size for 
> RocksDB used in streaming
>  
> We need these settings in order to control memory tuning for RocksDB. We 
> already expose settings for blockCache size. However, these 2 settings are 
> missing. This change proposes to add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42819) Add support for setting max_write_buffer_number and write_buffer_size for RocksDB used in streaming

2023-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700993#comment-17700993
 ] 

Apache Spark commented on SPARK-42819:
--

User 'anishshri-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40455

> Add support for setting max_write_buffer_number and write_buffer_size for 
> RocksDB used in streaming
> ---
>
> Key: SPARK-42819
> URL: https://issues.apache.org/jira/browse/SPARK-42819
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Add support for setting max_write_buffer_number and write_buffer_size for 
> RocksDB used in streaming
>  
> We need these settings in order to control memory tuning for RocksDB. We 
> already expose settings for blockCache size. However, these 2 settings are 
> missing. This change proposes to add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42821) Remove unused parameters in splitFiles methods

2023-03-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700940#comment-17700940
 ] 

Apache Spark commented on SPARK-42821:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40454

> Remove unused parameters in splitFiles methods
> --
>
> Key: SPARK-42821
> URL: https://issues.apache.org/jira/browse/SPARK-42821
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42821) Remove unused parameters in splitFiles methods

2023-03-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700939#comment-17700939
 ] 

Apache Spark commented on SPARK-42821:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40454

> Remove unused parameters in splitFiles methods
> --
>
> Key: SPARK-42821
> URL: https://issues.apache.org/jira/browse/SPARK-42821
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42821) Remove unused parameters in splitFiles methods

2023-03-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42821:


Assignee: Apache Spark

> Remove unused parameters in splitFiles methods
> --
>
> Key: SPARK-42821
> URL: https://issues.apache.org/jira/browse/SPARK-42821
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42821) Remove unused parameters in splitFiles methods

2023-03-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42821:


Assignee: (was: Apache Spark)

> Remove unused parameters in splitFiles methods
> --
>
> Key: SPARK-42821
> URL: https://issues.apache.org/jira/browse/SPARK-42821
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42820) Update ORC to 1.8.3

2023-03-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700934#comment-17700934
 ] 

Apache Spark commented on SPARK-42820:
--

User 'williamhyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40453

> Update ORC to 1.8.3
> ---
>
> Key: SPARK-42820
> URL: https://issues.apache.org/jira/browse/SPARK-42820
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42820) Update ORC to 1.8.3

2023-03-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42820:


Assignee: (was: Apache Spark)

> Update ORC to 1.8.3
> ---
>
> Key: SPARK-42820
> URL: https://issues.apache.org/jira/browse/SPARK-42820
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42820) Update ORC to 1.8.3

2023-03-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42820:


Assignee: Apache Spark

> Update ORC to 1.8.3
> ---
>
> Key: SPARK-42820
> URL: https://issues.apache.org/jira/browse/SPARK-42820
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: William Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42820) Update ORC to 1.8.3

2023-03-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700933#comment-17700933
 ] 

Apache Spark commented on SPARK-42820:
--

User 'williamhyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40453

> Update ORC to 1.8.3
> ---
>
> Key: SPARK-42820
> URL: https://issues.apache.org/jira/browse/SPARK-42820
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42818) Implement DataFrameReader/Writer.jdbc

2023-03-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700925#comment-17700925
 ] 

Apache Spark commented on SPARK-42818:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40451

> Implement DataFrameReader/Writer.jdbc
> -
>
> Key: SPARK-42818
> URL: https://issues.apache.org/jira/browse/SPARK-42818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42818) Implement DataFrameReader/Writer.jdbc

2023-03-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42818:


Assignee: Apache Spark

> Implement DataFrameReader/Writer.jdbc
> -
>
> Key: SPARK-42818
> URL: https://issues.apache.org/jira/browse/SPARK-42818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42818) Implement DataFrameReader/Writer.jdbc

2023-03-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42818:


Assignee: (was: Apache Spark)

> Implement DataFrameReader/Writer.jdbc
> -
>
> Key: SPARK-42818
> URL: https://issues.apache.org/jira/browse/SPARK-42818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42818) Implement DataFrameReader/Writer.jdbc

2023-03-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700877#comment-17700877
 ] 

Apache Spark commented on SPARK-42818:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40450

> Implement DataFrameReader/Writer.jdbc
> -
>
> Key: SPARK-42818
> URL: https://issues.apache.org/jira/browse/SPARK-42818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42791) Create golden file test framework for analysis

2023-03-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42791:


Assignee: (was: Apache Spark)

> Create golden file test framework for analysis
> --
>
> Key: SPARK-42791
> URL: https://issues.apache.org/jira/browse/SPARK-42791
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>
> Here we track the work to add new golden file test support for the Spark 
> analyzer. Each golden file can contain a list of SQL queries followed by the 
> string representations of their analyzed logical plans.
>  
> This can be similar to Spark's existing `SQLQueryTestSuite` [1], but stopping 
> after analysis and listing analyzed plans as the results instead of fully 
> executing queries end-to-end. As another example, ZetaSQL has analyzer-based 
> golden file testing like this as well [2].
>  
> This way, any changes to analysis will show up as test diffs, which are easy 
> to spot in review and also easy to update automatically. This could help the 
> community together maintain the qualify of Apache Spark's query analysis.
>  
> [1] 
> [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala]
>  
> [2] 
> [https://github.com/google/zetasql/blob/master/zetasql/analyzer/testdata/limit.test].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42791) Create golden file test framework for analysis

2023-03-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700809#comment-17700809
 ] 

Apache Spark commented on SPARK-42791:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/40449

> Create golden file test framework for analysis
> --
>
> Key: SPARK-42791
> URL: https://issues.apache.org/jira/browse/SPARK-42791
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>
> Here we track the work to add new golden file test support for the Spark 
> analyzer. Each golden file can contain a list of SQL queries followed by the 
> string representations of their analyzed logical plans.
>  
> This can be similar to Spark's existing `SQLQueryTestSuite` [1], but stopping 
> after analysis and listing analyzed plans as the results instead of fully 
> executing queries end-to-end. As another example, ZetaSQL has analyzer-based 
> golden file testing like this as well [2].
>  
> This way, any changes to analysis will show up as test diffs, which are easy 
> to spot in review and also easy to update automatically. This could help the 
> community together maintain the qualify of Apache Spark's query analysis.
>  
> [1] 
> [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala]
>  
> [2] 
> [https://github.com/google/zetasql/blob/master/zetasql/analyzer/testdata/limit.test].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42791) Create golden file test framework for analysis

2023-03-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42791:


Assignee: Apache Spark

> Create golden file test framework for analysis
> --
>
> Key: SPARK-42791
> URL: https://issues.apache.org/jira/browse/SPARK-42791
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Apache Spark
>Priority: Major
>
> Here we track the work to add new golden file test support for the Spark 
> analyzer. Each golden file can contain a list of SQL queries followed by the 
> string representations of their analyzed logical plans.
>  
> This can be similar to Spark's existing `SQLQueryTestSuite` [1], but stopping 
> after analysis and listing analyzed plans as the results instead of fully 
> executing queries end-to-end. As another example, ZetaSQL has analyzer-based 
> golden file testing like this as well [2].
>  
> This way, any changes to analysis will show up as test diffs, which are easy 
> to spot in review and also easy to update automatically. This could help the 
> community together maintain the qualify of Apache Spark's query analysis.
>  
> [1] 
> [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala]
>  
> [2] 
> [https://github.com/google/zetasql/blob/master/zetasql/analyzer/testdata/limit.test].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42817) Spark driver logs are filled with Initializing service data for shuffle service using name

2023-03-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700808#comment-17700808
 ] 

Apache Spark commented on SPARK-42817:
--

User 'otterc' has created a pull request for this issue:
https://github.com/apache/spark/pull/40448

> Spark driver logs are filled with Initializing service data for shuffle 
> service using name
> --
>
> Key: SPARK-42817
> URL: https://issues.apache.org/jira/browse/SPARK-42817
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Priority: Major
>
> With SPARK-34828, we added the ability to make the shuffle service name 
> configurable and we added a log 
> [here|https://github.com/apache/spark/blob/8860f69455e5a722626194c4797b4b42cccd4510/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L118]
>  that will log the shuffle service name. However, this log is printed in the 
> driver logs whenever there is new executor launched and pollutes the log. 
> {code}
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> {code}
> We can just log this once in the driver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42817) Spark driver logs are filled with Initializing service data for shuffle service using name

2023-03-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42817:


Assignee: (was: Apache Spark)

> Spark driver logs are filled with Initializing service data for shuffle 
> service using name
> --
>
> Key: SPARK-42817
> URL: https://issues.apache.org/jira/browse/SPARK-42817
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Priority: Major
>
> With SPARK-34828, we added the ability to make the shuffle service name 
> configurable and we added a log 
> [here|https://github.com/apache/spark/blob/8860f69455e5a722626194c4797b4b42cccd4510/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L118]
>  that will log the shuffle service name. However, this log is printed in the 
> driver logs whenever there is new executor launched and pollutes the log. 
> {code}
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> {code}
> We can just log this once in the driver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >