[jira] [Commented] (SPARK-35972) NestColumnPruning cause execute loss output
[ https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373258#comment-17373258 ] Apache Spark commented on SPARK-35972: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33183 > NestColumnPruning cause execute loss output > --- > > Key: SPARK-35972 > URL: https://issues.apache.org/jira/browse/SPARK-35972 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: angerszhu >Priority: Major > > {code:java} > Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most > recent failure: Lost task 47.3 in stage 1.0 (TID 328) > (ip-idata-server.shopee.io executor 3): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: _gen_alias_788#788 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.cataly
[jira] [Commented] (SPARK-35972) NestColumnPruning cause execute loss output
[ https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373255#comment-17373255 ] Apache Spark commented on SPARK-35972: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33183 > NestColumnPruning cause execute loss output > --- > > Key: SPARK-35972 > URL: https://issues.apache.org/jira/browse/SPARK-35972 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: angerszhu >Priority: Major > > {code:java} > Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most > recent failure: Lost task 47.3 in stage 1.0 (TID 328) > (ip-idata-server.shopee.io executor 3): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: _gen_alias_788#788 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.cataly
[jira] [Assigned] (SPARK-35972) NestColumnPruning cause execute loss output
[ https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35972: Assignee: (was: Apache Spark) > NestColumnPruning cause execute loss output > --- > > Key: SPARK-35972 > URL: https://issues.apache.org/jira/browse/SPARK-35972 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: angerszhu >Priority: Major > > {code:java} > Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most > recent failure: Lost task 47.3 in stage 1.0 (TID 328) > (ip-idata-server.shopee.io executor 3): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: _gen_alias_788#788 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformD
[jira] [Assigned] (SPARK-35972) NestColumnPruning cause execute loss output
[ https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35972: Assignee: Apache Spark > NestColumnPruning cause execute loss output > --- > > Key: SPARK-35972 > URL: https://issues.apache.org/jira/browse/SPARK-35972 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > {code:java} > Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most > recent failure: Lost task 47.3 in stage 1.0 (TID 328) > (ip-idata-server.shopee.io executor 3): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: _gen_alias_788#788 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.
[jira] [Updated] (SPARK-35978) Support new keyword TIMESTAMP_LTZ
[ https://issues.apache.org/jira/browse/SPARK-35978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-35978: --- Affects Version/s: (was: 3.3.0) > Support new keyword TIMESTAMP_LTZ > - > > Key: SPARK-35978 > URL: https://issues.apache.org/jira/browse/SPARK-35978 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Major > > Support new keyword TIMESTAMP_LTZ, which can be used for: > * timestamp with local time zone data type in DDL > * timestamp with local time zone data type in Cast clause. > * timestamp with local time zone data type literal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35978) Support new keyword TIMESTAMP_LTZ
[ https://issues.apache.org/jira/browse/SPARK-35978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-35978: --- Affects Version/s: 3.3.0 > Support new keyword TIMESTAMP_LTZ > - > > Key: SPARK-35978 > URL: https://issues.apache.org/jira/browse/SPARK-35978 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Gengliang Wang >Priority: Major > > Support new keyword TIMESTAMP_LTZ, which can be used for: > * timestamp with local time zone data type in DDL > * timestamp with local time zone data type in Cast clause. > * timestamp with local time zone data type literal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35955) Fix decimal overflow issues for Average
[ https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35955. Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33177 [https://github.com/apache/spark/pull/33177] > Fix decimal overflow issues for Average > --- > > Key: SPARK-35955 > URL: https://issues.apache.org/jira/browse/SPARK-35955 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Karen Feng >Assignee: Karen Feng >Priority: Major > Fix For: 3.2.0 > > > Fix decimal overflow issues for decimal average in ANSI mode. Linked to > SPARK-32018 and SPARK-28067, which address decimal sum. > Repro: > > {code:java} > import org.apache.spark.sql.functions._ > spark.conf.set("spark.sql.ansi.enabled", true) > val df = Seq( > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2)).toDF("decNum", "intNum") > val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, > "intNum").agg(mean("decNum")) > df2.show(40,false) > {code} > > Should throw an exception (as sum overflows), but instead returns: > > {code:java} > +---+ > |avg(decNum)| > +---+ > |null | > +---+{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35955) Fix decimal overflow issues for Average
[ https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-35955: -- Assignee: Karen Feng > Fix decimal overflow issues for Average > --- > > Key: SPARK-35955 > URL: https://issues.apache.org/jira/browse/SPARK-35955 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Karen Feng >Assignee: Karen Feng >Priority: Major > > Fix decimal overflow issues for decimal average in ANSI mode. Linked to > SPARK-32018 and SPARK-28067, which address decimal sum. > Repro: > > {code:java} > import org.apache.spark.sql.functions._ > spark.conf.set("spark.sql.ansi.enabled", true) > val df = Seq( > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2)).toDF("decNum", "intNum") > val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, > "intNum").agg(mean("decNum")) > df2.show(40,false) > {code} > > Should throw an exception (as sum overflows), but instead returns: > > {code:java} > +---+ > |avg(decNum)| > +---+ > |null | > +---+{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-35897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-35897: -- Assignee: Rahul Shivu Mahadev > Support user defined initial state with flatMapGroupsWithState in Structured > Streaming > -- > > Key: SPARK-35897 > URL: https://issues.apache.org/jira/browse/SPARK-35897 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Rahul Shivu Mahadev >Assignee: Rahul Shivu Mahadev >Priority: Major > Fix For: 3.2.0 > > > Structured Streaming supports arbitrary stateful processing using > mapGroupsWithState and flatMapGroupWithState operators. The state is created > by processing the data that comes in with every batch. This API improvement > will allow users to specify an initial state which is applied at the time of > executing the first batch. > > h2. Proposed new APIs (Scala) > > > def mapGroupsWithState[S: Encoder, U: Encoder]( > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)])( > func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] > > def flatMapGroupsWithState[S: Encoder, U: Encoder]( > outputMode: OutputMode, > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)])( > func: (K, Iterator[V], GroupState[S]) => Iterator[U]) > > h2. Proposed new APIs (Java) > > def mapGroupsWithState[S, U]( > func: MapGroupsWithStateFunction[K, V, S, U], > stateEncoder: Encoder[S], > outputEncoder: Encoder[U], > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)]): Dataset[U] > def flatMapGroupsWithState[S, U]( > func: FlatMapGroupsWithStateFunction[K, V, S, U], > outputMode: OutputMode, > stateEncoder: Encoder[S], > outputEncoder: Encoder[U], > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)]): Dataset[U] > > > h2. Example Usage > > val initialState: Dataset[(String, RunningCount)] = Seq( > ("a", new RunningCount(1)), > ("b", new RunningCount(1)) > ).toDS() > > val inputData = MemoryStream[String] > val result = > inputData.toDS() > .groupByKey(x => x) > .mapGroupsWithState(initialState, timeoutConf)(stateFunc) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-35897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35897. Resolution: Fixed Issue resolved by pull request 33093 [https://github.com/apache/spark/pull/33093] > Support user defined initial state with flatMapGroupsWithState in Structured > Streaming > -- > > Key: SPARK-35897 > URL: https://issues.apache.org/jira/browse/SPARK-35897 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Rahul Shivu Mahadev >Assignee: Rahul Shivu Mahadev >Priority: Major > Fix For: 3.2.0 > > > Structured Streaming supports arbitrary stateful processing using > mapGroupsWithState and flatMapGroupWithState operators. The state is created > by processing the data that comes in with every batch. This API improvement > will allow users to specify an initial state which is applied at the time of > executing the first batch. > > h2. Proposed new APIs (Scala) > > > def mapGroupsWithState[S: Encoder, U: Encoder]( > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)])( > func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] > > def flatMapGroupsWithState[S: Encoder, U: Encoder]( > outputMode: OutputMode, > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)])( > func: (K, Iterator[V], GroupState[S]) => Iterator[U]) > > h2. Proposed new APIs (Java) > > def mapGroupsWithState[S, U]( > func: MapGroupsWithStateFunction[K, V, S, U], > stateEncoder: Encoder[S], > outputEncoder: Encoder[U], > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)]): Dataset[U] > def flatMapGroupsWithState[S, U]( > func: FlatMapGroupsWithStateFunction[K, V, S, U], > outputMode: OutputMode, > stateEncoder: Encoder[S], > outputEncoder: Encoder[U], > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)]): Dataset[U] > > > h2. Example Usage > > val initialState: Dataset[(String, RunningCount)] = Seq( > ("a", new RunningCount(1)), > ("b", new RunningCount(1)) > ).toDS() > > val inputData = MemoryStream[String] > val result = > inputData.toDS() > .groupByKey(x => x) > .mapGroupsWithState(initialState, timeoutConf)(stateFunc) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35984) Add a config to force using ShuffledHashJoin for test purpose
[ https://issues.apache.org/jira/browse/SPARK-35984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35984: Assignee: (was: Apache Spark) > Add a config to force using ShuffledHashJoin for test purpose > - > > Key: SPARK-35984 > URL: https://issues.apache.org/jira/browse/SPARK-35984 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Linhong Liu >Priority: Major > > In the join.sql, we want to cover all 3 join types. but the problem is > currently the `spark.sql.join.preferSortMergeJoin = false` couldn't guarantee > all the joins will use ShuffledHashJoin, so we need another config to force > using hash join in the testing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35984) Add a config to force using ShuffledHashJoin for test purpose
[ https://issues.apache.org/jira/browse/SPARK-35984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373193#comment-17373193 ] Apache Spark commented on SPARK-35984: -- User 'linhongliu-db' has created a pull request for this issue: https://github.com/apache/spark/pull/33182 > Add a config to force using ShuffledHashJoin for test purpose > - > > Key: SPARK-35984 > URL: https://issues.apache.org/jira/browse/SPARK-35984 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Linhong Liu >Priority: Major > > In the join.sql, we want to cover all 3 join types. but the problem is > currently the `spark.sql.join.preferSortMergeJoin = false` couldn't guarantee > all the joins will use ShuffledHashJoin, so we need another config to force > using hash join in the testing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35984) Add a config to force using ShuffledHashJoin for test purpose
[ https://issues.apache.org/jira/browse/SPARK-35984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35984: Assignee: Apache Spark > Add a config to force using ShuffledHashJoin for test purpose > - > > Key: SPARK-35984 > URL: https://issues.apache.org/jira/browse/SPARK-35984 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Linhong Liu >Assignee: Apache Spark >Priority: Major > > In the join.sql, we want to cover all 3 join types. but the problem is > currently the `spark.sql.join.preferSortMergeJoin = false` couldn't guarantee > all the joins will use ShuffledHashJoin, so we need another config to force > using hash join in the testing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35982) Allow from_json/to_json for map types where value types are year-month intervals
[ https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35982: Assignee: Kousuke Saruta (was: Apache Spark) > Allow from_json/to_json for map types where value types are year-month > intervals > > > Key: SPARK-35982 > URL: https://issues.apache.org/jira/browse/SPARK-35982 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > In the current master, from_json and to_json are doesn't support map types > where value types are year-month interval types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35982) Allow from_json/to_json for map types where value types are year-month intervals
[ https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35982: Assignee: Apache Spark (was: Kousuke Saruta) > Allow from_json/to_json for map types where value types are year-month > intervals > > > Key: SPARK-35982 > URL: https://issues.apache.org/jira/browse/SPARK-35982 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > In the current master, from_json and to_json are doesn't support map types > where value types are year-month interval types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35982) Allow from_json/to_json for map types where value types are year-month intervals
[ https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373164#comment-17373164 ] Apache Spark commented on SPARK-35982: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33181 > Allow from_json/to_json for map types where value types are year-month > intervals > > > Key: SPARK-35982 > URL: https://issues.apache.org/jira/browse/SPARK-35982 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > In the current master, from_json and to_json are doesn't support map types > where value types are year-month interval types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35984) Add a config to force using ShuffledHashJoin for test purpose
Linhong Liu created SPARK-35984: --- Summary: Add a config to force using ShuffledHashJoin for test purpose Key: SPARK-35984 URL: https://issues.apache.org/jira/browse/SPARK-35984 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: Linhong Liu In the join.sql, we want to cover all 3 join types. but the problem is currently the `spark.sql.join.preferSortMergeJoin = false` couldn't guarantee all the joins will use ShuffledHashJoin, so we need another config to force using hash join in the testing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35983) Allow from_json/to_json for map types where value types are day-time intervals
[ https://issues.apache.org/jira/browse/SPARK-35983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-35983: --- Description: In the current master, from_json and to_json are doesn't support map types where value types are day-time interval types. (was: In the current master, an exception is thrown if we specify day-time interval types as map value type.) > Allow from_json/to_json for map types where value types are day-time intervals > -- > > Key: SPARK-35983 > URL: https://issues.apache.org/jira/browse/SPARK-35983 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > In the current master, from_json and to_json are doesn't support map types > where value types are day-time interval types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35982) Allow from_json/to_json for map types where value types are year-month intervals
[ https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-35982: --- Description: In the current master, from_json and to_json are doesn't support map types where value types are year-month interval types. (was: In the current master, an exception is thrown if we specify year-month interval types as map value type.) > Allow from_json/to_json for map types where value types are year-month > intervals > > > Key: SPARK-35982 > URL: https://issues.apache.org/jira/browse/SPARK-35982 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > In the current master, from_json and to_json are doesn't support map types > where value types are year-month interval types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35982) Allow from_json/to_json for map types where value types are year-month intervals
[ https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-35982: --- Summary: Allow from_json/to_json for map types where value types are year-month intervals (was: Allow year-month intervals as map value types) > Allow from_json/to_json for map types where value types are year-month > intervals > > > Key: SPARK-35982 > URL: https://issues.apache.org/jira/browse/SPARK-35982 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > In the current master, an exception is thrown if we specify year-month > interval types as map value type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35983) Allow from_json/to_json for map types where value types are day-time intervals
[ https://issues.apache.org/jira/browse/SPARK-35983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-35983: --- Summary: Allow from_json/to_json for map types where value types are day-time intervals (was: Allow day-time intervals as map value types) > Allow from_json/to_json for map types where value types are day-time intervals > -- > > Key: SPARK-35983 > URL: https://issues.apache.org/jira/browse/SPARK-35983 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > In the current master, an exception is thrown if we specify day-time interval > types as map value type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35983) Allow day-time intervals as map value types
Kousuke Saruta created SPARK-35983: -- Summary: Allow day-time intervals as map value types Key: SPARK-35983 URL: https://issues.apache.org/jira/browse/SPARK-35983 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta In the current master, an exception is thrown if we specify day-time interval types as map value type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35982) Allow year-month intervals as map value types
[ https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-35982: --- Summary: Allow year-month intervals as map value types (was: Allow year-month intervals as map key types) > Allow year-month intervals as map value types > - > > Key: SPARK-35982 > URL: https://issues.apache.org/jira/browse/SPARK-35982 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > In the current master, an exception is thrown if we specify year-month > interval types as map key type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35982) Allow year-month intervals as map value types
[ https://issues.apache.org/jira/browse/SPARK-35982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-35982: --- Description: In the current master, an exception is thrown if we specify year-month interval types as map value type. (was: In the current master, an exception is thrown if we specify year-month interval types as map key type.) > Allow year-month intervals as map value types > - > > Key: SPARK-35982 > URL: https://issues.apache.org/jira/browse/SPARK-35982 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > In the current master, an exception is thrown if we specify year-month > interval types as map value type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35982) Allow year-month intervals as map key types
Kousuke Saruta created SPARK-35982: -- Summary: Allow year-month intervals as map key types Key: SPARK-35982 URL: https://issues.apache.org/jira/browse/SPARK-35982 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta In the current master, an exception is thrown if we specify year-month interval types as map key type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32899) Support submit application with user-defined cluster manager
[ https://issues.apache.org/jira/browse/SPARK-32899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyang Liu updated SPARK-32899: - Description: We have supported users to define the customed cluster manager with `ExternalClusterManager` trait. However, we can not submit the application with `SparkSubmit`. This patch adds the support to submit applications with user-defined cluster manager. Add design doc: https://docs.google.com/document/d/1-Sn4Zh-l0SCqH7DQ0esdukS68ptSolK4lStj7MZUqJo/edit?usp=sharing was:We have supported users to define the customed cluster manager with `ExternalClusterManager` trait. However, we can not submit the application with `SparkSubmit`. This patch adds the support to submit applications with user-defined cluster manager. > Support submit application with user-defined cluster manager > > > Key: SPARK-32899 > URL: https://issues.apache.org/jira/browse/SPARK-32899 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Xianyang Liu >Priority: Major > > We have supported users to define the customed cluster manager with > `ExternalClusterManager` trait. However, we can not submit the application > with `SparkSubmit`. This patch adds the support to submit applications with > user-defined cluster manager. > > Add design doc: > https://docs.google.com/document/d/1-Sn4Zh-l0SCqH7DQ0esdukS68ptSolK4lStj7MZUqJo/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35339) Improve unit tests for data-type-based basic operations
[ https://issues.apache.org/jira/browse/SPARK-35339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-35339. --- Fix Version/s: 3.2.0 Assignee: Xinrong Meng Resolution: Fixed Issue resolved by pull request 33095 https://github.com/apache/spark/pull/33095 > Improve unit tests for data-type-based basic operations > --- > > Key: SPARK-35339 > URL: https://issues.apache.org/jira/browse/SPARK-35339 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > Unit tests for arithmetic operations are scattered in the codebase: > * pyspark/pandas/tests/test_ops_on_diff_frames.py > * pyspark/pandas/tests/test_dataframe.py > * pyspark/pandas/tests/test_series.py > * (Upcoming) pyspark/pandas/tests/data_type_ops/ > We wanted to consolidate them. > The code would be cleaner and easier to maintain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35825) Increase the heap and stack size for Maven build
[ https://issues.apache.org/jira/browse/SPARK-35825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373114#comment-17373114 ] Apache Spark commented on SPARK-35825: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33180 > Increase the heap and stack size for Maven build > > > Key: SPARK-35825 > URL: https://issues.apache.org/jira/browse/SPARK-35825 > Project: Spark > Issue Type: Task > Components: Project Infra, Tests >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > The jenkins jobs are unstable due to the stackoverflow errors: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-3.2-jdk-11/ > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/2274/ > We should increase memory configuration for Maven build. > Stack size: 64MB => 128MB > Initial heap size: 1024MB => 2048MB > Maximum heap size: 1024MB => 2048MB > The SBT builds are ok so let's keep the current configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35779) Support dynamic filtering for v2 tables
[ https://issues.apache.org/jira/browse/SPARK-35779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-35779: --- Assignee: Anton Okolnychyi > Support dynamic filtering for v2 tables > --- > > Key: SPARK-35779 > URL: https://issues.apache.org/jira/browse/SPARK-35779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > > We need to support dynamic filtering for v2 tables. > Design doc: > https://docs.google.com/document/d/1RfFn2e9o_1uHJ8jFGsSakp-BZMizX1uRrJSybMe2a6M -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35779) Support dynamic filtering for v2 tables
[ https://issues.apache.org/jira/browse/SPARK-35779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-35779. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32921 [https://github.com/apache/spark/pull/32921] > Support dynamic filtering for v2 tables > --- > > Key: SPARK-35779 > URL: https://issues.apache.org/jira/browse/SPARK-35779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.2.0 > > > We need to support dynamic filtering for v2 tables. > Design doc: > https://docs.google.com/document/d/1RfFn2e9o_1uHJ8jFGsSakp-BZMizX1uRrJSybMe2a6M -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision
[ https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373079#comment-17373079 ] Apache Spark commented on SPARK-35981: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33179 > Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check > precision > --- > > Key: SPARK-35981 > URL: https://issues.apache.org/jira/browse/SPARK-35981 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > In some environment, the precision could be different in {{DataFrame.corr}} > function. > We should use {{check_exact=False}} to loosen the precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision
[ https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35981: Assignee: (was: Apache Spark) > Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check > precision > --- > > Key: SPARK-35981 > URL: https://issues.apache.org/jira/browse/SPARK-35981 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > In some environment, the precision could be different in {{DataFrame.corr}} > function. > We should use {{check_exact=False}} to loosen the precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision
[ https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35981: Assignee: Apache Spark > Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check > precision > --- > > Key: SPARK-35981 > URL: https://issues.apache.org/jira/browse/SPARK-35981 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > In some environment, the precision could be different in {{DataFrame.corr}} > function. > We should use {{check_exact=False}} to loosen the precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision
[ https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373078#comment-17373078 ] Apache Spark commented on SPARK-35981: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33179 > Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check > precision > --- > > Key: SPARK-35981 > URL: https://issues.apache.org/jira/browse/SPARK-35981 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > In some environment, the precision could be different in {{DataFrame.corr}} > function. > We should use {{check_exact=False}} to loosen the precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35976) Adjust `astype` method for ExtensionDtype in pandas API on Spark
[ https://issues.apache.org/jira/browse/SPARK-35976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35976: - Description: Currently, `astype` method for ExtensionDtype in pandas API on Spark is not consistent with pandas. For example, [https://github.com/apache/spark/pull/33095#discussion_r661704734.] [https://github.com/apache/spark/pull/33095#discussion_r662623005.] We ought to fill in the gap. was: Currently, `astype` method for ExtensionDtype in pandas API on Spark is not consistent with pandas. For example, [https://github.com/apache/spark/pull/33095#discussion_r661704734.] We ought to fill in the gap. > Adjust `astype` method for ExtensionDtype in pandas API on Spark > > > Key: SPARK-35976 > URL: https://issues.apache.org/jira/browse/SPARK-35976 > Project: Spark > Issue Type: Story > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Currently, `astype` method for ExtensionDtype in pandas API on Spark is not > consistent with pandas. For example, > [https://github.com/apache/spark/pull/33095#discussion_r661704734.] > [https://github.com/apache/spark/pull/33095#discussion_r662623005.] > > We ought to fill in the gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision
Takuya Ueshin created SPARK-35981: - Summary: Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision Key: SPARK-35981 URL: https://issues.apache.org/jira/browse/SPARK-35981 Project: Spark Issue Type: Test Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin In some environment, the precision could be different in {{DataFrame.corr}} function. We should use {{check_exact=False}} to loosen the precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread
[ https://issues.apache.org/jira/browse/SPARK-35980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35980: Assignee: (was: Apache Spark) > ThreadAudit test helper should log whether a thread is a Daemon thread > -- > > Key: SPARK-35980 > URL: https://issues.apache.org/jira/browse/SPARK-35980 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.2 >Reporter: Tim Armstrong >Priority: Major > > It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned > whether the threads were daemon threads or not, since leaked non-daemon > threads are more likely to be unintentional than leaked daemon threads. > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread
[ https://issues.apache.org/jira/browse/SPARK-35980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373069#comment-17373069 ] Apache Spark commented on SPARK-35980: -- User 'timarmstrong' has created a pull request for this issue: https://github.com/apache/spark/pull/33178 > ThreadAudit test helper should log whether a thread is a Daemon thread > -- > > Key: SPARK-35980 > URL: https://issues.apache.org/jira/browse/SPARK-35980 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.2 >Reporter: Tim Armstrong >Priority: Major > > It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned > whether the threads were daemon threads or not, since leaked non-daemon > threads are more likely to be unintentional than leaked daemon threads. > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread
[ https://issues.apache.org/jira/browse/SPARK-35980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35980: Assignee: Apache Spark > ThreadAudit test helper should log whether a thread is a Daemon thread > -- > > Key: SPARK-35980 > URL: https://issues.apache.org/jira/browse/SPARK-35980 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.2 >Reporter: Tim Armstrong >Assignee: Apache Spark >Priority: Major > > It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned > whether the threads were daemon threads or not, since leaked non-daemon > threads are more likely to be unintentional than leaked daemon threads. > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread
[ https://issues.apache.org/jira/browse/SPARK-35980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373068#comment-17373068 ] Apache Spark commented on SPARK-35980: -- User 'timarmstrong' has created a pull request for this issue: https://github.com/apache/spark/pull/33178 > ThreadAudit test helper should log whether a thread is a Daemon thread > -- > > Key: SPARK-35980 > URL: https://issues.apache.org/jira/browse/SPARK-35980 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.2 >Reporter: Tim Armstrong >Priority: Major > > It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned > whether the threads were daemon threads or not, since leaked non-daemon > threads are more likely to be unintentional than leaked daemon threads. > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread
[ https://issues.apache.org/jira/browse/SPARK-35980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373065#comment-17373065 ] Tim Armstrong commented on SPARK-35980: --- I plan to contribute a fix for this. > ThreadAudit test helper should log whether a thread is a Daemon thread > -- > > Key: SPARK-35980 > URL: https://issues.apache.org/jira/browse/SPARK-35980 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.2 >Reporter: Tim Armstrong >Priority: Major > > It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned > whether the threads were daemon threads or not, since leaked non-daemon > threads are more likely to be unintentional than leaked daemon threads. > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35980) ThreadAudit test helper should log whether a thread is a Daemon thread
Tim Armstrong created SPARK-35980: - Summary: ThreadAudit test helper should log whether a thread is a Daemon thread Key: SPARK-35980 URL: https://issues.apache.org/jira/browse/SPARK-35980 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.1.2 Reporter: Tim Armstrong It would be helpful if the POSSIBLE THREAD LEAK IN SUITE error mentioned whether the threads were daemon threads or not, since leaked non-daemon threads are more likely to be unintentional than leaked daemon threads. https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ThreadAudit.scala#L113 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35955) Fix decimal overflow issues for Average
[ https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35955: Assignee: Apache Spark > Fix decimal overflow issues for Average > --- > > Key: SPARK-35955 > URL: https://issues.apache.org/jira/browse/SPARK-35955 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Karen Feng >Assignee: Apache Spark >Priority: Major > > Fix decimal overflow issues for decimal average in ANSI mode. Linked to > SPARK-32018 and SPARK-28067, which address decimal sum. > Repro: > > {code:java} > import org.apache.spark.sql.functions._ > spark.conf.set("spark.sql.ansi.enabled", true) > val df = Seq( > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2)).toDF("decNum", "intNum") > val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, > "intNum").agg(mean("decNum")) > df2.show(40,false) > {code} > > Should throw an exception (as sum overflows), but instead returns: > > {code:java} > +---+ > |avg(decNum)| > +---+ > |null | > +---+{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35955) Fix decimal overflow issues for Average
[ https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373039#comment-17373039 ] Apache Spark commented on SPARK-35955: -- User 'karenfeng' has created a pull request for this issue: https://github.com/apache/spark/pull/33177 > Fix decimal overflow issues for Average > --- > > Key: SPARK-35955 > URL: https://issues.apache.org/jira/browse/SPARK-35955 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Karen Feng >Priority: Major > > Fix decimal overflow issues for decimal average in ANSI mode. Linked to > SPARK-32018 and SPARK-28067, which address decimal sum. > Repro: > > {code:java} > import org.apache.spark.sql.functions._ > spark.conf.set("spark.sql.ansi.enabled", true) > val df = Seq( > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2)).toDF("decNum", "intNum") > val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, > "intNum").agg(mean("decNum")) > df2.show(40,false) > {code} > > Should throw an exception (as sum overflows), but instead returns: > > {code:java} > +---+ > |avg(decNum)| > +---+ > |null | > +---+{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35955) Fix decimal overflow issues for Average
[ https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35955: Assignee: (was: Apache Spark) > Fix decimal overflow issues for Average > --- > > Key: SPARK-35955 > URL: https://issues.apache.org/jira/browse/SPARK-35955 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Karen Feng >Priority: Major > > Fix decimal overflow issues for decimal average in ANSI mode. Linked to > SPARK-32018 and SPARK-28067, which address decimal sum. > Repro: > > {code:java} > import org.apache.spark.sql.functions._ > spark.conf.set("spark.sql.ansi.enabled", true) > val df = Seq( > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2)).toDF("decNum", "intNum") > val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, > "intNum").agg(mean("decNum")) > df2.show(40,false) > {code} > > Should throw an exception (as sum overflows), but instead returns: > > {code:java} > +---+ > |avg(decNum)| > +---+ > |null | > +---+{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35975) New configuration spark.sql.timestampType for the default timestamp type
[ https://issues.apache.org/jira/browse/SPARK-35975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-35975. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33176 [https://github.com/apache/spark/pull/33176] > New configuration spark.sql.timestampType for the default timestamp type > > > Key: SPARK-35975 > URL: https://issues.apache.org/jira/browse/SPARK-35975 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > Add a new configuration `spark.sql.timestampType`, which configures the > default timestamp type of Spark SQL, including SQL DDL and Cast clause. > Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME > ZONE as the default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP > WITH LOCAL TIME ZONE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35855) Unify reuse map data structures in non-AQE and AQE rules
[ https://issues.apache.org/jira/browse/SPARK-35855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373025#comment-17373025 ] Apache Spark commented on SPARK-35855: -- User 'karenfeng' has created a pull request for this issue: https://github.com/apache/spark/pull/33177 > Unify reuse map data structures in non-AQE and AQE rules > > > Key: SPARK-35855 > URL: https://issues.apache.org/jira/browse/SPARK-35855 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Minor > Fix For: 3.2.0 > > > We can unify reuse map data structures in non-AQE and AQE rules > (`ReuseExchangeAndSubquery`, `ReuseAdaptiveSubquery`) to a simple > `Map[, ]`. > Please find discussion here: > [https://github.com/apache/spark/pull/28885#discussion_r655073897] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35855) Unify reuse map data structures in non-AQE and AQE rules
[ https://issues.apache.org/jira/browse/SPARK-35855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373023#comment-17373023 ] Apache Spark commented on SPARK-35855: -- User 'karenfeng' has created a pull request for this issue: https://github.com/apache/spark/pull/33177 > Unify reuse map data structures in non-AQE and AQE rules > > > Key: SPARK-35855 > URL: https://issues.apache.org/jira/browse/SPARK-35855 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Minor > Fix For: 3.2.0 > > > We can unify reuse map data structures in non-AQE and AQE rules > (`ReuseExchangeAndSubquery`, `ReuseAdaptiveSubquery`) to a simple > `Map[, ]`. > Please find discussion here: > [https://github.com/apache/spark/pull/28885#discussion_r655073897] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35974) Spark submit REST cluster/standalone mode - launching an s3a jar with STS
[ https://issues.apache.org/jira/browse/SPARK-35974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35974. --- Resolution: Cannot Reproduce Could you try to use Apache Spark 3.1.2, please, [~toopt4], because Apache Spark 2.4 is EOL. It seems that the log shows `spark-2.3.4-bin-hadoop2.7` and the affected version is 2.4.6. Both are too old. > Spark submit REST cluster/standalone mode - launching an s3a jar with STS > - > > Key: SPARK-35974 > URL: https://issues.apache.org/jira/browse/SPARK-35974 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.6 >Reporter: t oo >Priority: Major > > {code:java} > /var/lib/spark-2.3.4-bin-hadoop2.7/bin/spark-submit --master > spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf > spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf > spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf > spark.hadoop.fs.s3a.secret.key='redact2' --conf > spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf > spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf > spark.hadoop.fs.s3a.session.token='redact3' --conf > spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf > spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf > spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider > --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 > -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf > spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 > -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' > --total-executor-cores 4 --executor-cores 2 --executor-memory 2g > --driver-memory 1g --name lin1 --deploy-mode cluster --conf > spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku > s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml > {code} > running the above command give below stack trace: > > {code:java} > Exception from the cluster:\njava.nio.file.AccessDeniedException: > s3a://mybuc/metorikku_2.11.jar: getFileStatus on > s3a://mybuc/metorikku_2.11.jar: > com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon > S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended > Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\ > org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158) > org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101) > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542) > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117) > org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463) > org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030) > org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747) > org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723) > org.apache.spark.util.Utils$.fetchFile(Utils.scala:509) > org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155) > org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173) > org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code} > all the ec2s in the spark cluster only have access to s3 via STS tokens. The > jar itself reads csvs from s3 using the tokens, and everything works if > either 1. i change the commandline to point to local jars on the ec2 OR 2. > use port 7077/client mode instead of cluster mode. But it seems the jar > itself can't be launched off s3, as if the tokens are not being picked up > properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35974) Spark submit REST cluster/standalone mode - launching an s3a jar with STS
[ https://issues.apache.org/jira/browse/SPARK-35974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372995#comment-17372995 ] Dongjoon Hyun commented on SPARK-35974: --- Free free to reopen this with the updated information with Spark 3. > Spark submit REST cluster/standalone mode - launching an s3a jar with STS > - > > Key: SPARK-35974 > URL: https://issues.apache.org/jira/browse/SPARK-35974 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.6 >Reporter: t oo >Priority: Major > > {code:java} > /var/lib/spark-2.3.4-bin-hadoop2.7/bin/spark-submit --master > spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf > spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf > spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf > spark.hadoop.fs.s3a.secret.key='redact2' --conf > spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf > spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf > spark.hadoop.fs.s3a.session.token='redact3' --conf > spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf > spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf > spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider > --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 > -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf > spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 > -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' > --total-executor-cores 4 --executor-cores 2 --executor-memory 2g > --driver-memory 1g --name lin1 --deploy-mode cluster --conf > spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku > s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml > {code} > running the above command give below stack trace: > > {code:java} > Exception from the cluster:\njava.nio.file.AccessDeniedException: > s3a://mybuc/metorikku_2.11.jar: getFileStatus on > s3a://mybuc/metorikku_2.11.jar: > com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon > S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended > Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\ > org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158) > org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101) > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542) > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117) > org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463) > org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030) > org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747) > org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723) > org.apache.spark.util.Utils$.fetchFile(Utils.scala:509) > org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155) > org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173) > org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code} > all the ec2s in the spark cluster only have access to s3 via STS tokens. The > jar itself reads csvs from s3 using the tokens, and everything works if > either 1. i change the commandline to point to local jars on the ec2 OR 2. > use port 7077/client mode instead of cluster mode. But it seems the jar > itself can't be launched off s3, as if the tokens are not being picked up > properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-35972) NestColumnPruning cause execute loss output
[ https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372980#comment-17372980 ] Dongjoon Hyun edited comment on SPARK-35972 at 7/1/21, 6:06 PM: Hi, [~angerszhu]. Could you make this as a BUG? Also, could you provide more detail? was (Author: dongjoon): Hi, [~angerszhu]. Could you make this as a BUG? > NestColumnPruning cause execute loss output > --- > > Key: SPARK-35972 > URL: https://issues.apache.org/jira/browse/SPARK-35972 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: angerszhu >Priority: Major > > {code:java} > Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most > recent failure: Lost task 47.3 in stage 1.0 (TID 328) > (ip-idata-server.shopee.io executor 3): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: _gen_alias_788#788 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spa
[jira] [Commented] (SPARK-35972) NestColumnPruning cause execute loss output
[ https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372980#comment-17372980 ] Dongjoon Hyun commented on SPARK-35972: --- Hi, [~angerszhu]. Could you make this as a BUG? > NestColumnPruning cause execute loss output > --- > > Key: SPARK-35972 > URL: https://issues.apache.org/jira/browse/SPARK-35972 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: angerszhu >Priority: Major > > {code:java} > Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most > recent failure: Lost task 47.3 in stage 1.0 (TID 328) > (ip-idata-server.shopee.io executor 3): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: _gen_alias_788#788 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) >
[jira] [Created] (SPARK-35979) Return different timestamp literals based on the default timestamp type
Gengliang Wang created SPARK-35979: -- Summary: Return different timestamp literals based on the default timestamp type Key: SPARK-35979 URL: https://issues.apache.org/jira/browse/SPARK-35979 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang For the timestamp literal, it should have following behavior. 1. When spark.sql.timestampType is TIMESTAMP_NTZ: if there is no time zone part, return timestamp without time zone literal; otherwise, return timestamp with local time zone literal 2. When spark.sql.timestampType is TIMESTAMP_LTZ: return timestamp with local time zone literal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35978) Support new keyword TIMESTAMP_LTZ
Gengliang Wang created SPARK-35978: -- Summary: Support new keyword TIMESTAMP_LTZ Key: SPARK-35978 URL: https://issues.apache.org/jira/browse/SPARK-35978 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Support new keyword TIMESTAMP_LTZ, which can be used for: * timestamp with local time zone data type in DDL * timestamp with local time zone data type in Cast clause. * timestamp with local time zone data type literal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35977) Support new keyword TIMESTAMP_NTZ
Gengliang Wang created SPARK-35977: -- Summary: Support new keyword TIMESTAMP_NTZ Key: SPARK-35977 URL: https://issues.apache.org/jira/browse/SPARK-35977 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Support new keyword TIMESTAMP_NTZ, which can be used for: * timestamp without time zone data type in DDL * timestamp without time zone data type in Cast clause. * timestamp without time zone data type literal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35756) unionByName should support nested struct also
[ https://issues.apache.org/jira/browse/SPARK-35756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35756. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32972 [https://github.com/apache/spark/pull/32972] > unionByName should support nested struct also > - > > Key: SPARK-35756 > URL: https://issues.apache.org/jira/browse/SPARK-35756 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Wassim Almaaoui >Assignee: Saurabh Chawla >Priority: Major > Fix For: 3.2.0 > > > It would be cool if `unionByName` supports also nested struct. I don't kwon > if it's the expected behaviour already or not so I am not sure if its a bug > or an improvement proposal. > {code:java} > case class Struct1(c1: Int, c2: Int) > case class Struct2(c2: Int, c1: Int) > val ds1 = Seq((1, Struct1(1,2))).toDS > val ds2 = Seq((1, Struct2(1,2))).toDS > ds1.unionByName(ds2.as[(Int,Struct1)]) {code} > gives > {code:java} > org.apache.spark.sql.AnalysisException: Union can only be performed on tables > with the compatible column types. struct <> > struct at the second column of the second table; 'Union false, > false :- LocalRelation [_1#38, _2#39] +- LocalRelation _1#45, _2#46 > {code} > The code documentation of the function `unionByName` says `Note that > allowMissingColumns supports nested column in struct types` but doesn't say > if the function itself supports the nested column ordering or not. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35756) unionByName should support nested struct also
[ https://issues.apache.org/jira/browse/SPARK-35756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35756: --- Assignee: Saurabh Chawla > unionByName should support nested struct also > - > > Key: SPARK-35756 > URL: https://issues.apache.org/jira/browse/SPARK-35756 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Wassim Almaaoui >Assignee: Saurabh Chawla >Priority: Major > > It would be cool if `unionByName` supports also nested struct. I don't kwon > if it's the expected behaviour already or not so I am not sure if its a bug > or an improvement proposal. > {code:java} > case class Struct1(c1: Int, c2: Int) > case class Struct2(c2: Int, c1: Int) > val ds1 = Seq((1, Struct1(1,2))).toDS > val ds2 = Seq((1, Struct2(1,2))).toDS > ds1.unionByName(ds2.as[(Int,Struct1)]) {code} > gives > {code:java} > org.apache.spark.sql.AnalysisException: Union can only be performed on tables > with the compatible column types. struct <> > struct at the second column of the second table; 'Union false, > false :- LocalRelation [_1#38, _2#39] +- LocalRelation _1#45, _2#46 > {code} > The code documentation of the function `unionByName` says `Note that > allowMissingColumns supports nested column in struct types` but doesn't say > if the function itself supports the nested column ordering or not. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35976) Adjust `astype` method for ExtensionDtype in pandas API on Spark
Xinrong Meng created SPARK-35976: Summary: Adjust `astype` method for ExtensionDtype in pandas API on Spark Key: SPARK-35976 URL: https://issues.apache.org/jira/browse/SPARK-35976 Project: Spark Issue Type: Story Components: PySpark Affects Versions: 3.2.0 Reporter: Xinrong Meng Currently, `astype` method for ExtensionDtype in pandas API on Spark is not consistent with pandas. For example, [https://github.com/apache/spark/pull/33095#discussion_r661704734.] We ought to fill in the gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35975) New configuration spark.sql.timestampType for the default timestamp type
[ https://issues.apache.org/jira/browse/SPARK-35975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35975: Assignee: Apache Spark (was: Gengliang Wang) > New configuration spark.sql.timestampType for the default timestamp type > > > Key: SPARK-35975 > URL: https://issues.apache.org/jira/browse/SPARK-35975 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Add a new configuration `spark.sql.timestampType`, which configures the > default timestamp type of Spark SQL, including SQL DDL and Cast clause. > Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME > ZONE as the default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP > WITH LOCAL TIME ZONE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35975) New configuration spark.sql.timestampType for the default timestamp type
[ https://issues.apache.org/jira/browse/SPARK-35975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35975: Assignee: Gengliang Wang (was: Apache Spark) > New configuration spark.sql.timestampType for the default timestamp type > > > Key: SPARK-35975 > URL: https://issues.apache.org/jira/browse/SPARK-35975 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Add a new configuration `spark.sql.timestampType`, which configures the > default timestamp type of Spark SQL, including SQL DDL and Cast clause. > Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME > ZONE as the default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP > WITH LOCAL TIME ZONE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35975) New configuration spark.sql.timestampType for the default timestamp type
[ https://issues.apache.org/jira/browse/SPARK-35975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372914#comment-17372914 ] Apache Spark commented on SPARK-35975: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/33176 > New configuration spark.sql.timestampType for the default timestamp type > > > Key: SPARK-35975 > URL: https://issues.apache.org/jira/browse/SPARK-35975 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Add a new configuration `spark.sql.timestampType`, which configures the > default timestamp type of Spark SQL, including SQL DDL and Cast clause. > Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME > ZONE as the default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP > WITH LOCAL TIME ZONE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35975) New configuration spark.sql.timestampType for the default timestamp type
Gengliang Wang created SPARK-35975: -- Summary: New configuration spark.sql.timestampType for the default timestamp type Key: SPARK-35975 URL: https://issues.apache.org/jira/browse/SPARK-35975 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Add a new configuration `spark.sql.timestampType`, which configures the default timestamp type of Spark SQL, including SQL DDL and Cast clause. Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME ZONE as the default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP WITH LOCAL TIME ZONE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35955) Fix decimal overflow issues for Average
[ https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372910#comment-17372910 ] Karen Feng commented on SPARK-35955: I have changes almost ready locally, will open PR soon.[~dc-heros], what is the state of your work? > Fix decimal overflow issues for Average > --- > > Key: SPARK-35955 > URL: https://issues.apache.org/jira/browse/SPARK-35955 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Karen Feng >Priority: Major > > Fix decimal overflow issues for decimal average in ANSI mode. Linked to > SPARK-32018 and SPARK-28067, which address decimal sum. > Repro: > > {code:java} > import org.apache.spark.sql.functions._ > spark.conf.set("spark.sql.ansi.enabled", true) > val df = Seq( > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2)).toDF("decNum", "intNum") > val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, > "intNum").agg(mean("decNum")) > df2.show(40,false) > {code} > > Should throw an exception (as sum overflows), but instead returns: > > {code:java} > +---+ > |avg(decNum)| > +---+ > |null | > +---+{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35968) Make sure partitions are not too small in AQE partition coalescing
[ https://issues.apache.org/jira/browse/SPARK-35968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35968: -- Parent: SPARK-33828 Issue Type: Sub-task (was: Improvement) > Make sure partitions are not too small in AQE partition coalescing > -- > > Key: SPARK-35968 > URL: https://issues.apache.org/jira/browse/SPARK-35968 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35969) Make the pod prefix more readable and tallied with K8S DNS Label Names
[ https://issues.apache.org/jira/browse/SPARK-35969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35969. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33171 [https://github.com/apache/spark/pull/33171] > Make the pod prefix more readable and tallied with K8S DNS Label Names > -- > > Key: SPARK-35969 > URL: https://issues.apache.org/jira/browse/SPARK-35969 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.2.0 > > > By default, the executor pod prefix is generated by the app name. It handles > characters that match [^a-z0-9\\-] differently. The '.' and all whitespaces > will be converted to '-', but other ones to empty string. Especially, > characters like '_', '|' are commonly used as a word separator in many > languages. > According to the K8S DNS Label Names, see > [https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names,] > we can convert all special characters to `-`. > > {code:scala} > scala> "time.is%the¥most$valuable_——thing,it's about > time.".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-") > res9: String = time-is-the-most-valuable-thing-it-s-about-time- > scala> "time.is%the¥most$valuable_——thing,it's about > time.".replaceAll("\\s+", "-").replaceAll("\\.", > "-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-") > res10: String = time-isthemostvaluablethingits-about-time- > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35969) Make the pod prefix more readable and tallied with K8S DNS Label Names
[ https://issues.apache.org/jira/browse/SPARK-35969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35969: - Assignee: Kent Yao > Make the pod prefix more readable and tallied with K8S DNS Label Names > -- > > Key: SPARK-35969 > URL: https://issues.apache.org/jira/browse/SPARK-35969 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > By default, the executor pod prefix is generated by the app name. It handles > characters that match [^a-z0-9\\-] differently. The '.' and all whitespaces > will be converted to '-', but other ones to empty string. Especially, > characters like '_', '|' are commonly used as a word separator in many > languages. > According to the K8S DNS Label Names, see > [https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names,] > we can convert all special characters to `-`. > > {code:scala} > scala> "time.is%the¥most$valuable_——thing,it's about > time.".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-") > res9: String = time-is-the-most-valuable-thing-it-s-about-time- > scala> "time.is%the¥most$valuable_——thing,it's about > time.".replaceAll("\\s+", "-").replaceAll("\\.", > "-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-") > res10: String = time-isthemostvaluablethingits-about-time- > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35974) Spark submit REST cluster/standalone mode - launching an s3a jar with STS
t oo created SPARK-35974: Summary: Spark submit REST cluster/standalone mode - launching an s3a jar with STS Key: SPARK-35974 URL: https://issues.apache.org/jira/browse/SPARK-35974 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.6 Reporter: t oo {code:java} /var/lib/spark-2.3.4-bin-hadoop2.7/bin/spark-submit --master spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf spark.hadoop.fs.s3a.secret.key='redact2' --conf spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf spark.hadoop.fs.s3a.session.token='redact3' --conf spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --total-executor-cores 4 --executor-cores 2 --executor-memory 2g --driver-memory 1g --name lin1 --deploy-mode cluster --conf spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml {code} running the above command give below stack trace: {code:java} Exception from the cluster:\njava.nio.file.AccessDeniedException: s3a://mybuc/metorikku_2.11.jar: getFileStatus on s3a://mybuc/metorikku_2.11.jar: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\ org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158) org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101) org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542) org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117) org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463) org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030) org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747) org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723) org.apache.spark.util.Utils$.fetchFile(Utils.scala:509) org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155) org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173) org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code} all the ec2s in the spark cluster only have access to s3 via STS tokens. The jar itself reads csvs from s3 using the tokens, and everything works if either 1. i change the commandline to point to local jars on the ec2 OR 2. use port 7077/client mode instead of cluster mode. But it seems the jar itself can't be launched off s3, as if the tokens are not being picked up properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS
[ https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35973: Assignee: Apache Spark > DataSourceV2: Support SHOW CATALOGS > --- > > Key: SPARK-35973 > URL: https://issues.apache.org/jira/browse/SPARK-35973 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Assignee: Apache Spark >Priority: Major > > Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list > the catalogs and corresponding default-namespace info will be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS
[ https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35973: Assignee: (was: Apache Spark) > DataSourceV2: Support SHOW CATALOGS > --- > > Key: SPARK-35973 > URL: https://issues.apache.org/jira/browse/SPARK-35973 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Priority: Major > > Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list > the catalogs and corresponding default-namespace info will be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS
[ https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372800#comment-17372800 ] Apache Spark commented on SPARK-35973: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/33175 > DataSourceV2: Support SHOW CATALOGS > --- > > Key: SPARK-35973 > URL: https://issues.apache.org/jira/browse/SPARK-35973 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Priority: Major > > Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list > the catalogs and corresponding default-namespace info will be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS
[ https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PengLei updated SPARK-35973: Description: Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list the catalogs and corresponding default-namespace info will be useful. (was: Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list the catalogs/default-namespace info will be useful.) > DataSourceV2: Support SHOW CATALOGS > --- > > Key: SPARK-35973 > URL: https://issues.apache.org/jira/browse/SPARK-35973 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Priority: Major > > Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list > the catalogs and corresponding default-namespace info will be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS
[ https://issues.apache.org/jira/browse/SPARK-35973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372797#comment-17372797 ] PengLei commented on SPARK-35973: - I am woking on this. After 3.2 released > DataSourceV2: Support SHOW CATALOGS > --- > > Key: SPARK-35973 > URL: https://issues.apache.org/jira/browse/SPARK-35973 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Priority: Major > > Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list > the catalogs/default-namespace info will be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35973) DataSourceV2: Support SHOW CATALOGS
PengLei created SPARK-35973: --- Summary: DataSourceV2: Support SHOW CATALOGS Key: SPARK-35973 URL: https://issues.apache.org/jira/browse/SPARK-35973 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: PengLei Datasource V2 can support multiple catalogs. Having "SHOW CATALOGS" to list the catalogs/default-namespace info will be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"
[ https://issues.apache.org/jira/browse/SPARK-35971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35971. Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33173 [https://github.com/apache/spark/pull/33173] > Rename the type name of TimestampNTZType as "timestamp_ntz" > --- > > Key: SPARK-35971 > URL: https://issues.apache.org/jira/browse/SPARK-35971 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > Rename the type name string of TimestampNTZType from "timestamp without time > zone" to "timestamp_ntz". > This is to make the column header shorter and simpler. > Snowflake and Flink uses similar approach: > https://docs.snowflake.com/en/sql-reference/data-types-datetime.html > https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35686) Avoid using auto generated alias when creating view
[ https://issues.apache.org/jira/browse/SPARK-35686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35686. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32832 [https://github.com/apache/spark/pull/32832] > Avoid using auto generated alias when creating view > --- > > Key: SPARK-35686 > URL: https://issues.apache.org/jira/browse/SPARK-35686 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Linhong Liu >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.0 > > > If the user creates a view in 2.4 and reads it in 3.2, there will be an > incompatible schema issue. the root cause is that we changed the alias auto > generation rule after 2.4. To avoid this happening again, we should let the > user explicitly specifying the column names -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35972) NestColumnPruning cause execute loss output
[ https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372647#comment-17372647 ] angerszhu commented on SPARK-35972: --- We meet a case that it can analyze/optimize and generate sparkplan well but when running, executor will throw exception like above in desc. looks like child's output loss data @ > NestColumnPruning cause execute loss output > --- > > Key: SPARK-35972 > URL: https://issues.apache.org/jira/browse/SPARK-35972 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: angerszhu >Priority: Major > > {code:java} > Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most > recent failure: Lost task 47.3 in stage 1.0 (TID 328) > (ip-idata-server.shopee.io executor 3): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: _gen_alias_788#788 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapC
[jira] [Updated] (SPARK-35972) NestColumnPruning cause execute loss output
[ https://issues.apache.org/jira/browse/SPARK-35972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-35972: -- Description: {code:java} Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most recent failure: Lost task 47.3 in stage 1.0 (TID 328) (ip-idata-server.shopee.io executor 3): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: _gen_alias_788#788 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.immutable.List.map(List.scala:298) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.immutable.List.map(List.scala:298) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:386) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.Traversa
[jira] [Created] (SPARK-35972) NestColumnPruning cause execute loss output
angerszhu created SPARK-35972: - Summary: NestColumnPruning cause execute loss output Key: SPARK-35972 URL: https://issues.apache.org/jira/browse/SPARK-35972 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.2 Reporter: angerszhu {code:java} Job aborted due to stage failure: Task 47 in stage 1.0 failed 4 times, most recent failure: Lost task 47.3 in stage 1.0 (TID 328) (ip-10-130-163-200.idata-server.shopee.io executor 3): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: _gen_alias_788#788 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.immutable.List.map(List.scala:298) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:377) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:438) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.immutable.List.map(List.scala:298) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:438) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:323) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:386) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:4
[jira] [Resolved] (SPARK-35966) Port HIVE-17952: Fix license headers to avoid dangling javadoc warnings
[ https://issues.apache.org/jira/browse/SPARK-35966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-35966. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33169 [https://github.com/apache/spark/pull/33169] > Port HIVE-17952: Fix license headers to avoid dangling javadoc warnings > --- > > Key: SPARK-35966 > URL: https://issues.apache.org/jira/browse/SPARK-35966 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.2.0 > > > see HIVE-17952 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35966) Port HIVE-17952: Fix license headers to avoid dangling javadoc warnings
[ https://issues.apache.org/jira/browse/SPARK-35966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-35966: Assignee: Kent Yao > Port HIVE-17952: Fix license headers to avoid dangling javadoc warnings > --- > > Key: SPARK-35966 > URL: https://issues.apache.org/jira/browse/SPARK-35966 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > > see HIVE-17952 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35965) Add documentation for ORC nested column vectorized reader
[ https://issues.apache.org/jira/browse/SPARK-35965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35965: Assignee: Cheng Su > Add documentation for ORC nested column vectorized reader > - > > Key: SPARK-35965 > URL: https://issues.apache.org/jira/browse/SPARK-35965 > Project: Spark > Issue Type: Documentation > Components: docs, SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Trivial > > In https://issues.apache.org/jira/browse/SPARK-34862, we added support for > ORC nested column vectorized reader, and it is disabled by default for now. > So we would like to add the user-facing documentation for it, and user can > opt-in to use it if they want. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35965) Add documentation for ORC nested column vectorized reader
[ https://issues.apache.org/jira/browse/SPARK-35965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35965. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33168 [https://github.com/apache/spark/pull/33168] > Add documentation for ORC nested column vectorized reader > - > > Key: SPARK-35965 > URL: https://issues.apache.org/jira/browse/SPARK-35965 > Project: Spark > Issue Type: Documentation > Components: docs, SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Trivial > Fix For: 3.2.0 > > > In https://issues.apache.org/jira/browse/SPARK-34862, we added support for > ORC nested column vectorized reader, and it is disabled by default for now. > So we would like to add the user-facing documentation for it, and user can > opt-in to use it if they want. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35685) Prompt recreating the View when there is a schema incompatible change
[ https://issues.apache.org/jira/browse/SPARK-35685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35685. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32831 [https://github.com/apache/spark/pull/32831] > Prompt recreating the View when there is a schema incompatible change > - > > Key: SPARK-35685 > URL: https://issues.apache.org/jira/browse/SPARK-35685 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Linhong Liu >Assignee: Linhong Liu >Priority: Major > Fix For: 3.2.0 > > > Prompt recreating the View when there is a schema incompatible change. > Something like: > "there is an incompatible schema change and the column couldn't be resolved. > Please consider to recreate the view to fix this: CREATE OR REPLACE VIEW v AS > xxx" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35685) Prompt recreating the View when there is a schema incompatible change
[ https://issues.apache.org/jira/browse/SPARK-35685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35685: --- Assignee: Linhong Liu > Prompt recreating the View when there is a schema incompatible change > - > > Key: SPARK-35685 > URL: https://issues.apache.org/jira/browse/SPARK-35685 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Linhong Liu >Assignee: Linhong Liu >Priority: Major > > Prompt recreating the View when there is a schema incompatible change. > Something like: > "there is an incompatible schema change and the column couldn't be resolved. > Please consider to recreate the view to fix this: CREATE OR REPLACE VIEW v AS > xxx" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35618) Resolve star expressions in subquery
[ https://issues.apache.org/jira/browse/SPARK-35618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35618: --- Assignee: Allison Wang > Resolve star expressions in subquery > > > Key: SPARK-35618 > URL: https://issues.apache.org/jira/browse/SPARK-35618 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > Currently, Spark does not resolve star expressions in subqueries correctly. > It can only resolve the star expressions using the inner query attributes. > For example: > {{CREATE VIEW t(a) AS VALUES (1), (2);}} > {{SELECT * FROM t WHERE a in (SELECT t.*)}} > {{SELECT * FROM t, LATERAL (SELECT t.*)}} > {{org.apache.spark.sql.AnalysisException: cannot resolve 't.*' given input > columns '';}} > Instead, we should try to resolve star expressions in subquery first using > the inner attributes and then using the outer query attributes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35618) Resolve star expressions in subquery
[ https://issues.apache.org/jira/browse/SPARK-35618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35618. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32787 [https://github.com/apache/spark/pull/32787] > Resolve star expressions in subquery > > > Key: SPARK-35618 > URL: https://issues.apache.org/jira/browse/SPARK-35618 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.2.0 > > > Currently, Spark does not resolve star expressions in subqueries correctly. > It can only resolve the star expressions using the inner query attributes. > For example: > {{CREATE VIEW t(a) AS VALUES (1), (2);}} > {{SELECT * FROM t WHERE a in (SELECT t.*)}} > {{SELECT * FROM t, LATERAL (SELECT t.*)}} > {{org.apache.spark.sql.AnalysisException: cannot resolve 't.*' given input > columns '';}} > Instead, we should try to resolve star expressions in subquery first using > the inner attributes and then using the outer query attributes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35955) Fix decimal overflow issues for Average
[ https://issues.apache.org/jira/browse/SPARK-35955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372566#comment-17372566 ] dgd_contributor commented on SPARK-35955: - I will raise a pull request soon > Fix decimal overflow issues for Average > --- > > Key: SPARK-35955 > URL: https://issues.apache.org/jira/browse/SPARK-35955 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Karen Feng >Priority: Major > > Fix decimal overflow issues for decimal average in ANSI mode. Linked to > SPARK-32018 and SPARK-28067, which address decimal sum. > Repro: > > {code:java} > import org.apache.spark.sql.functions._ > spark.conf.set("spark.sql.ansi.enabled", true) > val df = Seq( > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 1), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2), > (BigDecimal("1000"), 2)).toDF("decNum", "intNum") > val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, > "intNum").agg(mean("decNum")) > df2.show(40,false) > {code} > > Should throw an exception (as sum overflows), but instead returns: > > {code:java} > +---+ > |avg(decNum)| > +---+ > |null | > +---+{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35721) Path level discover for python unittests
[ https://issues.apache.org/jira/browse/SPARK-35721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372560#comment-17372560 ] Apache Spark commented on SPARK-35721: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/33174 > Path level discover for python unittests > > > Key: SPARK-35721 > URL: https://issues.apache.org/jira/browse/SPARK-35721 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Priority: Major > > Now we need to specify the python test cases by manually when we add a new > testcase. Sometime, we forgot to add the testcase to module list, the > testcase would not be executed. > Such as: > * pyspark-core pyspark.tests.test_pin_thread > Thus we need some auto-discover way to find all testcase rather than > specified every case by manually. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35721) Path level discover for python unittests
[ https://issues.apache.org/jira/browse/SPARK-35721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372559#comment-17372559 ] Apache Spark commented on SPARK-35721: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/33174 > Path level discover for python unittests > > > Key: SPARK-35721 > URL: https://issues.apache.org/jira/browse/SPARK-35721 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Priority: Major > > Now we need to specify the python test cases by manually when we add a new > testcase. Sometime, we forgot to add the testcase to module list, the > testcase would not be executed. > Such as: > * pyspark-core pyspark.tests.test_pin_thread > Thus we need some auto-discover way to find all testcase rather than > specified every case by manually. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"
[ https://issues.apache.org/jira/browse/SPARK-35971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372555#comment-17372555 ] Apache Spark commented on SPARK-35971: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/33173 > Rename the type name of TimestampNTZType as "timestamp_ntz" > --- > > Key: SPARK-35971 > URL: https://issues.apache.org/jira/browse/SPARK-35971 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Rename the type name string of TimestampNTZType from "timestamp without time > zone" to "timestamp_ntz". > This is to make the column header shorter and simpler. > Snowflake and Flink uses similar approach: > https://docs.snowflake.com/en/sql-reference/data-types-datetime.html > https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"
[ https://issues.apache.org/jira/browse/SPARK-35971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35971: Assignee: Apache Spark (was: Gengliang Wang) > Rename the type name of TimestampNTZType as "timestamp_ntz" > --- > > Key: SPARK-35971 > URL: https://issues.apache.org/jira/browse/SPARK-35971 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Rename the type name string of TimestampNTZType from "timestamp without time > zone" to "timestamp_ntz". > This is to make the column header shorter and simpler. > Snowflake and Flink uses similar approach: > https://docs.snowflake.com/en/sql-reference/data-types-datetime.html > https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"
[ https://issues.apache.org/jira/browse/SPARK-35971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35971: Assignee: Gengliang Wang (was: Apache Spark) > Rename the type name of TimestampNTZType as "timestamp_ntz" > --- > > Key: SPARK-35971 > URL: https://issues.apache.org/jira/browse/SPARK-35971 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Rename the type name string of TimestampNTZType from "timestamp without time > zone" to "timestamp_ntz". > This is to make the column header shorter and simpler. > Snowflake and Flink uses similar approach: > https://docs.snowflake.com/en/sql-reference/data-types-datetime.html > https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"
[ https://issues.apache.org/jira/browse/SPARK-35971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372553#comment-17372553 ] Apache Spark commented on SPARK-35971: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/33173 > Rename the type name of TimestampNTZType as "timestamp_ntz" > --- > > Key: SPARK-35971 > URL: https://issues.apache.org/jira/browse/SPARK-35971 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Rename the type name string of TimestampNTZType from "timestamp without time > zone" to "timestamp_ntz". > This is to make the column header shorter and simpler. > Snowflake and Flink uses similar approach: > https://docs.snowflake.com/en/sql-reference/data-types-datetime.html > https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35968) Make sure partitions are not too small in AQE partition coalescing
[ https://issues.apache.org/jira/browse/SPARK-35968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35968: Assignee: Apache Spark > Make sure partitions are not too small in AQE partition coalescing > -- > > Key: SPARK-35968 > URL: https://issues.apache.org/jira/browse/SPARK-35968 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35968) Make sure partitions are not too small in AQE partition coalescing
[ https://issues.apache.org/jira/browse/SPARK-35968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35968: Assignee: (was: Apache Spark) > Make sure partitions are not too small in AQE partition coalescing > -- > > Key: SPARK-35968 > URL: https://issues.apache.org/jira/browse/SPARK-35968 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35968) Make sure partitions are not too small in AQE partition coalescing
[ https://issues.apache.org/jira/browse/SPARK-35968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372552#comment-17372552 ] Apache Spark commented on SPARK-35968: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33172 > Make sure partitions are not too small in AQE partition coalescing > -- > > Key: SPARK-35968 > URL: https://issues.apache.org/jira/browse/SPARK-35968 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35971) Rename the type name of TimestampNTZType as "timestamp_ntz"
Gengliang Wang created SPARK-35971: -- Summary: Rename the type name of TimestampNTZType as "timestamp_ntz" Key: SPARK-35971 URL: https://issues.apache.org/jira/browse/SPARK-35971 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Rename the type name string of TimestampNTZType from "timestamp without time zone" to "timestamp_ntz". This is to make the column header shorter and simpler. Snowflake and Flink uses similar approach: https://docs.snowflake.com/en/sql-reference/data-types-datetime.html https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35963) Rename TimestampWithoutTZType to TimestampNTZType
[ https://issues.apache.org/jira/browse/SPARK-35963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35963. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33167 [https://github.com/apache/spark/pull/33167] > Rename TimestampWithoutTZType to TimestampNTZType > - > > Key: SPARK-35963 > URL: https://issues.apache.org/jira/browse/SPARK-35963 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > The time name of `TimestampWithoutTZType` is verbose. Rename it as > `TimestampNTZType` so that > 1. it is easier to read and type. > 2. As we have the function to_timestamp_ntz, this makes the names consistent. > 3. We will introduce a new SQL configuration `spark.sql.timestampType` for > the default timestamp type. The configuration values can be "TIMESTMAP_NTZ" > or "TIMESTMAP_LTZ" for simplicity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35969) Make the pod prefix more readable and tallied with K8S DNS Label Names
[ https://issues.apache.org/jira/browse/SPARK-35969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35969: Assignee: Apache Spark > Make the pod prefix more readable and tallied with K8S DNS Label Names > -- > > Key: SPARK-35969 > URL: https://issues.apache.org/jira/browse/SPARK-35969 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > By default, the executor pod prefix is generated by the app name. It handles > characters that match [^a-z0-9\\-] differently. The '.' and all whitespaces > will be converted to '-', but other ones to empty string. Especially, > characters like '_', '|' are commonly used as a word separator in many > languages. > According to the K8S DNS Label Names, see > [https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names,] > we can convert all special characters to `-`. > > {code:scala} > scala> "time.is%the¥most$valuable_——thing,it's about > time.".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-") > res9: String = time-is-the-most-valuable-thing-it-s-about-time- > scala> "time.is%the¥most$valuable_——thing,it's about > time.".replaceAll("\\s+", "-").replaceAll("\\.", > "-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-") > res10: String = time-isthemostvaluablethingits-about-time- > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35970) Allow predicate for pyspark.sql.functions.array_sort
Ramanan Subramanian created SPARK-35970: --- Summary: Allow predicate for pyspark.sql.functions.array_sort Key: SPARK-35970 URL: https://issues.apache.org/jira/browse/SPARK-35970 Project: Spark Issue Type: Wish Components: SQL Affects Versions: 3.1.2 Reporter: Ramanan Subramanian Currently, both the Python API and the Scala API for the SQL function `array_sort` do not take a predicate boolean function/lambda expression as a second argument. Hence, we have to resort to `expression` or `selectExpression` and use the DSL for the predicate function. It would be nice to allow this, just like all the higher-order functions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org