[jira] [Resolved] (SPARK-40323) Update ORC to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-40323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-40323. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37787 [https://github.com/apache/spark/pull/37787] > Update ORC to 1.8.0 > --- > > Key: SPARK-40323 > URL: https://issues.apache.org/jira/browse/SPARK-40323 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: William Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40323) Update ORC to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-40323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-40323: - Assignee: William Hyun > Update ORC to 1.8.0 > --- > > Key: SPARK-40323 > URL: https://issues.apache.org/jira/browse/SPARK-40323 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: William Hyun >Assignee: William Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40323) Update ORC to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-40323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40323: Assignee: (was: Apache Spark) > Update ORC to 1.8.0 > --- > > Key: SPARK-40323 > URL: https://issues.apache.org/jira/browse/SPARK-40323 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40323) Update ORC to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-40323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40323: Assignee: Apache Spark > Update ORC to 1.8.0 > --- > > Key: SPARK-40323 > URL: https://issues.apache.org/jira/browse/SPARK-40323 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: William Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40323) Update ORC to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-40323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1763#comment-1763 ] Apache Spark commented on SPARK-40323: -- User 'williamhyun' has created a pull request for this issue: https://github.com/apache/spark/pull/37787 > Update ORC to 1.8.0 > --- > > Key: SPARK-40323 > URL: https://issues.apache.org/jira/browse/SPARK-40323 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40323) Update ORC to 1.8.0
William Hyun created SPARK-40323: Summary: Update ORC to 1.8.0 Key: SPARK-40323 URL: https://issues.apache.org/jira/browse/SPARK-40323 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.4.0 Reporter: William Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40308) str_to_map should accept non-foldable delimiter arguments
[ https://issues.apache.org/jira/browse/SPARK-40308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-40308: Assignee: Bruce Robbins > str_to_map should accept non-foldable delimiter arguments > - > > Key: SPARK-40308 > URL: https://issues.apache.org/jira/browse/SPARK-40308 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Minor > > Currently, str_to_map requires the delimiter arguments to be foldable > expressions. For example, the following doesn't work in Spark SQL: > {noformat} > drop table if exists maptbl; > create table maptbl as select ',' as del1, ':' as del2, 'a:1,b:2,c:3' as str; > insert into table maptbl select '%' as del1, '-' as del2, 'a-1%b-2%c-3' as > str; > select str, str_to_map(str, del1, del2) from maptbl; > {noformat} > You get the following error: > {noformat} > str_to_map's delimiters must be foldable.; line 1 pos 12; > {noformat} > However, the above example SQL statements do work in Hive 2.3.9. There, you > get: > {noformat} > +--++ > | str |_c1 | > +--++ > | a:1,b:2,c:3 | {"a":"1","b":"2","c":"3"} | > | a-1%b-2%c-3 | {"a":"1","b":"2","c":"3"} | > +--++ > 2 rows selected (0.13 seconds) > {noformat} > It's unlikely that an input table would have the needed delimiters in > columns. The use-case is more likely to be something like this, where the > delimiters are determined based on some other value: > {noformat} > select > str, > str_to_map(str, ',', if(region = 0, ':', '#')) as m > from > maptbl2; > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40308) str_to_map should accept non-foldable delimiter arguments
[ https://issues.apache.org/jira/browse/SPARK-40308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40308. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37763 [https://github.com/apache/spark/pull/37763] > str_to_map should accept non-foldable delimiter arguments > - > > Key: SPARK-40308 > URL: https://issues.apache.org/jira/browse/SPARK-40308 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Minor > Fix For: 3.4.0 > > > Currently, str_to_map requires the delimiter arguments to be foldable > expressions. For example, the following doesn't work in Spark SQL: > {noformat} > drop table if exists maptbl; > create table maptbl as select ',' as del1, ':' as del2, 'a:1,b:2,c:3' as str; > insert into table maptbl select '%' as del1, '-' as del2, 'a-1%b-2%c-3' as > str; > select str, str_to_map(str, del1, del2) from maptbl; > {noformat} > You get the following error: > {noformat} > str_to_map's delimiters must be foldable.; line 1 pos 12; > {noformat} > However, the above example SQL statements do work in Hive 2.3.9. There, you > get: > {noformat} > +--++ > | str |_c1 | > +--++ > | a:1,b:2,c:3 | {"a":"1","b":"2","c":"3"} | > | a-1%b-2%c-3 | {"a":"1","b":"2","c":"3"} | > +--++ > 2 rows selected (0.13 seconds) > {noformat} > It's unlikely that an input table would have the needed delimiters in > columns. The use-case is more likely to be something like this, where the > delimiters are determined based on some other value: > {noformat} > select > str, > str_to_map(str, ',', if(region = 0, ':', '#')) as m > from > maptbl2; > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-40316 ] Sachit deleted comment on SPARK-40316: was (Author: JIRAUSER287754): Hi [~srowen] , Yes even I have tried that. Please see the what UDF gives when we put it in spark-shell Earlier Version giving false for nullability. SparkUserDefinedFunction($Lambda$5089/955282812@57e5546c,ArrayType(LongType,{*}false{*}),List(Some(class[value[0]: array]), Some(class[value[0]: array]), Some(class[value[0]: int]), Some(class[value[0]: int])),Some(class[value[0]: array]),None,true,true) I have handled nullability so that it can return null SparkUserDefinedFunction($Lambda$5088/1085757601@7fd002e3,ArrayType(LongType,{*}true{*}),List(Some(class[value[0]: array]), Some(class[value[0]: array]), Some(class[value[0]: int]), Some(class[value[0]: int])),Some(class[value[0]: array]),None,true,true) > Upgrading to Spark 3 is giving NullPointerException > --- > > Key: SPARK-40316 > URL: https://issues.apache.org/jira/browse/SPARK-40316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: Sachit >Priority: Major > > Getting below error while upgrading to Spark3 > > java.lang.RuntimeException: Error while decoding: > java.lang.NullPointerException: Null value appeared in non-nullable field: > - array element class: "scala.Long" > - root class: "scala.collection.Seq" > If the schema is inferred from a Scala tuple/case class, or a Java bean, > please try to use scala.Option[_] or other nullable types (e.g. > java.lang.Integer instead of int/scala.Int). > mapobjects(lambdavariable(MapObject, LongType, true, -1), > assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, > array, true], Some(interface scala.collection.Seq)) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599984#comment-17599984 ] Apache Spark commented on SPARK-40142: -- User 'khalidmammadov' has created a pull request for this issue: https://github.com/apache/spark/pull/37786 > Make pyspark.sql.functions examples self-contained > -- > > Key: SPARK-40142 > URL: https://issues.apache.org/jira/browse/SPARK-40142 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599983#comment-17599983 ] Apache Spark commented on SPARK-40142: -- User 'khalidmammadov' has created a pull request for this issue: https://github.com/apache/spark/pull/37786 > Make pyspark.sql.functions examples self-contained > -- > > Key: SPARK-40142 > URL: https://issues.apache.org/jira/browse/SPARK-40142 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.
[ https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hgs updated SPARK-40288: Description: {{--table}} {{create}} {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}} {{--data}} {{insert}} {{overwrite }}{{table}} {{miss_expr }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}} {{--failure sql}} {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n }}{{{}from{}}}{{{}({}}} {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}} {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}} --error stack {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) 100 else 200#12]}} {{at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}} {{at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}} {{at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}} was: {{--table}} {{create}} {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}} {{--data}} {{insert}} {{overwrite }}{{table}} {{miss_expr }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}} {{--failure sql}} {{insert}} {{overwrite }}{{table}} {{miss_expr}} {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n }}{{{}from{}}}{{{}({}}} {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}} {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}} --error stack {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) 100 else 200#12]}} {{at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}} {{at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}} {{at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}} > After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should > applied to avoid attribute missing when use complex expression. > -- > > Key: SPARK-40288 > URL: https://issues.apache.org/jira/browse/SPARK-40288 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 > Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0 >Reporter: hgs >Priority: Minor > > {{--table}} > {{create}} {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} > {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}} > {{--data}} > {{insert}} {{overwrite }}{{table}} {{miss_expr > }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}} > {{--failure sql}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n > }}{{{}from{}}}{{{}({}}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage > }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}} > {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}} > --error stack > {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in > [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) > 100 else 200#12]}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}} > {{at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.
[ https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599975#comment-17599975 ] Apache Spark commented on SPARK-40288: -- User 'hgs19921112' has created a pull request for this issue: https://github.com/apache/spark/pull/37785 > After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should > applied to avoid attribute missing when use complex expression. > -- > > Key: SPARK-40288 > URL: https://issues.apache.org/jira/browse/SPARK-40288 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 > Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0 >Reporter: hgs >Priority: Minor > > {{--table}} > {{create}} {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} > {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}} > {{--data}} > {{insert}} {{overwrite }}{{table}} {{miss_expr > }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}} > {{--failure sql}} > {{insert}} {{overwrite }}{{table}} {{miss_expr}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n > }}{{{}from{}}}{{{}({}}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage > }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}} > {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}} > --error stack > {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in > [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) > 100 else 200#12]}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}} > {{at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.
[ https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599974#comment-17599974 ] Apache Spark commented on SPARK-40288: -- User 'hgs19921112' has created a pull request for this issue: https://github.com/apache/spark/pull/37785 > After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should > applied to avoid attribute missing when use complex expression. > -- > > Key: SPARK-40288 > URL: https://issues.apache.org/jira/browse/SPARK-40288 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 > Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0 >Reporter: hgs >Priority: Minor > > {{--table}} > {{create}} {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} > {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}} > {{--data}} > {{insert}} {{overwrite }}{{table}} {{miss_expr > }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}} > {{--failure sql}} > {{insert}} {{overwrite }}{{table}} {{miss_expr}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n > }}{{{}from{}}}{{{}({}}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage > }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}} > {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}} > --error stack > {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in > [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) > 100 else 200#12]}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}} > {{at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.
[ https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599972#comment-17599972 ] Apache Spark commented on SPARK-40288: -- User 'hgs19921112' has created a pull request for this issue: https://github.com/apache/spark/pull/37784 > After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should > applied to avoid attribute missing when use complex expression. > -- > > Key: SPARK-40288 > URL: https://issues.apache.org/jira/browse/SPARK-40288 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 > Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0 >Reporter: hgs >Priority: Minor > > {{--table}} > {{create}} {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} > {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}} > {{--data}} > {{insert}} {{overwrite }}{{table}} {{miss_expr > }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}} > {{--failure sql}} > {{insert}} {{overwrite }}{{table}} {{miss_expr}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n > }}{{{}from{}}}{{{}({}}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage > }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}} > {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}} > --error stack > {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in > [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) > 100 else 200#12]}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}} > {{at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.
[ https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599971#comment-17599971 ] Apache Spark commented on SPARK-40288: -- User 'hgs19921112' has created a pull request for this issue: https://github.com/apache/spark/pull/37784 > After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should > applied to avoid attribute missing when use complex expression. > -- > > Key: SPARK-40288 > URL: https://issues.apache.org/jira/browse/SPARK-40288 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 > Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0 >Reporter: hgs >Priority: Minor > > {{--table}} > {{create}} {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} > {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}} > {{--data}} > {{insert}} {{overwrite }}{{table}} {{miss_expr > }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}} > {{--failure sql}} > {{insert}} {{overwrite }}{{table}} {{miss_expr}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n > }}{{{}from{}}}{{{}({}}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage > }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}} > {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}} > --error stack > {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in > [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) > 100 else 200#12]}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}} > {{at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599961#comment-17599961 ] Sachit commented on SPARK-40316: Hi [~srowen] , Yes even I have tried that. Please see the what UDF gives when we put it in spark-shell Earlier Version giving false for nullability. SparkUserDefinedFunction($Lambda$5089/955282812@57e5546c,ArrayType(LongType,{*}false{*}),List(Some(class[value[0]: array]), Some(class[value[0]: array]), Some(class[value[0]: int]), Some(class[value[0]: int])),Some(class[value[0]: array]),None,true,true) I have handled nullability so that it can return null SparkUserDefinedFunction($Lambda$5088/1085757601@7fd002e3,ArrayType(LongType,{*}true{*}),List(Some(class[value[0]: array]), Some(class[value[0]: array]), Some(class[value[0]: int]), Some(class[value[0]: int])),Some(class[value[0]: array]),None,true,true) > Upgrading to Spark 3 is giving NullPointerException > --- > > Key: SPARK-40316 > URL: https://issues.apache.org/jira/browse/SPARK-40316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: Sachit >Priority: Major > > Getting below error while upgrading to Spark3 > > java.lang.RuntimeException: Error while decoding: > java.lang.NullPointerException: Null value appeared in non-nullable field: > - array element class: "scala.Long" > - root class: "scala.collection.Seq" > If the schema is inferred from a Scala tuple/case class, or a Java bean, > please try to use scala.Option[_] or other nullable types (e.g. > java.lang.Integer instead of int/scala.Int). > mapobjects(lambdavariable(MapObject, LongType, true, -1), > assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, > array, true], Some(interface scala.collection.Seq)) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599960#comment-17599960 ] Sean R. Owen commented on SPARK-40316: -- It's possible this was handled differently in earlier Spark/Scala differently (results in 0s?) but it still points to an error in your UDF. Why not pursue that? > Upgrading to Spark 3 is giving NullPointerException > --- > > Key: SPARK-40316 > URL: https://issues.apache.org/jira/browse/SPARK-40316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: Sachit >Priority: Major > > Getting below error while upgrading to Spark3 > > java.lang.RuntimeException: Error while decoding: > java.lang.NullPointerException: Null value appeared in non-nullable field: > - array element class: "scala.Long" > - root class: "scala.collection.Seq" > If the schema is inferred from a Scala tuple/case class, or a Java bean, > please try to use scala.Option[_] or other nullable types (e.g. > java.lang.Integer instead of int/scala.Int). > mapobjects(lambdavariable(MapObject, LongType, true, -1), > assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, > array, true], Some(interface scala.collection.Seq)) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40316. -- Resolution: Not A Problem > Upgrading to Spark 3 is giving NullPointerException > --- > > Key: SPARK-40316 > URL: https://issues.apache.org/jira/browse/SPARK-40316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: Sachit >Priority: Major > > Getting below error while upgrading to Spark3 > > java.lang.RuntimeException: Error while decoding: > java.lang.NullPointerException: Null value appeared in non-nullable field: > - array element class: "scala.Long" > - root class: "scala.collection.Seq" > If the schema is inferred from a Scala tuple/case class, or a Java bean, > please try to use scala.Option[_] or other nullable types (e.g. > java.lang.Integer instead of int/scala.Int). > mapobjects(lambdavariable(MapObject, LongType, true, -1), > assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, > array, true], Some(interface scala.collection.Seq)) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599931#comment-17599931 ] Sachit edited comment on SPARK-40316 at 9/3/22 4:01 PM: Hi [~srowen] Yes , but this was working in Spark 2.4 , are there any changes as post putting Spark3 it is failing(On same dataset) was (Author: JIRAUSER287754): Yes , but this was working in Spark 2.4 , are there any changes as post putting Spark3 it is failing(On same dataset) > Upgrading to Spark 3 is giving NullPointerException > --- > > Key: SPARK-40316 > URL: https://issues.apache.org/jira/browse/SPARK-40316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: Sachit >Priority: Major > > Getting below error while upgrading to Spark3 > > java.lang.RuntimeException: Error while decoding: > java.lang.NullPointerException: Null value appeared in non-nullable field: > - array element class: "scala.Long" > - root class: "scala.collection.Seq" > If the schema is inferred from a Scala tuple/case class, or a Java bean, > please try to use scala.Option[_] or other nullable types (e.g. > java.lang.Integer instead of int/scala.Int). > mapobjects(lambdavariable(MapObject, LongType, true, -1), > assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, > array, true], Some(interface scala.collection.Seq)) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-39996) Upgrade postgresql to 42.5.0
[ https://issues.apache.org/jira/browse/SPARK-39996 ] Bjørn Jørgensen deleted comment on SPARK-39996: - was (Author: bjornjorgensen): [GA testes failed|https://github.com/bjornjorgensen/spark/runs/7705423158?check_suite_focus=true] > Upgrade postgresql to 42.5.0 > > > Key: SPARK-39996 > URL: https://issues.apache.org/jira/browse/SPARK-39996 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Security > - fix: > [CVE-2022-31197|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-31197] > Fixes SQL generated in PgResultSet.refresh() to escape column identifiers so > as to prevent SQL injection. > - Previously, the column names for both key and data columns in the table > were copied as-is into the generated > SQL. This allowed a malicious table with column names that include > statement terminator to be parsed and > executed as multiple separate commands. > - Also adds a new test class ResultSetRefreshTest to verify this change. > - Reported by [Sho Kato](https://github.com/kato-sho) > [Release > note|https://github.com/pgjdbc/pgjdbc/commit/bd91c4cc76cdfc1ffd0322be80c85ddfe08a38c2] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40322) Fix all dead links
Yuming Wang created SPARK-40322: --- Summary: Fix all dead links Key: SPARK-40322 URL: https://issues.apache.org/jira/browse/SPARK-40322 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 3.4.0 Reporter: Yuming Wang https://www.deadlinkchecker.com/website-dead-link-checker.asp ||Status||URL||Source link text|| |-1 Not found: The server name or address could not be resolved|[http://engineering.ooyala.com/blog/using-parquet-and-scrooge-spark]|[Using Parquet and Scrooge with Spark|https://spark.apache.org/documentation.html]| |-1 Not found: The server name or address could not be resolved|[http://blinkdb.org/]|[BlinkDB|https://spark.apache.org/third-party-projects.html]| |404 Not Found|[https://github.com/AyasdiOpenSource/df]|[DF|https://spark.apache.org/third-party-projects.html]| |-1 Timeout|[https://atp.io/]|[atp|https://spark.apache.org/powered-by.html]| |-1 Not found: The server name or address could not be resolved|[http://www.sehir.edu.tr/en/]|[Istanbul Sehir University|https://spark.apache.org/powered-by.html]| |404 Not Found|[http://nsn.com/]|[Nokia Solutions and Networks|https://spark.apache.org/powered-by.html]| |-1 Not found: The server name or address could not be resolved|[http://www.nubetech.co/]|[Nube Technologies|https://spark.apache.org/powered-by.html]| |-1 Timeout|[http://ooyala.com/]|[Ooyala, Inc.|https://spark.apache.org/powered-by.html]| |-1 Not found: The server name or address could not be resolved|[http://engineering.ooyala.com/blog/fast-spark-queries-memory-datasets]|[Spark for Fast Queries|https://spark.apache.org/powered-by.html]| |-1 Not found: The server name or address could not be resolved|[http://www.sisa.samsung.com/]|[Samsung Research America|https://spark.apache.org/powered-by.html]| |-1 Timeout|[https://checker.apache.org/projs/spark.html]|[https://checker.apache.org/projs/spark.html|https://spark.apache.org/release-process.html]| |404 Not Found|[https://ampcamp.berkeley.edu/amp-camp-two-strata-2013/]|[AMP Camp 2 [302 from http://ampcamp.berkeley.edu/amp-camp-two-strata-2013/]|https://spark.apache.org/documentation.html]| |404 Not Found|[https://ampcamp.berkeley.edu/agenda-2012/]|[AMP Camp 1 [302 from http://ampcamp.berkeley.edu/agenda-2012/]|https://spark.apache.org/documentation.html]| |404 Not Found|[https://ampcamp.berkeley.edu/4/]|[AMP Camp 4 [302 from http://ampcamp.berkeley.edu/4/]|https://spark.apache.org/documentation.html]| |404 Not Found|[https://ampcamp.berkeley.edu/3/]|[AMP Camp 3 [302 from http://ampcamp.berkeley.edu/3/]|https://spark.apache.org/documentation.html]| |500 Internal Server Error|[https://www.packtpub.com/product/spark-cookbook/9781783987061]|[Spark Cookbook [301 from https://www.packtpub.com/big-data-and-business-intelligence/spark-cookbook]|https://spark.apache.org/documentation.html]| |500 Internal Server Error|[https://www.packtpub.com/product/apache-spark-graph-processing/9781784391805]|[Apache Spark Graph Processing [301 from https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing]|https://spark.apache.org/documentation.html]| |500 Internal Server Error|[https://prevalentdesignevents.com/sparksummit/eu17/]|[register|https://spark.apache.org/news/]| |500 Internal Server Error|[https://prevalentdesignevents.com/sparksummit/ss17/?_ga=1.211902866.780052874.1433437196]|[register|https://spark.apache.org/news/]| |500 Internal Server Error|[https://www.prevalentdesignevents.com/sparksummit2015/europe/registration.aspx?source=header]|[register|https://spark.apache.org/news/]| |500 Internal Server Error|[https://www.prevalentdesignevents.com/sparksummit2015/europe/speaker/]|[Spark Summit Europe|https://spark.apache.org/news/]| |-1 Timeout|[http://strataconf.com/strata2013]|[Strata|https://spark.apache.org/news/]| |-1 Not found: The server name or address could not be resolved|[http://blog.quantifind.com/posts/spark-unit-test/]|[Unit testing with Spark|https://spark.apache.org/news/]| |-1 Not found: The server name or address could not be resolved|[http://blog.quantifind.com/posts/logging-post/]|[Configuring Spark's logs|https://spark.apache.org/news/]| |-1 Timeout|[http://strata.oreilly.com/2012/08/seven-reasons-why-i-like-spark.html]|[Spark|https://spark.apache.org/news/]| |-1 Timeout|[http://strata.oreilly.com/2012/11/shark-real-time-queries-and-analytics-for-big-data.html]|[Shark|https://spark.apache.org/news/]| |-1 Timeout|[http://strata.oreilly.com/2012/10/spark-0-6-improves-performance-and-accessibility.html]|[Spark 0.6 release|https://spark.apache.org/news/]| |404 Not Found|[http://data-informed.com/spark-an-open-source-engine-for-iterative-data-mining/]|[DataInformed|https://spark.apache.org/news/]| |-1 Timeout|[http://strataconf.com/strata2013/public/schedule/detail/27438]|[introduction to Spark,
[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599931#comment-17599931 ] Sachit commented on SPARK-40316: Yes , but this was working in Spark 2.4 , are there any changes as post putting Spark3 it is failing(On same dataset) > Upgrading to Spark 3 is giving NullPointerException > --- > > Key: SPARK-40316 > URL: https://issues.apache.org/jira/browse/SPARK-40316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: Sachit >Priority: Major > > Getting below error while upgrading to Spark3 > > java.lang.RuntimeException: Error while decoding: > java.lang.NullPointerException: Null value appeared in non-nullable field: > - array element class: "scala.Long" > - root class: "scala.collection.Seq" > If the schema is inferred from a Scala tuple/case class, or a Java bean, > please try to use scala.Option[_] or other nullable types (e.g. > java.lang.Integer instead of int/scala.Int). > mapobjects(lambdavariable(MapObject, LongType, true, -1), > assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, > array, true], Some(interface scala.collection.Seq)) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599930#comment-17599930 ] Sean R. Owen commented on SPARK-40316: -- This says your UDF returns a Seq containing null, but the signature says it's going to be a Seq of primitive longs which can't be null > Upgrading to Spark 3 is giving NullPointerException > --- > > Key: SPARK-40316 > URL: https://issues.apache.org/jira/browse/SPARK-40316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: Sachit >Priority: Major > > Getting below error while upgrading to Spark3 > > java.lang.RuntimeException: Error while decoding: > java.lang.NullPointerException: Null value appeared in non-nullable field: > - array element class: "scala.Long" > - root class: "scala.collection.Seq" > If the schema is inferred from a Scala tuple/case class, or a Java bean, > please try to use scala.Option[_] or other nullable types (e.g. > java.lang.Integer instead of int/scala.Int). > mapobjects(lambdavariable(MapObject, LongType, true, -1), > assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, > array, true], Some(interface scala.collection.Seq)) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40321) Upgrade rocksdbjni to 7.5.3
[ https://issues.apache.org/jira/browse/SPARK-40321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599919#comment-17599919 ] Apache Spark commented on SPARK-40321: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/37783 > Upgrade rocksdbjni to 7.5.3 > --- > > Key: SPARK-40321 > URL: https://issues.apache.org/jira/browse/SPARK-40321 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/facebook/rocksdb/releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40321) Upgrade rocksdbjni to 7.5.3
[ https://issues.apache.org/jira/browse/SPARK-40321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40321: Assignee: (was: Apache Spark) > Upgrade rocksdbjni to 7.5.3 > --- > > Key: SPARK-40321 > URL: https://issues.apache.org/jira/browse/SPARK-40321 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/facebook/rocksdb/releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40321) Upgrade rocksdbjni to 7.5.3
[ https://issues.apache.org/jira/browse/SPARK-40321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599918#comment-17599918 ] Apache Spark commented on SPARK-40321: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/37783 > Upgrade rocksdbjni to 7.5.3 > --- > > Key: SPARK-40321 > URL: https://issues.apache.org/jira/browse/SPARK-40321 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/facebook/rocksdb/releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40321) Upgrade rocksdbjni to 7.5.3
[ https://issues.apache.org/jira/browse/SPARK-40321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40321: Assignee: Apache Spark > Upgrade rocksdbjni to 7.5.3 > --- > > Key: SPARK-40321 > URL: https://issues.apache.org/jira/browse/SPARK-40321 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > https://github.com/facebook/rocksdb/releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40321) Upgrade rocksdbjni to 7.5.3
Yang Jie created SPARK-40321: Summary: Upgrade rocksdbjni to 7.5.3 Key: SPARK-40321 URL: https://issues.apache.org/jira/browse/SPARK-40321 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie https://github.com/facebook/rocksdb/releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.
[ https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599912#comment-17599912 ] Apache Spark commented on SPARK-40288: -- User 'hgs19921112' has created a pull request for this issue: https://github.com/apache/spark/pull/37782 > After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should > applied to avoid attribute missing when use complex expression. > -- > > Key: SPARK-40288 > URL: https://issues.apache.org/jira/browse/SPARK-40288 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 > Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0 >Reporter: hgs >Priority: Minor > > {{--table}} > {{create}} {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} > {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}} > {{--data}} > {{insert}} {{overwrite }}{{table}} {{miss_expr > }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}} > {{--failure sql}} > {{insert}} {{overwrite }}{{table}} {{miss_expr}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n > }}{{{}from{}}}{{{}({}}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage > }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}} > {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}} > --error stack > {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in > [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) > 100 else 200#12]}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}} > {{at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.
[ https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599911#comment-17599911 ] Apache Spark commented on SPARK-40288: -- User 'hgs19921112' has created a pull request for this issue: https://github.com/apache/spark/pull/37782 > After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should > applied to avoid attribute missing when use complex expression. > -- > > Key: SPARK-40288 > URL: https://issues.apache.org/jira/browse/SPARK-40288 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 > Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0 >Reporter: hgs >Priority: Minor > > {{--table}} > {{create}} {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} > {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}} > {{--data}} > {{insert}} {{overwrite }}{{table}} {{miss_expr > }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}} > {{--failure sql}} > {{insert}} {{overwrite }}{{table}} {{miss_expr}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n > }}{{{}from{}}}{{{}({}}} > {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage > }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}} > {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}} > --error stack > {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in > [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) > 100 else 200#12]}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}} > {{at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}} > {{at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33861) Simplify conditional in predicate
[ https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599868#comment-17599868 ] Yuming Wang edited comment on SPARK-33861 at 9/3/22 8:54 AM: - Note that only 3.2.0, 3.2.1, 3.2.2 and 3.3.0 include this optimization. We recovered it via [https://github.com/apache/spark/commit/43cbdc6ec9dbcf9ebe0b48e14852cec4af18b4ec] was (Author: q79969786): Note that only 3.2.0, 3.2.1 and 3.3.0 include this optimization. We recovered it via https://github.com/apache/spark/commit/43cbdc6ec9dbcf9ebe0b48e14852cec4af18b4ec > Simplify conditional in predicate > - > > Key: SPARK-33861 > URL: https://issues.apache.org/jira/browse/SPARK-33861 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > The use case is: > {noformat} > spark.sql("create table t1 using parquet as select id as a, id as b from > range(10)") > spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > > 5").explain() > {noformat} > Before this pr: > {noformat} > == Physical Plan == > *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END > +- *(1) ColumnarToRow >+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: > [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct > {noformat} > After this pr: > {noformat} > == Physical Plan == > *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND > ((b#4L + 10) > 5)) > +- *(1) ColumnarToRow >+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: > [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: > Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., > PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), > GreaterThan(a,2)], ReadSchema: struct > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33861) Simplify conditional in predicate
[ https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-33861. - Resolution: Won't Fix Note that only 3.2.0, 3.2.1 and 3.3.0 include this optimization. We recovered it via https://github.com/apache/spark/commit/43cbdc6ec9dbcf9ebe0b48e14852cec4af18b4ec > Simplify conditional in predicate > - > > Key: SPARK-33861 > URL: https://issues.apache.org/jira/browse/SPARK-33861 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > The use case is: > {noformat} > spark.sql("create table t1 using parquet as select id as a, id as b from > range(10)") > spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > > 5").explain() > {noformat} > Before this pr: > {noformat} > == Physical Plan == > *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END > +- *(1) ColumnarToRow >+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: > [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct > {noformat} > After this pr: > {noformat} > == Physical Plan == > *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND > ((b#4L + 10) > 5)) > +- *(1) ColumnarToRow >+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: > [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: > Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., > PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), > GreaterThan(a,2)], ReadSchema: struct > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-33861) Simplify conditional in predicate
[ https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reopened SPARK-33861: - Assignee: (was: Yuming Wang) > Simplify conditional in predicate > - > > Key: SPARK-33861 > URL: https://issues.apache.org/jira/browse/SPARK-33861 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > The use case is: > {noformat} > spark.sql("create table t1 using parquet as select id as a, id as b from > range(10)") > spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > > 5").explain() > {noformat} > Before this pr: > {noformat} > == Physical Plan == > *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END > +- *(1) ColumnarToRow >+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: > [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct > {noformat} > After this pr: > {noformat} > == Physical Plan == > *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND > ((b#4L + 10) > 5)) > +- *(1) ColumnarToRow >+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: > [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: > Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., > PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), > GreaterThan(a,2)], ReadSchema: struct > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33861) Simplify conditional in predicate
[ https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33861: Fix Version/s: (was: 3.2.0) > Simplify conditional in predicate > - > > Key: SPARK-33861 > URL: https://issues.apache.org/jira/browse/SPARK-33861 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > The use case is: > {noformat} > spark.sql("create table t1 using parquet as select id as a, id as b from > range(10)") > spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > > 5").explain() > {noformat} > Before this pr: > {noformat} > == Physical Plan == > *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END > +- *(1) ColumnarToRow >+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: > [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct > {noformat} > After this pr: > {noformat} > == Physical Plan == > *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND > ((b#4L + 10) > 5)) > +- *(1) ColumnarToRow >+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: > [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: > Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., > PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), > GreaterThan(a,2)], ReadSchema: struct > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599862#comment-17599862 ] Sachit commented on SPARK-40316: Hello [~srowen] , Please let me know if there is any suggestion. Thanks! > Upgrading to Spark 3 is giving NullPointerException > --- > > Key: SPARK-40316 > URL: https://issues.apache.org/jira/browse/SPARK-40316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: Sachit >Priority: Major > > Getting below error while upgrading to Spark3 > > java.lang.RuntimeException: Error while decoding: > java.lang.NullPointerException: Null value appeared in non-nullable field: > - array element class: "scala.Long" > - root class: "scala.collection.Seq" > If the schema is inferred from a Scala tuple/case class, or a Java bean, > please try to use scala.Option[_] or other nullable types (e.g. > java.lang.Integer instead of int/scala.Int). > mapobjects(lambdavariable(MapObject, LongType, true, -1), > assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, > array, true], Some(interface scala.collection.Seq)) > at > org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40033) Nested schema pruning support through element_at
[ https://issues.apache.org/jira/browse/SPARK-40033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-40033. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37463 [https://github.com/apache/spark/pull/37463] > Nested schema pruning support through element_at > > > Key: SPARK-40033 > URL: https://issues.apache.org/jira/browse/SPARK-40033 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > Fix For: 3.4.0 > > > The semantics of element_at is similar with GetArrayItem and GetMapValue, so > we can support do nested schema pruning if the inside data type is struct. > For example: > For a column schema: `c: array>` > With the query: `SELECT element_at(c, 1).s1` > The final pruned schema should be `c: array>` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40033) Nested schema pruning support through element_at
[ https://issues.apache.org/jira/browse/SPARK-40033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-40033: --- Assignee: XiDuo You > Nested schema pruning support through element_at > > > Key: SPARK-40033 > URL: https://issues.apache.org/jira/browse/SPARK-40033 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.4.0 > > > The semantics of element_at is similar with GetArrayItem and GetMapValue, so > we can support do nested schema pruning if the inside data type is struct. > For example: > For a column schema: `c: array>` > With the query: `SELECT element_at(c, 1).s1` > The final pruned schema should be `c: array>` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org