date:20220903

[jira] [Resolved] (SPARK-40323) Update ORC to 1.8.0

2022-09-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40323.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37787
[https://github.com/apache/spark/pull/37787]

> Update ORC to 1.8.0
> ---
>
> Key: SPARK-40323
> URL: https://issues.apache.org/jira/browse/SPARK-40323
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40323) Update ORC to 1.8.0

2022-09-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40323:
-

Assignee: William Hyun

> Update ORC to 1.8.0
> ---
>
> Key: SPARK-40323
> URL: https://issues.apache.org/jira/browse/SPARK-40323
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40323) Update ORC to 1.8.0

2022-09-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40323:


Assignee: (was: Apache Spark)

> Update ORC to 1.8.0
> ---
>
> Key: SPARK-40323
> URL: https://issues.apache.org/jira/browse/SPARK-40323
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40323) Update ORC to 1.8.0

2022-09-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40323:


Assignee: Apache Spark

> Update ORC to 1.8.0
> ---
>
> Key: SPARK-40323
> URL: https://issues.apache.org/jira/browse/SPARK-40323
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40323) Update ORC to 1.8.0

2022-09-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1763#comment-1763
 ] 

Apache Spark commented on SPARK-40323:
--

User 'williamhyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37787

> Update ORC to 1.8.0
> ---
>
> Key: SPARK-40323
> URL: https://issues.apache.org/jira/browse/SPARK-40323
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40323) Update ORC to 1.8.0

2022-09-03 Thread William Hyun (Jira)

William Hyun created SPARK-40323:


 Summary: Update ORC to 1.8.0
 Key: SPARK-40323
 URL: https://issues.apache.org/jira/browse/SPARK-40323
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.4.0
Reporter: William Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40308) str_to_map should accept non-foldable delimiter arguments

2022-09-03 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-40308:


Assignee: Bruce Robbins

> str_to_map should accept non-foldable delimiter arguments
> -
>
> Key: SPARK-40308
> URL: https://issues.apache.org/jira/browse/SPARK-40308
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Minor
>
> Currently, str_to_map requires the delimiter arguments to be foldable 
> expressions. For example, the following doesn't work in Spark SQL:
> {noformat}
> drop table if exists maptbl;
> create table maptbl as select ',' as del1, ':' as del2, 'a:1,b:2,c:3' as str;
> insert into table maptbl select '%' as del1, '-' as del2, 'a-1%b-2%c-3' as 
> str;
> select str, str_to_map(str, del1, del2) from maptbl;
> {noformat}
> You get the following error:
> {noformat}
> str_to_map's delimiters must be foldable.; line 1 pos 12;
> {noformat}
> However, the above example SQL statements do work in Hive 2.3.9. There, you 
> get:
> {noformat}
> +--++
> | str  |_c1 |
> +--++
> | a:1,b:2,c:3  | {"a":"1","b":"2","c":"3"}  |
> | a-1%b-2%c-3  | {"a":"1","b":"2","c":"3"}  |
> +--++
> 2 rows selected (0.13 seconds)
> {noformat}
> It's unlikely that an input table would have the needed delimiters in 
> columns. The use-case is more likely to be something like this, where the 
> delimiters are determined based on some other value:
> {noformat}
> select
>   str,
>   str_to_map(str, ',', if(region = 0, ':', '#')) as m
> from
>   maptbl2;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40308) str_to_map should accept non-foldable delimiter arguments

2022-09-03 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40308.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37763
[https://github.com/apache/spark/pull/37763]

> str_to_map should accept non-foldable delimiter arguments
> -
>
> Key: SPARK-40308
> URL: https://issues.apache.org/jira/browse/SPARK-40308
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Minor
> Fix For: 3.4.0
>
>
> Currently, str_to_map requires the delimiter arguments to be foldable 
> expressions. For example, the following doesn't work in Spark SQL:
> {noformat}
> drop table if exists maptbl;
> create table maptbl as select ',' as del1, ':' as del2, 'a:1,b:2,c:3' as str;
> insert into table maptbl select '%' as del1, '-' as del2, 'a-1%b-2%c-3' as 
> str;
> select str, str_to_map(str, del1, del2) from maptbl;
> {noformat}
> You get the following error:
> {noformat}
> str_to_map's delimiters must be foldable.; line 1 pos 12;
> {noformat}
> However, the above example SQL statements do work in Hive 2.3.9. There, you 
> get:
> {noformat}
> +--++
> | str  |_c1 |
> +--++
> | a:1,b:2,c:3  | {"a":"1","b":"2","c":"3"}  |
> | a-1%b-2%c-3  | {"a":"1","b":"2","c":"3"}  |
> +--++
> 2 rows selected (0.13 seconds)
> {noformat}
> It's unlikely that an input table would have the needed delimiters in 
> columns. The use-case is more likely to be something like this, where the 
> delimiters are determined based on some other value:
> {noformat}
> select
>   str,
>   str_to_map(str, ',', if(region = 0, ':', '#')) as m
> from
>   maptbl2;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-03 Thread Sachit (Jira)



[ https://issues.apache.org/jira/browse/SPARK-40316 ]


Sachit deleted comment on SPARK-40316:


was (Author: JIRAUSER287754):
Hi [~srowen] ,

Yes even I have tried that. Please see the what UDF gives when we put it in 
spark-shell



Earlier Version giving false for nullability. 

SparkUserDefinedFunction($Lambda$5089/955282812@57e5546c,ArrayType(LongType,{*}false{*}),List(Some(class[value[0]:
 array]), Some(class[value[0]: array]), Some(class[value[0]: 
int]), Some(class[value[0]: int])),Some(class[value[0]: 
array]),None,true,true)

 

I have handled nullability so that it can return null


SparkUserDefinedFunction($Lambda$5088/1085757601@7fd002e3,ArrayType(LongType,{*}true{*}),List(Some(class[value[0]:
 array]), Some(class[value[0]: array]), Some(class[value[0]: 
int]), Some(class[value[0]: int])),Some(class[value[0]: 
array]),None,true,true)

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-09-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599984#comment-17599984
 ] 

Apache Spark commented on SPARK-40142:
--

User 'khalidmammadov' has created a pull request for this issue:
https://github.com/apache/spark/pull/37786

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-09-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599983#comment-17599983
 ] 

Apache Spark commented on SPARK-40142:
--

User 'khalidmammadov' has created a pull request for this issue:
https://github.com/apache/spark/pull/37786

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.

2022-09-03 Thread hgs (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hgs updated SPARK-40288:

Description: 
{{--table}}
{{create}}  {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} 
{{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}}
{{--data}}
{{insert}} {{overwrite }}{{table}} {{miss_expr 
}}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}}
{{--failure sql}}

{{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n 
}}{{{}from{}}}{{{}({}}}
{{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage 
}}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}}
{{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}}

--error stack
{{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in 
[id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) 
100 else 200#12]}}
{{at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}}
{{at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}}
{{at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}}

  was:
{{--table}}
{{create}}  {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} 
{{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}}
{{--data}}
{{insert}} {{overwrite }}{{table}} {{miss_expr 
}}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}}
{{--failure sql}}
{{insert}} {{overwrite }}{{table}} {{miss_expr}}
{{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n 
}}{{{}from{}}}{{{}({}}}
{{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage 
}}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}}
{{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}}


--error stack
{{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in 
[id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) 
100 else 200#12]}}
{{at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}}
{{at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}}
{{at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}}


> After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should 
> applied to avoid attribute missing when use complex expression.
> --
>
> Key: SPARK-40288
> URL: https://issues.apache.org/jira/browse/SPARK-40288
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
> Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0
>Reporter: hgs
>Priority: Minor
>
> {{--table}}
> {{create}}  {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} 
> {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}}
> {{--data}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr 
> }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}}
> {{--failure sql}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n 
> }}{{{}from{}}}{{{}({}}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage 
> }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}}
> {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}}
> --error stack
> {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in 
> [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) 
> 100 else 200#12]}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}}
> {{at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.

2022-09-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599975#comment-17599975
 ] 

Apache Spark commented on SPARK-40288:
--

User 'hgs19921112' has created a pull request for this issue:
https://github.com/apache/spark/pull/37785

> After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should 
> applied to avoid attribute missing when use complex expression.
> --
>
> Key: SPARK-40288
> URL: https://issues.apache.org/jira/browse/SPARK-40288
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
> Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0
>Reporter: hgs
>Priority: Minor
>
> {{--table}}
> {{create}}  {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} 
> {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}}
> {{--data}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr 
> }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}}
> {{--failure sql}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n 
> }}{{{}from{}}}{{{}({}}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage 
> }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}}
> {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}}
> --error stack
> {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in 
> [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) 
> 100 else 200#12]}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}}
> {{at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.

2022-09-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599974#comment-17599974
 ] 

Apache Spark commented on SPARK-40288:
--

User 'hgs19921112' has created a pull request for this issue:
https://github.com/apache/spark/pull/37785

> After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should 
> applied to avoid attribute missing when use complex expression.
> --
>
> Key: SPARK-40288
> URL: https://issues.apache.org/jira/browse/SPARK-40288
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
> Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0
>Reporter: hgs
>Priority: Minor
>
> {{--table}}
> {{create}}  {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} 
> {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}}
> {{--data}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr 
> }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}}
> {{--failure sql}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n 
> }}{{{}from{}}}{{{}({}}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage 
> }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}}
> {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}}
> --error stack
> {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in 
> [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) 
> 100 else 200#12]}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}}
> {{at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.

2022-09-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599972#comment-17599972
 ] 

Apache Spark commented on SPARK-40288:
--

User 'hgs19921112' has created a pull request for this issue:
https://github.com/apache/spark/pull/37784

> After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should 
> applied to avoid attribute missing when use complex expression.
> --
>
> Key: SPARK-40288
> URL: https://issues.apache.org/jira/browse/SPARK-40288
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
> Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0
>Reporter: hgs
>Priority: Minor
>
> {{--table}}
> {{create}}  {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} 
> {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}}
> {{--data}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr 
> }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}}
> {{--failure sql}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n 
> }}{{{}from{}}}{{{}({}}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage 
> }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}}
> {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}}
> --error stack
> {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in 
> [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) 
> 100 else 200#12]}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}}
> {{at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.

2022-09-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599971#comment-17599971
 ] 

Apache Spark commented on SPARK-40288:
--

User 'hgs19921112' has created a pull request for this issue:
https://github.com/apache/spark/pull/37784

> After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should 
> applied to avoid attribute missing when use complex expression.
> --
>
> Key: SPARK-40288
> URL: https://issues.apache.org/jira/browse/SPARK-40288
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
> Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0
>Reporter: hgs
>Priority: Minor
>
> {{--table}}
> {{create}}  {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} 
> {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}}
> {{--data}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr 
> }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}}
> {{--failure sql}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n 
> }}{{{}from{}}}{{{}({}}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage 
> }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}}
> {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}}
> --error stack
> {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in 
> [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) 
> 100 else 200#12]}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}}
> {{at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-03 Thread Sachit (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599961#comment-17599961
 ] 

Sachit commented on SPARK-40316:


Hi [~srowen] ,

Yes even I have tried that. Please see the what UDF gives when we put it in 
spark-shell



Earlier Version giving false for nullability. 

SparkUserDefinedFunction($Lambda$5089/955282812@57e5546c,ArrayType(LongType,{*}false{*}),List(Some(class[value[0]:
 array]), Some(class[value[0]: array]), Some(class[value[0]: 
int]), Some(class[value[0]: int])),Some(class[value[0]: 
array]),None,true,true)

 

I have handled nullability so that it can return null


SparkUserDefinedFunction($Lambda$5088/1085757601@7fd002e3,ArrayType(LongType,{*}true{*}),List(Some(class[value[0]:
 array]), Some(class[value[0]: array]), Some(class[value[0]: 
int]), Some(class[value[0]: int])),Some(class[value[0]: 
array]),None,true,true)

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-03 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599960#comment-17599960
 ] 

Sean R. Owen commented on SPARK-40316:
--

It's possible this was handled differently in earlier Spark/Scala differently 
(results in 0s?) but it still points to an error in your UDF. Why not pursue 
that?

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-03 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40316.
--
Resolution: Not A Problem

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-03 Thread Sachit (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599931#comment-17599931
 ] 

Sachit edited comment on SPARK-40316 at 9/3/22 4:01 PM:


Hi [~srowen] 

Yes , but this was working in Spark 2.4 , are there any changes as post putting 
Spark3 it is failing(On same dataset)


was (Author: JIRAUSER287754):
Yes , but this was working in Spark 2.4 , are there any changes as post putting 
Spark3 it is failing(On same dataset)

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-39996) Upgrade postgresql to 42.5.0

2022-09-03 Thread Jira



[ https://issues.apache.org/jira/browse/SPARK-39996 ]


Bjørn Jørgensen deleted comment on SPARK-39996:
-

was (Author: bjornjorgensen):
[GA testes 
failed|https://github.com/bjornjorgensen/spark/runs/7705423158?check_suite_focus=true]
 

> Upgrade postgresql to 42.5.0
> 
>
> Key: SPARK-39996
> URL: https://issues.apache.org/jira/browse/SPARK-39996
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Security
> - fix: 
> [CVE-2022-31197|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-31197]
>  Fixes SQL generated in PgResultSet.refresh() to escape column identifiers so 
> as to prevent SQL injection.
>   - Previously, the column names for both key and data columns in the table 
> were copied as-is into the generated
>   SQL. This allowed a malicious table with column names that include 
> statement terminator to be parsed and
>   executed as multiple separate commands.
>   - Also adds a new test class ResultSetRefreshTest to verify this change.
>   - Reported by [Sho Kato](https://github.com/kato-sho)
> [Release 
> note|https://github.com/pgjdbc/pgjdbc/commit/bd91c4cc76cdfc1ffd0322be80c85ddfe08a38c2]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40322) Fix all dead links

2022-09-03 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-40322:
---

 Summary: Fix all dead links
 Key: SPARK-40322
 URL: https://issues.apache.org/jira/browse/SPARK-40322
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.4.0
Reporter: Yuming Wang


 

https://www.deadlinkchecker.com/website-dead-link-checker.asp

 

 
||Status||URL||Source link text||
|-1 Not found: The server name or address could not be 
resolved|[http://engineering.ooyala.com/blog/using-parquet-and-scrooge-spark]|[Using
 Parquet and Scrooge with Spark|https://spark.apache.org/documentation.html]|
|-1 Not found: The server name or address could not be 
resolved|[http://blinkdb.org/]|[BlinkDB|https://spark.apache.org/third-party-projects.html]|
|404 Not 
Found|[https://github.com/AyasdiOpenSource/df]|[DF|https://spark.apache.org/third-party-projects.html]|
|-1 Timeout|[https://atp.io/]|[atp|https://spark.apache.org/powered-by.html]|
|-1 Not found: The server name or address could not be 
resolved|[http://www.sehir.edu.tr/en/]|[Istanbul Sehir 
University|https://spark.apache.org/powered-by.html]|
|404 Not Found|[http://nsn.com/]|[Nokia Solutions and 
Networks|https://spark.apache.org/powered-by.html]|
|-1 Not found: The server name or address could not be 
resolved|[http://www.nubetech.co/]|[Nube 
Technologies|https://spark.apache.org/powered-by.html]|
|-1 Timeout|[http://ooyala.com/]|[Ooyala, 
Inc.|https://spark.apache.org/powered-by.html]|
|-1 Not found: The server name or address could not be 
resolved|[http://engineering.ooyala.com/blog/fast-spark-queries-memory-datasets]|[Spark
 for Fast Queries|https://spark.apache.org/powered-by.html]|
|-1 Not found: The server name or address could not be 
resolved|[http://www.sisa.samsung.com/]|[Samsung Research 
America|https://spark.apache.org/powered-by.html]|
|-1 
Timeout|[https://checker.apache.org/projs/spark.html]|[https://checker.apache.org/projs/spark.html|https://spark.apache.org/release-process.html]|
|404 Not Found|[https://ampcamp.berkeley.edu/amp-camp-two-strata-2013/]|[AMP 
Camp 2 [302 from 
http://ampcamp.berkeley.edu/amp-camp-two-strata-2013/]|https://spark.apache.org/documentation.html]|
|404 Not Found|[https://ampcamp.berkeley.edu/agenda-2012/]|[AMP Camp 1 [302 
from 
http://ampcamp.berkeley.edu/agenda-2012/]|https://spark.apache.org/documentation.html]|
|404 Not Found|[https://ampcamp.berkeley.edu/4/]|[AMP Camp 4 [302 from 
http://ampcamp.berkeley.edu/4/]|https://spark.apache.org/documentation.html]|
|404 Not Found|[https://ampcamp.berkeley.edu/3/]|[AMP Camp 3 [302 from 
http://ampcamp.berkeley.edu/3/]|https://spark.apache.org/documentation.html]|
|500 Internal Server 
Error|[https://www.packtpub.com/product/spark-cookbook/9781783987061]|[Spark 
Cookbook [301 from 
https://www.packtpub.com/big-data-and-business-intelligence/spark-cookbook]|https://spark.apache.org/documentation.html]|
|500 Internal Server 
Error|[https://www.packtpub.com/product/apache-spark-graph-processing/9781784391805]|[Apache
 Spark Graph Processing [301 from 
https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing]|https://spark.apache.org/documentation.html]|
|500 Internal Server 
Error|[https://prevalentdesignevents.com/sparksummit/eu17/]|[register|https://spark.apache.org/news/]|
|500 Internal Server 
Error|[https://prevalentdesignevents.com/sparksummit/ss17/?_ga=1.211902866.780052874.1433437196]|[register|https://spark.apache.org/news/]|
|500 Internal Server 
Error|[https://www.prevalentdesignevents.com/sparksummit2015/europe/registration.aspx?source=header]|[register|https://spark.apache.org/news/]|
|500 Internal Server 
Error|[https://www.prevalentdesignevents.com/sparksummit2015/europe/speaker/]|[Spark
 Summit Europe|https://spark.apache.org/news/]|
|-1 
Timeout|[http://strataconf.com/strata2013]|[Strata|https://spark.apache.org/news/]|
|-1 Not found: The server name or address could not be 
resolved|[http://blog.quantifind.com/posts/spark-unit-test/]|[Unit testing with 
Spark|https://spark.apache.org/news/]|
|-1 Not found: The server name or address could not be 
resolved|[http://blog.quantifind.com/posts/logging-post/]|[Configuring Spark's 
logs|https://spark.apache.org/news/]|
|-1 
Timeout|[http://strata.oreilly.com/2012/08/seven-reasons-why-i-like-spark.html]|[Spark|https://spark.apache.org/news/]|
|-1 
Timeout|[http://strata.oreilly.com/2012/11/shark-real-time-queries-and-analytics-for-big-data.html]|[Shark|https://spark.apache.org/news/]|
|-1 
Timeout|[http://strata.oreilly.com/2012/10/spark-0-6-improves-performance-and-accessibility.html]|[Spark
 0.6 release|https://spark.apache.org/news/]|
|404 Not 
Found|[http://data-informed.com/spark-an-open-source-engine-for-iterative-data-mining/]|[DataInformed|https://spark.apache.org/news/]|
|-1 
Timeout|[http://strataconf.com/strata2013/public/schedule/detail/27438]|[introduction
 to Spark,

[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-03 Thread Sachit (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599931#comment-17599931
 ] 

Sachit commented on SPARK-40316:


Yes , but this was working in Spark 2.4 , are there any changes as post putting 
Spark3 it is failing(On same dataset)

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-03 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599930#comment-17599930
 ] 

Sean R. Owen commented on SPARK-40316:
--

This says your UDF returns a Seq containing null, but the signature says it's 
going to be a Seq of primitive longs which can't be null

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40321) Upgrade rocksdbjni to 7.5.3

2022-09-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599919#comment-17599919
 ] 

Apache Spark commented on SPARK-40321:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37783

> Upgrade rocksdbjni to 7.5.3
> ---
>
> Key: SPARK-40321
> URL: https://issues.apache.org/jira/browse/SPARK-40321
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/facebook/rocksdb/releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40321) Upgrade rocksdbjni to 7.5.3

2022-09-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40321:


Assignee: (was: Apache Spark)

> Upgrade rocksdbjni to 7.5.3
> ---
>
> Key: SPARK-40321
> URL: https://issues.apache.org/jira/browse/SPARK-40321
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/facebook/rocksdb/releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40321) Upgrade rocksdbjni to 7.5.3

2022-09-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599918#comment-17599918
 ] 

Apache Spark commented on SPARK-40321:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37783

> Upgrade rocksdbjni to 7.5.3
> ---
>
> Key: SPARK-40321
> URL: https://issues.apache.org/jira/browse/SPARK-40321
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/facebook/rocksdb/releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40321) Upgrade rocksdbjni to 7.5.3

2022-09-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40321:


Assignee: Apache Spark

> Upgrade rocksdbjni to 7.5.3
> ---
>
> Key: SPARK-40321
> URL: https://issues.apache.org/jira/browse/SPARK-40321
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> https://github.com/facebook/rocksdb/releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40321) Upgrade rocksdbjni to 7.5.3

2022-09-03 Thread Yang Jie (Jira)

Yang Jie created SPARK-40321:


 Summary: Upgrade rocksdbjni to 7.5.3
 Key: SPARK-40321
 URL: https://issues.apache.org/jira/browse/SPARK-40321
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


https://github.com/facebook/rocksdb/releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.

2022-09-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599912#comment-17599912
 ] 

Apache Spark commented on SPARK-40288:
--

User 'hgs19921112' has created a pull request for this issue:
https://github.com/apache/spark/pull/37782

> After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should 
> applied to avoid attribute missing when use complex expression.
> --
>
> Key: SPARK-40288
> URL: https://issues.apache.org/jira/browse/SPARK-40288
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
> Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0
>Reporter: hgs
>Priority: Minor
>
> {{--table}}
> {{create}}  {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} 
> {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}}
> {{--data}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr 
> }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}}
> {{--failure sql}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n 
> }}{{{}from{}}}{{{}({}}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage 
> }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}}
> {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}}
> --error stack
> {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in 
> [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) 
> 100 else 200#12]}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}}
> {{at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40288) After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression.

2022-09-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599911#comment-17599911
 ] 

Apache Spark commented on SPARK-40288:
--

User 'hgs19921112' has created a pull request for this issue:
https://github.com/apache/spark/pull/37782

> After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should 
> applied to avoid attribute missing when use complex expression.
> --
>
> Key: SPARK-40288
> URL: https://issues.apache.org/jira/browse/SPARK-40288
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
> Environment: spark 3.2.0 spark 3.2.2 spark 3.3.0
>Reporter: hgs
>Priority: Minor
>
> {{--table}}
> {{create}}  {{table}} {{miss_expr(id }}{{{}int{}}}{{{},{}}}{{{}name{}}} 
> {{string,age }}{{{}double{}}}{{{}) stored {}}}{{as}} {{textfile}}
> {{--data}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr 
> }}{{{}values{}}}{{{}(1,{}}}{{{}'ox'{}}}{{{},1.0),(1,{}}}{{{}'oox'{}}}{{{},2.0),(2,{}}}{{{}'ox'{}}}{{{},3.0),(2,{}}}{{{}'xxo'{}}}{{{},4.0){}}}
> {{--failure sql}}
> {{insert}} {{overwrite }}{{table}} {{miss_expr}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},nage {}}}{{as}} {{n 
> }}{{{}from{}}}{{{}({}}}
> {{select}} {{{}id,{}}}{{{}name{}}}{{{},if(age>3,100,200) {}}}{{as}} {{nage 
> }}{{from}} {{miss_expr }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},age{}}}
> {{) }}{{group}} {{by}} {{{}id,{}}}{{{}name{}}}{{{},nage{}}}
> --error stack
> {{Caused by: java.lang.IllegalStateException: Couldn't find age#4 in 
> [id#2,name#3,if ((age#4 > 3.0)) 100 else 200#12|#2,name#3,if ((age#4 > 3.0)) 
> 100 else 200#12]}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)}}
> {{at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)}}
> {{at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33861) Simplify conditional in predicate

2022-09-03 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599868#comment-17599868
 ] 

Yuming Wang edited comment on SPARK-33861 at 9/3/22 8:54 AM:
-

Note that only 3.2.0, 3.2.1, 3.2.2 and 3.3.0 include this optimization. We 
recovered it via 
[https://github.com/apache/spark/commit/43cbdc6ec9dbcf9ebe0b48e14852cec4af18b4ec]


was (Author: q79969786):
Note that only 3.2.0, 3.2.1 and 3.3.0 include this optimization. We recovered 
it via 
https://github.com/apache/spark/commit/43cbdc6ec9dbcf9ebe0b48e14852cec4af18b4ec

> Simplify conditional in predicate
> -
>
> Key: SPARK-33861
> URL: https://issues.apache.org/jira/browse/SPARK-33861
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> The use case is:
> {noformat}
> spark.sql("create table t1 using parquet as select id as a, id as b from 
> range(10)")
> spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > 
> 5").explain()
> {noformat}
> Before this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
> {noformat}
> After this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND 
> ((b#4L + 10) > 5))
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), 
> GreaterThan(a,2)], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33861) Simplify conditional in predicate

2022-09-03 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-33861.
-
Resolution: Won't Fix

Note that only 3.2.0, 3.2.1 and 3.3.0 include this optimization. We recovered 
it via 
https://github.com/apache/spark/commit/43cbdc6ec9dbcf9ebe0b48e14852cec4af18b4ec

> Simplify conditional in predicate
> -
>
> Key: SPARK-33861
> URL: https://issues.apache.org/jira/browse/SPARK-33861
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> The use case is:
> {noformat}
> spark.sql("create table t1 using parquet as select id as a, id as b from 
> range(10)")
> spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > 
> 5").explain()
> {noformat}
> Before this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
> {noformat}
> After this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND 
> ((b#4L + 10) > 5))
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), 
> GreaterThan(a,2)], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-33861) Simplify conditional in predicate

2022-09-03 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reopened SPARK-33861:
-
  Assignee: (was: Yuming Wang)

> Simplify conditional in predicate
> -
>
> Key: SPARK-33861
> URL: https://issues.apache.org/jira/browse/SPARK-33861
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> The use case is:
> {noformat}
> spark.sql("create table t1 using parquet as select id as a, id as b from 
> range(10)")
> spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > 
> 5").explain()
> {noformat}
> Before this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
> {noformat}
> After this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND 
> ((b#4L + 10) > 5))
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), 
> GreaterThan(a,2)], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33861) Simplify conditional in predicate

2022-09-03 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33861:

Fix Version/s: (was: 3.2.0)

> Simplify conditional in predicate
> -
>
> Key: SPARK-33861
> URL: https://issues.apache.org/jira/browse/SPARK-33861
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> The use case is:
> {noformat}
> spark.sql("create table t1 using parquet as select id as a, id as b from 
> range(10)")
> spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > 
> 5").explain()
> {noformat}
> Before this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
> {noformat}
> After this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND 
> ((b#4L + 10) > 5))
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), 
> GreaterThan(a,2)], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40316) Upgrading to Spark 3 is giving NullPointerException

2022-09-03 Thread Sachit (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599862#comment-17599862
 ] 

Sachit commented on SPARK-40316:


Hello [~srowen] , 
Please let me know if there is any suggestion. Thanks!

> Upgrading to Spark 3 is giving NullPointerException
> ---
>
> Key: SPARK-40316
> URL: https://issues.apache.org/jira/browse/SPARK-40316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: Sachit
>Priority: Major
>
> Getting below error while upgrading to Spark3
>  
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> - array element class: "scala.Long"
> - root class: "scala.collection.Seq"
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> mapobjects(lambdavariable(MapObject, LongType, true, -1), 
> assertnotnull(lambdavariable(MapObject, LongType, true, -1)), input[0, 
> array, true], Some(interface scala.collection.Seq))
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1047)
>     at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
>     at 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$scalaConverter$2(ScalaUDF.scala:164)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40033) Nested schema pruning support through element_at

2022-09-03 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-40033.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37463
[https://github.com/apache/spark/pull/37463]

> Nested schema pruning support through element_at
> 
>
> Key: SPARK-40033
> URL: https://issues.apache.org/jira/browse/SPARK-40033
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
> The semantics of element_at is similar with GetArrayItem and GetMapValue, so 
> we can support do nested schema pruning if the inside data type is struct.
> For example:
> For a column schema: `c: array>`
> With the query: `SELECT element_at(c, 1).s1`
> The final pruned schema should be `c: array>`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40033) Nested schema pruning support through element_at

2022-09-03 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-40033:
---

Assignee: XiDuo You

> Nested schema pruning support through element_at
> 
>
> Key: SPARK-40033
> URL: https://issues.apache.org/jira/browse/SPARK-40033
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
> The semantics of element_at is similar with GetArrayItem and GetMapValue, so 
> we can support do nested schema pruning if the inside data type is struct.
> For example:
> For a column schema: `c: array>`
> With the query: `SELECT element_at(c, 1).s1`
> The final pruned schema should be `c: array>`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

38 matches

Mail list logo