date:20210322

[jira] [Comment Edited] (SPARK-34651) Improve ZSTD support

2021-03-22 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306785#comment-17306785
 ] 

Dongjoon Hyun edited comment on SPARK-34651 at 3/23/21, 5:57 AM:
-

Yes, Apache Spark's ZStd codec is that one.

Even with Apache Spark 3.1.1 with Hadoop 3.2.0, it's not easy for users to use 
Hadoop's ZStandardCodec because `libhadoop` is not built with zstd library by 
default.
{code}
scala> spark.version
res0: String = 3.1.1

scala> Seq("text").toDF.write.option("compression", 
"org.apache.hadoop.io.compress.ZStandardCodec").text("/tmp/zstd")
21/03/22 22:56:48 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: native zStandard library not available: this 
version of libhadoop was built without zstd support.
{code}


was (Author: dongjoon):
Yes, Apache Spark's ZStd codec is that one.

Even with Apache Spark 3.1.1 with Hadoop 3.2.0, it's not easy for users to use 
Hadoop's ZStandardCodec because `libhadoop` is not built with zstd library by 
default.

> Improve ZSTD support
> 
>
> Key: SPARK-34651
> URL: https://issues.apache.org/jira/browse/SPARK-34651
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34651) Improve ZSTD support

2021-03-22 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306785#comment-17306785
 ] 

Dongjoon Hyun commented on SPARK-34651:
---

Yes, Apache Spark's ZStd codec is that one.

Even with Apache Spark 3.1.1 with Hadoop 3.2.0, it's not easy for users to use 
Hadoop's ZStandardCodec because `libhadoop` is not built with zstd library by 
default.

> Improve ZSTD support
> 
>
> Key: SPARK-34651
> URL: https://issues.apache.org/jira/browse/SPARK-34651
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34832) ExternalAppendOnlyUnsafeRowArrayBenchmark can't run with spark-submit

2021-03-22 Thread Yang Jie (Jira)

Yang Jie created SPARK-34832:


 Summary: ExternalAppendOnlyUnsafeRowArrayBenchmark can't run with 
spark-submit
 Key: SPARK-34832
 URL: https://issues.apache.org/jira/browse/SPARK-34832
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1, 3.2.0
Reporter: Yang Jie


The following exception will appear when run 
ExternalAppendOnlyUnsafeRowArrayBenchmark with
{code:java}
bin/spark-submit --class 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark --jars 
spark-core_2.12-3.2.0-SNAPSHOT-tests.jar 
spark-sql_2.12-3.2.0-SNAPSHOT-tests.jar 
{code}
command :
{code:java}
Exception in thread "main" java.lang.IllegalStateException: SparkContext should 
only be created and accessed on the driver.Exception in thread "main" 
java.lang.IllegalStateException: SparkContext should only be created and 
accessed on the driver. at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$assertOnDriver(SparkContext.scala:2679)
 at org.apache.spark.SparkContext.(SparkContext.scala:89) at 
org.apache.spark.SparkContext.(SparkContext.scala:137) at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.withFakeTaskContext(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:53)
 at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.testAgainstRawArrayBuffer(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:120)
 at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.$anonfun$runBenchmarkSuite$1(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:190)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
org.apache.spark.benchmark.BenchmarkBase.runBenchmark(BenchmarkBase.scala:40) 
at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.runBenchmarkSuite(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:187)
 at org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:58) at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark.main(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
 at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at 
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at 
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34651) Improve ZSTD support

2021-03-22 Thread Stanislav Savulchik (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306782#comment-17306782
 ] 

Stanislav Savulchik commented on SPARK-34651:
-

[~dongjoon]

bq. Well, it's a different codec technically. Apache Spark has its own 
ZStandardCodec class instead of Apache Hadoop ZStandardCodec. 

I believe you are referring to [org.apache.spark.io.ZStdCompressionCodec 
|https://github.com/apache/spark/blob/1f6089b165181472e581ed0466694d28ed9b8de8/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala#L213]
 that is used for spark internal use cases like compressing shuffled data, 
isn't it?

bq. BTW, please note that Apache Hadoop ZStandardCodec exists only at Hadoop 
2.9+ (HADOOP-13578) and Apache Spark still supports Hadoop 2.7 distribution.

I guess it is the reason why org.apache.hadoop.io.compress.ZStandardCodec is 
not on the 
[list|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CompressionCodecs.scala#L29].
 

Thank you!

> Improve ZSTD support
> 
>
> Key: SPARK-34651
> URL: https://issues.apache.org/jira/browse/SPARK-34651
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306772#comment-17306772
 ] 

Ankit Raj Boudh edited comment on SPARK-34673 at 3/23/21, 5:02 AM:
---

[~dongjoon], I have tried with Spar 3.1.1 also and found there is difference 
during creation of Parsed Logical Plan

 

Spar 3.1.1 Parsed Logical Plan: 
{code:java}
== Parsed Logical Plan == 'CreateViewStatement [temp1_33], select 20E2, 
'Project [unresolvedalias(2000.0, None)], false, false, LocalTempView{code}
Spar 2.4 Parsed Logical Plan: 
{code:java}
== Parsed Logical Plan == CreateViewCommand `temp1_33`, select 20E2, false, 
false, LocalTempView    +- 'Project [unresolvedalias(2.0E+3, None)]       +- 
OneRowRelation {code}


was (Author: ankitraj):
[~dongjoon], I have tried with Spar 3.1.1 also and found there difference 
during creation of Parsed Logical Plan

 

Spar 3.1.1 Parsed Logical Plan: 
{code:java}
== Parsed Logical Plan == 'CreateViewStatement [temp1_33], select 20E2, 
'Project [unresolvedalias(2000.0, None)], false, false, LocalTempView{code}
Spar 2.4 Parsed Logical Plan: 
{code:java}
== Parsed Logical Plan == CreateViewCommand `temp1_33`, select 20E2, false, 
false, LocalTempView    +- 'Project [unresolvedalias(2.0E+3, None)]       +- 
OneRowRelation {code}

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306772#comment-17306772
 ] 

Ankit Raj Boudh edited comment on SPARK-34673 at 3/23/21, 4:54 AM:
---

[~dongjoon], I have tried with Spar 3.1.1 also and found there difference 
during creation of Parsed Logical Plan

 

Spar 3.1.1 Parsed Logical Plan: 
{code:java}
== Parsed Logical Plan == 'CreateViewStatement [temp1_33], select 20E2, 
'Project [unresolvedalias(2000.0, None)], false, false, LocalTempView{code}
Spar 2.4 Parsed Logical Plan: 
{code:java}
== Parsed Logical Plan == CreateViewCommand `temp1_33`, select 20E2, false, 
false, LocalTempView    +- 'Project [unresolvedalias(2.0E+3, None)]       +- 
OneRowRelation {code}


was (Author: ankitraj):
[~dongjoon], I have tried with Spar 3.1.1 also and found there difference 
during creation of Parsed Logical Plan

 

Spar 3.1.1 Parsed Logical Plan: 
{code:java}
== Parsed Logical Plan == 'CreateViewStatement [temp1_33], select 20E2, 
'Project [unresolvedalias(2000.0, None)], false, false, LocalTempView{code}
Spar 2.4 Parsed Logical Plan: 
{code:java}
== Parsed Logical Plan == CreateViewCommand `temp1_33`, select 20E2, false, 
false, LocalTempView    +- 'Project [unresolvedalias(2.0E+3, None)]       +- 
OneRowRelation {code}

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306772#comment-17306772
 ] 

Ankit Raj Boudh commented on SPARK-34673:
-

[~dongjoon], I have tried with Spar 3.1.1 also and found there difference 
during creation of Parsed Logical Plan

 

Spar 3.1.1 Parsed Logical Plan: 
{code:java}
== Parsed Logical Plan == 'CreateViewStatement [temp1_33], select 20E2, 
'Project [unresolvedalias(2000.0, None)], false, false, LocalTempView{code}
Spar 2.4 Parsed Logical Plan: 
{code:java}
== Parsed Logical Plan == CreateViewCommand `temp1_33`, select 20E2, false, 
false, LocalTempView    +- 'Project [unresolvedalias(2.0E+3, None)]       +- 
OneRowRelation {code}

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306703#comment-17306703
 ] 

Ankit Raj Boudh edited comment on SPARK-34673 at 3/23/21, 4:50 AM:
---

I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*:

 
{code:java}
== Parsed Logical Plan ==
 'CreateViewStatement [temp1_33], select 20E2, 'Project 
[unresolvedalias(2000.0, None)], false, false, LocalTempView
== Analyzed Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation
== Optimized Logical Plan ==
 CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation
== Physical Plan ==
 Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation
 
{code}
 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*:

 

 
{code:java}
== Parsed Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
   +- 'Project [unresolvedalias(2.0E+3, None)]
      +- OneRowRelation
 
== Analyzed Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
   +- 'Project [unresolvedalias(2.0E+3, None)]
      +- OneRowRelation
 
== Optimized Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
   +- 'Project [unresolvedalias(2.0E+3, None)]
      +- OneRowRelation
 
== Physical Plan ==
Execute CreateViewCommand
   +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
         +- 'Project [unresolvedalias(2.0E+3, None)]
            +- OneRowRelation
{code}
 

 

 


was (Author: ankitraj):
I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*:

 
{code:java}
== Parsed Logical Plan ==
 'CreateViewStatement [temp1_33], select 20E2, 'Project 
[unresolvedalias(2000.0, None)], false, false, LocalTempView
== Analyzed Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation
== Optimized Logical Plan ==
 CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation
== Physical Plan ==
 Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation
 
{code}
 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*:

 

 
{code:java}
== Parsed Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
   +- 'Project [unresolvedalias(*2.0E+3*, None)]
      +- OneRowRelation
 
== Analyzed Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
   +- 'Project [unresolvedalias(2.0E+3, None)]
      +- OneRowRelation
 
== Optimized Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
   +- 'Project [unresolvedalias(2.0E+3, None)]
      +- OneRowRelation
 
== Physical Plan ==
Execute CreateViewCommand
   +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
         +- 'Project [unresolvedalias(2.0E+3, None)]
            +- OneRowRelation
{code}
 

 

 

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306703#comment-17306703
 ] 

Ankit Raj Boudh edited comment on SPARK-34673 at 3/23/21, 4:45 AM:
---

I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*:

 
{code:java}
== Parsed Logical Plan ==
 'CreateViewStatement [temp1_33], select 20E2, 'Project 
[unresolvedalias(2000.0, None)], false, false, LocalTempView
== Analyzed Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation
== Optimized Logical Plan ==
 CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation
== Physical Plan ==
 Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation
 
{code}
 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*:

 

 
{code:java}
== Parsed Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
   +- 'Project [unresolvedalias(*2.0E+3*, None)]
      +- OneRowRelation
 
== Analyzed Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
   +- 'Project [unresolvedalias(2.0E+3, None)]
      +- OneRowRelation
 
== Optimized Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
   +- 'Project [unresolvedalias(2.0E+3, None)]
      +- OneRowRelation
 
== Physical Plan ==
Execute CreateViewCommand
   +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
         +- 'Project [unresolvedalias(2.0E+3, None)]
            +- OneRowRelation
{code}
 

 

 


was (Author: ankitraj):
I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*:

== Parsed Logical Plan ==
 'CreateViewStatement [temp1_33], select *20E2*, 'Project 
[unresolvedalias(2000.0, None)], false, false, LocalTempView

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Optimized Logical Plan ==
 CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Physical Plan ==
 Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*:

 

== Parsed Logical Plan ==

CreateViewCommand `temp1_33`, select *20E2*, false, false, LocalTempView

   +- 'Project [unresolvedalias(*2.0E+3*, None)]

      +- OneRowRelation

 

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Optimized Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Physical Plan ==

Execute CreateViewCommand

   +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

         +- 'Project [unresolvedalias(2.0E+3, None)]

            +- OneRowRelation

 

 

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306703#comment-17306703
 ] 

Ankit Raj Boudh edited comment on SPARK-34673 at 3/23/21, 4:44 AM:
---

I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*:

== Parsed Logical Plan ==
 'CreateViewStatement [temp1_33], select *20E2*, 'Project 
[unresolvedalias(2000.0, None)], false, false, LocalTempView

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Optimized Logical Plan ==
 CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Physical Plan ==
 Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*:

 

== Parsed Logical Plan ==

CreateViewCommand `temp1_33`, select *20E2*, false, false, LocalTempView

   +- 'Project [unresolvedalias(*2.0E+3*, None)]

      +- OneRowRelation

 

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Optimized Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Physical Plan ==

Execute CreateViewCommand

   +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

         +- 'Project [unresolvedalias(2.0E+3, None)]

            +- OneRowRelation

 

 


was (Author: ankitraj):
I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*:

== Parsed Logical Plan ==
 'CreateViewStatement [temp1_33], select *20E2*, 'Project 
[unresolvedalias(*2000.0*, None)], false, false, LocalTempView

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Optimized Logical Plan ==
 CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Physical Plan ==
 Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*:

 

== Parsed Logical Plan ==

CreateViewCommand `temp1_33`, select *20E2*, false, false, LocalTempView

   +- 'Project [unresolvedalias(*2.0E+3*, None)]

      +- OneRowRelation

 

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Optimized Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Physical Plan ==

Execute CreateViewCommand

   +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

         +- 'Project [unresolvedalias(2.0E+3, None)]

            +- OneRowRelation

 

 

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Comment Edited] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306703#comment-17306703
 ] 

Ankit Raj Boudh edited comment on SPARK-34673 at 3/23/21, 4:43 AM:
---

I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*:

== Parsed Logical Plan ==
 'CreateViewStatement [temp1_33], select *20E2*, 'Project 
[unresolvedalias(*2000.0*, None)], false, false, LocalTempView

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Optimized Logical Plan ==
 CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Physical Plan ==
 Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*:

 

== Parsed Logical Plan ==

CreateViewCommand `temp1_33`, select *20E2*, false, false, LocalTempView

   +- 'Project [unresolvedalias(*2.0E+3*, None)]

      +- OneRowRelation

 

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Optimized Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Physical Plan ==

Execute CreateViewCommand

   +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

         +- 'Project [unresolvedalias(2.0E+3, None)]

            +- OneRowRelation

 

 


was (Author: ankitraj):
I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*:

== Parsed Logical Plan ==
 'CreateViewStatement [temp1_33], select *20E2*, 'Project 
[unresolvedalias(*2000.0*, None)], false, false, LocalTempView

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Optimized Logical Plan ==
 CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Physical Plan ==
 Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*:

 

== Parsed Logical Plan ==

CreateViewCommand `temp1_33`, select *20E2*, false, false, LocalTempView

   +- 'Project [unresolvedalias(*2.0E+3*, None)]

      +- OneRowRelation

 

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Optimized Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Physical Plan ==

Execute CreateViewCommand

   +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

         +- 'Project [unresolvedalias(2.0E+3, None)]

            +- OneRowRelation

 

 

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Commented] (SPARK-34651) Improve ZSTD support

2021-03-22 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306763#comment-17306763
 ] 

Dongjoon Hyun commented on SPARK-34651:
---

BTW, please note that Apache Hadoop ZStandardCodec exists only at Hadoop 2.9+ 
(HADOOP-13578) and Apache Spark still supports Hadoop 2.7 distribution.

> Improve ZSTD support
> 
>
> Key: SPARK-34651
> URL: https://issues.apache.org/jira/browse/SPARK-34651
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306736#comment-17306736
 ] 

Dongjoon Hyun commented on SPARK-34673:
---

Thank you for reporting, [~chetdb] and [~Ankitraj].

However, could you try with Apache Spark 3.1.1?

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34651) Improve ZSTD support

2021-03-22 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306731#comment-17306731
 ] 

Dongjoon Hyun commented on SPARK-34651:
---

Well, it's a different codec technically. Apache Spark has its own 
ZStandardCodec class instead of Apache Hadoop ZStandardCodec. Apache Parquet 
and Avro also have their own codec for ZSTD. For now, Apache Spark 3.2 will be 
the first release to support ZSTD for ORC/Parquet/Avro, [~savulchik] . If you 
need ZSTD for text file, yes, please file a Jira.

> Improve ZSTD support
> 
>
> Key: SPARK-34651
> URL: https://issues.apache.org/jira/browse/SPARK-34651
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data

2021-03-22 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306726#comment-17306726
 ] 

Dongjoon Hyun commented on SPARK-18105:
---

[~devaraj]. Do you have a reproducer? BTW, there is a know issue, SPARK-34790, 
with the latest Apache Spark with IO Encryption + AQE configuration.

 

> LZ4 failed to decompress a stream of shuffled data
> --
>
> Key: SPARK-18105
> URL: https://issues.apache.org/jira/browse/SPARK-18105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Davies Liu
>Priority: Major
>
> When lz4 is used to compress the shuffle files, it may fail to decompress it 
> as "stream is corrupt"
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in 
> stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted
>   at 
> org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220)
>   at 
> org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:353)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at com.google.common.io.ByteStreams.read(ByteStreams.java:828)
>   at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110)
>   at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>   at 
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> https://github.com/jpountz/lz4-java/issues/89



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34831) spark2.3 can't add column in carbondata table

2021-03-22 Thread Carefree (Jira)

Carefree created SPARK-34831:


 Summary: spark2.3 can't add column in carbondata table
 Key: SPARK-34831
 URL: https://issues.apache.org/jira/browse/SPARK-34831
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 3.1.0
 Environment: spark2.3   carbondata1.5.3    cdh5.16
Reporter: Carefree


when i want to add an column in carbondata table with spark2.3, but it occur a 
error,here it's the detail of mistake:
{code:java}
// ALTER ADD COLUMNS does not support datasource table with type 
org.apache.spark.sql.CarbonSource
  at 
org.apache.spark.sql.execution.command.AlterTableAddColumnsCommand.verifyAlterTableAddColumn(tables.scala:242)
  at 
org.apache.spark.sql.execution.command.AlterTableAddColumnsCommand.run(tables.scala:194)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
  at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
  at org.apache.spark.sql.Dataset.(Dataset.scala:190)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
{code}
so,i wish it can be supported in latest version.Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306703#comment-17306703
 ] 

Ankit Raj Boudh edited comment on SPARK-34673 at 3/23/21, 2:38 AM:
---

I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*:

== Parsed Logical Plan ==
 'CreateViewStatement [temp1_33], select *20E2*, 'Project 
[unresolvedalias(*2000.0*, None)], false, false, LocalTempView

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Optimized Logical Plan ==
 CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Physical Plan ==
 Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*:

 

== Parsed Logical Plan ==

CreateViewCommand `temp1_33`, select *20E2*, false, false, LocalTempView

   +- 'Project [unresolvedalias(*2.0E+3*, None)]

      +- OneRowRelation

 

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Optimized Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Physical Plan ==

Execute CreateViewCommand

   +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView

         +- 'Project [unresolvedalias(2.0E+3, None)]

            +- OneRowRelation

 

 


was (Author: ankitraj):
I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*:

== Parsed Logical Plan ==
 'CreateViewStatement [temp1_33], select *20E2*, 'Project 
[unresolvedalias(*2000.0*, None)], false, false, LocalTempView

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Optimized Logical Plan ==
 CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Physical Plan ==
 Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*:

 

== Parsed Logical Plan ==

CreateViewCommand `kajal_1`, select *20E2*, false, false, LocalTempView

   +- 'Project [unresolvedalias(*2.0E+3*, None)]

      +- OneRowRelation

 

== Analyzed Logical Plan ==

CreateViewCommand `kajal_1`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Optimized Logical Plan ==

CreateViewCommand `kajal_1`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Physical Plan ==

Execute CreateViewCommand

   +- CreateViewCommand `kajal_1`, select 20E2, false, false, LocalTempView

         +- 'Project [unresolvedalias(2.0E+3, None)]

            +- OneRowRelation

 

 

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To

[jira] [Comment Edited] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306703#comment-17306703
 ] 

Ankit Raj Boudh edited comment on SPARK-34673 at 3/23/21, 2:36 AM:
---

I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*:

== Parsed Logical Plan ==
 'CreateViewStatement [temp1_33], select *20E2*, 'Project 
[unresolvedalias(*2000.0*, None)], false, false, LocalTempView

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Optimized Logical Plan ==
 CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Physical Plan ==
 Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*:

 

== Parsed Logical Plan ==

CreateViewCommand `kajal_1`, select *20E2*, false, false, LocalTempView

   +- 'Project [unresolvedalias(*2.0E+3*, None)]

      +- OneRowRelation

 

== Analyzed Logical Plan ==

CreateViewCommand `kajal_1`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Optimized Logical Plan ==

CreateViewCommand `kajal_1`, select 20E2, false, false, LocalTempView

   +- 'Project [unresolvedalias(2.0E+3, None)]

      +- OneRowRelation

 

== Physical Plan ==

Execute CreateViewCommand

   +- CreateViewCommand `kajal_1`, select 20E2, false, false, LocalTempView

         +- 'Project [unresolvedalias(2.0E+3, None)]

            +- OneRowRelation

 

 


was (Author: ankitraj):
I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*;):

== Parsed Logical Plan ==
'CreateViewStatement [temp1_33], select *20E2*, 'Project 
[unresolvedalias(*2000.0*, None)], false, false, LocalTempView

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Optimized Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Physical Plan ==
Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*;):

 

 

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306703#comment-17306703
 ] 

Ankit Raj Boudh edited comment on SPARK-34673 at 3/23/21, 2:35 AM:
---

I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

Spark-master branch plans for this query (create temporary view temp1_33 as 
select *20E2*;):

== Parsed Logical Plan ==
'CreateViewStatement [temp1_33], select *20E2*, 'Project 
[unresolvedalias(*2000.0*, None)], false, false, LocalTempView

== Analyzed Logical Plan ==

CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Optimized Logical Plan ==
CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

== Physical Plan ==
Execute CreateViewCommand
 +- CreateViewCommand `temp1_33`, select 20E2, false, false, LocalTempView
 +- 'Project [unresolvedalias(2000.0, None)]
 +- OneRowRelation

 

 

Spark-2.4 branch plans for this query (create temporary view temp1_33 as select 
*20E2*;):

 

 


was (Author: ankitraj):
I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34673) Select queries fail with Error: java.lang.IllegalArgumentException: Error: name expected at the position 10 of 'decimal(2,-2)' but '-' is found. (state=,code=0)

2021-03-22 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306703#comment-17306703
 ] 

Ankit Raj Boudh commented on SPARK-34673:
-

I can see the difference b/w plan which is creating in spark master branch and 
spark branch-2.4

> Select queries fail  with Error: java.lang.IllegalArgumentException: Error: 
> name expected at the position 10 of 'decimal(2,-2)' but '-' is found. 
> (state=,code=0)
> -
>
> Key: SPARK-34673
> URL: https://issues.apache.org/jira/browse/SPARK-34673
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Attachments: Screenshot 2021-03-10 at 8.47.00 PM.png, Screenshot 
> 2021-03-19 at 1.33.54 PM.png
>
>
> Temporary views are created
> Select filter queries are executed on the Temporary views.
>  
> [Actual Issue] : - Select queries fail with Error: 
> java.lang.IllegalArgumentException: Error: name expected at the position 10 
> of 'decimal(2,-2)' but '-' is found. (state=,code=0)
>  
> [Expected Result] :- Select queries should be success on Temporary views.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34674) Spark app on k8s doesn't terminate without call to sparkContext.stop() method

2021-03-22 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306692#comment-17306692
 ] 

Dongjoon Hyun commented on SPARK-34674:
---

What makes you think like the following, [~Kotlov]? I don't think Apache Spark 
has that kind of contracts.
{quote}As far as I know, Spark uses the ShutdownHook to stop SparkContext 
anyway before exiting the JVM
{quote}

> Spark app on k8s doesn't terminate without call to sparkContext.stop() method
> -
>
> Key: SPARK-34674
> URL: https://issues.apache.org/jira/browse/SPARK-34674
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.1
>Reporter: Sergey Kotlov
>Priority: Major
>
> Hello!
>  I have run into a problem that if I don't call the method 
> sparkContext.stop() explicitly, then a Spark driver process doesn't terminate 
> even after its Main method has been completed. This behaviour is different 
> from spark on yarn, where the manual sparkContext stopping is not required.
>  It looks like, the problem is in using non-daemon threads, which prevent the 
> driver jvm process from terminating.
>  At least I see two non-daemon threads, if I don't call sparkContext.stop():
> {code:java}
> Thread[OkHttp kubernetes.default.svc,5,main]
> Thread[OkHttp kubernetes.default.svc Writer,5,main]
> {code}
> Could you tell please, if it is possible to solve this problem?
> Docker image from the official release of spark-3.1.1 hadoop3.2 is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34799) Return User-defined types from Pandas UDF

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34799:


Assignee: Apache Spark

> Return User-defined types from Pandas UDF
> -
>
> Key: SPARK-34799
> URL: https://issues.apache.org/jira/browse/SPARK-34799
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Darcy Shen
>Assignee: Apache Spark
>Priority: Major
>
> Because Pandas UDF uses pyarrow to passing data, it does not currently 
> support UserDefinedTypes, as what normal python udf does.
> For example:
> {code:python}
> class BoxType(UserDefinedType):
> @classmethod
> def sqlType(cls) -> StructType:
> return StructType(
> fields=[
> StructField("xmin", DoubleType(), False),
> StructField("ymin", DoubleType(), False),
> StructField("xmax", DoubleType(), False),
> StructField("ymax", DoubleType(), False),
> ]
> )
> @pandas_udf(
>  returnType=StructType([StructField("boxes", ArrayType(Box()))]
> )
> def pandas_pf(s: pd.DataFrame) -> pd.DataFrame:
>yield s
> {code}
> The logs show
> {code}
> try:
> to_arrow_type(self._returnType_placeholder)
> except TypeError:
> >   raise NotImplementedError(
> "Invalid return type with scalar Pandas UDFs: %s is "
> E   NotImplementedError: Invalid return type with scalar 
> Pandas UDFs: StructType(List(StructField(boxes,ArrayType(Box,true),true))) is 
> not supported
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34799) Return User-defined types from Pandas UDF

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34799:


Assignee: (was: Apache Spark)

> Return User-defined types from Pandas UDF
> -
>
> Key: SPARK-34799
> URL: https://issues.apache.org/jira/browse/SPARK-34799
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Darcy Shen
>Priority: Major
>
> Because Pandas UDF uses pyarrow to passing data, it does not currently 
> support UserDefinedTypes, as what normal python udf does.
> For example:
> {code:python}
> class BoxType(UserDefinedType):
> @classmethod
> def sqlType(cls) -> StructType:
> return StructType(
> fields=[
> StructField("xmin", DoubleType(), False),
> StructField("ymin", DoubleType(), False),
> StructField("xmax", DoubleType(), False),
> StructField("ymax", DoubleType(), False),
> ]
> )
> @pandas_udf(
>  returnType=StructType([StructField("boxes", ArrayType(Box()))]
> )
> def pandas_pf(s: pd.DataFrame) -> pd.DataFrame:
>yield s
> {code}
> The logs show
> {code}
> try:
> to_arrow_type(self._returnType_placeholder)
> except TypeError:
> >   raise NotImplementedError(
> "Invalid return type with scalar Pandas UDFs: %s is "
> E   NotImplementedError: Invalid return type with scalar 
> Pandas UDFs: StructType(List(StructField(boxes,ArrayType(Box,true),true))) is 
> not supported
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34771) Support UDT for Pandas with Arrow Optimization

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34771:


Assignee: (was: Apache Spark)

> Support UDT for Pandas with Arrow Optimization
> --
>
> Key: SPARK-34771
> URL: https://issues.apache.org/jira/browse/SPARK-34771
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Darcy Shen
>Priority: Major
>
> {code:python}
> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
> from pyspark.testing.sqlutils  import ExamplePoint
> import pandas as pd
> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), ExamplePoint(2, 
> 2)])})
> df = spark.createDataFrame(pdf)
> df.toPandas()
> {code}
> with `spark.sql.execution.arrow.enabled` = false, the above snippet works 
> fine without WARNINGS.
> with `spark.sql.execution.arrow.enabled` = true, the above snippet works fine 
> with WARNINGS. Because of Unsupported type in conversion, the Arrow 
> optimization is actually turned off. 
> Detailed steps to reproduce:
> {code:python}
> $ bin/pyspark
> Python 3.8.8 (default, Feb 24 2021, 13:46:16)
> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 21/03/17 23:13:27 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 3.2.0-SNAPSHOT
>   /_/
> Using Python version 3.8.8 (default, Feb 24 2021 13:46:16)
> Spark context Web UI available at http://172.30.0.226:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1615994008526).
> SparkSession available as 'spark'.
> >>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
> 21/03/17 23:13:31 WARN SQLConf: The SQL config 
> 'spark.sql.execution.arrow.enabled' has been deprecated in Spark v3.0 and may 
> be removed in the future. Use 'spark.sql.execution.arrow.pyspark.enabled' 
> instead of it.
> >>> from pyspark.testing.sqlutils  import ExamplePoint
> >>> import pandas as pd
> >>> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), 
> >>> ExamplePoint(2, 2)])})
> >>> df = spark.createDataFrame(pdf)
> /Users/da/github/apache/spark/python/pyspark/sql/pandas/conversion.py:332: 
> UserWarning: createDataFrame attempted Arrow optimization because 
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed 
> by the reason below:
>   Could not convert (1,1) with type ExamplePoint: did not recognize Python 
> value type when inferring an Arrow data type
> Attempting non-optimization as 
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
>   warnings.warn(msg)
> >>>
> >>> df.show()
> +--+
> | point|
> +--+
> |(0.0, 0.0)|
> |(0.0, 0.0)|
> +--+
> >>> df.schema
> StructType(List(StructField(point,ExamplePointUDT,true)))
> >>> df.toPandas()
> /Users/da/github/apache/spark/python/pyspark/sql/pandas/conversion.py:87: 
> UserWarning: toPandas attempted Arrow optimization because 
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed 
> by the reason below:
>   Unsupported type in conversion to Arrow: ExamplePointUDT
> Attempting non-optimization as 
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
>   warnings.warn(msg)
>point
> 0  (0.0,0.0)
> 1  (0.0,0.0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34830) Some UDF calls inside transform are broken

2021-03-22 Thread Daniel Solow (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306678#comment-17306678
 ] 

Daniel Solow commented on SPARK-34830:
--

I think it's the same thing as yours -- it's overwriting earlier array results 
with later results, but it only keeps a number of characters based on the 
overwritten value. So in the example I showed, the first result is "abc" and 
the second result is "defg" -- when it overwrites "abc" with "defg" it keeps 
only the first 3 characters "def" because "abc" is three characters. In your 
example you're using fix-length types so this effect isn't evident.

The overwritten strings are also null-terminated (there's actually a \x00 byte 
at the end).

> Some UDF calls inside transform are broken
> --
>
> Key: SPARK-34830
> URL: https://issues.apache.org/jira/browse/SPARK-34830
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Daniel Solow
>Priority: Major
>
> Let's say I want to create a UDF to do a simple lookup on a string:
> {code:java}
> import org.apache.spark.sql.{functions => f}
> val M = Map("a" -> "abc", "b" -> "defg")
> val BM = spark.sparkContext.broadcast(M)
> val LOOKUP = f.udf((s: String) => BM.value.get(s))
> {code}
> Now if I have the following dataframe:
> {code:java}
> val df = Seq(
> Tuple1(Seq("a", "b"))
> ).toDF("arr")
> {code}
> and I want to run this UDF over each element in the array, I can do:
> {code:java}
> df.select(f.transform($"arr", i => LOOKUP(i)).as("arr")).show(false)
> {code}
> This should show:
> {code:java}
> +---+
> |arr|
> +---+
> |[abc, defg]|
> +---+
> {code}
> However it actually shows:
> {code:java}
> +---+
> |arr|
> +---+
> |[def, defg]|
> +---+
> {code}
> It's also broken for SQL (even without DSL). This gives the same result:
> {code:java}
> spark.udf.register("LOOKUP",(s: String) => BM.value.get(s))
> df.selectExpr("TRANSFORM(arr, a -> LOOKUP(a)) AS arr").show(false)
> {code}
> Note that "def" is not even in the map I'm using.
> This is a big problem because it breaks existing code/UDFs. I noticed this 
> because the job I ported from 2.4.5 to 3.1.1 seemed to be working, but was 
> actually producing broken data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34771) Support UDT for Pandas with Arrow Optimization

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306679#comment-17306679
 ] 

Apache Spark commented on SPARK-34771:
--

User 'eddyxu' has created a pull request for this issue:
https://github.com/apache/spark/pull/31735

> Support UDT for Pandas with Arrow Optimization
> --
>
> Key: SPARK-34771
> URL: https://issues.apache.org/jira/browse/SPARK-34771
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Darcy Shen
>Priority: Major
>
> {code:python}
> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
> from pyspark.testing.sqlutils  import ExamplePoint
> import pandas as pd
> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), ExamplePoint(2, 
> 2)])})
> df = spark.createDataFrame(pdf)
> df.toPandas()
> {code}
> with `spark.sql.execution.arrow.enabled` = false, the above snippet works 
> fine without WARNINGS.
> with `spark.sql.execution.arrow.enabled` = true, the above snippet works fine 
> with WARNINGS. Because of Unsupported type in conversion, the Arrow 
> optimization is actually turned off. 
> Detailed steps to reproduce:
> {code:python}
> $ bin/pyspark
> Python 3.8.8 (default, Feb 24 2021, 13:46:16)
> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 21/03/17 23:13:27 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 3.2.0-SNAPSHOT
>   /_/
> Using Python version 3.8.8 (default, Feb 24 2021 13:46:16)
> Spark context Web UI available at http://172.30.0.226:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1615994008526).
> SparkSession available as 'spark'.
> >>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
> 21/03/17 23:13:31 WARN SQLConf: The SQL config 
> 'spark.sql.execution.arrow.enabled' has been deprecated in Spark v3.0 and may 
> be removed in the future. Use 'spark.sql.execution.arrow.pyspark.enabled' 
> instead of it.
> >>> from pyspark.testing.sqlutils  import ExamplePoint
> >>> import pandas as pd
> >>> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), 
> >>> ExamplePoint(2, 2)])})
> >>> df = spark.createDataFrame(pdf)
> /Users/da/github/apache/spark/python/pyspark/sql/pandas/conversion.py:332: 
> UserWarning: createDataFrame attempted Arrow optimization because 
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed 
> by the reason below:
>   Could not convert (1,1) with type ExamplePoint: did not recognize Python 
> value type when inferring an Arrow data type
> Attempting non-optimization as 
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
>   warnings.warn(msg)
> >>>
> >>> df.show()
> +--+
> | point|
> +--+
> |(0.0, 0.0)|
> |(0.0, 0.0)|
> +--+
> >>> df.schema
> StructType(List(StructField(point,ExamplePointUDT,true)))
> >>> df.toPandas()
> /Users/da/github/apache/spark/python/pyspark/sql/pandas/conversion.py:87: 
> UserWarning: toPandas attempted Arrow optimization because 
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed 
> by the reason below:
>   Unsupported type in conversion to Arrow: ExamplePointUDT
> Attempting non-optimization as 
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
>   warnings.warn(msg)
>point
> 0  (0.0,0.0)
> 1  (0.0,0.0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34771) Support UDT for Pandas with Arrow Optimization

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34771:


Assignee: Apache Spark

> Support UDT for Pandas with Arrow Optimization
> --
>
> Key: SPARK-34771
> URL: https://issues.apache.org/jira/browse/SPARK-34771
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Darcy Shen
>Assignee: Apache Spark
>Priority: Major
>
> {code:python}
> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
> from pyspark.testing.sqlutils  import ExamplePoint
> import pandas as pd
> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), ExamplePoint(2, 
> 2)])})
> df = spark.createDataFrame(pdf)
> df.toPandas()
> {code}
> with `spark.sql.execution.arrow.enabled` = false, the above snippet works 
> fine without WARNINGS.
> with `spark.sql.execution.arrow.enabled` = true, the above snippet works fine 
> with WARNINGS. Because of Unsupported type in conversion, the Arrow 
> optimization is actually turned off. 
> Detailed steps to reproduce:
> {code:python}
> $ bin/pyspark
> Python 3.8.8 (default, Feb 24 2021, 13:46:16)
> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 21/03/17 23:13:27 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 3.2.0-SNAPSHOT
>   /_/
> Using Python version 3.8.8 (default, Feb 24 2021 13:46:16)
> Spark context Web UI available at http://172.30.0.226:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1615994008526).
> SparkSession available as 'spark'.
> >>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
> 21/03/17 23:13:31 WARN SQLConf: The SQL config 
> 'spark.sql.execution.arrow.enabled' has been deprecated in Spark v3.0 and may 
> be removed in the future. Use 'spark.sql.execution.arrow.pyspark.enabled' 
> instead of it.
> >>> from pyspark.testing.sqlutils  import ExamplePoint
> >>> import pandas as pd
> >>> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), 
> >>> ExamplePoint(2, 2)])})
> >>> df = spark.createDataFrame(pdf)
> /Users/da/github/apache/spark/python/pyspark/sql/pandas/conversion.py:332: 
> UserWarning: createDataFrame attempted Arrow optimization because 
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed 
> by the reason below:
>   Could not convert (1,1) with type ExamplePoint: did not recognize Python 
> value type when inferring an Arrow data type
> Attempting non-optimization as 
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
>   warnings.warn(msg)
> >>>
> >>> df.show()
> +--+
> | point|
> +--+
> |(0.0, 0.0)|
> |(0.0, 0.0)|
> +--+
> >>> df.schema
> StructType(List(StructField(point,ExamplePointUDT,true)))
> >>> df.toPandas()
> /Users/da/github/apache/spark/python/pyspark/sql/pandas/conversion.py:87: 
> UserWarning: toPandas attempted Arrow optimization because 
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed 
> by the reason below:
>   Unsupported type in conversion to Arrow: ExamplePointUDT
> Attempting non-optimization as 
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
>   warnings.warn(msg)
>point
> 0  (0.0,0.0)
> 1  (0.0,0.0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10816) EventTime based sessionization (session window)

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306675#comment-17306675
 ] 

Apache Spark commented on SPARK-10816:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/31937

> EventTime based sessionization (session window)
> ---
>
> Key: SPARK-10816
> URL: https://issues.apache.org/jira/browse/SPARK-10816
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Reporter: Reynold Xin
>Priority: Major
> Attachments: SPARK-10816 Support session window natively.pdf, Session 
> Window Support For Structure Streaming.pdf
>
>
> Currently structured streaming supports two kinds of windows: tumbling window 
> and sliding window. Another useful window function is session window. Which 
> is not supported by SS.
> Unlike time window (tumbling window and sliding window), session window 
> doesn't have static window begin and end time. Session window creation 
> depends on defined session gap which can be static or dynamic.
> For static session gap, the events which are falling in a certain period of 
> time (gap) are considered as a session window. A session window closes when 
> it does not receive events for the gap. For dynamic gap, the gap could be 
> changed from event to event.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10816) EventTime based sessionization (session window)

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306676#comment-17306676
 ] 

Apache Spark commented on SPARK-10816:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/31937

> EventTime based sessionization (session window)
> ---
>
> Key: SPARK-10816
> URL: https://issues.apache.org/jira/browse/SPARK-10816
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Reporter: Reynold Xin
>Priority: Major
> Attachments: SPARK-10816 Support session window natively.pdf, Session 
> Window Support For Structure Streaming.pdf
>
>
> Currently structured streaming supports two kinds of windows: tumbling window 
> and sliding window. Another useful window function is session window. Which 
> is not supported by SS.
> Unlike time window (tumbling window and sliding window), session window 
> doesn't have static window begin and end time. Session window creation 
> depends on defined session gap which can be static or dynamic.
> For static session gap, the events which are falling in a certain period of 
> time (gap) are considered as a session window. A session window closes when 
> it does not receive events for the gap. For dynamic gap, the gap could be 
> changed from event to event.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34830) Some UDF calls inside transform are broken

2021-03-22 Thread Pavel Chernikov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1730#comment-1730
 ] 

Pavel Chernikov commented on SPARK-34830:
-

[~dsolow1], sure. I believe that it might be something different, as here you 
get {{"def"}} out of nowhere, basically.

> Some UDF calls inside transform are broken
> --
>
> Key: SPARK-34830
> URL: https://issues.apache.org/jira/browse/SPARK-34830
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Daniel Solow
>Priority: Major
>
> Let's say I want to create a UDF to do a simple lookup on a string:
> {code:java}
> import org.apache.spark.sql.{functions => f}
> val M = Map("a" -> "abc", "b" -> "defg")
> val BM = spark.sparkContext.broadcast(M)
> val LOOKUP = f.udf((s: String) => BM.value.get(s))
> {code}
> Now if I have the following dataframe:
> {code:java}
> val df = Seq(
> Tuple1(Seq("a", "b"))
> ).toDF("arr")
> {code}
> and I want to run this UDF over each element in the array, I can do:
> {code:java}
> df.select(f.transform($"arr", i => LOOKUP(i)).as("arr")).show(false)
> {code}
> This should show:
> {code:java}
> +---+
> |arr|
> +---+
> |[abc, defg]|
> +---+
> {code}
> However it actually shows:
> {code:java}
> +---+
> |arr|
> +---+
> |[def, defg]|
> +---+
> {code}
> It's also broken for SQL (even without DSL). This gives the same result:
> {code:java}
> spark.udf.register("LOOKUP",(s: String) => BM.value.get(s))
> df.selectExpr("TRANSFORM(arr, a -> LOOKUP(a)) AS arr").show(false)
> {code}
> Note that "def" is not even in the map I'm using.
> This is a big problem because it breaks existing code/UDFs. I noticed this 
> because the job I ported from 2.4.5 to 3.1.1 seemed to be working, but was 
> actually producing broken data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34829) transform_values return identical values while operating on complex types

2021-03-22 Thread Pavel Chernikov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306665#comment-17306665
 ] 

Pavel Chernikov commented on SPARK-34829:
-

On the contrary, this will work as expected:
{code:java}
case class Bar(i: Int)
def square(barC: Column): Column = {
  val iC = barC.getField("i")
  struct((iC * iC).as("i"))
}
val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
df.withColumn("map_square", transform_values(col("map"), (_, v) => 
square(v))).show(truncate = false)
+--+--+
|map   |map_square|
+--+--+
|{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> {1}, 2 -> {4}, 3 -> {9}}|
+--+--+{code}
and this as well:
{code:java}
case class Foo(s: String)
def reverse(fooC: Column): Column = 
  org.apache.spark.sql.functions.reverse(fooC.getField("s"))
val df = Seq(Map(1 -> Foo("abc"), 2 -> Foo("klm"), 3 -> Foo("xyz"))).toDF("map")
df.withColumn("map_reverse", transform_values(col("map"), (_, v) => 
reverse(v))).show(truncate = false)
++--+
|map |map_reverse   |
++--+
|{1 -> {abc}, 2 -> {klm}, 3 -> {xyz}}|{1 -> cba, 2 -> mlk, 3 -> zyx}|
++--+{code}

> transform_values return identical values while operating on complex types
> -
>
> Key: SPARK-34829
> URL: https://issues.apache.org/jira/browse/SPARK-34829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Pavel Chernikov
>Priority: Major
>
> If map values are {{StructType}} s then behavior of {{transform_values}} is 
> inconsistent (it may return identical values). To be more precise, it looks 
> like it returns identical values if the return type is {{AnyRef}}.
> Consider following examples:
> {code:java}
> case class Bar(i: Int)
> val square = udf((b: Bar) => b.i * b.i)
> val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
> df.withColumn("map_square", transform_values(col("map"), (_, v) => 
> square(v))).show(truncate = false)
> +--++
> |map                           |map_square              |
> +--++
> |{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> 1, 2 -> 4, 3 -> 9}|
> +--++
> {code}
> vs 
> {code:java}
> case class Bar(i: Int)
> case class BarSquare(i: Int)
> val square = udf((b: Bar) => BarSquare(b.i * b.i))
> val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
> df.withColumn("map_square", transform_values(col("map"), (_, v) => 
> square(v))).show(truncate = false)
> +--+--+
> |map                           |map_square                    |
> +--+--+
> |{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> {9}, 2 -> {9}, 3 -> {9}}|
> +--+--+
> {code}
> or even just this one
> {code:java}
> case class Foo(s: String)
> val reverse = udf((f: Foo) => f.s.reverse)
> val df = Seq(Map(1 -> Foo("abc"), 2 -> Foo("klm"), 3 -> 
> Foo("xyz"))).toDF("map")
> df.withColumn("map_reverse", transform_values(col("map"), (_, v) => 
> reverse(v))).show(truncate = false)
> ++--+
> |map |map_reverse   |
> ++--+
> |{1 -> {abc}, 2 -> {klm}, 3 -> {xyz}}|{1 -> zyx, 2 -> zyx, 3 -> zyx}|
> ++--+
> {code}
> After playing with 
> {{org.apache.spark.sql.catalyst.expressions.TransformValues}} it looks like 
> something wrong is happening while executing this line:
> {code:java}
> resultValues.update(i, functionForEval.eval(inputRow)){code}
> To be more precise , it's all about {{functionForEval.eval(inputRow)}} , 
> because if you do something like this:
> {code:java}
> println(s"RESULTS PRIOR TO EVALUATION - $resultValues")
> val resultValue = functionForEval.eval(inputRow)
> println(s"RESULT - $resultValue")
> println(s"RESULTS PRIOR TO UPDATE - $resultValues")
> resultValues.update(i, resultValue)
> println(s"RESULTS AFTER UPDATE - $resultValues"){code}
> You'll see in the logs, something like:
> {code:java}
> RESULTS PRIOR TO EVALUATION - [null,null,null] 
> RESULT - [0,1] 
>

[jira] [Commented] (SPARK-34830) Some UDF calls inside transform are broken

2021-03-22 Thread Daniel Solow (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306661#comment-17306661
 ] 

Daniel Solow commented on SPARK-34830:
--

[~ChernikovP] Seems like it's probably the same thing. I'll leave this up on 
the off-chance it's something different.

> Some UDF calls inside transform are broken
> --
>
> Key: SPARK-34830
> URL: https://issues.apache.org/jira/browse/SPARK-34830
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Daniel Solow
>Priority: Major
>
> Let's say I want to create a UDF to do a simple lookup on a string:
> {code:java}
> import org.apache.spark.sql.{functions => f}
> val M = Map("a" -> "abc", "b" -> "defg")
> val BM = spark.sparkContext.broadcast(M)
> val LOOKUP = f.udf((s: String) => BM.value.get(s))
> {code}
> Now if I have the following dataframe:
> {code:java}
> val df = Seq(
> Tuple1(Seq("a", "b"))
> ).toDF("arr")
> {code}
> and I want to run this UDF over each element in the array, I can do:
> {code:java}
> df.select(f.transform($"arr", i => LOOKUP(i)).as("arr")).show(false)
> {code}
> This should show:
> {code:java}
> +---+
> |arr|
> +---+
> |[abc, defg]|
> +---+
> {code}
> However it actually shows:
> {code:java}
> +---+
> |arr|
> +---+
> |[def, defg]|
> +---+
> {code}
> It's also broken for SQL (even without DSL). This gives the same result:
> {code:java}
> spark.udf.register("LOOKUP",(s: String) => BM.value.get(s))
> df.selectExpr("TRANSFORM(arr, a -> LOOKUP(a)) AS arr").show(false)
> {code}
> Note that "def" is not even in the map I'm using.
> This is a big problem because it breaks existing code/UDFs. I noticed this 
> because the job I ported from 2.4.5 to 3.1.1 seemed to be working, but was 
> actually producing broken data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34830) Some UDF calls inside transform are broken

2021-03-22 Thread Pavel Chernikov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306655#comment-17306655
 ] 

Pavel Chernikov commented on SPARK-34830:
-

[~dsolow1], I've recently bumped into another issue with UDF calls and 
{{transform}} functionality. See, 
https://issues.apache.org/jira/browse/SPARK-34829

> Some UDF calls inside transform are broken
> --
>
> Key: SPARK-34830
> URL: https://issues.apache.org/jira/browse/SPARK-34830
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Daniel Solow
>Priority: Major
>
> Let's say I want to create a UDF to do a simple lookup on a string:
> {code:java}
> import org.apache.spark.sql.{functions => f}
> val M = Map("a" -> "abc", "b" -> "defg")
> val BM = spark.sparkContext.broadcast(M)
> val LOOKUP = f.udf((s: String) => BM.value.get(s))
> {code}
> Now if I have the following dataframe:
> {code:java}
> val df = Seq(
> Tuple1(Seq("a", "b"))
> ).toDF("arr")
> {code}
> and I want to run this UDF over each element in the array, I can do:
> {code:java}
> df.select(f.transform($"arr", i => LOOKUP(i)).as("arr")).show(false)
> {code}
> This should show:
> {code:java}
> +---+
> |arr|
> +---+
> |[abc, defg]|
> +---+
> {code}
> However it actually shows:
> {code:java}
> +---+
> |arr|
> +---+
> |[def, defg]|
> +---+
> {code}
> It's also broken for SQL (even without DSL). This gives the same result:
> {code:java}
> spark.udf.register("LOOKUP",(s: String) => BM.value.get(s))
> df.selectExpr("TRANSFORM(arr, a -> LOOKUP(a)) AS arr").show(false)
> {code}
> Note that "def" is not even in the map I'm using.
> This is a big problem because it breaks existing code/UDFs. I noticed this 
> because the job I ported from 2.4.5 to 3.1.1 seemed to be working, but was 
> actually producing broken data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34830) Some UDF calls inside transform are broken

2021-03-22 Thread Daniel Solow (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Solow updated SPARK-34830:
-
Description: 
Let's say I want to create a UDF to do a simple lookup on a string:

{code:java}
import org.apache.spark.sql.{functions => f}
val M = Map("a" -> "abc", "b" -> "defg")
val BM = spark.sparkContext.broadcast(M)
val LOOKUP = f.udf((s: String) => BM.value.get(s))
{code}

Now if I have the following dataframe:

{code:java}
val df = Seq(
Tuple1(Seq("a", "b"))
).toDF("arr")
{code}

and I want to run this UDF over each element in the array, I can do:

{code:java}
df.select(f.transform($"arr", i => LOOKUP(i)).as("arr")).show(false)
{code}

This should show:

{code:java}
+---+
|arr|
+---+
|[abc, defg]|
+---+
{code}
However it actually shows:

{code:java}
+---+
|arr|
+---+
|[def, defg]|
+---+
{code}

It's also broken for SQL (even without DSL). This gives the same result:

{code:java}
spark.udf.register("LOOKUP",(s: String) => BM.value.get(s))
df.selectExpr("TRANSFORM(arr, a -> LOOKUP(a)) AS arr").show(false)
{code}


Note that "def" is not even in the map I'm using.

This is a big problem because it breaks existing code/UDFs. I noticed this 
because the job I ported from 2.4.5 to 3.1.1 seemed to be working, but was 
actually producing broken data.

  was:
Let's say I want to create a UDF to do a simple lookup on a string:

{code:java}
import org.apache.spark.sql.{functions => f}
val M = Map("a" -> "abc", "b" -> "defg")
val BM = spark.sparkContext.broadcast(M)
val LOOKUP = f.udf((s: String) => BM.value.get(s))
{code}

Now if I have the following dataframe:

{code:java}
val df = Seq(
Tuple1(Seq("a", "b"))
).toDF("arr")
{code}

and I want to run this UDF over each element in the array, I can do:

{code:java}
df.select(f.transform($"arr", i => LOOKUP(i)).as("arr")).show(false)
{code}

This should show:

{code:java}
+---+
|arr|
+---+
|[abc, defg]|
+---+
{code}
However it actually shows:

{code:java}
+---+
|arr|
+---+
|[def, defg]|
+---+
{code}

Note that "def" is not even in the map I'm using.

This is a big problem because it breaks existing code/UDFs. I noticed this 
because the job I ported from 2.4.5 to 3.1.1 seemed to be working, but was 
actually producing broken data.


> Some UDF calls inside transform are broken
> --
>
> Key: SPARK-34830
> URL: https://issues.apache.org/jira/browse/SPARK-34830
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Daniel Solow
>Priority: Major
>
> Let's say I want to create a UDF to do a simple lookup on a string:
> {code:java}
> import org.apache.spark.sql.{functions => f}
> val M = Map("a" -> "abc", "b" -> "defg")
> val BM = spark.sparkContext.broadcast(M)
> val LOOKUP = f.udf((s: String) => BM.value.get(s))
> {code}
> Now if I have the following dataframe:
> {code:java}
> val df = Seq(
> Tuple1(Seq("a", "b"))
> ).toDF("arr")
> {code}
> and I want to run this UDF over each element in the array, I can do:
> {code:java}
> df.select(f.transform($"arr", i => LOOKUP(i)).as("arr")).show(false)
> {code}
> This should show:
> {code:java}
> +---+
> |arr|
> +---+
> |[abc, defg]|
> +---+
> {code}
> However it actually shows:
> {code:java}
> +---+
> |arr|
> +---+
> |[def, defg]|
> +---+
> {code}
> It's also broken for SQL (even without DSL). This gives the same result:
> {code:java}
> spark.udf.register("LOOKUP",(s: String) => BM.value.get(s))
> df.selectExpr("TRANSFORM(arr, a -> LOOKUP(a)) AS arr").show(false)
> {code}
> Note that "def" is not even in the map I'm using.
> This is a big problem because it breaks existing code/UDFs. I noticed this 
> because the job I ported from 2.4.5 to 3.1.1 seemed to be working, but was 
> actually producing broken data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34830) Some UDF calls inside transform are broken

2021-03-22 Thread Daniel Solow (Jira)

Daniel Solow created SPARK-34830:


 Summary: Some UDF calls inside transform are broken
 Key: SPARK-34830
 URL: https://issues.apache.org/jira/browse/SPARK-34830
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1
Reporter: Daniel Solow


Let's say I want to create a UDF to do a simple lookup on a string:

{code:java}
import org.apache.spark.sql.{functions => f}
val M = Map("a" -> "abc", "b" -> "defg")
val BM = spark.sparkContext.broadcast(M)
val LOOKUP = f.udf((s: String) => BM.value.get(s))
{code}

Now if I have the following dataframe:

{code:java}
val df = Seq(
Tuple1(Seq("a", "b"))
).toDF("arr")
{code}

and I want to run this UDF over each element in the array, I can do:

{code:java}
df.select(f.transform($"arr", i => LOOKUP(i)).as("arr")).show(false)
{code}

This should show:

{code:java}
+---+
|arr|
+---+
|[abc, defg]|
+---+
{code}
However it actually shows:

{code:java}
+---+
|arr|
+---+
|[def, defg]|
+---+
{code}

Note that "def" is not even in the map I'm using.

This is a big problem because it breaks existing code/UDFs. I noticed this 
because the job I ported from 2.4.5 to 3.1.1 seemed to be working, but was 
actually producing broken data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34829) transform_values return identical values while operating on complex types

2021-03-22 Thread Pavel Chernikov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Chernikov updated SPARK-34829:

Description: 
If map values are {{StructType}} s then behavior of {{transform_values}} is 
inconsistent (it may return identical values). To be more precise, it looks 
like it returns identical values if the return type is {{AnyRef}}.

Consider following examples:
{code:java}
case class Bar(i: Int)
val square = udf((b: Bar) => b.i * b.i)
val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
df.withColumn("map_square", transform_values(col("map"), (_, v) => 
square(v))).show(truncate = false)
+--++
|map                           |map_square              |
+--++
|{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> 1, 2 -> 4, 3 -> 9}|
+--++
{code}
vs 
{code:java}
case class Bar(i: Int)
case class BarSquare(i: Int)
val square = udf((b: Bar) => BarSquare(b.i * b.i))
val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
df.withColumn("map_square", transform_values(col("map"), (_, v) => 
square(v))).show(truncate = false)
+--+--+
|map                           |map_square                    |
+--+--+
|{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> {9}, 2 -> {9}, 3 -> {9}}|
+--+--+
{code}
or even just this one
{code:java}
case class Foo(s: String)
val reverse = udf((f: Foo) => f.s.reverse)
val df = Seq(Map(1 -> Foo("abc"), 2 -> Foo("klm"), 3 -> Foo("xyz"))).toDF("map")
df.withColumn("map_reverse", transform_values(col("map"), (_, v) => 
reverse(v))).show(truncate = false)
++--+
|map |map_reverse   |
++--+
|{1 -> {abc}, 2 -> {klm}, 3 -> {xyz}}|{1 -> zyx, 2 -> zyx, 3 -> zyx}|
++--+
{code}
After playing with 
{{org.apache.spark.sql.catalyst.expressions.TransformValues}} it looks like 
something wrong is happening while executing this line:
{code:java}
resultValues.update(i, functionForEval.eval(inputRow)){code}
To be more precise , it's all about {{functionForEval.eval(inputRow)}} , 
because if you do something like this:
{code:java}
println(s"RESULTS PRIOR TO EVALUATION - $resultValues")
val resultValue = functionForEval.eval(inputRow)
println(s"RESULT - $resultValue")
println(s"RESULTS PRIOR TO UPDATE - $resultValues")
resultValues.update(i, resultValue)
println(s"RESULTS AFTER UPDATE - $resultValues"){code}
You'll see in the logs, something like:
{code:java}
RESULTS PRIOR TO EVALUATION - [null,null,null] 
RESULT - [0,1] 
RESULTS PRIOR TO UPDATE - [null,null,null]
RESULTS AFTER UPDATE - [[0,1],null,null]
--
RESULTS PRIOR TO EVALUATION - [[0,1],null,null] 
RESULT - [0,4]
RESULTS PRIOR TO UPDATE - [[0,4],null,null] 
RESULTS  AFTER UPDATE - [[0,4],[0,4],null]
--
RESULTS PRIOR TO EVALUATION - [[0,4],[0,4],null] 
RESULT - [0,9]
RESULTS PRIOR TO UPDATE - [[0,9],[0,9],null]
RESULTS  AFTER UPDATE - [[0,9],[0,9],[0,9]
{code}
 

  was:
If map values are {{StructType}} s then behavior of {{transform_values}} is 
inconsistent (it may return identical values). To be more precise, it looks 
like it returns identical values if the return type is {{AnyRef}}.

Consider following examples:
{code:java}
case class Bar(i: Int)
val square = udf((b: Bar) => b.i * b.i)
val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
df.withColumn("map_square", transform_values(col("map"), (_, v) => 
square(v))).show(truncate = false)
+--++
|map                           |map_square              |
+--++
|{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> 1, 2 -> 4, 3 -> 9}|
+--++
{code}
vs 
{code:java}
case class Bar(i: Int)
case class BarSquare(i: Int)
val square = udf((b: Bar) => BarSquare(b.i * b.i))
val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
df.withColumn("map_square", transform_values(col("map"), (_, v) => 
square(v))).show(truncate = false)
+--+--+
|map                           |map_square                    |
+--+--+
|{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> {9}, 2 -> {9}, 3 -> {9}}|
+--+--+
{code}
or even just this one
{code:java}
case class Foo(s: String)
val reverse = udf((f: Foo) => f.s.reverse)
val df = Seq(Map(1 ->

[jira] [Updated] (SPARK-34829) transform_values return identical values while operating on complex types

2021-03-22 Thread Pavel Chernikov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Chernikov updated SPARK-34829:

Description: 
If map values are {{StructType}} s then behavior of {{transform_values}} is 
inconsistent (it may return identical values). To be more precise, it looks 
like it returns identical values if the return type is {{AnyRef}}.

Consider following examples:
{code:java}
case class Bar(i: Int)
val square = udf((b: Bar) => b.i * b.i)
val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
df.withColumn("map_square", transform_values(col("map"), (_, v) => 
square(v))).show(truncate = false)
+--++
|map                           |map_square              |
+--++
|{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> 1, 2 -> 4, 3 -> 9}|
+--++
{code}
vs 
{code:java}
case class Bar(i: Int)
case class BarSquare(i: Int)
val square = udf((b: Bar) => BarSquare(b.i * b.i))
val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
df.withColumn("map_square", transform_values(col("map"), (_, v) => 
square(v))).show(truncate = false)
+--+--+
|map                           |map_square                    |
+--+--+
|{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> {9}, 2 -> {9}, 3 -> {9}}|
+--+--+
{code}
or even just this one
{code:java}
case class Foo(s: String)
val reverse = udf((f: Foo) => f.s.reverse)
val df = Seq(Map(1 -> Foo("abc"), 2 -> Foo("klm"), 3 -> Foo("xyz"))).toDF("map")
df.withColumn("map_reverse", transform_values(col("map"), (_, v) => 
reverse(v))).show(truncate = false)
++--+
|map |map_reverse   |
++--+
|{1 -> {abc}, 2 -> {klm}, 3 -> {xyz}}|{1 -> zyx, 2 -> zyx, 3 -> zyx}|
++--+
{code}
After playing with 
{{org.apache.spark.sql.catalyst.expressions.TransformValues}} it looks like 
something wrong is happening while executing this line:
{code:java}
resultValues.update(i, functionForEval.eval(inputRow)){code}
To be more precise , it's all about {{functionForEval.eval(inputRow)}} , 
because if you do something like this:

 
{code:java}
println(s"RESULTS PRIOR TO EVALUATION - $resultValues")
val resultValue = functionForEval.eval(inputRow)
println(s"RESULT - $resultValue")
println(s"RESULTS PRIOR TO UPDATE - $resultValues")
resultValues.update(i, resultValue)
println(s"RESULTS AFTER UPDATE - $resultValues"){code}
You'll see in the logs, something like:
{code:java}
RESULTS PRIOR TO EVALUATION - [null,null,null] 
RESULT - [0,1] 
RESULTS PRIOR TO UPDATE - [null,null,null]
RESULTS AFTER UPDATE - [[0,1],null,null]
--
RESULTS PRIOR TO EVALUATION - [[0,1],null,null] 
RESULT - [0,4]
RESULTS PRIOR TO UPDATE - [[0,4],null,null] 
RESULTS  AFTER UPDATE - [[0,4],[0,4],null]
--
RESULTS PRIOR TO EVALUATION - [[0,4],[0,4],null] 
RESULT - [0,9]
RESULTS PRIOR TO UPDATE - [[0,9],[0,9],null]
RESULTS  AFTER UPDATE - [[0,9],[0,9],[0,9]
{code}
 

  was:
If map values are {{StructType}} s then behavior of {{transform_values}} is 
inconsistent (it may return identical values). To be more precise, it looks 
like it returns identical values if the return type is {{AnyRef}}.

Consider following examples:

 
{code:java}
case class Bar(i: Int)
val square = udf((b: Bar) => b.i * b.i)
val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
df.withColumn("map_square", transform_values(col("map"), (_, v) => 
square(v))).show(truncate = false)
+--++
|map                           |map_square              |
+--++
|{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> 1, 2 -> 4, 3 -> 9}|
+--++
{code}
vs 
{code:java}
case class Bar(i: Int)
case class BarSquare(i: Int)
val square = udf((b: Bar) => BarSquare(b.i * b.i))
val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
df.withColumn("map_square", transform_values(col("map"), (_, v) => 
square(v))).show(truncate = false)
+--+--+
|map                           |map_square                    |
+--+--+
|{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> {9}, 2 -> {9}, 3 -> {9}}|
+--+--+
{code}
and even just

 
{code:java}
case class Foo(s: String)
val reverse = udf((f: Foo) => f.s.reverse)
val df = Seq(Map(1 ->

[jira] [Created] (SPARK-34829) transform_values return identical values while operating on complex types

2021-03-22 Thread Pavel Chernikov (Jira)

Pavel Chernikov created SPARK-34829:
---

 Summary: transform_values return identical values while operating 
on complex types
 Key: SPARK-34829
 URL: https://issues.apache.org/jira/browse/SPARK-34829
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1
Reporter: Pavel Chernikov


If map values are {{StructType}} s then behavior of {{transform_values}} is 
inconsistent (it may return identical values). To be more precise, it looks 
like it returns identical values if the return type is {{AnyRef}}.

Consider following examples:

 
{code:java}
case class Bar(i: Int)
val square = udf((b: Bar) => b.i * b.i)
val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
df.withColumn("map_square", transform_values(col("map"), (_, v) => 
square(v))).show(truncate = false)
+--++
|map                           |map_square              |
+--++
|{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> 1, 2 -> 4, 3 -> 9}|
+--++
{code}
vs 
{code:java}
case class Bar(i: Int)
case class BarSquare(i: Int)
val square = udf((b: Bar) => BarSquare(b.i * b.i))
val df = Seq(Map(1 -> Bar(1), 2 -> Bar(2), 3 -> Bar(3))).toDF("map")
df.withColumn("map_square", transform_values(col("map"), (_, v) => 
square(v))).show(truncate = false)
+--+--+
|map                           |map_square                    |
+--+--+
|{1 -> {1}, 2 -> {2}, 3 -> {3}}|{1 -> {9}, 2 -> {9}, 3 -> {9}}|
+--+--+
{code}
and even just

 
{code:java}
case class Foo(s: String)
val reverse = udf((f: Foo) => f.s.reverse)
val df = Seq(Map(1 -> Foo("abc"), 2 -> Foo("klm"), 3 -> Foo("xyz"))).toDF("map")
df.withColumn("map_reverse", transform_values(col("map"), (_, v) => 
reverse(v))).show(truncate = false)
++--+
|map |map_reverse   |
++--+
|{1 -> {abc}, 2 -> {klm}, 3 -> {xyz}}|{1 -> zyx, 2 -> zyx, 3 -> zyx}|
++--+

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34828) YARN Shuffle Service: Support configurability of aux service name and service-specific config overrides

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34828:


Assignee: (was: Apache Spark)

> YARN Shuffle Service: Support configurability of aux service name and 
> service-specific config overrides
> ---
>
> Key: SPARK-34828
> URL: https://issues.apache.org/jira/browse/SPARK-34828
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, YARN
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> In some cases it may be desirable to run multiple instances of the Spark 
> Shuffle Service which are using different versions of Spark. This can be 
> helpful, for example, when running a YARN cluster with a mixed workload of 
> applications running multiple Spark versions, since a given version of the 
> shuffle service is not always compatible with other versions of Spark. (See 
> SPARK-27780 for more detail on this)
> YARN versions since 2.9.0 support the ability to run shuffle services within 
> an isolated classloader (see YARN-4577), meaning multiple Spark versions can 
> coexist within a single NodeManager.
> To support this from the Spark side, we need to make two enhancements:
> * Make the name of the shuffle service configurable. Currently it is 
> hard-coded to be {{spark_shuffle}} on both the client and server side. The 
> server-side name is not actually used anywhere, as it is the value within the 
> {{yarn.nodemanager.aux-services}} which is considered by the NodeManager to 
> be definitive name. However, if you change this in the configs, the 
> hard-coded name within the client will no longer match. So, this needs to be 
> configurable.
> * Add a way to separately configure the two shuffle service instances. Since 
> the configurations such as the port number are taken from the NodeManager 
> config, they will both try to use the same port, which obviously won't work. 
> So, we need to provide a way to selectively configure the two shuffle service 
> instances. I will go into details on my proposal for how to achieve this 
> within the PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34828) YARN Shuffle Service: Support configurability of aux service name and service-specific config overrides

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34828:


Assignee: Apache Spark

> YARN Shuffle Service: Support configurability of aux service name and 
> service-specific config overrides
> ---
>
> Key: SPARK-34828
> URL: https://issues.apache.org/jira/browse/SPARK-34828
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, YARN
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Assignee: Apache Spark
>Priority: Major
>
> In some cases it may be desirable to run multiple instances of the Spark 
> Shuffle Service which are using different versions of Spark. This can be 
> helpful, for example, when running a YARN cluster with a mixed workload of 
> applications running multiple Spark versions, since a given version of the 
> shuffle service is not always compatible with other versions of Spark. (See 
> SPARK-27780 for more detail on this)
> YARN versions since 2.9.0 support the ability to run shuffle services within 
> an isolated classloader (see YARN-4577), meaning multiple Spark versions can 
> coexist within a single NodeManager.
> To support this from the Spark side, we need to make two enhancements:
> * Make the name of the shuffle service configurable. Currently it is 
> hard-coded to be {{spark_shuffle}} on both the client and server side. The 
> server-side name is not actually used anywhere, as it is the value within the 
> {{yarn.nodemanager.aux-services}} which is considered by the NodeManager to 
> be definitive name. However, if you change this in the configs, the 
> hard-coded name within the client will no longer match. So, this needs to be 
> configurable.
> * Add a way to separately configure the two shuffle service instances. Since 
> the configurations such as the port number are taken from the NodeManager 
> config, they will both try to use the same port, which obviously won't work. 
> So, we need to provide a way to selectively configure the two shuffle service 
> instances. I will go into details on my proposal for how to achieve this 
> within the PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34828) YARN Shuffle Service: Support configurability of aux service name and service-specific config overrides

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306638#comment-17306638
 ] 

Apache Spark commented on SPARK-34828:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/31936

> YARN Shuffle Service: Support configurability of aux service name and 
> service-specific config overrides
> ---
>
> Key: SPARK-34828
> URL: https://issues.apache.org/jira/browse/SPARK-34828
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, YARN
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> In some cases it may be desirable to run multiple instances of the Spark 
> Shuffle Service which are using different versions of Spark. This can be 
> helpful, for example, when running a YARN cluster with a mixed workload of 
> applications running multiple Spark versions, since a given version of the 
> shuffle service is not always compatible with other versions of Spark. (See 
> SPARK-27780 for more detail on this)
> YARN versions since 2.9.0 support the ability to run shuffle services within 
> an isolated classloader (see YARN-4577), meaning multiple Spark versions can 
> coexist within a single NodeManager.
> To support this from the Spark side, we need to make two enhancements:
> * Make the name of the shuffle service configurable. Currently it is 
> hard-coded to be {{spark_shuffle}} on both the client and server side. The 
> server-side name is not actually used anywhere, as it is the value within the 
> {{yarn.nodemanager.aux-services}} which is considered by the NodeManager to 
> be definitive name. However, if you change this in the configs, the 
> hard-coded name within the client will no longer match. So, this needs to be 
> configurable.
> * Add a way to separately configure the two shuffle service instances. Since 
> the configurations such as the port number are taken from the NodeManager 
> config, they will both try to use the same port, which obviously won't work. 
> So, we need to provide a way to selectively configure the two shuffle service 
> instances. I will go into details on my proposal for how to achieve this 
> within the PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34828) YARN Shuffle Service: Support configurability of aux service name and service-specific config overrides

2021-03-22 Thread Erik Krogen (Jira)

Erik Krogen created SPARK-34828:
---

 Summary: YARN Shuffle Service: Support configurability of aux 
service name and service-specific config overrides
 Key: SPARK-34828
 URL: https://issues.apache.org/jira/browse/SPARK-34828
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, YARN
Affects Versions: 3.1.1
Reporter: Erik Krogen


In some cases it may be desirable to run multiple instances of the Spark 
Shuffle Service which are using different versions of Spark. This can be 
helpful, for example, when running a YARN cluster with a mixed workload of 
applications running multiple Spark versions, since a given version of the 
shuffle service is not always compatible with other versions of Spark. (See 
SPARK-27780 for more detail on this)

YARN versions since 2.9.0 support the ability to run shuffle services within an 
isolated classloader (see YARN-4577), meaning multiple Spark versions can 
coexist within a single NodeManager.

To support this from the Spark side, we need to make two enhancements:

* Make the name of the shuffle service configurable. Currently it is hard-coded 
to be {{spark_shuffle}} on both the client and server side. The server-side 
name is not actually used anywhere, as it is the value within the 
{{yarn.nodemanager.aux-services}} which is considered by the NodeManager to be 
definitive name. However, if you change this in the configs, the hard-coded 
name within the client will no longer match. So, this needs to be configurable.
* Add a way to separately configure the two shuffle service instances. Since 
the configurations such as the port number are taken from the NodeManager 
config, they will both try to use the same port, which obviously won't work. 
So, we need to provide a way to selectively configure the two shuffle service 
instances. I will go into details on my proposal for how to achieve this within 
the PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34789) Introduce Jetty based construct for integration tests where HTTP(S) is used

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306611#comment-17306611
 ] 

Apache Spark commented on SPARK-34789:
--

User 'attilapiros' has created a pull request for this issue:
https://github.com/apache/spark/pull/31935

> Introduce Jetty based construct for integration tests where HTTP(S) is used
> ---
>
> Key: SPARK-34789
> URL: https://issues.apache.org/jira/browse/SPARK-34789
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.2.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>
> This came up during 
> https://github.com/apache/spark/pull/31877#discussion_r596831803.
> Short summary: we have some tests where HTTP(S) is used to access files. The 
> current solution uses github urls like 
> "https://raw.githubusercontent.com/apache/spark/master/data/mllib/pagerank_data.txt;.
> This connects two Spark version in an unhealthy way like connecting the 
> "master" branch which is moving part with the committed test code which is a 
> non-moving (as it might be even released).
> So this way a test running for an earlier version of Spark expects something 
> (filename, content, path) from a the latter release and what is worse when 
> the moving version is changed the earlier test will break. 
> The idea is to introduce a method like:
> {noformat}
> withHttpServer(files) {
> }
> {noformat}
> Which uses a Jetty ResourceHandler to serve the listed files (or directories 
> / or just the root where it is started from) and stops the server in the 
> finally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34789) Introduce Jetty based construct for integration tests where HTTP(S) is used

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34789:


Assignee: (was: Apache Spark)

> Introduce Jetty based construct for integration tests where HTTP(S) is used
> ---
>
> Key: SPARK-34789
> URL: https://issues.apache.org/jira/browse/SPARK-34789
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.2.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>
> This came up during 
> https://github.com/apache/spark/pull/31877#discussion_r596831803.
> Short summary: we have some tests where HTTP(S) is used to access files. The 
> current solution uses github urls like 
> "https://raw.githubusercontent.com/apache/spark/master/data/mllib/pagerank_data.txt;.
> This connects two Spark version in an unhealthy way like connecting the 
> "master" branch which is moving part with the committed test code which is a 
> non-moving (as it might be even released).
> So this way a test running for an earlier version of Spark expects something 
> (filename, content, path) from a the latter release and what is worse when 
> the moving version is changed the earlier test will break. 
> The idea is to introduce a method like:
> {noformat}
> withHttpServer(files) {
> }
> {noformat}
> Which uses a Jetty ResourceHandler to serve the listed files (or directories 
> / or just the root where it is started from) and stops the server in the 
> finally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34789) Introduce Jetty based construct for integration tests where HTTP(S) is used

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34789:


Assignee: Apache Spark

> Introduce Jetty based construct for integration tests where HTTP(S) is used
> ---
>
> Key: SPARK-34789
> URL: https://issues.apache.org/jira/browse/SPARK-34789
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.2.0
>Reporter: Attila Zsolt Piros
>Assignee: Apache Spark
>Priority: Major
>
> This came up during 
> https://github.com/apache/spark/pull/31877#discussion_r596831803.
> Short summary: we have some tests where HTTP(S) is used to access files. The 
> current solution uses github urls like 
> "https://raw.githubusercontent.com/apache/spark/master/data/mllib/pagerank_data.txt;.
> This connects two Spark version in an unhealthy way like connecting the 
> "master" branch which is moving part with the committed test code which is a 
> non-moving (as it might be even released).
> So this way a test running for an earlier version of Spark expects something 
> (filename, content, path) from a the latter release and what is worse when 
> the moving version is changed the earlier test will break. 
> The idea is to introduce a method like:
> {noformat}
> withHttpServer(files) {
> }
> {noformat}
> Which uses a Jetty ResourceHandler to serve the listed files (or directories 
> / or just the root where it is started from) and stops the server in the 
> finally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32916) Add support for external shuffle service in YARN deployment mode to leverage push-based shuffle

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306598#comment-17306598
 ] 

Apache Spark commented on SPARK-32916:
--

User 'otterc' has created a pull request for this issue:
https://github.com/apache/spark/pull/31934

> Add support for external shuffle service in YARN deployment mode to leverage 
> push-based shuffle
> ---
>
> Key: SPARK-32916
> URL: https://issues.apache.org/jira/browse/SPARK-32916
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core, YARN
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.1.0
>
>
> Integration needed to bootstrap external shuffle service in YARN deployment 
> mode. Properly create the necessary dirs and initialize the relevant 
> server-side components in the RPC layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2021-03-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34827:
--
Summary: Support fetching shuffle blocks in batch with i/o encryption  
(was: Support IO Encryption in SQL Adaptive Query Execution)

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34827) Support IO Encryption in SQL Adaptive Query Execution

2021-03-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34827:
--
Target Version/s: 3.2.0

> Support IO Encryption in SQL Adaptive Query Execution
> -
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34827) Support IO Encryption in SQL Adaptive Query Execution

2021-03-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-34827:
-

 Summary: Support IO Encryption in SQL Adaptive Query Execution
 Key: SPARK-34827
 URL: https://issues.apache.org/jira/browse/SPARK-34827
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34790) Fail in fetch shuffle blocks in batch when i/o encryption is enabled.

2021-03-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34790:
--
Affects Version/s: 3.1.0

> Fail in fetch shuffle blocks in batch when i/o encryption is enabled.
> -
>
> Key: SPARK-34790
> URL: https://issues.apache.org/jira/browse/SPARK-34790
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1
>Reporter: hezuojiao
>Assignee: hezuojiao
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
>
> When set spark.io.encryption.enabled=true, lots of test cases in 
> AdaptiveQueryExecSuite will be failed. Fetching shuffle blocks in batch is 
> incompatible with io encryption.
> For example:
> After set spark.io.encryption.enabled=true, run the following test suite 
> which in AdaptiveQueryExecSuite:
>  
> {code:java}
>   test("SPARK-33494: Do not use local shuffle reader for repartition") {
> withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
>   val df = spark.table("testData").repartition('key)
>   df.collect()
>   // local shuffle reader breaks partitioning and shouldn't be used for 
> repartition operation
>   // which is specified by users.
>   checkNumLocalShuffleReaders(df.queryExecution.executedPlan, 
> numShufflesWithoutLocalReader = 1)
> }
>   }
> {code}
>  
> I got the following error message：
> {code:java}
> 14:05:52.638 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
> stage 2.0 (TID 3) (11.240.37.88 executor driver): 
> FetchFailed(BlockManagerId(driver, 11.240.37.88, 63574, None), shuffleId=0, 
> mapIndex=0, mapId=0, reduceId=2, message=14:05:52.638 WARN 
> org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in stage 2.0 (TID 3) 
> (11.240.37.88 executor driver): FetchFailed(BlockManagerId(driver, 
> 11.240.37.88, 63574, None), shuffleId=0, mapIndex=0, mapId=0, reduceId=2, 
> message=org.apache.spark.shuffle.FetchFailedException: Stream is corrupted at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:772)
>  at 
> org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:845)
>  at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at 
> java.io.BufferedInputStream.read(BufferedInputStream.java:265) at 
> java.io.DataInputStream.readInt(DataInputStream.java:387) at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.readSize(UnsafeRowSerializer.scala:113)
>  at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:129)
>  at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:494) at 
> scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at 
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) 
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
>  at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) 
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:131) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1437) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: java.io.IOException: 
> Stream is corrupted at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:200) at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:226) at 
> net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157) at 
> org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:841)
>  ... 25 more
> )
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe,

[jira] [Assigned] (SPARK-34790) Fail in fetch shuffle blocks in batch when i/o encryption is enabled.

2021-03-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34790:
-

Assignee: hezuojiao

> Fail in fetch shuffle blocks in batch when i/o encryption is enabled.
> -
>
> Key: SPARK-34790
> URL: https://issues.apache.org/jira/browse/SPARK-34790
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: hezuojiao
>Assignee: hezuojiao
>Priority: Critical
>
> When set spark.io.encryption.enabled=true, lots of test cases in 
> AdaptiveQueryExecSuite will be failed. Fetching shuffle blocks in batch is 
> incompatible with io encryption.
> For example:
> After set spark.io.encryption.enabled=true, run the following test suite 
> which in AdaptiveQueryExecSuite:
>  
> {code:java}
>   test("SPARK-33494: Do not use local shuffle reader for repartition") {
> withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
>   val df = spark.table("testData").repartition('key)
>   df.collect()
>   // local shuffle reader breaks partitioning and shouldn't be used for 
> repartition operation
>   // which is specified by users.
>   checkNumLocalShuffleReaders(df.queryExecution.executedPlan, 
> numShufflesWithoutLocalReader = 1)
> }
>   }
> {code}
>  
> I got the following error message：
> {code:java}
> 14:05:52.638 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
> stage 2.0 (TID 3) (11.240.37.88 executor driver): 
> FetchFailed(BlockManagerId(driver, 11.240.37.88, 63574, None), shuffleId=0, 
> mapIndex=0, mapId=0, reduceId=2, message=14:05:52.638 WARN 
> org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in stage 2.0 (TID 3) 
> (11.240.37.88 executor driver): FetchFailed(BlockManagerId(driver, 
> 11.240.37.88, 63574, None), shuffleId=0, mapIndex=0, mapId=0, reduceId=2, 
> message=org.apache.spark.shuffle.FetchFailedException: Stream is corrupted at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:772)
>  at 
> org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:845)
>  at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at 
> java.io.BufferedInputStream.read(BufferedInputStream.java:265) at 
> java.io.DataInputStream.readInt(DataInputStream.java:387) at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.readSize(UnsafeRowSerializer.scala:113)
>  at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:129)
>  at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:494) at 
> scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at 
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) 
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
>  at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) 
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:131) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1437) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: java.io.IOException: 
> Stream is corrupted at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:200) at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:226) at 
> net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157) at 
> org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:841)
>  ... 25 more
> )
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For

[jira] [Resolved] (SPARK-34790) Fail in fetch shuffle blocks in batch when i/o encryption is enabled.

2021-03-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34790.
---
Fix Version/s: 3.1.2
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 31898
[https://github.com/apache/spark/pull/31898]

> Fail in fetch shuffle blocks in batch when i/o encryption is enabled.
> -
>
> Key: SPARK-34790
> URL: https://issues.apache.org/jira/browse/SPARK-34790
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: hezuojiao
>Assignee: hezuojiao
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
>
> When set spark.io.encryption.enabled=true, lots of test cases in 
> AdaptiveQueryExecSuite will be failed. Fetching shuffle blocks in batch is 
> incompatible with io encryption.
> For example:
> After set spark.io.encryption.enabled=true, run the following test suite 
> which in AdaptiveQueryExecSuite:
>  
> {code:java}
>   test("SPARK-33494: Do not use local shuffle reader for repartition") {
> withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
>   val df = spark.table("testData").repartition('key)
>   df.collect()
>   // local shuffle reader breaks partitioning and shouldn't be used for 
> repartition operation
>   // which is specified by users.
>   checkNumLocalShuffleReaders(df.queryExecution.executedPlan, 
> numShufflesWithoutLocalReader = 1)
> }
>   }
> {code}
>  
> I got the following error message：
> {code:java}
> 14:05:52.638 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
> stage 2.0 (TID 3) (11.240.37.88 executor driver): 
> FetchFailed(BlockManagerId(driver, 11.240.37.88, 63574, None), shuffleId=0, 
> mapIndex=0, mapId=0, reduceId=2, message=14:05:52.638 WARN 
> org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in stage 2.0 (TID 3) 
> (11.240.37.88 executor driver): FetchFailed(BlockManagerId(driver, 
> 11.240.37.88, 63574, None), shuffleId=0, mapIndex=0, mapId=0, reduceId=2, 
> message=org.apache.spark.shuffle.FetchFailedException: Stream is corrupted at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:772)
>  at 
> org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:845)
>  at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at 
> java.io.BufferedInputStream.read(BufferedInputStream.java:265) at 
> java.io.DataInputStream.readInt(DataInputStream.java:387) at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.readSize(UnsafeRowSerializer.scala:113)
>  at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:129)
>  at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:494) at 
> scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at 
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) 
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
>  at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) 
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:131) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1437) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: java.io.IOException: 
> Stream is corrupted at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:200) at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:226) at 
> net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157) at 
> org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:841)
>  ... 25 more
> )
> {code}
>  
>  



--
This message was sent by

[jira] [Assigned] (SPARK-34707) Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34707:


Assignee: Apache Spark

> Code-gen broadcast nested loop join (left outer/right outer)
> 
>
> Key: SPARK-34707
> URL: https://issues.apache.org/jira/browse/SPARK-34707
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Apache Spark
>Priority: Minor
>
> We saw 1x run-time improvement for code-gen broadcast nested loop inner join 
> (https://issues.apache.org/jira/browse/SPARK-34620 ). Similarly let's add 
> code-gen for left outer (build right side), and right outer (build left side) 
> as well here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34707) Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34707:


Assignee: (was: Apache Spark)

> Code-gen broadcast nested loop join (left outer/right outer)
> 
>
> Key: SPARK-34707
> URL: https://issues.apache.org/jira/browse/SPARK-34707
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Priority: Minor
>
> We saw 1x run-time improvement for code-gen broadcast nested loop inner join 
> (https://issues.apache.org/jira/browse/SPARK-34620 ). Similarly let's add 
> code-gen for left outer (build right side), and right outer (build left side) 
> as well here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34707) Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306520#comment-17306520
 ] 

Apache Spark commented on SPARK-34707:
--

User 'linzebing' has created a pull request for this issue:
https://github.com/apache/spark/pull/31931

> Code-gen broadcast nested loop join (left outer/right outer)
> 
>
> Key: SPARK-34707
> URL: https://issues.apache.org/jira/browse/SPARK-34707
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Priority: Minor
>
> We saw 1x run-time improvement for code-gen broadcast nested loop inner join 
> (https://issues.apache.org/jira/browse/SPARK-34620 ). Similarly let's add 
> code-gen for left outer (build right side), and right outer (build left side) 
> as well here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34707) Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread Zebing Lin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306519#comment-17306519
 ] 

Zebing Lin commented on SPARK-34707:


Created PR https://github.com/apache/spark/pull/31931

> Code-gen broadcast nested loop join (left outer/right outer)
> 
>
> Key: SPARK-34707
> URL: https://issues.apache.org/jira/browse/SPARK-34707
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Priority: Minor
>
> We saw 1x run-time improvement for code-gen broadcast nested loop inner join 
> (https://issues.apache.org/jira/browse/SPARK-34620 ). Similarly let's add 
> code-gen for left outer (build right side), and right outer (build left side) 
> as well here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34701) Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306505#comment-17306505
 ] 

Apache Spark commented on SPARK-34701:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/31933

> Remove analyzing temp view again in CreateViewCommand
> -
>
> Key: SPARK-34701
> URL: https://issues.apache.org/jira/browse/SPARK-34701
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Priority: Major
>
> Remove analyzing temp view again in CreateViewCommand. This can be done once 
> all the caller passes analyzed plan to CreateViewCommand.
> Reference:
> https://github.com/apache/spark/pull/31652/files#r58959
> https://github.com/apache/spark/pull/31273/files#r581592786



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34701) Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34701:


Assignee: Apache Spark

> Remove analyzing temp view again in CreateViewCommand
> -
>
> Key: SPARK-34701
> URL: https://issues.apache.org/jira/browse/SPARK-34701
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Major
>
> Remove analyzing temp view again in CreateViewCommand. This can be done once 
> all the caller passes analyzed plan to CreateViewCommand.
> Reference:
> https://github.com/apache/spark/pull/31652/files#r58959
> https://github.com/apache/spark/pull/31273/files#r581592786



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34701) Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306504#comment-17306504
 ] 

Apache Spark commented on SPARK-34701:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/31933

> Remove analyzing temp view again in CreateViewCommand
> -
>
> Key: SPARK-34701
> URL: https://issues.apache.org/jira/browse/SPARK-34701
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Priority: Major
>
> Remove analyzing temp view again in CreateViewCommand. This can be done once 
> all the caller passes analyzed plan to CreateViewCommand.
> Reference:
> https://github.com/apache/spark/pull/31652/files#r58959
> https://github.com/apache/spark/pull/31273/files#r581592786



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34701) Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34701:


Assignee: (was: Apache Spark)

> Remove analyzing temp view again in CreateViewCommand
> -
>
> Key: SPARK-34701
> URL: https://issues.apache.org/jira/browse/SPARK-34701
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Priority: Major
>
> Remove analyzing temp view again in CreateViewCommand. This can be done once 
> all the caller passes analyzed plan to CreateViewCommand.
> Reference:
> https://github.com/apache/spark/pull/31652/files#r58959
> https://github.com/apache/spark/pull/31273/files#r581592786



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34826) Adaptive fetch of shuffle mergers for Push based shuffle

2021-03-22 Thread Venkata krishnan Sowrirajan (Jira)

Venkata krishnan Sowrirajan created SPARK-34826:
---

 Summary: Adaptive fetch of shuffle mergers for Push based shuffle
 Key: SPARK-34826
 URL: https://issues.apache.org/jira/browse/SPARK-34826
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle, Spark Core
Affects Versions: 3.1.0
Reporter: Venkata krishnan Sowrirajan


Currently the shuffle mergers are set during the creation of ShuffleMapStage. 
In the initial set of stages, there won't be enough executors added which can 
cause not enough shuffle mergers to be set during the creation of the shuffle 
map stage. This task is to handle the issue of low merge ratio for initial 
stages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34706) Broadcast nested loop join improvement

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34706:


Assignee: Apache Spark

> Broadcast nested loop join improvement
> --
>
> Key: SPARK-34706
> URL: https://issues.apache.org/jira/browse/SPARK-34706
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Apache Spark
>Priority: Minor
>
> The umbrella Jira to track overall progress of broadcast nested loop join 
> (`BroadcastNestedLoopJoinExec`) improvement. See individual sub-tasks for 
> details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34706) Broadcast nested loop join improvement

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34706:


Assignee: (was: Apache Spark)

> Broadcast nested loop join improvement
> --
>
> Key: SPARK-34706
> URL: https://issues.apache.org/jira/browse/SPARK-34706
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Priority: Minor
>
> The umbrella Jira to track overall progress of broadcast nested loop join 
> (`BroadcastNestedLoopJoinExec`) improvement. See individual sub-tasks for 
> details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34706) Broadcast nested loop join improvement

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306475#comment-17306475
 ] 

Apache Spark commented on SPARK-34706:
--

User 'linzebing' has created a pull request for this issue:
https://github.com/apache/spark/pull/31931

> Broadcast nested loop join improvement
> --
>
> Key: SPARK-34706
> URL: https://issues.apache.org/jira/browse/SPARK-34706
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Priority: Minor
>
> The umbrella Jira to track overall progress of broadcast nested loop join 
> (`BroadcastNestedLoopJoinExec`) improvement. See individual sub-tasks for 
> details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34820) K8s Integration test failed (due to libldap installation failed)

2021-03-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34820:
--
Component/s: (was: Build)
 Tests

> K8s Integration test failed (due to libldap installation failed)
> 
>
> Key: SPARK-34820
> URL: https://issues.apache.org/jira/browse/SPARK-34820
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, SparkR, Tests
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> Err:20
> [http://security.debian.org/debian-security]
> buster/updates/main amd64 libldap-common all 2.4.47+dfsg-3+deb10u4 404 Not 
> Found [IP: 151.101.194.132 80] Err:21
> [http://security.debian.org/debian-security]
> buster/updates/main amd64 libldap-2.4-2 amd64 2.4.47+dfsg-3+deb10u4 404 Not 
> Found [IP: 151.101.194.132 80] E: Failed to fetch
> [http://security.debian.org/debian-security/pool/updates/main/o/openldap/libldap-common_2.4.47+dfsg-3+deb10u4_all.deb]
> 404 Not Found [IP: 151.101.194.132 80] E: Failed to fetch
> [http://security.debian.org/debian-security/pool/updates/main/o/openldap/libldap-2.4-2_2.4.47+dfsg-3+deb10u4_amd64.deb]
> 404 Not Found [IP: 151.101.194.132 80]
> [1] 
> http://mail-archives.apache.org/mod_mbox/spark-dev/202103.mbox/%3CCAGFcPdZY_TZ-qD7_SLvN5%2B1jjt9cp4GyTwxbXHbVHnD-stLSqw%40mail.gmail.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34820) K8s Integration test failed (due to libldap installation failed)

2021-03-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34820.
---
Fix Version/s: 3.1.2
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 31923
[https://github.com/apache/spark/pull/31923]

> K8s Integration test failed (due to libldap installation failed)
> 
>
> Key: SPARK-34820
> URL: https://issues.apache.org/jira/browse/SPARK-34820
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes, SparkR
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> Err:20
> [http://security.debian.org/debian-security]
> buster/updates/main amd64 libldap-common all 2.4.47+dfsg-3+deb10u4 404 Not 
> Found [IP: 151.101.194.132 80] Err:21
> [http://security.debian.org/debian-security]
> buster/updates/main amd64 libldap-2.4-2 amd64 2.4.47+dfsg-3+deb10u4 404 Not 
> Found [IP: 151.101.194.132 80] E: Failed to fetch
> [http://security.debian.org/debian-security/pool/updates/main/o/openldap/libldap-common_2.4.47+dfsg-3+deb10u4_all.deb]
> 404 Not Found [IP: 151.101.194.132 80] E: Failed to fetch
> [http://security.debian.org/debian-security/pool/updates/main/o/openldap/libldap-2.4-2_2.4.47+dfsg-3+deb10u4_amd64.deb]
> 404 Not Found [IP: 151.101.194.132 80]
> [1] 
> http://mail-archives.apache.org/mod_mbox/spark-dev/202103.mbox/%3CCAGFcPdZY_TZ-qD7_SLvN5%2B1jjt9cp4GyTwxbXHbVHnD-stLSqw%40mail.gmail.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34820) K8s Integration test failed (due to libldap installation failed)

2021-03-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34820:
-

Assignee: Yikun Jiang

> K8s Integration test failed (due to libldap installation failed)
> 
>
> Key: SPARK-34820
> URL: https://issues.apache.org/jira/browse/SPARK-34820
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes, SparkR
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>
> Err:20
> [http://security.debian.org/debian-security]
> buster/updates/main amd64 libldap-common all 2.4.47+dfsg-3+deb10u4 404 Not 
> Found [IP: 151.101.194.132 80] Err:21
> [http://security.debian.org/debian-security]
> buster/updates/main amd64 libldap-2.4-2 amd64 2.4.47+dfsg-3+deb10u4 404 Not 
> Found [IP: 151.101.194.132 80] E: Failed to fetch
> [http://security.debian.org/debian-security/pool/updates/main/o/openldap/libldap-common_2.4.47+dfsg-3+deb10u4_all.deb]
> 404 Not Found [IP: 151.101.194.132 80] E: Failed to fetch
> [http://security.debian.org/debian-security/pool/updates/main/o/openldap/libldap-2.4-2_2.4.47+dfsg-3+deb10u4_amd64.deb]
> 404 Not Found [IP: 151.101.194.132 80]
> [1] 
> http://mail-archives.apache.org/mod_mbox/spark-dev/202103.mbox/%3CCAGFcPdZY_TZ-qD7_SLvN5%2B1jjt9cp4GyTwxbXHbVHnD-stLSqw%40mail.gmail.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34726) Fix collectToPython timeouts

2021-03-22 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-34726:

Description: 
One of our customers frequently encounters "serve-DataFrame" 
java.net.SocketTimeoutException: Accept timed errors in PySpark because 
DataSet.collectToPython() in Spark 2.4 does the following:


# Collects the results
# Opens up a socket server that is then listening to the connection from 
Python side
# Runs the event listeners as part of withAction on the same thread as 
SPARK-25680 is not available in Spark 2.4
# Returns the address of the socket server to Python
# The Python side connects to the socket server and fetches the data

As the customer has a custom, long running event listener the time between 2. 
and 5. is frequently longer than the default connection timeout and increasing 
the connect timeout is not a good solution as we don't know how long running 
the listeners can take.


> Fix collectToPython timeouts
> 
>
> Key: SPARK-34726
> URL: https://issues.apache.org/jira/browse/SPARK-34726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 2.4.8
>
>
> One of our customers frequently encounters "serve-DataFrame" 
> java.net.SocketTimeoutException: Accept timed errors in PySpark because 
> DataSet.collectToPython() in Spark 2.4 does the following:
> # Collects the results
> # Opens up a socket server that is then listening to the connection from 
> Python side
> # Runs the event listeners as part of withAction on the same thread as 
> SPARK-25680 is not available in Spark 2.4
> # Returns the address of the socket server to Python
> # The Python side connects to the socket server and fetches the data
> As the customer has a custom, long running event listener the time between 2. 
> and 5. is frequently longer than the default connection timeout and 
> increasing the connect timeout is not a good solution as we don't know how 
> long running the listeners can take.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34726) Fix collectToPython timeouts

2021-03-22 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-34726:
---

Assignee: Peter Toth

> Fix collectToPython timeouts
> 
>
> Key: SPARK-34726
> URL: https://issues.apache.org/jira/browse/SPARK-34726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 2.4.8
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34726) Fix collectToPython timeouts

2021-03-22 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-34726.
-
Fix Version/s: 2.4.8
   Resolution: Fixed

Issue resolved by pull request 31818
[https://github.com/apache/spark/pull/31818]

> Fix collectToPython timeouts
> 
>
> Key: SPARK-34726
> URL: https://issues.apache.org/jira/browse/SPARK-34726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7
>Reporter: Peter Toth
>Priority: Major
> Fix For: 2.4.8
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34719) fail if the view query has duplicated column names

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306338#comment-17306338
 ] 

Apache Spark commented on SPARK-34719:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/31930

> fail if the view query has duplicated column names
> --
>
> Key: SPARK-34719
> URL: https://issues.apache.org/jira/browse/SPARK-34719
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.1.0, 3.1.1
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Blocker
>  Labels: correctness
> Fix For: 3.1.2, 3.0.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34719) fail if the view query has duplicated column names

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306339#comment-17306339
 ] 

Apache Spark commented on SPARK-34719:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/31930

> fail if the view query has duplicated column names
> --
>
> Key: SPARK-34719
> URL: https://issues.apache.org/jira/browse/SPARK-34719
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.1.0, 3.1.1
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Blocker
>  Labels: correctness
> Fix For: 3.1.2, 3.0.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34825) pyspark.sql.function.lit is treating '1' the same as 1

2021-03-22 Thread yu peng (Jira)

yu peng created SPARK-34825:
---

 Summary: pyspark.sql.function.lit is treating '1' the same as 1
 Key: SPARK-34825
 URL: https://issues.apache.org/jira/browse/SPARK-34825
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.4.4
Reporter: yu peng


In [10]: import pyspark.sql.functions as F

In [11]: df.withColumn('x', F.lit('1')==F.lit(1)).show()
+---+---+-++
|  a|  b|c|   x|
+---+---+-++
|  1|  2|x > y|true|
|  0|  4|a > b > c|true|
+---+---+-++
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34812) RowNumberLike and RankLike should not be nullable

2021-03-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34812.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31924
[https://github.com/apache/spark/pull/31924]

> RowNumberLike and RankLike should not be nullable
> -
>
> Key: SPARK-34812
> URL: https://issues.apache.org/jira/browse/SPARK-34812
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Assignee: Tanel Kiis
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34812) RowNumberLike and RankLike should not be nullable

2021-03-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34812:
---

Assignee: Tanel Kiis

> RowNumberLike and RankLike should not be nullable
> -
>
> Key: SPARK-34812
> URL: https://issues.apache.org/jira/browse/SPARK-34812
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Assignee: Tanel Kiis
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34803) Util methods requiring certain versions of Pandas & PyArrow don't pass through the raised ImportError

2021-03-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34803.
--
Fix Version/s: 3.1.2
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 31902
[https://github.com/apache/spark/pull/31902]

> Util methods requiring certain versions of Pandas & PyArrow don't pass 
> through the raised ImportError
> -
>
> Key: SPARK-34803
> URL: https://issues.apache.org/jira/browse/SPARK-34803
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: John Hany
>Assignee: John Hany
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> When checking that the we can import either {{pandas}} or {{pyarrow}}, we 
> except any {{ImportError}} and raise an error declaring the minimum version 
> of the respective package that's required to be in the Python environment.
> We don't however, pass the {{ImportError}} that might have been thrown by the 
> package itself. Take {{pandas}} as an example, when we call {{import 
> pandas}}, pandas itself might be in the environment, but can throw an 
> {{ImportError}} 
> [https://github.com/pandas-dev/pandas/blob/0.24.x/pandas/compat/__init__.py#L438]
>  if another package it requires isn't there. This error wouldn't be passed 
> through and we'd end up getting a misleading error message that states that 
> {{pandas}} isn't in the environment, while in fact it is but something else 
> makes us unable to import it.
> I believe this can be improved by chaining the exceptions and am happy to 
> provide said contribution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34803) Util methods requiring certain versions of Pandas & PyArrow don't pass through the raised ImportError

2021-03-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-34803:


Assignee: John Hany

> Util methods requiring certain versions of Pandas & PyArrow don't pass 
> through the raised ImportError
> -
>
> Key: SPARK-34803
> URL: https://issues.apache.org/jira/browse/SPARK-34803
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: John Hany
>Assignee: John Hany
>Priority: Major
>
> When checking that the we can import either {{pandas}} or {{pyarrow}}, we 
> except any {{ImportError}} and raise an error declaring the minimum version 
> of the respective package that's required to be in the Python environment.
> We don't however, pass the {{ImportError}} that might have been thrown by the 
> package itself. Take {{pandas}} as an example, when we call {{import 
> pandas}}, pandas itself might be in the environment, but can throw an 
> {{ImportError}} 
> [https://github.com/pandas-dev/pandas/blob/0.24.x/pandas/compat/__init__.py#L438]
>  if another package it requires isn't there. This error wouldn't be passed 
> through and we'd end up getting a misleading error message that states that 
> {{pandas}} isn't in the environment, while in fact it is but something else 
> makes us unable to import it.
> I believe this can be improved by chaining the exceptions and am happy to 
> provide said contribution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34824) Multiply year-month interval by numeric

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306197#comment-17306197
 ] 

Apache Spark commented on SPARK-34824:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31929

> Multiply year-month interval by numeric
> ---
>
> Key: SPARK-34824
> URL: https://issues.apache.org/jira/browse/SPARK-34824
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Support the multiply op over year-month interval by numeric types including:
> # ByteType
> # ShortType
> # IntegerType
> # LongType
> # FloatType
> # DoubleType
> # DecimalType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34824) Multiply year-month interval by numeric

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34824:


Assignee: Max Gekk  (was: Apache Spark)

> Multiply year-month interval by numeric
> ---
>
> Key: SPARK-34824
> URL: https://issues.apache.org/jira/browse/SPARK-34824
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Support the multiply op over year-month interval by numeric types including:
> # ByteType
> # ShortType
> # IntegerType
> # LongType
> # FloatType
> # DoubleType
> # DecimalType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34824) Multiply year-month interval by numeric

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34824:


Assignee: Apache Spark  (was: Max Gekk)

> Multiply year-month interval by numeric
> ---
>
> Key: SPARK-34824
> URL: https://issues.apache.org/jira/browse/SPARK-34824
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Support the multiply op over year-month interval by numeric types including:
> # ByteType
> # ShortType
> # IntegerType
> # LongType
> # FloatType
> # DoubleType
> # DecimalType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33925) Remove unused SecurityManager in Utils.fetchFile

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306147#comment-17306147
 ] 

Apache Spark commented on SPARK-33925:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/31928

> Remove unused SecurityManager in Utils.fetchFile
> 
>
> Key: SPARK-33925
> URL: https://issues.apache.org/jira/browse/SPARK-33925
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.2.0
>
>
> The last usage of {{SecurityManager}} in {{Utils.fetchFile}} was removed in 
> SPARK-27004. We don't need to pass it around anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33925) Remove unused SecurityManager in Utils.fetchFile

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306145#comment-17306145
 ] 

Apache Spark commented on SPARK-33925:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/31928

> Remove unused SecurityManager in Utils.fetchFile
> 
>
> Key: SPARK-33925
> URL: https://issues.apache.org/jira/browse/SPARK-33925
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.2.0
>
>
> The last usage of {{SecurityManager}} in {{Utils.fetchFile}} was removed in 
> SPARK-27004. We don't need to pass it around anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34824) Multiply year-month interval by numeric

2021-03-22 Thread Max Gekk (Jira)

Max Gekk created SPARK-34824:


 Summary: Multiply year-month interval by numeric
 Key: SPARK-34824
 URL: https://issues.apache.org/jira/browse/SPARK-34824
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Max Gekk
Assignee: Max Gekk


Support the multiply op over year-month interval by numeric types including:
# ByteType
# ShortType
# IntegerType
# LongType
# FloatType
# DoubleType
# DecimalType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34778) Upgrade to Avro 1.10.2

2021-03-22 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-34778.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31866
[https://github.com/apache/spark/pull/31866]

> Upgrade to Avro 1.10.2
> --
>
> Key: SPARK-34778
> URL: https://issues.apache.org/jira/browse/SPARK-34778
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34801) java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadPartition

2021-03-22 Thread Peter Toth (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306093#comment-17306093
 ] 

Peter Toth commented on SPARK-34801:


Yes it is. Please use CDS 3 (Cloudera Distribution of Spark 3) on supported CDP 
versions.

> java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition
> --
>
> Key: SPARK-34801
> URL: https://issues.apache.org/jira/browse/SPARK-34801
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.2
> Environment: HDP3.1.4.0-315  spark 3.0.2
>Reporter: zhaojk
>Priority: Major
>
> use spark-sql  run this sql  insert overwrite table zry.zjk1 
> partition(etl_dt=2) select * from zry.zry;
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(org.apache.hadoop.fs.Path,
>  org.apache.hadoop.hive.ql.metadata.Table, java.util.Map, 
> org.apache.hadoop.hive.ql.plan.LoadTableDesc$LoadFileType, boolean, boolean, 
> boolean, boolean, boolean, java.lang.Long, int, boolean)
>  at java.lang.Class.getMethod(Class.java:1786)
>  at org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:177)
>  at 
> org.apache.spark.sql.hive.client.Shim_v3_0.loadPartitionMethod$lzycompute(HiveShim.scala:1289)
>  at 
> org.apache.spark.sql.hive.client.Shim_v3_0.loadPartitionMethod(HiveShim.scala:1274)
>  at 
> org.apache.spark.sql.hive.client.Shim_v3_0.loadPartition(HiveShim.scala:1337)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$loadPartition$1(HiveClientImpl.scala:881)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:295)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:228)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:227)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:277)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.loadPartition(HiveClientImpl.scala:871)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$loadPartition$1(HiveExternalCatalog.scala:915)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadPartition(HiveExternalCatalog.scala:894)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadPartition(ExternalCatalogWithListener.scala:179)
>  at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:318)
>  at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120)
>  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
>  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
>  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:229)
>  at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
>  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
>  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
>  at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
>  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
>  at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
>  at

[jira] [Resolved] (SPARK-34823) hive-common should exclusion hadoop related jars

2021-03-22 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu resolved SPARK-34823.
---
Resolution: Not A Problem

> hive-common should exclusion hadoop related jars
> 
>
> Key: SPARK-34823
> URL: https://issues.apache.org/jira/browse/SPARK-34823
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> hive-common should exclusion hadoop related jars
> {code:java}
>   ${hive.common.scope}
>   
> 
>   org.apache.hadoop
>   hadoop-auth
> 
> 
>   org.apache.hadoop
>   hadoop-annotations
> 
> 
>   org.apache.hadoop
>   hadoop-common
> 
>   
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34800) Use fine-grained lock in SessionCatalog.tableExists

2021-03-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34800:
---

Assignee: rongchuan.jin

> Use fine-grained lock in SessionCatalog.tableExists
> ---
>
> Key: SPARK-34800
> URL: https://issues.apache.org/jira/browse/SPARK-34800
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: rongchuan.jin
>Assignee: rongchuan.jin
>Priority: Major
> Fix For: 3.2.0
>
>
> We have modified the underlying hive meta store which a different hive 
> database is placed in its own shard for performance. However, we found that  
> the synchronized limits the concurrency, we would like to fix it.
> Related jstack trace like following:
> {code:java}
> "http-nio-7070-exec-257" #19961734 daemon prio=5 os_prio=0 
> tid=0x7f45f4ce1000 nid=0x1a85e6 waiting for monitor entry 
> [0x7f45949df000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:415)
> - waiting to lock <0x00011d983d90> (a 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog)
> at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:185)
> at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:430)
> at org.apache.spark.sql.DdlOperation$.getTableDesc(SourceFile:123)
> at org.apache.spark.sql.DdlOperation.getTableDesc(SourceFile)
> ...
> {code}
> we fixed as discussed in mail 
> [http://mail-archives.apache.org/mod_mbox/spark-dev/202103.mbox/browser]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34800) Use fine-grained lock in SessionCatalog.tableExists

2021-03-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34800.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31891
[https://github.com/apache/spark/pull/31891]

> Use fine-grained lock in SessionCatalog.tableExists
> ---
>
> Key: SPARK-34800
> URL: https://issues.apache.org/jira/browse/SPARK-34800
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: rongchuan.jin
>Priority: Major
> Fix For: 3.2.0
>
>
> We have modified the underlying hive meta store which a different hive 
> database is placed in its own shard for performance. However, we found that  
> the synchronized limits the concurrency, we would like to fix it.
> Related jstack trace like following:
> {code:java}
> "http-nio-7070-exec-257" #19961734 daemon prio=5 os_prio=0 
> tid=0x7f45f4ce1000 nid=0x1a85e6 waiting for monitor entry 
> [0x7f45949df000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:415)
> - waiting to lock <0x00011d983d90> (a 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog)
> at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:185)
> at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:430)
> at org.apache.spark.sql.DdlOperation$.getTableDesc(SourceFile:123)
> at org.apache.spark.sql.DdlOperation.getTableDesc(SourceFile)
> ...
> {code}
> we fixed as discussed in mail 
> [http://mail-archives.apache.org/mod_mbox/spark-dev/202103.mbox/browser]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34822) Update plan stability golden files even if only explain differs

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34822:


Assignee: Apache Spark

> Update plan stability golden files even if only explain differs
> ---
>
> Key: SPARK-34822
> URL: https://issues.apache.org/jira/browse/SPARK-34822
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Assignee: Apache Spark
>Priority: Major
>
> PlanStabilitySuite updates the golden files only if simplified.txt has 
> changed. In some situations only explain.txt will change and the golden files 
> are not updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34822) Update plan stability golden files even if only explain differs

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306019#comment-17306019
 ] 

Apache Spark commented on SPARK-34822:
--

User 'tanelk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31927

> Update plan stability golden files even if only explain differs
> ---
>
> Key: SPARK-34822
> URL: https://issues.apache.org/jira/browse/SPARK-34822
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> PlanStabilitySuite updates the golden files only if simplified.txt has 
> changed. In some situations only explain.txt will change and the golden files 
> are not updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34822) Update plan stability golden files even if only explain differs

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34822:


Assignee: (was: Apache Spark)

> Update plan stability golden files even if only explain differs
> ---
>
> Key: SPARK-34822
> URL: https://issues.apache.org/jira/browse/SPARK-34822
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> PlanStabilitySuite updates the golden files only if simplified.txt has 
> changed. In some situations only explain.txt will change and the golden files 
> are not updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34822) Update plan stability golden files even if only explain differs

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306021#comment-17306021
 ] 

Apache Spark commented on SPARK-34822:
--

User 'tanelk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31927

> Update plan stability golden files even if only explain differs
> ---
>
> Key: SPARK-34822
> URL: https://issues.apache.org/jira/browse/SPARK-34822
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> PlanStabilitySuite updates the golden files only if simplified.txt has 
> changed. In some situations only explain.txt will change and the golden files 
> are not updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34823) hive-common should exclusion hadoop related jars

2021-03-22 Thread angerszhu (Jira)

angerszhu created SPARK-34823:
-

 Summary: hive-common should exclusion hadoop related jars
 Key: SPARK-34823
 URL: https://issues.apache.org/jira/browse/SPARK-34823
 Project: Spark
  Issue Type: Improvement
  Components: Build, SQL
Affects Versions: 3.2.0
Reporter: angerszhu


hive-common should exclusion hadoop related jars
{code:java}
  ${hive.common.scope}
  

  org.apache.hadoop
  hadoop-auth


  org.apache.hadoop
  hadoop-annotations


  org.apache.hadoop
  hadoop-common

  

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34775) Push down limit through window when partitionSpec is not empty

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306018#comment-17306018
 ] 

Apache Spark commented on SPARK-34775:
--

User 'leoluan2009' has created a pull request for this issue:
https://github.com/apache/spark/pull/31926

> Push down limit through window when partitionSpec is not empty
> --
>
> Key: SPARK-34775
> URL: https://issues.apache.org/jira/browse/SPARK-34775
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> For example:
> {code:sql}
> SELECT *, ROW_NUMBER() OVER(PARTITION BY a ORDER BY b) AS rn FROM t LIMIT 10 
> ==>
> SELECT *, ROW_NUMBER() OVER(PARTITION BY a ORDER BY b) AS rn FROM (SELECT * 
> FROM t ORDER BY a, b LIMIT 10) tmp
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34775) Push down limit through window when partitionSpec is not empty

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306017#comment-17306017
 ] 

Apache Spark commented on SPARK-34775:
--

User 'leoluan2009' has created a pull request for this issue:
https://github.com/apache/spark/pull/31926

> Push down limit through window when partitionSpec is not empty
> --
>
> Key: SPARK-34775
> URL: https://issues.apache.org/jira/browse/SPARK-34775
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> For example:
> {code:sql}
> SELECT *, ROW_NUMBER() OVER(PARTITION BY a ORDER BY b) AS rn FROM t LIMIT 10 
> ==>
> SELECT *, ROW_NUMBER() OVER(PARTITION BY a ORDER BY b) AS rn FROM (SELECT * 
> FROM t ORDER BY a, b LIMIT 10) tmp
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34804) registerFunction shouldnt be logging warning message for same function being re-registered

2021-03-22 Thread Sumeet Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305998#comment-17305998
 ] 

Sumeet Sharma commented on SPARK-34804:
---

I need some pointers in how to check the Seq[Expression] to Expression for 
equality. Any comments are appreciated.

> registerFunction shouldnt be logging warning message for same function being 
> re-registered
> --
>
> Key: SPARK-34804
> URL: https://issues.apache.org/jira/browse/SPARK-34804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Sumeet Sharma
>Priority: Minor
>
> {code:java}
> test("function registry warning") {
>  implicit val ss = spark
>  import ss.implicits._
>  val dd = Seq((1),(2)).toDF("a")
>  val dd1 = udf((i: Int) => i*2)
>  (1 to 4).foreach {
>_ =>
>  dd.sparkSession.udf.register("function", dd1)
>  Thread.sleep(1000)
>  }
>  dd.withColumn("aa", expr("function(a)")).show(10)
> }{code}
> logs:
> 21/03/19 22:39:39 WARN SparkSession$Builder : Using an existing SparkSession; 
> some spark core configurations may not take effect.
> 21/03/19 22:39:39 WARN SparkSession$Builder : Using an existing SparkSession; 
> the static sql configurations will not take effect.
>  21/03/19 22:39:39 WARN SparkSession$Builder : Using an existing 
> SparkSession; some spark core configurations may not take effect.
>  21/03/19 22:39:43 WARN SimpleFunctionRegistry : The function function 
> replaced a previously registered function.
>  21/03/19 22:39:44 WARN SimpleFunctionRegistry : The function function 
> replaced a previously registered function.
>  21/03/19 22:39:45 WARN SimpleFunctionRegistry : The function function 
> replaced a previously registered function.
>  ++--+
> |a|aa|
> ++--+
> |1|2|
> |2|4|
> ++--+
> Basically in the FunctionRegistry implementation
> {code:java}
> override def registerFunction(
>  name: FunctionIdentifier,
>  info: ExpressionInfo,
>  builder: FunctionBuilder): Unit = synchronized {
>  val normalizedName = normalizeFuncName(name)
>  val newFunction = (info, builder)
>  functionBuilders.put(normalizedName, newFunction) match {
>  case Some(previousFunction) if previousFunction != newFunction =>
>  logWarning(s"The function $normalizedName replaced a previously registered 
> function.")
>  case _ =>
>  }
> }{code}
> The *previousFunction != newFunction* equality comparison is incorrect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34822) Update plan stability golden files even if only explain differs

2021-03-22 Thread Tanel Kiis (Jira)

Tanel Kiis created SPARK-34822:
--

 Summary: Update plan stability golden files even if only explain 
differs
 Key: SPARK-34822
 URL: https://issues.apache.org/jira/browse/SPARK-34822
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Tanel Kiis


PlanStabilitySuite updates the golden files only if simplified.txt has changed. 
In some situations only explain.txt will change and the golden files are not 
updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34812) RowNumberLike and RankLike should not be nullable

2021-03-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305987#comment-17305987
 ] 

Apache Spark commented on SPARK-34812:
--

User 'tanelk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31924

> RowNumberLike and RankLike should not be nullable
> -
>
> Key: SPARK-34812
> URL: https://issues.apache.org/jira/browse/SPARK-34812
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34812) RowNumberLike and RankLike should not be nullable

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34812:


Assignee: Apache Spark

> RowNumberLike and RankLike should not be nullable
> -
>
> Key: SPARK-34812
> URL: https://issues.apache.org/jira/browse/SPARK-34812
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34812) RowNumberLike and RankLike should not be nullable

2021-03-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34812:


Assignee: (was: Apache Spark)

> RowNumberLike and RankLike should not be nullable
> -
>
> Key: SPARK-34812
> URL: https://issues.apache.org/jira/browse/SPARK-34812
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 129 matches

Mail list logo