date:20210423

[jira] [Commented] (SPARK-35210) Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331160#comment-17331160
 ] 

Apache Spark commented on SPARK-35210:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/32318

> Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue
> -
>
> Key: SPARK-35210
> URL: https://issues.apache.org/jira/browse/SPARK-35210
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7, 3.0.2, 3.1.1, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Blocker
>
> SPARK-34988 upgraded Jetty to 9.4.39 for CVE-2021-28165.
> But after the upgrade, Jetty 9.4.40 was released to fix the 
> ERR_CONNECTION_RESET issue 
> (https://github.com/eclipse/jetty.project/issues/6152).
> This issue seems to affect Jetty 9.4.39 when POST method is used with SSL.
> For Spark, job submission using REST and ThriftServer with HTTPS protocol can 
> be affected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35210) Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331159#comment-17331159
 ] 

Apache Spark commented on SPARK-35210:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/32318

> Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue
> -
>
> Key: SPARK-35210
> URL: https://issues.apache.org/jira/browse/SPARK-35210
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7, 3.0.2, 3.1.1, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Blocker
>
> SPARK-34988 upgraded Jetty to 9.4.39 for CVE-2021-28165.
> But after the upgrade, Jetty 9.4.40 was released to fix the 
> ERR_CONNECTION_RESET issue 
> (https://github.com/eclipse/jetty.project/issues/6152).
> This issue seems to affect Jetty 9.4.39 when POST method is used with SSL.
> For Spark, job submission using REST and ThriftServer with HTTPS protocol can 
> be affected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35210) Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35210:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue
> -
>
> Key: SPARK-35210
> URL: https://issues.apache.org/jira/browse/SPARK-35210
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7, 3.0.2, 3.1.1, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Blocker
>
> SPARK-34988 upgraded Jetty to 9.4.39 for CVE-2021-28165.
> But after the upgrade, Jetty 9.4.40 was released to fix the 
> ERR_CONNECTION_RESET issue 
> (https://github.com/eclipse/jetty.project/issues/6152).
> This issue seems to affect Jetty 9.4.39 when POST method is used with SSL.
> For Spark, job submission using REST and ThriftServer with HTTPS protocol can 
> be affected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35210) Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35210:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue
> -
>
> Key: SPARK-35210
> URL: https://issues.apache.org/jira/browse/SPARK-35210
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7, 3.0.2, 3.1.1, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Blocker
>
> SPARK-34988 upgraded Jetty to 9.4.39 for CVE-2021-28165.
> But after the upgrade, Jetty 9.4.40 was released to fix the 
> ERR_CONNECTION_RESET issue 
> (https://github.com/eclipse/jetty.project/issues/6152).
> This issue seems to affect Jetty 9.4.39 when POST method is used with SSL.
> For Spark, job submission using REST and ThriftServer with HTTPS protocol can 
> be affected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35210) Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue

2021-04-23 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-35210:
--

 Summary: Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue
 Key: SPARK-35210
 URL: https://issues.apache.org/jira/browse/SPARK-35210
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.1, 3.0.2, 2.4.7, 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


SPARK-34988 upgraded Jetty to 9.4.39 for CVE-2021-28165.
But after the upgrade, Jetty 9.4.40 was released to fix the 
ERR_CONNECTION_RESET issue 
(https://github.com/eclipse/jetty.project/issues/6152).
This issue seems to affect Jetty 9.4.39 when POST method is used with SSL.
For Spark, job submission using REST and ThriftServer with HTTPS protocol can 
be affected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression

2021-04-23 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331157#comment-17331157
 ] 

Dongjoon Hyun commented on SPARK-35196:
---

Ya, Sorry for the negative opinion. There was a previous report of that 
non-working situation. IIRC, there is a document commit to give a warning about 
that.

> DataFrameWriter.text support zstd compression
> -
>
> Key: SPARK-35196
> URL: https://issues.apache.org/jira/browse/SPARK-35196
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Leonard Lausen
>Priority: Major
>
> [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html]
>  specifies that only the following compression codecs are supported: `none, 
> bzip2, gzip, lz4, snappy and deflate`
> However, RDD API supports compression with zstd if users specify 
> 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the 
> saveAsTextFile method.
> Please also expose zstd in the DataFrameWriter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35199) Tasks are failing with zstd default of spark.shuffle.mapStatus.compression.codec

2021-04-23 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331155#comment-17331155
 ] 

Dongjoon Hyun commented on SPARK-35199:
---

According to error logs, it seems that you are mixing ZSTD JNI. That doesn't 
work. We experienced many API incompatibility at ZSTD JNI. That's the reason we 
recently upgrade Parquet/Avro/Kafka. For the ZSTD JNI incompatibility issues, 
please see https://github.com/luben/zstd-jni/issues?q=is%3Aissue+ .
{code}
Decompression error: Version not supported at 
{code}

> Tasks are failing with zstd default of 
> spark.shuffle.mapStatus.compression.codec
> 
>
> Key: SPARK-35199
> URL: https://issues.apache.org/jira/browse/SPARK-35199
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.0.1
>Reporter: Leonard Lausen
>Priority: Major
>
> In Spark 3.0.1, tasks fail with the default value of 
> {{spark.shuffle.mapStatus.compression.codec=zstd}}, but work without problem 
> when changing the value to {{spark.shuffle.mapStatus.compression.codec=lz4}}.
> Exemplar backtrace:
>  
> {code:java}
> java.io.IOException: Decompression error: Version not supported at 
> com.github.luben.zstd.ZstdInputStream.readInternal(ZstdInputStream.java:164) 
> at com.github.luben.zstd.ZstdInputStream.read(ZstdInputStream.java:120) at 
> java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at 
> java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at 
> java.io.BufferedInputStream.read(BufferedInputStream.java:345) at 
> java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2781) 
> at 
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2797)
>  at 
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3274)
>  at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934) at 
> java.io.ObjectInputStream.(ObjectInputStream.java:396) at 
> org.apache.spark.MapOutputTracker$.deserializeObject$1(MapOutputTracker.scala:954)
>  at 
> org.apache.spark.MapOutputTracker$.deserializeMapStatuses(MapOutputTracker.scala:964)
>  at 
> org.apache.spark.MapOutputTrackerWorker.$anonfun$getStatuses$2(MapOutputTracker.scala:856)
>  at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64) at 
> org.apache.spark.MapOutputTrackerWorker.getStatuses(MapOutputTracker.scala:851)
>  at 
> org.apache.spark.MapOutputTrackerWorker.getMapSizesByExecutorId(MapOutputTracker.scala:808)
>  at 
> org.apache.spark.shuffle.sort.SortShuffleManager.getReader(SortShuffleManager.scala:128)
>  at 
> org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:185)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:127) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)  {code}
> {{}}
> Exemplar code to reproduce the issue
> {code:java}
> import pyspark.sql.functions as F
> df = spark.read.text("s3://my-bucket-with-300GB-compressed-text-files")
> df_rand = df.orderBy(F.rand(1))
> df_rand.write.text('s3://shuffled-output''){code}
> See 
> [https://stackoverflow.com/questions/64876463/spark-3-0-1-tasks-are-failing-when-using-zstd-compression-codec]
>  for another report of this issue and workaround.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

--

[jira] [Commented] (SPARK-35199) Tasks are failing with zstd default of spark.shuffle.mapStatus.compression.codec

2021-04-23 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331154#comment-17331154
 ] 

Dongjoon Hyun commented on SPARK-35199:
---

Well,  I'd recommend to use ZSTD at Apache Spark 3.2+. Many issues are fixed 
via SPARK-34651 .
BTW, could you provide a reproducible example, [~lausen]? We cannot access your 
bucket.

> Tasks are failing with zstd default of 
> spark.shuffle.mapStatus.compression.codec
> 
>
> Key: SPARK-35199
> URL: https://issues.apache.org/jira/browse/SPARK-35199
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.0.1
>Reporter: Leonard Lausen
>Priority: Major
>
> In Spark 3.0.1, tasks fail with the default value of 
> {{spark.shuffle.mapStatus.compression.codec=zstd}}, but work without problem 
> when changing the value to {{spark.shuffle.mapStatus.compression.codec=lz4}}.
> Exemplar backtrace:
>  
> {code:java}
> java.io.IOException: Decompression error: Version not supported at 
> com.github.luben.zstd.ZstdInputStream.readInternal(ZstdInputStream.java:164) 
> at com.github.luben.zstd.ZstdInputStream.read(ZstdInputStream.java:120) at 
> java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at 
> java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at 
> java.io.BufferedInputStream.read(BufferedInputStream.java:345) at 
> java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2781) 
> at 
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2797)
>  at 
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3274)
>  at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934) at 
> java.io.ObjectInputStream.(ObjectInputStream.java:396) at 
> org.apache.spark.MapOutputTracker$.deserializeObject$1(MapOutputTracker.scala:954)
>  at 
> org.apache.spark.MapOutputTracker$.deserializeMapStatuses(MapOutputTracker.scala:964)
>  at 
> org.apache.spark.MapOutputTrackerWorker.$anonfun$getStatuses$2(MapOutputTracker.scala:856)
>  at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64) at 
> org.apache.spark.MapOutputTrackerWorker.getStatuses(MapOutputTracker.scala:851)
>  at 
> org.apache.spark.MapOutputTrackerWorker.getMapSizesByExecutorId(MapOutputTracker.scala:808)
>  at 
> org.apache.spark.shuffle.sort.SortShuffleManager.getReader(SortShuffleManager.scala:128)
>  at 
> org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:185)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:127) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)  {code}
> {{}}
> Exemplar code to reproduce the issue
> {code:java}
> import pyspark.sql.functions as F
> df = spark.read.text("s3://my-bucket-with-300GB-compressed-text-files")
> df_rand = df.orderBy(F.rand(1))
> df_rand.write.text('s3://shuffled-output''){code}
> See 
> [https://stackoverflow.com/questions/64876463/spark-3-0-1-tasks-are-failing-when-using-zstd-compression-codec]
>  for another report of this issue and workaround.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331153#comment-17331153
 ] 

Hyukjin Kwon commented on SPARK-35196:
--

I see. Thanks Dongjoon for clarification!

> DataFrameWriter.text support zstd compression
> -
>
> Key: SPARK-35196
> URL: https://issues.apache.org/jira/browse/SPARK-35196
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Leonard Lausen
>Priority: Major
>
> [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html]
>  specifies that only the following compression codecs are supported: `none, 
> bzip2, gzip, lz4, snappy and deflate`
> However, RDD API supports compression with zstd if users specify 
> 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the 
> saveAsTextFile method.
> Please also expose zstd in the DataFrameWriter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression

2021-04-23 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331152#comment-17331152
 ] 

Dongjoon Hyun commented on SPARK-35196:
---

In addition, even with Hadoop 3.1, the official Apache Spark distribution 
raises a failure when you try to use 
`org.apache.hadoop.io.compress.ZStandardCodec`.

> DataFrameWriter.text support zstd compression
> -
>
> Key: SPARK-35196
> URL: https://issues.apache.org/jira/browse/SPARK-35196
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Leonard Lausen
>Priority: Major
>
> [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html]
>  specifies that only the following compression codecs are supported: `none, 
> bzip2, gzip, lz4, snappy and deflate`
> However, RDD API supports compression with zstd if users specify 
> 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the 
> saveAsTextFile method.
> Please also expose zstd in the DataFrameWriter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression

2021-04-23 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331150#comment-17331150
 ] 

Dongjoon Hyun commented on SPARK-35196:
---

Hi, [~lausen] and [~hyukjin.kwon]. We still didn't drop Hadoop 2.7.
`org.apache.hadoop.io.compress.ZStandardCodec` is added at Apache Hadoop 2.9.0+.
We may add some notes for the limitation, but I'm -1 for adding an alias.

> DataFrameWriter.text support zstd compression
> -
>
> Key: SPARK-35196
> URL: https://issues.apache.org/jira/browse/SPARK-35196
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Leonard Lausen
>Priority: Major
>
> [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html]
>  specifies that only the following compression codecs are supported: `none, 
> bzip2, gzip, lz4, snappy and deflate`
> However, RDD API supports compression with zstd if users specify 
> 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the 
> saveAsTextFile method.
> Please also expose zstd in the DataFrameWriter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33195) stages/stage UI page fails to load when spark reverse proxy is enabled

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331144#comment-17331144
 ] 

Apache Spark commented on SPARK-33195:
--

User 'mdianjun' has created a pull request for this issue:
https://github.com/apache/spark/pull/32317

> stages/stage UI page fails to load when spark reverse proxy is enabled
> --
>
> Key: SPARK-33195
> URL: https://issues.apache.org/jira/browse/SPARK-33195
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1
>Reporter: Liran
>Priority: Major
>
> I think we have the same issue reported in SPARK-32467, reproduced with 
> reverse proxy redirects, I'm getting the exact same error in spark UI.
> Url page:
> {code:java}
> http://:8080/proxy/app-20201020143315-0005/stages/stage/?id=7&attempt=0{code}
> The url above fails to load, looking at the network tab - this request fails:
> {code:java}
> http://:8080/proxy/app-20201020143315-0005/api/v1/applications/app-20201020143315-0005/stages/7/0/taskTable?draw=1&order%5B0%5D%5Bcolumn%5D=0&order%5B0%5D%5Bdir%5D=asc&start=0&length=20&search%5Bvalue%5D=&search%5Bregex%5D=false&numTasks=1&columnIndexToSort=0&columnNameToSort=Index&_=1603206039549
> {code}
> Server error stack trace:
> {code:java}
> /api/v1/applications/app-20201020113310-0004/stages/7/0/taskTable/api/v1/applications/app-20201020113310-0004/stages/7/0/taskTablejavax.servlet.ServletException:
>  java.lang.NullPointerException at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) 
> at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) 
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:505) at 
> org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>  at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) at 
> org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>  at 
> org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>  at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
>  at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> java.lang.NullPointerException at 
> org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175)
>  at 
> org.apache.spark.status.api.v1.BaseAppResource.

[jira] [Assigned] (SPARK-33195) stages/stage UI page fails to load when spark reverse proxy is enabled

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33195:


Assignee: (was: Apache Spark)

> stages/stage UI page fails to load when spark reverse proxy is enabled
> --
>
> Key: SPARK-33195
> URL: https://issues.apache.org/jira/browse/SPARK-33195
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1
>Reporter: Liran
>Priority: Major
>
> I think we have the same issue reported in SPARK-32467, reproduced with 
> reverse proxy redirects, I'm getting the exact same error in spark UI.
> Url page:
> {code:java}
> http://:8080/proxy/app-20201020143315-0005/stages/stage/?id=7&attempt=0{code}
> The url above fails to load, looking at the network tab - this request fails:
> {code:java}
> http://:8080/proxy/app-20201020143315-0005/api/v1/applications/app-20201020143315-0005/stages/7/0/taskTable?draw=1&order%5B0%5D%5Bcolumn%5D=0&order%5B0%5D%5Bdir%5D=asc&start=0&length=20&search%5Bvalue%5D=&search%5Bregex%5D=false&numTasks=1&columnIndexToSort=0&columnNameToSort=Index&_=1603206039549
> {code}
> Server error stack trace:
> {code:java}
> /api/v1/applications/app-20201020113310-0004/stages/7/0/taskTable/api/v1/applications/app-20201020113310-0004/stages/7/0/taskTablejavax.servlet.ServletException:
>  java.lang.NullPointerException at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) 
> at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) 
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:505) at 
> org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>  at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) at 
> org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>  at 
> org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>  at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
>  at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> java.lang.NullPointerException at 
> org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175)
>  at 
> org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:140)
>  at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:107) at 
>

[jira] [Assigned] (SPARK-33195) stages/stage UI page fails to load when spark reverse proxy is enabled

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33195:


Assignee: Apache Spark

> stages/stage UI page fails to load when spark reverse proxy is enabled
> --
>
> Key: SPARK-33195
> URL: https://issues.apache.org/jira/browse/SPARK-33195
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1
>Reporter: Liran
>Assignee: Apache Spark
>Priority: Major
>
> I think we have the same issue reported in SPARK-32467, reproduced with 
> reverse proxy redirects, I'm getting the exact same error in spark UI.
> Url page:
> {code:java}
> http://:8080/proxy/app-20201020143315-0005/stages/stage/?id=7&attempt=0{code}
> The url above fails to load, looking at the network tab - this request fails:
> {code:java}
> http://:8080/proxy/app-20201020143315-0005/api/v1/applications/app-20201020143315-0005/stages/7/0/taskTable?draw=1&order%5B0%5D%5Bcolumn%5D=0&order%5B0%5D%5Bdir%5D=asc&start=0&length=20&search%5Bvalue%5D=&search%5Bregex%5D=false&numTasks=1&columnIndexToSort=0&columnNameToSort=Index&_=1603206039549
> {code}
> Server error stack trace:
> {code:java}
> /api/v1/applications/app-20201020113310-0004/stages/7/0/taskTable/api/v1/applications/app-20201020113310-0004/stages/7/0/taskTablejavax.servlet.ServletException:
>  java.lang.NullPointerException at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) 
> at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) 
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:505) at 
> org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>  at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) at 
> org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>  at 
> org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>  at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
>  at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> java.lang.NullPointerException at 
> org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175)
>  at 
> org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:140)
>  at org.apache.spark.ui.SparkUI.withSparkUI

[jira] [Commented] (SPARK-33195) stages/stage UI page fails to load when spark reverse proxy is enabled

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331143#comment-17331143
 ] 

Apache Spark commented on SPARK-33195:
--

User 'mdianjun' has created a pull request for this issue:
https://github.com/apache/spark/pull/32317

> stages/stage UI page fails to load when spark reverse proxy is enabled
> --
>
> Key: SPARK-33195
> URL: https://issues.apache.org/jira/browse/SPARK-33195
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1
>Reporter: Liran
>Priority: Major
>
> I think we have the same issue reported in SPARK-32467, reproduced with 
> reverse proxy redirects, I'm getting the exact same error in spark UI.
> Url page:
> {code:java}
> http://:8080/proxy/app-20201020143315-0005/stages/stage/?id=7&attempt=0{code}
> The url above fails to load, looking at the network tab - this request fails:
> {code:java}
> http://:8080/proxy/app-20201020143315-0005/api/v1/applications/app-20201020143315-0005/stages/7/0/taskTable?draw=1&order%5B0%5D%5Bcolumn%5D=0&order%5B0%5D%5Bdir%5D=asc&start=0&length=20&search%5Bvalue%5D=&search%5Bregex%5D=false&numTasks=1&columnIndexToSort=0&columnNameToSort=Index&_=1603206039549
> {code}
> Server error stack trace:
> {code:java}
> /api/v1/applications/app-20201020113310-0004/stages/7/0/taskTable/api/v1/applications/app-20201020113310-0004/stages/7/0/taskTablejavax.servlet.ServletException:
>  java.lang.NullPointerException at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) 
> at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) 
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>  at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>  at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>  at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.sparkproject.jetty.server.Server.handle(Server.java:505) at 
> org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>  at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) at 
> org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>  at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>  at 
> org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>  at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
>  at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> java.lang.NullPointerException at 
> org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175)
>  at 
> org.apache.spark.status.api.v1.BaseAppResource.

[jira] [Commented] (SPARK-35168) mapred.reduce.tasks should be shuffle.partitions not adaptive.coalescePartitions.initialPartitionNum

2021-04-23 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331132#comment-17331132
 ] 

Kent Yao commented on SPARK-35168:
--

Thanks [~dongjoon]

> mapred.reduce.tasks should be shuffle.partitions not 
> adaptive.coalescePartitions.initialPartitionNum
> 
>
> Key: SPARK-35168
> URL: https://issues.apache.org/jira/browse/SPARK-35168
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1, 3.2.0
>Reporter: Kent Yao
>Priority: Minor
>
> {code:java}
> spark-sql> set spark.sql.adaptive.coalescePartitions.initialPartitionNum=1;
> spark.sql.adaptive.coalescePartitions.initialPartitionNum 1
> Time taken: 2.18 seconds, Fetched 1 row(s)
> spark-sql> set mapred.reduce.tasks;
> 21/04/21 14:27:11 WARN SetCommand: Property mapred.reduce.tasks is 
> deprecated, showing spark.sql.shuffle.partitions instead.
> spark.sql.shuffle.partitions  1
> Time taken: 0.03 seconds, Fetched 1 row(s)
> spark-sql> set spark.sql.shuffle.partitions;
> spark.sql.shuffle.partitions  200
> Time taken: 0.024 seconds, Fetched 1 row(s)
> spark-sql> set mapred.reduce.tasks=2;
> 21/04/21 14:31:52 WARN SetCommand: Property mapred.reduce.tasks is 
> deprecated, automatically converted to spark.sql.shuffle.partitions instead.
> spark.sql.shuffle.partitions  2
> Time taken: 0.017 seconds, Fetched 1 row(s)
> spark-sql> set mapred.reduce.tasks;
> 21/04/21 14:31:55 WARN SetCommand: Property mapred.reduce.tasks is 
> deprecated, showing spark.sql.shuffle.partitions instead.
> spark.sql.shuffle.partitions  1
> Time taken: 0.017 seconds, Fetched 1 row(s)
> spark-sql>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35209) CLONE - CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes

2021-04-23 Thread Jira

Kadir Selçuk created SPARK-35209:


 Summary: CLONE - CatalystTypeConverters of date/timestamp should 
accept both the old and new Java time classes
 Key: SPARK-35209
 URL: https://issues.apache.org/jira/browse/SPARK-35209
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Kadir Selçuk
Assignee: Wenchen Fan
 Fix For: 3.2.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes

2021-04-23 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331117#comment-17331117
 ] 

Kadir Selçuk commented on SPARK-35204:
--

Sorunları çözmek 

> CatalystTypeConverters of date/timestamp should accept both the old and new 
> Java time classes
> -
>
> Key: SPARK-35204
> URL: https://issues.apache.org/jira/browse/SPARK-35204
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35160) Spark application submitted despite failing to get Hive delegation token

2021-04-23 Thread Manu Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331114#comment-17331114
 ] 

Manu Zhang commented on SPARK-35160:


[~hyukjin.kwon],

Thanks for reminder. I've added my proposal and I did ask about it on mailing 
list. It will be great if you know the reasoning behind it or you may forward 
to someone who knows.

> Spark application submitted despite failing to get Hive delegation token
> 
>
> Key: SPARK-35160
> URL: https://issues.apache.org/jira/browse/SPARK-35160
> Project: Spark
>  Issue Type: Improvement
>  Components: Security
>Affects Versions: 3.1.1
>Reporter: Manu Zhang
>Priority: Major
>
> Currently, when running on YARN and failing to get Hive delegation token, a 
> Spark SQL application will still be submitted. Eventually, the application 
> will fail on connecting to Hive metastore without a valid delegation token. 
> Is there any reason for this design ?
> cc [~jerryshao] who originally implemented this in 
> https://issues.apache.org/jira/browse/SPARK-14743
> I'd propose to fail immediately like HadoopFSDelegationTokenProvider.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35160) Spark application submitted despite failing to get Hive delegation token

2021-04-23 Thread Manu Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang updated SPARK-35160:
---
Description: 
Currently, when running on YARN and failing to get Hive delegation token, a 
Spark SQL application will still be submitted. Eventually, the application will 
fail on connecting to Hive metastore without a valid delegation token. 

Is there any reason for this design ?

cc [~jerryshao] who originally implemented this in 
https://issues.apache.org/jira/browse/SPARK-14743

I'd propose to fail immediately like HadoopFSDelegationTokenProvider.

  was:
Currently, when running on YARN and failing to get Hive delegation token, a 
Spark SQL application will still be submitted. Eventually, the application will 
fail on connecting to Hive metastore without a valid delegation token. 

Is there any reason for this design ?

cc [~jerryshao] who originally implemented this in 
https://issues.apache.org/jira/browse/SPARK-14743

 


> Spark application submitted despite failing to get Hive delegation token
> 
>
> Key: SPARK-35160
> URL: https://issues.apache.org/jira/browse/SPARK-35160
> Project: Spark
>  Issue Type: Improvement
>  Components: Security
>Affects Versions: 3.1.1
>Reporter: Manu Zhang
>Priority: Major
>
> Currently, when running on YARN and failing to get Hive delegation token, a 
> Spark SQL application will still be submitted. Eventually, the application 
> will fail on connecting to Hive metastore without a valid delegation token. 
> Is there any reason for this design ?
> cc [~jerryshao] who originally implemented this in 
> https://issues.apache.org/jira/browse/SPARK-14743
> I'd propose to fail immediately like HadoopFSDelegationTokenProvider.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28247) Flaky test: "query without test harness" in ContinuousSuite

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331103#comment-17331103
 ] 

Apache Spark commented on SPARK-28247:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/32316

> Flaky test: "query without test harness" in ContinuousSuite
> ---
>
> Key: SPARK-28247
> URL: https://issues.apache.org/jira/browse/SPARK-28247
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.4, 3.0.0
>
>
> This test has failed a few times in some PRs, as well as easy to reproduce 
> locally. Example of a failure:
> {noformat}
>  [info] - query without test harness *** FAILED *** (2 seconds, 931 
> milliseconds)
> [info]   scala.Predef.Set.apply[Int](0, 1, 2, 
> 3).map[org.apache.spark.sql.Row, 
> scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => 
> org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row])
>  was false
> (ContinuousSuite.scala:226){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35208) Add docs for LATERAL subqueries

2021-04-23 Thread Allison Wang (Jira)

Allison Wang created SPARK-35208:


 Summary: Add docs for LATERAL subqueries
 Key: SPARK-35208
 URL: https://issues.apache.org/jira/browse/SPARK-35208
 Project: Spark
  Issue Type: Task
  Components: docs
Affects Versions: 3.2.0
Reporter: Allison Wang


Add documentation for LATERAL subqueries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE

2021-04-23 Thread Wei Xue (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331096#comment-17331096
 ] 

Wei Xue commented on SPARK-35133:
-

Totally understand. But turning off AQE is sometimes part of the debugging 
process, too, in order to isolate the problem.

> EXPLAIN CODEGEN does not work with AQE
> --
>
> Key: SPARK-35133
> URL: https://issues.apache.org/jira/browse/SPARK-35133
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Priority: Major
>
> `EXPLAIN CODEGEN ` (and Dataset.explain("codegen")) prints out the 
> generated code for each stage of plan. The current implementation is to match 
> `WholeStageCodegenExec` operator in query plan and prints out generated code 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L111-L118]
>  ). This does not work with AQE as we wrap the whole query plan inside 
> `AdaptiveSparkPlanExec` and do not run whole stage code-gen physical plan 
> rule eagerly (`CollapseCodegenStages`). This introduces unexpected behavior 
> change for EXPLAIN query (and Dataset.explain), as we enable AQE by default 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35207) hash() and other hash builtins do not normalize negative zero

2021-04-23 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated SPARK-35207:
--
Description: 
I would generally expect that {{x = y => hash(x) = hash(y)}}. However +-0 hash 
to different values for floating point types. 

{noformat}
scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as 
double))").show
+-+--+
|hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))|
+-+--+
|  -1670924195|-853646085|
+-+--+
scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show
++
|(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))|
++
|true|
++
{noformat}

I'm not sure how likely this is to cause issues in practice, since only a 
limited number of calculations can produce -0 and joining or aggregating with 
floating point keys is a bad practice as a general rule, but I think it would 
be safer if we normalised -0.0 to +0.0.

  was:
I would generally expect that x = y => hash(x) = hash(y). However +-0 hash to 
different values for floating point types. 

{noformat}
scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as 
double))").show
+-+--+
|hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))|
+-+--+
|  -1670924195|-853646085|
+-+--+
scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show
++
|(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))|
++
|true|
++
{noformat}

I'm not sure how likely this is to cause issues in practice, since only a 
limited number of calculations can produce -0 and joining or aggregating with 
floating point keys is a bad practice as a general rule, but I think it would 
be safer if we normalised -0.0 to +0.0.


> hash() and other hash builtins do not normalize negative zero
> -
>
> Key: SPARK-35207
> URL: https://issues.apache.org/jira/browse/SPARK-35207
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: correctness
>
> I would generally expect that {{x = y => hash(x) = hash(y)}}. However +-0 
> hash to different values for floating point types. 
> {noformat}
> scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as 
> double))").show
> +-+--+
> |hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))|
> +-+--+
> |  -1670924195|-853646085|
> +-+--+
> scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as 
> double)").show
> ++
> |(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))|
> ++
> |true|
> ++
> {noformat}
> I'm not sure how likely this is to cause issues in practice, since only a 
> limited number of calculations can produce -0 and joining or aggregating with 
> floating point keys is a bad practice as a general rule, but I think it would 
> be safer if we normalised -0.0 to +0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35207) hash() and other hash builtins do not normalize negative zero

2021-04-23 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated SPARK-35207:
--
Description: 
I would generally expect that {{x = y => hash( x ) = hash( y )}}. However +-0 
hash to different values for floating point types. 

{noformat}
scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as 
double))").show
+-+--+
|hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))|
+-+--+
|  -1670924195|-853646085|
+-+--+
scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show
++
|(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))|
++
|true|
++
{noformat}

I'm not sure how likely this is to cause issues in practice, since only a 
limited number of calculations can produce -0 and joining or aggregating with 
floating point keys is a bad practice as a general rule, but I think it would 
be safer if we normalised -0.0 to +0.0.

  was:
I would generally expect that {{x = y => hash(x) = hash(y)}}. However +-0 hash 
to different values for floating point types. 

{noformat}
scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as 
double))").show
+-+--+
|hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))|
+-+--+
|  -1670924195|-853646085|
+-+--+
scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show
++
|(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))|
++
|true|
++
{noformat}

I'm not sure how likely this is to cause issues in practice, since only a 
limited number of calculations can produce -0 and joining or aggregating with 
floating point keys is a bad practice as a general rule, but I think it would 
be safer if we normalised -0.0 to +0.0.


> hash() and other hash builtins do not normalize negative zero
> -
>
> Key: SPARK-35207
> URL: https://issues.apache.org/jira/browse/SPARK-35207
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: correctness
>
> I would generally expect that {{x = y => hash( x ) = hash( y )}}. However +-0 
> hash to different values for floating point types. 
> {noformat}
> scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as 
> double))").show
> +-+--+
> |hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))|
> +-+--+
> |  -1670924195|-853646085|
> +-+--+
> scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as 
> double)").show
> ++
> |(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))|
> ++
> |true|
> ++
> {noformat}
> I'm not sure how likely this is to cause issues in practice, since only a 
> limited number of calculations can produce -0 and joining or aggregating with 
> floating point keys is a bad practice as a general rule, but I think it would 
> be safer if we normalised -0.0 to +0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35207) hash() and other hash builtins do not normalize negative zero

2021-04-23 Thread Tim Armstrong (Jira)

Tim Armstrong created SPARK-35207:
-

 Summary: hash() and other hash builtins do not normalize negative 
zero
 Key: SPARK-35207
 URL: https://issues.apache.org/jira/browse/SPARK-35207
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1
Reporter: Tim Armstrong


I would generally expect that x = y => hash(x) = hash(y). However +-0 hash to 
different values for floating point types. 

{noformat}
scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as 
double))").show
+-+--+
|hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))|
+-+--+
|  -1670924195|-853646085|
+-+--+
scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show
++
|(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))|
++
|true|
++
{noformat}

I'm not sure how likely this is to cause issues in practice, since only a 
limited number of calculations can produce -0 and joining or aggregating with 
floating point keys is a bad practice as a general rule, but I think it would 
be safer if we normalised -0.0 to +0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35168) mapred.reduce.tasks should be shuffle.partitions not adaptive.coalescePartitions.initialPartitionNum

2021-04-23 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331059#comment-17331059
 ] 

Dongjoon Hyun commented on SPARK-35168:
---

Thank you, [~Qin Yao]. I converted this into a subtask of SPARK-33828 in order 
to give more visibility.

> mapred.reduce.tasks should be shuffle.partitions not 
> adaptive.coalescePartitions.initialPartitionNum
> 
>
> Key: SPARK-35168
> URL: https://issues.apache.org/jira/browse/SPARK-35168
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1, 3.2.0
>Reporter: Kent Yao
>Priority: Minor
>
> {code:java}
> spark-sql> set spark.sql.adaptive.coalescePartitions.initialPartitionNum=1;
> spark.sql.adaptive.coalescePartitions.initialPartitionNum 1
> Time taken: 2.18 seconds, Fetched 1 row(s)
> spark-sql> set mapred.reduce.tasks;
> 21/04/21 14:27:11 WARN SetCommand: Property mapred.reduce.tasks is 
> deprecated, showing spark.sql.shuffle.partitions instead.
> spark.sql.shuffle.partitions  1
> Time taken: 0.03 seconds, Fetched 1 row(s)
> spark-sql> set spark.sql.shuffle.partitions;
> spark.sql.shuffle.partitions  200
> Time taken: 0.024 seconds, Fetched 1 row(s)
> spark-sql> set mapred.reduce.tasks=2;
> 21/04/21 14:31:52 WARN SetCommand: Property mapred.reduce.tasks is 
> deprecated, automatically converted to spark.sql.shuffle.partitions instead.
> spark.sql.shuffle.partitions  2
> Time taken: 0.017 seconds, Fetched 1 row(s)
> spark-sql> set mapred.reduce.tasks;
> 21/04/21 14:31:55 WARN SetCommand: Property mapred.reduce.tasks is 
> deprecated, showing spark.sql.shuffle.partitions instead.
> spark.sql.shuffle.partitions  1
> Time taken: 0.017 seconds, Fetched 1 row(s)
> spark-sql>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35168) mapred.reduce.tasks should be shuffle.partitions not adaptive.coalescePartitions.initialPartitionNum

2021-04-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35168:
--
Parent: SPARK-33828
Issue Type: Sub-task  (was: Bug)

> mapred.reduce.tasks should be shuffle.partitions not 
> adaptive.coalescePartitions.initialPartitionNum
> 
>
> Key: SPARK-35168
> URL: https://issues.apache.org/jira/browse/SPARK-35168
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1, 3.2.0
>Reporter: Kent Yao
>Priority: Minor
>
> {code:java}
> spark-sql> set spark.sql.adaptive.coalescePartitions.initialPartitionNum=1;
> spark.sql.adaptive.coalescePartitions.initialPartitionNum 1
> Time taken: 2.18 seconds, Fetched 1 row(s)
> spark-sql> set mapred.reduce.tasks;
> 21/04/21 14:27:11 WARN SetCommand: Property mapred.reduce.tasks is 
> deprecated, showing spark.sql.shuffle.partitions instead.
> spark.sql.shuffle.partitions  1
> Time taken: 0.03 seconds, Fetched 1 row(s)
> spark-sql> set spark.sql.shuffle.partitions;
> spark.sql.shuffle.partitions  200
> Time taken: 0.024 seconds, Fetched 1 row(s)
> spark-sql> set mapred.reduce.tasks=2;
> 21/04/21 14:31:52 WARN SetCommand: Property mapred.reduce.tasks is 
> deprecated, automatically converted to spark.sql.shuffle.partitions instead.
> spark.sql.shuffle.partitions  2
> Time taken: 0.017 seconds, Fetched 1 row(s)
> spark-sql> set mapred.reduce.tasks;
> 21/04/21 14:31:55 WARN SetCommand: Property mapred.reduce.tasks is 
> deprecated, showing spark.sql.shuffle.partitions instead.
> spark.sql.shuffle.partitions  1
> Time taken: 0.017 seconds, Fetched 1 row(s)
> spark-sql>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34297) Add metrics for data loss and offset out range for KafkaMicroBatchStream

2021-04-23 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-34297.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31398
[https://github.com/apache/spark/pull/31398]

> Add metrics for data loss and offset out range for KafkaMicroBatchStream
> 
>
> Key: SPARK-34297
> URL: https://issues.apache.org/jira/browse/SPARK-34297
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.2.0
>
>
> When testing SS, I found it is hard to track data loss of SS reading from 
> Kafka. The micro scan node has only one metric, number of output rows. Users 
> have no idea how many times offsets to fetch are out of Kafak now, how many 
> times data loss happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE

2021-04-23 Thread Cheng Su (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331011#comment-17331011
 ] 

Cheng Su commented on SPARK-35133:
--

btw just to provide more context, I am running into this in reality when trying 
to debug code-gen for some queries in unit test. So I guess others can run into 
this issue as well. I will spend one afternoon or so to figure out if there's a 
clean fix. Thanks.

> EXPLAIN CODEGEN does not work with AQE
> --
>
> Key: SPARK-35133
> URL: https://issues.apache.org/jira/browse/SPARK-35133
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Priority: Major
>
> `EXPLAIN CODEGEN ` (and Dataset.explain("codegen")) prints out the 
> generated code for each stage of plan. The current implementation is to match 
> `WholeStageCodegenExec` operator in query plan and prints out generated code 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L111-L118]
>  ). This does not work with AQE as we wrap the whole query plan inside 
> `AdaptiveSparkPlanExec` and do not run whole stage code-gen physical plan 
> rule eagerly (`CollapseCodegenStages`). This introduces unexpected behavior 
> change for EXPLAIN query (and Dataset.explain), as we enable AQE by default 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit

2021-04-23 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331003#comment-17331003
 ] 

L. C. Hsieh commented on SPARK-35156:
-

> Can you just use a later version? it may be a problem that was fixed.
> Or some issue in how you packager your app, like, having it include 
> incompatible K8S classes.

Master is okay. I guess it is not intentionally fixed or is not backport? 
Because branch-3.1 has the issue. As branch-3.0 is okay too, maybe some change 
between 3.0 and master causes it.

The exception is seen locally when running spark-submit. You can also see the 
above stack trace that the exception is thrown early in SparkSubmit. And 
master, branch-3.0 both are not affected from the issue. It seems not related 
to how packaging app is done.

It is good if anyone can test it too in case I really did something incorrect 
during the tests.

> Thrown java.lang.NoClassDefFoundError when using spark-submit
> -
>
> Key: SPARK-35156
> URL: https://issues.apache.org/jira/browse/SPARK-35156
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.1.1
>Reporter: L. C. Hsieh
>Priority: Major
>
> Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S 
> cluster.
> Master, branch-3.0 are okay. Branch-3.1 is affected.
> How to reproduce:
> 1. Using sbt to build Spark with Kubernetes (-Pkubernetes)
> 2. Run spark-submit to submit to K8S cluster
> 3. Get the following exception 
> {code:java}
> 21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file                       
>           
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> com/fasterxml/jackson/dataformat/yaml/YAMLFactory                             
>                          
>         at 
> io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46)
>                                                    
>         at 
> io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)       
>                                                                          
>         at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530) 
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264) 
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.(Config.java:230)        
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.(Config.java:224)        
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259) 
>                                                                               
>               at 
> org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80)
>                                 
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207)
>                                    
>         at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621)     
>                                                                               
>       
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>                                               
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>                                             
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>                                                 
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)        
>                                                                          
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)  
>                                                                               
>       
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> com.fa

[jira] [Commented] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit

2021-04-23 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330999#comment-17330999
 ] 

L. C. Hsieh commented on SPARK-35156:
-

> Do you mean Branch-3.0 is affected alone?

Sorry for the typo. Only branch-3.1 is affected.

> Thrown java.lang.NoClassDefFoundError when using spark-submit
> -
>
> Key: SPARK-35156
> URL: https://issues.apache.org/jira/browse/SPARK-35156
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.1.1
>Reporter: L. C. Hsieh
>Priority: Major
>
> Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S 
> cluster.
> Master, branch-3.0 are okay. Branch-3.1 is affected.
> How to reproduce:
> 1. Using sbt to build Spark with Kubernetes (-Pkubernetes)
> 2. Run spark-submit to submit to K8S cluster
> 3. Get the following exception 
> {code:java}
> 21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file                       
>           
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> com/fasterxml/jackson/dataformat/yaml/YAMLFactory                             
>                          
>         at 
> io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46)
>                                                    
>         at 
> io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)       
>                                                                          
>         at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530) 
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264) 
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.(Config.java:230)        
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.(Config.java:224)        
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259) 
>                                                                               
>               at 
> org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80)
>                                 
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207)
>                                    
>         at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621)     
>                                                                               
>       
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>                                               
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>                                             
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>                                                 
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)        
>                                                                          
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)  
>                                                                               
>       
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> com.fasterxml.jackson.dataformat.yaml.YAMLFactory
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>         ... 19 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit

2021-04-23 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-35156:

Description: 
Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S 
cluster.

Master, branch-3.0 are okay. Branch-3.1 is affected.

How to reproduce:

1. Using sbt to build Spark with Kubernetes (-Pkubernetes)
2. Run spark-submit to submit to K8S cluster
3. Get the following exception 

{code:java}
21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file                         
        
Exception in thread "main" java.lang.NoClassDefFoundError: 
com/fasterxml/jackson/dataformat/yaml/YAMLFactory                               
                       
        at 
io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46)
                                                   
        at 
io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)         
                                                                       
        at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530)   
                                                                                
  
        at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264)   
                                                                                
  
        at io.fabric8.kubernetes.client.Config.(Config.java:230)          
                                                                                
  
        at io.fabric8.kubernetes.client.Config.(Config.java:224)          
                                                                                
  
        at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259)   
                                                                                
          at 
org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80)
                                
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207)
                                   
        at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621)       
                                                                                
  
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
                                              
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
                                            
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
                                                
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)          
                                                                       
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)    
                                                                                
  
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: 
com.fasterxml.jackson.dataformat.yaml.YAMLFactory
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 19 more {code}

  was:
Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S 
cluster.

Master, branch-3.1 are okay. Branch-3.1 is affected.

How to reproduce:

1. Using sbt to build Spark with Kubernetes (-Pkubernetes)
2. Run spark-submit to submit to K8S cluster
3. Get the following exception 

{code:java}
21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file                         
        
Exception in thread "main" java.lang.NoClassDefFoundError: 
com/fasterxml/jackson/dataformat/yaml/YAMLFactory                               
                       
        at 
io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46)
                                                   
        at 
io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)

[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE

2021-04-23 Thread Cheng Su (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330982#comment-17330982
 ] 

Cheng Su commented on SPARK-35133:
--

When ever developers/users want to debug generated code for query in 
spark-shell or spark-sql command line, they have to disable AQE explicitly. 
After debugging, they have to enable AQE back for running queries or doing some 
other stuff. I feel it's kind of inconvenient for debugging.

> EXPLAIN CODEGEN does not work with AQE
> --
>
> Key: SPARK-35133
> URL: https://issues.apache.org/jira/browse/SPARK-35133
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Priority: Major
>
> `EXPLAIN CODEGEN ` (and Dataset.explain("codegen")) prints out the 
> generated code for each stage of plan. The current implementation is to match 
> `WholeStageCodegenExec` operator in query plan and prints out generated code 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L111-L118]
>  ). This does not work with AQE as we wrap the whole query plan inside 
> `AdaptiveSparkPlanExec` and do not run whole stage code-gen physical plan 
> rule eagerly (`CollapseCodegenStages`). This introduces unexpected behavior 
> change for EXPLAIN query (and Dataset.explain), as we enable AQE by default 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35013) Spark allows to set spark.driver.cores=0

2021-04-23 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-35013:
-
Issue Type: Improvement  (was: Bug)

> Spark allows to set spark.driver.cores=0
> 
>
> Key: SPARK-35013
> URL: https://issues.apache.org/jira/browse/SPARK-35013
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.7, 3.1.1
>Reporter: Oleg Lypkan
>Priority: Minor
>
> I found an inconsistency in [validation logic of Spark submit arguments 
> |https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L248-L258]that
>  allows *spark.driver.cores* value to be set to 0 but requires 
> *spark.driver.memory,* *spark.executor.cores, spark.executor.memory* to be 
> positive numbers:
> {quote}Exception in thread "main" org.apache.spark.SparkException: Driver 
> memory must be a positive number
>  Exception in thread "main" org.apache.spark.SparkException: Executor cores 
> must be a positive number
>  Exception in thread "main" org.apache.spark.SparkException: Executor memory 
> must be a positive number
> {quote}
> I would like to understand if there is a reason for this inconsistency in the 
> validation logic or it is a bug?
> Thank you



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35013) Spark allows to set spark.driver.cores=0

2021-04-23 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330978#comment-17330978
 ] 

Sean R. Owen commented on SPARK-35013:
--

I can't think of a reason to allow 0 cores. Feel free to open a PR.

> Spark allows to set spark.driver.cores=0
> 
>
> Key: SPARK-35013
> URL: https://issues.apache.org/jira/browse/SPARK-35013
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7, 3.1.1
>Reporter: Oleg Lypkan
>Priority: Minor
>
> I found an inconsistency in [validation logic of Spark submit arguments 
> |https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L248-L258]that
>  allows *spark.driver.cores* value to be set to 0 but requires 
> *spark.driver.memory,* *spark.executor.cores, spark.executor.memory* to be 
> positive numbers:
> {quote}Exception in thread "main" org.apache.spark.SparkException: Driver 
> memory must be a positive number
>  Exception in thread "main" org.apache.spark.SparkException: Executor cores 
> must be a positive number
>  Exception in thread "main" org.apache.spark.SparkException: Executor memory 
> must be a positive number
> {quote}
> I would like to understand if there is a reason for this inconsistency in the 
> validation logic or it is a bug?
> Thank you



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35027) Close the inputStream in FileAppender when writing the logs failure

2021-04-23 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330976#comment-17330976
 ] 

Sean R. Owen commented on SPARK-35027:
--

Are you sure? stop() is called on an error on these FileAppenders.

> Close the inputStream in FileAppender when writing the logs failure
> ---
>
> Key: SPARK-35027
> URL: https://issues.apache.org/jira/browse/SPARK-35027
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Jack Hu
>Priority: Major
>
> In Spark Cluster, the ExecutorRunner uses FileAppender  to redirect the 
> stdout/stderr of executors to file, when the writing processing is failure 
> due to some reasons: disk full, the FileAppender will only close the input 
> stream to file, but leave the pipe's stdout/stderr open, following writting 
> operation in executor side may be hung. 
> need to close the inputStream in FileAppender ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35046) Wrong memory allocation on standalone mode cluster

2021-04-23 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-35046.
--
Resolution: Invalid

> Wrong memory allocation on standalone mode cluster
> --
>
> Key: SPARK-35046
> URL: https://issues.apache.org/jira/browse/SPARK-35046
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 3.0.1
>Reporter: Mohamadreza Rostami
>Priority: Major
>
> I see a bug in executer memory allocation in the standalone cluster, but I 
> can't find which part of the spark code causes this problem. That why's I 
> decided to raise this issue here.
> Assume you have 3 workers with 10 CPU cores and 10 Gigabyte memories. Assume 
> also you have 2 spark jobs that run on this cluster of workers, and these 
> jobs configs set as below:
> -
> job-1:
> executer-memory: 5g
> executer-CPU: 4
> max-cores: 8
> --
> job-2:
> executer-memory: 6g
> executer-CPU: 4
> max-cores: 8
> --
> In this situation, We expect that if we submit both of these jobs, the first 
> job that submits get  2 executers which each of them has 4 CPU core and 5g 
> memory, and the second job gets only one executer on thirds worker who has 4 
> CPU core and 6g memory because worker 1 and worker 2 doesn't have enough 
> memory to accept the second job. But surprisingly, we see that one of the 
> first or second workers creates an executor for job-2, and the worker's 
> consuming memory goes beyond what's allocated to that and gets 11g memory 
> from the operating system.
> Is this behavior normal? I think this can cause some undefined behavior 
> problem in the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35054) Getting Critical Vulnerability CVE-2021-20231 on spark 3.0.0 branch

2021-04-23 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-35054.
--
Resolution: Invalid

There's no info about what these are or if they even affect Spark.

> Getting Critical Vulnerability CVE-2021-20231 on spark 3.0.0 branch
> ---
>
> Key: SPARK-35054
> URL: https://issues.apache.org/jira/browse/SPARK-35054
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Shashank Jain
>Priority: Major
>
> Currently while running Trivy Scan on Spark build we are getting the 
> following critical vulnerability 
> CVE-2021-20231   
> CVE-2021-20232
> How to fix these vulnerabilities in spark 3.0.0 branch ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit

2021-04-23 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330971#comment-17330971
 ] 

Sean R. Owen commented on SPARK-35156:
--

Can you just use a later version? it may be a problem that was fixed.
Or some issue in how you packager your app, like, having it include 
incompatible K8S classes.

> Thrown java.lang.NoClassDefFoundError when using spark-submit
> -
>
> Key: SPARK-35156
> URL: https://issues.apache.org/jira/browse/SPARK-35156
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.1.1
>Reporter: L. C. Hsieh
>Priority: Major
>
> Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S 
> cluster.
> Master, branch-3.1 are okay. Branch-3.1 is affected.
> How to reproduce:
> 1. Using sbt to build Spark with Kubernetes (-Pkubernetes)
> 2. Run spark-submit to submit to K8S cluster
> 3. Get the following exception 
> {code:java}
> 21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file                       
>           
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> com/fasterxml/jackson/dataformat/yaml/YAMLFactory                             
>                          
>         at 
> io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46)
>                                                    
>         at 
> io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)       
>                                                                          
>         at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530) 
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264) 
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.(Config.java:230)        
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.(Config.java:224)        
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259) 
>                                                                               
>               at 
> org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80)
>                                 
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207)
>                                    
>         at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621)     
>                                                                               
>       
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>                                               
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>                                             
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>                                                 
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)        
>                                                                          
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)  
>                                                                               
>       
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> com.fasterxml.jackson.dataformat.yaml.YAMLFactory
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>         ... 19 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For addi

[jira] [Resolved] (SPARK-35193) Scala/Java compatibility issue Re: how to use externalResource in java transformer from Scala Transformer?

2021-04-23 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-35193.
--
Resolution: Invalid

I think this should be a question to the user@ list - I don't see reason to 
believe it's a Spark issue.

There are several things that could be wrong, like, ExternalResourceParam not 
extending Param or not having the right name, etc.

> Scala/Java compatibility issue Re: how to use externalResource in java 
> transformer from Scala Transformer?
> --
>
> Key: SPARK-35193
> URL: https://issues.apache.org/jira/browse/SPARK-35193
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, ML
>Affects Versions: 3.1.1
>Reporter: Arthur
>Priority: Major
>
> I am trying to make a custom transformer use an externalResource, as it 
> requires a large table to do the transformation. I'm not super familiar with 
> scala syntax, but from snippets found on the internet I think I've made a 
> proper java implementation. I am running into the following error:
> Exception in thread "main" java.lang.IllegalArgumentException: requirement 
> failed: Param HardMatchDetector_d95b8f699114__externalResource does not 
> belong to HardMatchDetector_d95b8f699114.
>  at scala.Predef$.require(Predef.scala:281)
>  at org.apache.spark.ml.param.Params.shouldOwn(params.scala:851)
>  at org.apache.spark.ml.param.Params.set(params.scala:727)
>  at org.apache.spark.ml.param.Params.set$(params.scala:726)
>  at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
>  at org.apache.spark.ml.param.Params.set(params.scala:713)
>  at org.apache.spark.ml.param.Params.set$(params.scala:712)
>  at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
>  at HardMatchDetector.setResource(HardMatchDetector.java:45)
>  
> Code as follows:
> {code:java}
> public class HardMatchDetector extends Transformer implements 
> DefaultParamsWritable, DefaultParamsReadable, Serializable {
> public String inputColumn = "value";
>  public String outputColumn = "hardMatches";
>  private ExternalResourceParam resourceParam = new 
> ExternalResourceParam(this, "externalResource", "external resource, parquet 
> file with 2 columns, one names and one wordcount");;
>  private String uid;
> public HardMatchDetector setResource(final ExternalResource value)
> { return (HardMatchDetector)this.set(this.resourceParam, value); }
> public HardMatchDetector setResource(final String path)
> { return this.setResource(new ExternalResource(path, ReadAs.TEXT(), new 
> HashMap())); }
> @Override
>  public String uid()
> { return getUid(); }
> private String getUid() {
>  if (uid == null)
> { uid = Identifiable$.MODULE$.randomUID("HardMatchDetector"); }
> return uid;
>  }
> @Override
>  public Dataset transform(final Dataset dataset)
> { return dataset; }
> @Override
>  public StructType transformSchema(StructType schema)
> { return schema.add(DataTypes.createStructField(outputColumn, 
> DataTypes.StringType, true)); }
> @Override
>  public Transformer copy(ParamMap extra)
> { return new HardMatchDetector(); }
> }
> public class HardMatcherTest extends AbstractSparkTest
> { @Test 
> public void test() 
> { 
> var hardMatcher = new HardMatchDetector().setResource(pathName); }
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34430) Update index.md with a pyspark hint to avoid java.nio.DirectByteBuffer.(long, int) not available

2021-04-23 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-34430.
--
   Fix Version/s: (was: 3.0.0)
Target Version/s:   (was: 3.0.0)
  Resolution: Won't Fix

> Update index.md with a pyspark hint to avoid java.nio.DirectByteBuffer.(long, 
> int) not available
> 
>
> Key: SPARK-34430
> URL: https://issues.apache.org/jira/browse/SPARK-34430
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Marco van der Linden
>Priority: Trivial
>  Labels: pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Took us a while to figure out how to fix this with pyspark this might save a 
> few people a few hours...
>  
> The documentation describes vaguely how to fix the issue, by setting a 
> parameter but without an actual working example. 
> With the given PySpark example it should hold enough information to set this 
> in other scenarios as well.
>  
>  
> Kept the change to the docs as small as possible. 
> h3. What changes were proposed in this pull request?
> doc update, see title
> h3. Why are the changes needed?
> save people time figuring out how to resolve it
> h3. Does this PR introduce _any_ user-facing change?
> no
> h3. How was this patch tested?
> no code changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35197) Accumulators Explore Page on Spark UI on History Server

2021-04-23 Thread Frida Montserrat Pulido Padilla (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frida Montserrat Pulido Padilla updated SPARK-35197:

Summary: Accumulators Explore Page on Spark UI on History Server  (was: 
Accumulators Explore Page on Spark UI in History Server)

> Accumulators Explore Page on Spark UI on History Server
> ---
>
> Key: SPARK-35197
> URL: https://issues.apache.org/jira/browse/SPARK-35197
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, Web UI
>Affects Versions: 2.4.4
>Reporter: Frida Montserrat Pulido Padilla
>Priority: Minor
>  Labels: accumulators, ui
> Fix For: 2.4.4
>
>
> Proposition for *Accumulators Explore Page* on *SparkUI*: The particular 
> information for the accumulators will be located under a new tab that has an 
> overview page with links to check for more details about the accumulators 
> information by a particular name or stage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE

2021-04-23 Thread Wei Xue (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330920#comment-17330920
 ] 

Wei Xue commented on SPARK-35133:
-

I'm not against fixing it. But just wondering is it even worth the trouble?

> EXPLAIN CODEGEN does not work with AQE
> --
>
> Key: SPARK-35133
> URL: https://issues.apache.org/jira/browse/SPARK-35133
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Priority: Major
>
> `EXPLAIN CODEGEN ` (and Dataset.explain("codegen")) prints out the 
> generated code for each stage of plan. The current implementation is to match 
> `WholeStageCodegenExec` operator in query plan and prints out generated code 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L111-L118]
>  ). This does not work with AQE as we wrap the whole query plan inside 
> `AdaptiveSparkPlanExec` and do not run whole stage code-gen physical plan 
> rule eagerly (`CollapseCodegenStages`). This introduces unexpected behavior 
> change for EXPLAIN query (and Dataset.explain), as we enable AQE by default 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34458) Spark-hive: apache hive dependency with CVEs

2021-04-23 Thread Bhupesh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330873#comment-17330873
 ] 

Bhupesh edited comment on SPARK-34458 at 4/23/21, 4:48 PM:
---

I am going to work on it 


was (Author: bdhiman84):
I found that, this is already upgraded twice. Following are the git link of 
change.  *
 * 
[https://github.pie.apple.com/blnu/apache-spark/commit/29e7d354a896fbf5a00e22da6554356aa0d4eb95]
 * 
[https://github.pie.apple.com/blnu/apache-spark/commit/181d326a98c07d6021f11d5eb85962360bd8406d]

 

> Spark-hive: apache hive dependency with CVEs
> 
>
> Key: SPARK-34458
> URL: https://issues.apache.org/jira/browse/SPARK-34458
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Gang Liang
>Priority: Major
>
> Apache hive version 2.3.7 used by spark-hive (version 3.0.1) has the 
> following CVEs, as reported by our security team.
> CVE-2017-12625, CVE-2015-1772, CVE-2016-3083, CVE-2018-11777, CVE-2014-0228
> Please upgrade apache hive libraries to a higher version with no known 
> security risks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes

2021-04-23 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-35204:


Assignee: Wenchen Fan

> CatalystTypeConverters of date/timestamp should accept both the old and new 
> Java time classes
> -
>
> Key: SPARK-35204
> URL: https://issues.apache.org/jira/browse/SPARK-35204
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes

2021-04-23 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-35204.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32312
[https://github.com/apache/spark/pull/32312]

> CatalystTypeConverters of date/timestamp should accept both the old and new 
> Java time classes
> -
>
> Key: SPARK-35204
> URL: https://issues.apache.org/jira/browse/SPARK-35204
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34458) Spark-hive: apache hive dependency with CVEs

2021-04-23 Thread Bhupesh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330873#comment-17330873
 ] 

Bhupesh edited comment on SPARK-34458 at 4/23/21, 3:44 PM:
---

I found that, this is already upgraded twice. Following are the git link of 
change.  *
 * 
[https://github.pie.apple.com/blnu/apache-spark/commit/29e7d354a896fbf5a00e22da6554356aa0d4eb95]
 * 
[https://github.pie.apple.com/blnu/apache-spark/commit/181d326a98c07d6021f11d5eb85962360bd8406d]

 


was (Author: bdhiman84):
I found that, this is already upgraded twice. Following are the git link of 
change.  * 
[https://github.pie.apple.com/blnu/apache-spark/commit/29e7d354a896fbf5a00e22da6554356aa0d4eb95]
 * 
[https://github.pie.apple.com/blnu/apache-spark/commit/181d326a98c07d6021f11d5eb85962360bd8406d]

 

> Spark-hive: apache hive dependency with CVEs
> 
>
> Key: SPARK-34458
> URL: https://issues.apache.org/jira/browse/SPARK-34458
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Gang Liang
>Priority: Major
>
> Apache hive version 2.3.7 used by spark-hive (version 3.0.1) has the 
> following CVEs, as reported by our security team.
> CVE-2017-12625, CVE-2015-1772, CVE-2016-3083, CVE-2018-11777, CVE-2014-0228
> Please upgrade apache hive libraries to a higher version with no known 
> security risks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34458) Spark-hive: apache hive dependency with CVEs

2021-04-23 Thread Bhupesh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330873#comment-17330873
 ] 

Bhupesh commented on SPARK-34458:
-

I found that, this is already upgraded twice. Following are the git link of 
change.  * 
[https://github.pie.apple.com/blnu/apache-spark/commit/29e7d354a896fbf5a00e22da6554356aa0d4eb95]
 * 
[https://github.pie.apple.com/blnu/apache-spark/commit/181d326a98c07d6021f11d5eb85962360bd8406d]

 

> Spark-hive: apache hive dependency with CVEs
> 
>
> Key: SPARK-34458
> URL: https://issues.apache.org/jira/browse/SPARK-34458
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Gang Liang
>Priority: Major
>
> Apache hive version 2.3.7 used by spark-hive (version 3.0.1) has the 
> following CVEs, as reported by our security team.
> CVE-2017-12625, CVE-2015-1772, CVE-2016-3083, CVE-2018-11777, CVE-2014-0228
> Please upgrade apache hive libraries to a higher version with no known 
> security risks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35162) New SQL functions: TRY_ADD/TRY_DIVIDE

2021-04-23 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35162:
---
Summary: New SQL functions: TRY_ADD/TRY_DIVIDE  (was: New SQL functions: 
TRY_ADD/TRY_SUBTRACT/TRY_MULTIPLY/TRY_DIVIDE/TRY_DIV)

> New SQL functions: TRY_ADD/TRY_DIVIDE
> -
>
> Key: SPARK-35162
> URL: https://issues.apache.org/jira/browse/SPARK-35162
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35161) Error-handling SQL functions

2021-04-23 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35161:
---
Summary: Error-handling SQL functions  (was: Safe version SQL functions)

> Error-handling SQL functions
> 
>
> Key: SPARK-35161
> URL: https://issues.apache.org/jira/browse/SPARK-35161
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Create new safe version SQL functions for existing SQL functions/operators, 
> which returns NULL if overflow/error occurs. So that:
> 1. Users can manage to finish queries without interruptions in ANSI mode.
> 2. Users can get NULLs instead of unreasonable results if overflow occurs 
> when ANSI mode is off.
> For example, the behavior of the following SQL operations is unreasonable:
> {code:java}
> 2147483647 + 2 => -2147483647
> CAST(2147483648L AS INT) => -2147483648
> {code}
> With the new safe version SQL functions:
> {code:java}
> TRY_ADD(2147483647, 2) => null
> TRY_CAST(2147483648L AS INT) => null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35161) Error-handling SQL functions

2021-04-23 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35161:
---
Description: 
Create new Error-handling version SQL functions for existing SQL 
functions/operators, which returns NULL if overflow/error occurs. So that:
1. Users can manage to finish queries without interruptions in ANSI mode.
2. Users can get NULLs instead of unreasonable results if overflow occurs when 
ANSI mode is off.
For example, the behavior of the following SQL operations is unreasonable:
{code:java}
2147483647 + 2 => -2147483647
CAST(2147483648L AS INT) => -2147483648
{code}
With the new safe version SQL functions:
{code:java}
TRY_ADD(2147483647, 2) => null
TRY_CAST(2147483648L AS INT) => null
{code}

  was:
Create new safe version SQL functions for existing SQL functions/operators, 
which returns NULL if overflow/error occurs. So that:
1. Users can manage to finish queries without interruptions in ANSI mode.
2. Users can get NULLs instead of unreasonable results if overflow occurs when 
ANSI mode is off.
For example, the behavior of the following SQL operations is unreasonable:
{code:java}
2147483647 + 2 => -2147483647
CAST(2147483648L AS INT) => -2147483648
{code}
With the new safe version SQL functions:
{code:java}
TRY_ADD(2147483647, 2) => null
TRY_CAST(2147483648L AS INT) => null
{code}


> Error-handling SQL functions
> 
>
> Key: SPARK-35161
> URL: https://issues.apache.org/jira/browse/SPARK-35161
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Create new Error-handling version SQL functions for existing SQL 
> functions/operators, which returns NULL if overflow/error occurs. So that:
> 1. Users can manage to finish queries without interruptions in ANSI mode.
> 2. Users can get NULLs instead of unreasonable results if overflow occurs 
> when ANSI mode is off.
> For example, the behavior of the following SQL operations is unreasonable:
> {code:java}
> 2147483647 + 2 => -2147483647
> CAST(2147483648L AS INT) => -2147483648
> {code}
> With the new safe version SQL functions:
> {code:java}
> TRY_ADD(2147483647, 2) => null
> TRY_CAST(2147483648L AS INT) => null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330803#comment-17330803
 ] 

Hyukjin Kwon commented on SPARK-35196:
--

Yeah, I think it's all implemented properly. We should probably add the alias 
at 
[https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CompressionCodecs.scala#L30-L36],
 and fix the documentations at DataFrameWriter.scala, DataStreamWriter.scala, 
streaming.py readwriter.py

> DataFrameWriter.text support zstd compression
> -
>
> Key: SPARK-35196
> URL: https://issues.apache.org/jira/browse/SPARK-35196
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Leonard Lausen
>Priority: Major
>
> [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html]
>  specifies that only the following compression codecs are supported: `none, 
> bzip2, gzip, lz4, snappy and deflate`
> However, RDD API supports compression with zstd if users specify 
> 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the 
> saveAsTextFile method.
> Please also expose zstd in the DataFrameWriter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression

2021-04-23 Thread Leonard Lausen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330795#comment-17330795
 ] 

Leonard Lausen commented on SPARK-35196:


Great.

Adding the alias should be straightforward but a helpful addition. I found the 
Python interface at 
[https://github.com/apache/spark/blob/faa928cefc8c1c6d7771aacd2ae7670162346361/python/pyspark/sql/readwriter.py#L1300-L1301]
 Could you point out where the _jdf.write / _jwrite / _jwriter are implemented? 
I suspect the alias needs to be added there.

> DataFrameWriter.text support zstd compression
> -
>
> Key: SPARK-35196
> URL: https://issues.apache.org/jira/browse/SPARK-35196
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Leonard Lausen
>Priority: Major
>
> [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html]
>  specifies that only the following compression codecs are supported: `none, 
> bzip2, gzip, lz4, snappy and deflate`
> However, RDD API supports compression with zstd if users specify 
> 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the 
> saveAsTextFile method.
> Please also expose zstd in the DataFrameWriter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35206) Extract common get project path ability as function to SparkFunctionSuite

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330460#comment-17330460
 ] 

Apache Spark commented on SPARK-35206:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/32315

> Extract common get project path ability as function to SparkFunctionSuite
> -
>
> Key: SPARK-35206
> URL: https://issues.apache.org/jira/browse/SPARK-35206
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: wuyi
>Priority: Major
>
> Spark sql has test suites to read resources when running tests. The way of 
> getting the path of resources is commonly used in different suites. We can 
> extract them into a function to ease the maintenance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35206) Extract common get project path ability as function to SparkFunctionSuite

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35206:


Assignee: Apache Spark

> Extract common get project path ability as function to SparkFunctionSuite
> -
>
> Key: SPARK-35206
> URL: https://issues.apache.org/jira/browse/SPARK-35206
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>
> Spark sql has test suites to read resources when running tests. The way of 
> getting the path of resources is commonly used in different suites. We can 
> extract them into a function to ease the maintenance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35206) Extract common get project path ability as function to SparkFunctionSuite

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330461#comment-17330461
 ] 

Apache Spark commented on SPARK-35206:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/32315

> Extract common get project path ability as function to SparkFunctionSuite
> -
>
> Key: SPARK-35206
> URL: https://issues.apache.org/jira/browse/SPARK-35206
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: wuyi
>Priority: Major
>
> Spark sql has test suites to read resources when running tests. The way of 
> getting the path of resources is commonly used in different suites. We can 
> extract them into a function to ease the maintenance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35206) Extract common get project path ability as function to SparkFunctionSuite

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35206:


Assignee: (was: Apache Spark)

> Extract common get project path ability as function to SparkFunctionSuite
> -
>
> Key: SPARK-35206
> URL: https://issues.apache.org/jira/browse/SPARK-35206
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: wuyi
>Priority: Major
>
> Spark sql has test suites to read resources when running tests. The way of 
> getting the path of resources is commonly used in different suites. We can 
> extract them into a function to ease the maintenance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35206) Extract common get project path ability as function to SparkFunctionSuite

2021-04-23 Thread wuyi (Jira)

wuyi created SPARK-35206:


 Summary: Extract common get project path ability as function to 
SparkFunctionSuite
 Key: SPARK-35206
 URL: https://issues.apache.org/jira/browse/SPARK-35206
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Tests
Affects Versions: 3.2.0
Reporter: wuyi


Spark sql has test suites to read resources when running tests. The way of 
getting the path of resources is commonly used in different suites. We can 
extract them into a function to ease the maintenance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35201) Format empty grouping set exception in CUBE/ROLLUP

2021-04-23 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-35201.
--
Fix Version/s: 3.2.0
 Assignee: angerszhu
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/32307

> Format empty grouping set exception in CUBE/ROLLUP
> --
>
> Key: SPARK-35201
> URL: https://issues.apache.org/jira/browse/SPARK-35201
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Format empty grouping set exception in CUBE/ROLLUP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35123) read partitioned parquet: my_col=NOW replaced by on read()

2021-04-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35123.
--
Resolution: Duplicate

> read partitioned parquet: my_col=NOW replaced by  on read()
> -
>
> Key: SPARK-35123
> URL: https://issues.apache.org/jira/browse/SPARK-35123
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Killian
>Priority: Major
>
> When reading parquet file partitioned with a column containing the value 
> "NOW", The value is interpreted as now() and replaced by the current time at 
> the moment of the read() funct is executed
> {code:java}
> // step to reproduce
> df = spark.createDataFrame(data=[("NOW",1), ("TEST", 2)], schema = ["col1", 
> "id"])
> df.write.partitionBy("col1").parquet("test/test.parquet")
> >>> /home/test/test.parquet/col1=NOW
> df_loaded = spark.read.option(
>  "basePath",
>  "test/test.parquet",
> ).parquet("test/test.parquet/col1=*")
> >>> 
> +---+--+
> |id |col1  |
> +---+--+
> |2  |TEST  |
> |1  |2021-04-18 14:36:46.532273|
> +---+--+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35123) read partitioned parquet: my_col=NOW replaced by on read()

2021-04-23 Thread Max Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330391#comment-17330391
 ] 

Max Gekk commented on SPARK-35123:
--

The PR [https://github.com/apache/spark/pull/31549] should fix this particular 
case. [~salticidae] Can you reproduce the issue on the master?

> read partitioned parquet: my_col=NOW replaced by  on read()
> -
>
> Key: SPARK-35123
> URL: https://issues.apache.org/jira/browse/SPARK-35123
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Killian
>Priority: Major
>
> When reading parquet file partitioned with a column containing the value 
> "NOW", The value is interpreted as now() and replaced by the current time at 
> the moment of the read() funct is executed
> {code:java}
> // step to reproduce
> df = spark.createDataFrame(data=[("NOW",1), ("TEST", 2)], schema = ["col1", 
> "id"])
> df.write.partitionBy("col1").parquet("test/test.parquet")
> >>> /home/test/test.parquet/col1=NOW
> df_loaded = spark.read.option(
>  "basePath",
>  "test/test.parquet",
> ).parquet("test/test.parquet/col1=*")
> >>> 
> +---+--+
> |id |col1  |
> +---+--+
> |2  |TEST  |
> |1  |2021-04-18 14:36:46.532273|
> +---+--+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330389#comment-17330389
 ] 

Hyukjin Kwon commented on SPARK-35133:
--

cc [~maryannxue] and [~Ngone51] FYI

> EXPLAIN CODEGEN does not work with AQE
> --
>
> Key: SPARK-35133
> URL: https://issues.apache.org/jira/browse/SPARK-35133
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Priority: Major
>
> `EXPLAIN CODEGEN ` (and Dataset.explain("codegen")) prints out the 
> generated code for each stage of plan. The current implementation is to match 
> `WholeStageCodegenExec` operator in query plan and prints out generated code 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L111-L118]
>  ). This does not work with AQE as we wrap the whole query plan inside 
> `AdaptiveSparkPlanExec` and do not run whole stage code-gen physical plan 
> rule eagerly (`CollapseCodegenStages`). This introduces unexpected behavior 
> change for EXPLAIN query (and Dataset.explain), as we enable AQE by default 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35123) read partitioned parquet: my_col=NOW replaced by on read()

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330390#comment-17330390
 ] 

Hyukjin Kwon commented on SPARK-35123:
--

[~maxgekk] FYI

> read partitioned parquet: my_col=NOW replaced by  on read()
> -
>
> Key: SPARK-35123
> URL: https://issues.apache.org/jira/browse/SPARK-35123
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Killian
>Priority: Major
>
> When reading parquet file partitioned with a column containing the value 
> "NOW", The value is interpreted as now() and replaced by the current time at 
> the moment of the read() funct is executed
> {code:java}
> // step to reproduce
> df = spark.createDataFrame(data=[("NOW",1), ("TEST", 2)], schema = ["col1", 
> "id"])
> df.write.partitionBy("col1").parquet("test/test.parquet")
> >>> /home/test/test.parquet/col1=NOW
> df_loaded = spark.read.option(
>  "basePath",
>  "test/test.parquet",
> ).parquet("test/test.parquet/col1=*")
> >>> 
> +---+--+
> |id |col1  |
> +---+--+
> |2  |TEST  |
> |1  |2021-04-18 14:36:46.532273|
> +---+--+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35149) I am facing this issue regularly, how to fix this issue.

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330388#comment-17330388
 ] 

Hyukjin Kwon commented on SPARK-35149:
--

For questions, please use Spark mailing list.

> I am facing this issue regularly, how to fix this issue.
> 
>
> Key: SPARK-35149
> URL: https://issues.apache.org/jira/browse/SPARK-35149
> Project: Spark
>  Issue Type: Question
>  Components: Spark Submit
>Affects Versions: 2.2.2
>Reporter: Eppa Rakesh
>Priority: Critical
>
> 21/04/19 21:02:11 WARN hdfs.DataStreamer: Exception for 
> BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312
>  java.io.EOFException: Unexpected EOF while trying to read response from 
> server
>  at 
> org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:448)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
>  at 
> org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1086)
>  21/04/19 21:04:01 WARN hdfs.DataStreamer: Error Recovery for 
> BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312 in pipeline 
> [DatanodeInfoWithStorage[10.34.39.42:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK],
>  
> DatanodeInfoWithStorage[10.56.47.67:9866,DS-c28dab54-8fa0-4a49-80ec-345cc0cc52bd,DISK],
>  
> DatanodeInfoWithStorage[10.56.47.55:9866,DS-79f5dd22-d0bc-4fe0-8e50-8a570779de17,DISK]]:
>  datanode 
> 0(DatanodeInfoWithStorage[10.56.47.36:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK])
>  is bad.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35149) I am facing this issue regularly, how to fix this issue.

2021-04-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35149.
--
Resolution: Invalid

> I am facing this issue regularly, how to fix this issue.
> 
>
> Key: SPARK-35149
> URL: https://issues.apache.org/jira/browse/SPARK-35149
> Project: Spark
>  Issue Type: Question
>  Components: Spark Submit
>Affects Versions: 2.2.2
>Reporter: Eppa Rakesh
>Priority: Critical
>
> 21/04/19 21:02:11 WARN hdfs.DataStreamer: Exception for 
> BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312
>  java.io.EOFException: Unexpected EOF while trying to read response from 
> server
>  at 
> org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:448)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
>  at 
> org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1086)
>  21/04/19 21:04:01 WARN hdfs.DataStreamer: Error Recovery for 
> BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312 in pipeline 
> [DatanodeInfoWithStorage[10.34.39.42:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK],
>  
> DatanodeInfoWithStorage[10.56.47.67:9866,DS-c28dab54-8fa0-4a49-80ec-345cc0cc52bd,DISK],
>  
> DatanodeInfoWithStorage[10.56.47.55:9866,DS-79f5dd22-d0bc-4fe0-8e50-8a570779de17,DISK]]:
>  datanode 
> 0(DatanodeInfoWithStorage[10.56.47.36:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK])
>  is bad.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35154) Rpc env not shutdown when shutdown method call by endpoint onStop

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330387#comment-17330387
 ] 

Hyukjin Kwon commented on SPARK-35154:
--

{{RpcEndpoint}} isn't an API.

> Rpc env not shutdown when shutdown method call by endpoint onStop
> -
>
> Key: SPARK-35154
> URL: https://issues.apache.org/jira/browse/SPARK-35154
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: spark-3.x
>Reporter: LIU
>Priority: Minor
>
> when i use this code to work,  Rpc thread hangs up and not close gracefully. 
> i think when rpc thread called shutdown on OnStop method, it will try to put 
> MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, 
> it will make others thread return & stop but current thread which call OnStop 
> method to await current pool to stop. it makes current thread not stop, and 
> pending program.
> I'm not sure that needs to be improved or not?
>  
> {code:java}
> //代码占位符{code}
> test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
>  val rpcEndpoint = new RpcEndpoint {
>     override val rpcEnv: RpcEnv = env
>     override def onStop(): Unit = {
>       env.shutdown()
>  env.awaitTermination()         
>  }
>     override def receiveAndReply(context: RpcCallContext): 
> PartialFunction[Any, Unit] = {
>  case m => context.reply(m)
>  }
>   }
>   env.setupEndpoint("test", rpcEndpoint)
>   rpcEndpoint.stop()
>   env.awaitTermination()
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35154) Rpc env not shutdown when shutdown method call by endpoint onStop

2021-04-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35154.
--
Resolution: Invalid

> Rpc env not shutdown when shutdown method call by endpoint onStop
> -
>
> Key: SPARK-35154
> URL: https://issues.apache.org/jira/browse/SPARK-35154
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: spark-3.x
>Reporter: LIU
>Priority: Minor
>
> when i use this code to work,  Rpc thread hangs up and not close gracefully. 
> i think when rpc thread called shutdown on OnStop method, it will try to put 
> MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, 
> it will make others thread return & stop but current thread which call OnStop 
> method to await current pool to stop. it makes current thread not stop, and 
> pending program.
> I'm not sure that needs to be improved or not?
>  
> {code:java}
> //代码占位符{code}
> test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
>  val rpcEndpoint = new RpcEndpoint {
>     override val rpcEnv: RpcEnv = env
>     override def onStop(): Unit = {
>       env.shutdown()
>  env.awaitTermination()         
>  }
>     override def receiveAndReply(context: RpcCallContext): 
> PartialFunction[Any, Unit] = {
>  case m => context.reply(m)
>  }
>   }
>   env.setupEndpoint("test", rpcEndpoint)
>   rpcEndpoint.stop()
>   env.awaitTermination()
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330386#comment-17330386
 ] 

Hyukjin Kwon commented on SPARK-35156:
--

[~viirya] no big deal but:

{quote}
Master, branch-3.1 are okay. Branch-3.1 is affected
{quote}

Do you mean Branch-3.0 is affected alone?

> Thrown java.lang.NoClassDefFoundError when using spark-submit
> -
>
> Key: SPARK-35156
> URL: https://issues.apache.org/jira/browse/SPARK-35156
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.1.1
>Reporter: L. C. Hsieh
>Priority: Major
>
> Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S 
> cluster.
> Master, branch-3.1 are okay. Branch-3.1 is affected.
> How to reproduce:
> 1. Using sbt to build Spark with Kubernetes (-Pkubernetes)
> 2. Run spark-submit to submit to K8S cluster
> 3. Get the following exception 
> {code:java}
> 21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file                       
>           
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> com/fasterxml/jackson/dataformat/yaml/YAMLFactory                             
>                          
>         at 
> io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46)
>                                                    
>         at 
> io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)       
>                                                                          
>         at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530) 
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264) 
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.(Config.java:230)        
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.(Config.java:224)        
>                                                                               
>       
>         at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259) 
>                                                                               
>               at 
> org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80)
>                                 
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207)
>                                    
>         at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621)     
>                                                                               
>       
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>                                               
>         at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>                                             
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>                                                 
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)        
>                                                                          
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)  
>                                                                               
>       
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> com.fasterxml.jackson.dataformat.yaml.YAMLFactory
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>         ... 19 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: i

[jira] [Commented] (SPARK-35160) Spark application submitted despite failing to get Hive delegation token

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330385#comment-17330385
 ] 

Hyukjin Kwon commented on SPARK-35160:
--

[~mauzhang] if this is a question, it should better be asked to the mailing 
list. If you file an issue, it would be greatly helpful what suggestion you 
would propose.

> Spark application submitted despite failing to get Hive delegation token
> 
>
> Key: SPARK-35160
> URL: https://issues.apache.org/jira/browse/SPARK-35160
> Project: Spark
>  Issue Type: Improvement
>  Components: Security
>Affects Versions: 3.1.1
>Reporter: Manu Zhang
>Priority: Major
>
> Currently, when running on YARN and failing to get Hive delegation token, a 
> Spark SQL application will still be submitted. Eventually, the application 
> will fail on connecting to Hive metastore without a valid delegation token. 
> Is there any reason for this design ?
> cc [~jerryshao] who originally implemented this in 
> https://issues.apache.org/jira/browse/SPARK-14743
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35176) Raise TypeError in inappropriate type case rather than ValueError

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330383#comment-17330383
 ] 

Hyukjin Kwon commented on SPARK-35176:
--

[~yikunkero] Please go ahead for a PR

>  Raise TypeError in inappropriate type case rather than ValueError
> --
>
> Key: SPARK-35176
> URL: https://issues.apache.org/jira/browse/SPARK-35176
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Priority: Minor
>
> There are many wrong error type usages on ValueError type.
> When an operation or function is applied to an object of inappropriate type, 
> we should use TypeError rather than ValueError.
> such as:
> [https://github.com/apache/spark/blob/355c39939d9e4c87ffc9538eb822a41cb2ff93fb/python/pyspark/sql/dataframe.py#L1137]
> [https://github.com/apache/spark/blob/355c39939d9e4c87ffc9538eb822a41cb2ff93fb/python/pyspark/sql/dataframe.py#L1228]
>  
> We should do some correction in some right time, note that if we do these 
> corrections, it will break some catch on original ValueError.
>  
> [1] https://docs.python.org/3/library/exceptions.html#TypeError



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35184) Filtering a dataframe after groupBy and user-define-aggregate-function in Pyspark will cause java.lang.UnsupportedOperationException

2021-04-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35184.
--
Resolution: Cannot Reproduce

> Filtering a dataframe after groupBy and user-define-aggregate-function in 
> Pyspark will cause java.lang.UnsupportedOperationException
> 
>
> Key: SPARK-35184
> URL: https://issues.apache.org/jira/browse/SPARK-35184
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.4.0
>Reporter: Xiao Jin
>Priority: Major
>
> I found some strange error when I'm coding Pyspark UDAF. After I call groupBy 
> function and agg function, I want to filter some data from remaining 
> dataframe, but it seems not work. My sample code is below.
> {code:java}
> >>> from pyspark.sql.functions import pandas_udf, PandasUDFType, col
> >>> df = spark.createDataFrame(
> ... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
> ... ("id", "v"))
> >>> @pandas_udf("double", PandasUDFType.GROUPED_AGG)
> ... def mean_udf(v):
> ... return v.mean()
> >>> df.groupby("id").agg(mean_udf(df['v']).alias("mean")).filter(col("mean") 
> >>> > 5).show()
> {code}
> The code above will cause exception printed below
> {code:java}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/opt/spark/python/pyspark/sql/dataframe.py", line 378, in show
> print(self._jdf.showString(n, 20, vertical))
>   File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 
> 1257, in __call__
>   File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco
> return f(*a, **kw)
>   File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 
> 328, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o3717.showString.
> : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, 
> tree:
> Exchange hashpartitioning(id#1726L, 200)
> +- *(1) Filter (mean_udf(v#1727) > 5.0)
>+- Scan ExistingRDD[id#1726L,v#1727]
> at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
> at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
> at 
> org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:121)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.python.AggregateInPandasExec.doExecute(AggregateInPandasExec.scala:80)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
> at 
> org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
> at 
> org.apache.spark.sql.Dataset.org$apache$spark$s

[jira] [Commented] (SPARK-35184) Filtering a dataframe after groupBy and user-define-aggregate-function in Pyspark will cause java.lang.UnsupportedOperationException

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330381#comment-17330381
 ] 

Hyukjin Kwon commented on SPARK-35184:
--

Seems like it works in the latest master branch:
{code:java}
+---++
| id|mean|
+---++
|  2| 6.0|
+---++ {code}

It would be great if we can identify and see if we can backport.

> Filtering a dataframe after groupBy and user-define-aggregate-function in 
> Pyspark will cause java.lang.UnsupportedOperationException
> 
>
> Key: SPARK-35184
> URL: https://issues.apache.org/jira/browse/SPARK-35184
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.4.0
>Reporter: Xiao Jin
>Priority: Major
>
> I found some strange error when I'm coding Pyspark UDAF. After I call groupBy 
> function and agg function, I want to filter some data from remaining 
> dataframe, but it seems not work. My sample code is below.
> {code:java}
> >>> from pyspark.sql.functions import pandas_udf, PandasUDFType, col
> >>> df = spark.createDataFrame(
> ... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
> ... ("id", "v"))
> >>> @pandas_udf("double", PandasUDFType.GROUPED_AGG)
> ... def mean_udf(v):
> ... return v.mean()
> >>> df.groupby("id").agg(mean_udf(df['v']).alias("mean")).filter(col("mean") 
> >>> > 5).show()
> {code}
> The code above will cause exception printed below
> {code:java}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/opt/spark/python/pyspark/sql/dataframe.py", line 378, in show
> print(self._jdf.showString(n, 20, vertical))
>   File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 
> 1257, in __call__
>   File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco
> return f(*a, **kw)
>   File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 
> 328, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o3717.showString.
> : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, 
> tree:
> Exchange hashpartitioning(id#1726L, 200)
> +- *(1) Filter (mean_udf(v#1727) > 5.0)
>+- Scan ExistingRDD[id#1726L,v#1727]
> at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
> at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
> at 
> org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:121)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.python.AggregateInPandasExec.doExecute(AggregateInPandasExec.scala:80)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
> at 
> org.apache.spark.sql

[jira] [Assigned] (SPARK-35169) Wrong result of min ANSI interval division by -1

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35169:


Assignee: Apache Spark

> Wrong result of min ANSI interval division by -1
> 
>
> Key: SPARK-35169
> URL: https://issues.apache.org/jira/browse/SPARK-35169
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The code below portraits the issue:
> {code:scala}
> scala> Seq(java.time.Period.ofMonths(Int.MinValue)).toDF("i").select($"i" / 
> -1).show(false)
> +-+
> |(i / -1) |
> +-+
> |INTERVAL '-178956970-8' YEAR TO MONTH|
> +-+
> scala> Seq(java.time.Duration.of(Long.MinValue, 
> java.time.temporal.ChronoUnit.MICROS)).toDF("i").select($"i" / -1).show(false)
> +---+
> |(i / -1)   |
> +---+
> |INTERVAL '-106751991 04:00:54.775808' DAY TO SECOND|
> +---+
> {code}
> The result cannot be a negative interval. Spark must throw an overflow 
> exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35169) Wrong result of min ANSI interval division by -1

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35169:


Assignee: (was: Apache Spark)

> Wrong result of min ANSI interval division by -1
> 
>
> Key: SPARK-35169
> URL: https://issues.apache.org/jira/browse/SPARK-35169
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> The code below portraits the issue:
> {code:scala}
> scala> Seq(java.time.Period.ofMonths(Int.MinValue)).toDF("i").select($"i" / 
> -1).show(false)
> +-+
> |(i / -1) |
> +-+
> |INTERVAL '-178956970-8' YEAR TO MONTH|
> +-+
> scala> Seq(java.time.Duration.of(Long.MinValue, 
> java.time.temporal.ChronoUnit.MICROS)).toDF("i").select($"i" / -1).show(false)
> +---+
> |(i / -1)   |
> +---+
> |INTERVAL '-106751991 04:00:54.775808' DAY TO SECOND|
> +---+
> {code}
> The result cannot be a negative interval. Spark must throw an overflow 
> exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35190) all columns are read even if column pruning applies when spark3.0 read table written by spark2.2

2021-04-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35190.
--
Resolution: Duplicate

> all columns are read even if column pruning applies when spark3.0 read table 
> written by spark2.2
> 
>
> Key: SPARK-35190
> URL: https://issues.apache.org/jira/browse/SPARK-35190
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: spark3.0
> set spark.sql.hive.convertMetastoreOrc=true (default value in spark3.0)
> set spark.sql.orc.impl=native(default velue in spark3.0)
>Reporter: xiaoli
>Priority: Major
>
> Before I address this issue, let me talk about the issue background: The 
> current spark version we use is 2.2, and we plan to migrate to spark3.0 in 
> near future. Before migration, we test some query in both spark2.2 and 
> spark3.0 to check potential issue. The data source table of these query is 
> orc format written by spark2.2.
>  
> I find that even if column pruning is applied, spark3.0’s native reader will 
> read all columns.
>  
> Then I do remote debug. In OrcUtils.scala’s requestedColumnIds Method, it 
> will check whether field name is started with “_col”. In my case, field name 
> is started with “_col”, like “_col1”, “_col2”. So pruneCols is not done.  The 
> code is below:
>  
> if (orcFieldNames.forall(_.startsWith("_col"))) {
>   // This is a ORC file written by Hive, no field names in the physical 
> schema, assume the
>   // physical schema maps to the data scheme by index.
>   _assert_(orcFieldNames.length <= dataSchema.length, "The given data schema 
> " +
>     s"*$*{dataSchema.catalogString} has less fields than the actual ORC 
> physical schema, " +
>     "no idea which columns were dropped, fail to read.")
>   // for ORC file written by Hive, no field names
>   // in the physical schema, there is a need to send the
>   // entire dataSchema instead of required schema.
>   // So pruneCols is not done in this case
>   Some(requiredSchema.fieldNames.map { name =>
>     val index = dataSchema.fieldIndex(name)
>     if (index < orcFieldNames.length) {
>       index
>     } else {
>       -1
>     }
>   }, false)
>  
>  Although this code comment explains reason, I still do not understand. This 
> issue only happens in this case: spark3.0 uses native reader to read table 
> written by spark2.2. 
>  
> In other cases, there is no such issue. I do another 2 tests:
> Test1: use spark3.0’s hive reader (running with 
> spark.sql.hive.convertMetastoreOrc=false and spark.sql.orc.impl=hive) to read 
> the same table, it only reads pruned columns.
> Test2: use spark3.0 to write a table, then use spark3.0’s native reader to 
> read this new table, it only reads pruned columns.
>  
> This issue I mentioned is a block we use native reader in spark3.0. Can 
> anyone know further reason or provide solutions?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35191) all columns are read even if column pruning applies when spark3.0 read table written by spark2.2

2021-04-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35191.
--
Resolution: Duplicate

> all columns are read even if column pruning applies when spark3.0 read table 
> written by spark2.2
> 
>
> Key: SPARK-35191
> URL: https://issues.apache.org/jira/browse/SPARK-35191
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: spark3.0
> spark.sql.hive.convertMetastoreOrc=true(default value in spark3.0)
> spark.sql.orc.impl=native(default value in spark3.0)
>Reporter: xiaoli
>Priority: Major
>
> Before I address this issue, let me talk about the issue background: The 
> current spark version we use is 2.2, and we plan to migrate to spark3.0 in 
> near future. Before migration, we test some query in both spark2.2 and 
> spark3.0 to check potential issue. The data source table of these query is 
> orc format written by spark2.2.
>  
> I find that even if column pruning is applied, spark3.0’s native reader will 
> read all columns.
>  
> Then I do remote debug. In OrcUtils.scala’s requestedColumnIds Method, it 
> will check whether field name is started with “_col”. In my case, field name 
> is started with “_col”, like “_col1”, “_col2”. So pruneCols is not done.  The 
> code is below:
>  
> if (orcFieldNames.forall(_.startsWith("_col"))) {
>   // This is a ORC file written by Hive, no field names in the physical 
> schema, assume the
>   // physical schema maps to the data scheme by index.
>   _assert_(orcFieldNames.length <= dataSchema.length, "The given data schema 
> " +
>     s"*$*{dataSchema.catalogString} has less fields than the actual ORC 
> physical schema, " +
>     "no idea which columns were dropped, fail to read.")
>   // for ORC file written by Hive, no field names
>   // in the physical schema, there is a need to send the
>   // entire dataSchema instead of required schema.
>   // So pruneCols is not done in this case
>   Some(requiredSchema.fieldNames.map { name =>
>     val index = dataSchema.fieldIndex(name)
>     if (index < orcFieldNames.length) {
>       index
>     } else {
>       -1
>     }
>   }, false)
>  
>  Although this code comment explains reason, I still do not understand. This 
> issue only happens in this case: spark3.0 uses native reader to read table 
> written by spark2.2. 
>  
> In other cases, there is no such issue. I do another 2 tests:
> Test1: use spark3.0’s hive reader (running with 
> spark.sql.hive.convertMetastoreOrc=false and spark.sql.orc.impl=hive) to read 
> the same table, it only reads pruned columns.
> Test2: use spark3.0 to write a table, then use spark3.0’s native reader to 
> read this new table, it only reads pruned columns.
>  
> This issue I mentioned is a block we use native reader in spark3.0. Can 
> anyone know further reason or provide solutions?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330379#comment-17330379
 ] 

Hyukjin Kwon commented on SPARK-35196:
--

It does support, and you can specify 
{{org.apache.hadoop.io.compress.ZStandardCodec}} for compression option. 
However, I agree with adding a short name for the easy use. Are you interested 
in adding an alias?

[~dongjoon]  FYI

> DataFrameWriter.text support zstd compression
> -
>
> Key: SPARK-35196
> URL: https://issues.apache.org/jira/browse/SPARK-35196
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Leonard Lausen
>Priority: Major
>
> [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html]
>  specifies that only the following compression codecs are supported: `none, 
> bzip2, gzip, lz4, snappy and deflate`
> However, RDD API supports compression with zstd if users specify 
> 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the 
> saveAsTextFile method.
> Please also expose zstd in the DataFrameWriter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35199) Tasks are failing with zstd default of spark.shuffle.mapStatus.compression.codec

2021-04-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330377#comment-17330377
 ] 

Hyukjin Kwon commented on SPARK-35199:
--

cc [~dongjoon] FYI

> Tasks are failing with zstd default of 
> spark.shuffle.mapStatus.compression.codec
> 
>
> Key: SPARK-35199
> URL: https://issues.apache.org/jira/browse/SPARK-35199
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.0.1
>Reporter: Leonard Lausen
>Priority: Major
>
> In Spark 3.0.1, tasks fail with the default value of 
> {{spark.shuffle.mapStatus.compression.codec=zstd}}, but work without problem 
> when changing the value to {{spark.shuffle.mapStatus.compression.codec=lz4}}.
> Exemplar backtrace:
>  
> {code:java}
> java.io.IOException: Decompression error: Version not supported at 
> com.github.luben.zstd.ZstdInputStream.readInternal(ZstdInputStream.java:164) 
> at com.github.luben.zstd.ZstdInputStream.read(ZstdInputStream.java:120) at 
> java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at 
> java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at 
> java.io.BufferedInputStream.read(BufferedInputStream.java:345) at 
> java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2781) 
> at 
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2797)
>  at 
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3274)
>  at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934) at 
> java.io.ObjectInputStream.(ObjectInputStream.java:396) at 
> org.apache.spark.MapOutputTracker$.deserializeObject$1(MapOutputTracker.scala:954)
>  at 
> org.apache.spark.MapOutputTracker$.deserializeMapStatuses(MapOutputTracker.scala:964)
>  at 
> org.apache.spark.MapOutputTrackerWorker.$anonfun$getStatuses$2(MapOutputTracker.scala:856)
>  at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64) at 
> org.apache.spark.MapOutputTrackerWorker.getStatuses(MapOutputTracker.scala:851)
>  at 
> org.apache.spark.MapOutputTrackerWorker.getMapSizesByExecutorId(MapOutputTracker.scala:808)
>  at 
> org.apache.spark.shuffle.sort.SortShuffleManager.getReader(SortShuffleManager.scala:128)
>  at 
> org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:185)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:127) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)  {code}
> {{}}
> Exemplar code to reproduce the issue
> {code:java}
> import pyspark.sql.functions as F
> df = spark.read.text("s3://my-bucket-with-300GB-compressed-text-files")
> df_rand = df.orderBy(F.rand(1))
> df_rand.write.text('s3://shuffled-output''){code}
> See 
> [https://stackoverflow.com/questions/64876463/spark-3-0-1-tasks-are-failing-when-using-zstd-compression-codec]
>  for another report of this issue and workaround.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35169) Wrong result of min ANSI interval division by -1

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330375#comment-17330375
 ] 

Apache Spark commented on SPARK-35169:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/32314

> Wrong result of min ANSI interval division by -1
> 
>
> Key: SPARK-35169
> URL: https://issues.apache.org/jira/browse/SPARK-35169
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> The code below portraits the issue:
> {code:scala}
> scala> Seq(java.time.Period.ofMonths(Int.MinValue)).toDF("i").select($"i" / 
> -1).show(false)
> +-+
> |(i / -1) |
> +-+
> |INTERVAL '-178956970-8' YEAR TO MONTH|
> +-+
> scala> Seq(java.time.Duration.of(Long.MinValue, 
> java.time.temporal.ChronoUnit.MICROS)).toDF("i").select($"i" / -1).show(false)
> +---+
> |(i / -1)   |
> +---+
> |INTERVAL '-106751991 04:00:54.775808' DAY TO SECOND|
> +---+
> {code}
> The result cannot be a negative interval. Spark must throw an overflow 
> exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35169) Wrong result of min ANSI interval division by -1

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330374#comment-17330374
 ] 

Apache Spark commented on SPARK-35169:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/32314

> Wrong result of min ANSI interval division by -1
> 
>
> Key: SPARK-35169
> URL: https://issues.apache.org/jira/browse/SPARK-35169
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> The code below portraits the issue:
> {code:scala}
> scala> Seq(java.time.Period.ofMonths(Int.MinValue)).toDF("i").select($"i" / 
> -1).show(false)
> +-+
> |(i / -1) |
> +-+
> |INTERVAL '-178956970-8' YEAR TO MONTH|
> +-+
> scala> Seq(java.time.Duration.of(Long.MinValue, 
> java.time.temporal.ChronoUnit.MICROS)).toDF("i").select($"i" / -1).show(false)
> +---+
> |(i / -1)   |
> +---+
> |INTERVAL '-106751991 04:00:54.775808' DAY TO SECOND|
> +---+
> {code}
> The result cannot be a negative interval. Spark must throw an overflow 
> exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35205) Simplify org.apache.hive.service.cli.OperationType.getOperationType by using a hashMap.

2021-04-23 Thread akiyamaneko (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

akiyamaneko updated SPARK-35205:

Summary: Simplify 
org.apache.hive.service.cli.OperationType.getOperationType by using a hashMap.  
(was: Simplify operationType.getOperationType by using a hashMap.)

> Simplify org.apache.hive.service.cli.OperationType.getOperationType by using 
> a hashMap.
> ---
>
> Key: SPARK-35205
> URL: https://issues.apache.org/jira/browse/SPARK-35205
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: akiyamaneko
>Priority: Minor
>
> SImply the *getOperationType* method in 
> `org.apache.hive.service.cli.OperationType`.
>  Introduce a *HashMap* to cache the existed enumeration types, so as to avoid 
> a linear search in a for loop.
>  `*OperationType.getOperationType*` can be called in 
> OperationHandler.constructor:
>  
> {code:java}
>  
> public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion 
> protocol) {
>   super(tOperationHandle.getOperationId());
>   this.opType = 
> OperationType.getOperationType(tOperationHandle.getOperationType());
>   this.hasResultSet = tOperationHandle.isHasResultSet();
>   this.protocol = protocol;
> }
>  
> {code}
> `*OperationHandle* ` is widely used, It's better to improve the execution 
> efficiency of `OperationType.getOperationType`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35205) Refractor org.apache.hive.service.cli.OperationType.getOperationType by using a hashMap.

2021-04-23 Thread akiyamaneko (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

akiyamaneko updated SPARK-35205:

Summary: Refractor 
org.apache.hive.service.cli.OperationType.getOperationType by using a hashMap.  
(was: Simplify org.apache.hive.service.cli.OperationType.getOperationType by 
using a hashMap.)

> Refractor org.apache.hive.service.cli.OperationType.getOperationType by using 
> a hashMap.
> 
>
> Key: SPARK-35205
> URL: https://issues.apache.org/jira/browse/SPARK-35205
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: akiyamaneko
>Priority: Minor
>
> SImply the *getOperationType* method in 
> `org.apache.hive.service.cli.OperationType`.
>  Introduce a *HashMap* to cache the existed enumeration types, so as to avoid 
> a linear search in a for loop.
>  `*OperationType.getOperationType*` can be called in 
> OperationHandler.constructor:
>  
> {code:java}
>  
> public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion 
> protocol) {
>   super(tOperationHandle.getOperationId());
>   this.opType = 
> OperationType.getOperationType(tOperationHandle.getOperationType());
>   this.hasResultSet = tOperationHandle.isHasResultSet();
>   this.protocol = protocol;
> }
>  
> {code}
> `*OperationHandle* ` is widely used, It's better to improve the execution 
> efficiency of `OperationType.getOperationType`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35205:


Assignee: (was: Apache Spark)

> Simplify operationType.getOperationType by using a hashMap.
> ---
>
> Key: SPARK-35205
> URL: https://issues.apache.org/jira/browse/SPARK-35205
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: akiyamaneko
>Priority: Minor
>
> SImply the *getOperationType* method.
>  Introduce a *HashMap* to cache the existed enumeration types, so as to avoid 
> a linear search in a for loop.
>  `*OperationType.getOperationType*` can be called in 
> OperationHandler.constructor:
>  
> {code:java}
>  
> public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion 
> protocol) {
>   super(tOperationHandle.getOperationId());
>   this.opType = 
> OperationType.getOperationType(tOperationHandle.getOperationType());
>   this.hasResultSet = tOperationHandle.isHasResultSet();
>   this.protocol = protocol;
> }
>  
> {code}
> `*OperationHandle* ` is widely used, It's better to improve the execution 
> efficiency of `OperationType.getOperationType`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.

2021-04-23 Thread akiyamaneko (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

akiyamaneko updated SPARK-35205:

Description: 
SImply the *getOperationType* method in 
`org.apache.hive.service.cli.OperationType`.
 Introduce a *HashMap* to cache the existed enumeration types, so as to avoid a 
linear search in a for loop.
 `*OperationType.getOperationType*` can be called in 
OperationHandler.constructor:

 
{code:java}
 

public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion 
protocol) {
  super(tOperationHandle.getOperationId());
  this.opType = 
OperationType.getOperationType(tOperationHandle.getOperationType());
  this.hasResultSet = tOperationHandle.isHasResultSet();
  this.protocol = protocol;
}
 

{code}
`*OperationHandle* ` is widely used, It's better to improve the execution 
efficiency of `OperationType.getOperationType`

  was:
SImply the *getOperationType* method.
 Introduce a *HashMap* to cache the existed enumeration types, so as to avoid a 
linear search in a for loop.
 `*OperationType.getOperationType*` can be called in 
OperationHandler.constructor:

 
{code:java}
 

public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion 
protocol) {
  super(tOperationHandle.getOperationId());
  this.opType = 
OperationType.getOperationType(tOperationHandle.getOperationType());
  this.hasResultSet = tOperationHandle.isHasResultSet();
  this.protocol = protocol;
}
 

{code}
`*OperationHandle* ` is widely used, It's better to improve the execution 
efficiency of `OperationType.getOperationType`


> Simplify operationType.getOperationType by using a hashMap.
> ---
>
> Key: SPARK-35205
> URL: https://issues.apache.org/jira/browse/SPARK-35205
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: akiyamaneko
>Priority: Minor
>
> SImply the *getOperationType* method in 
> `org.apache.hive.service.cli.OperationType`.
>  Introduce a *HashMap* to cache the existed enumeration types, so as to avoid 
> a linear search in a for loop.
>  `*OperationType.getOperationType*` can be called in 
> OperationHandler.constructor:
>  
> {code:java}
>  
> public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion 
> protocol) {
>   super(tOperationHandle.getOperationId());
>   this.opType = 
> OperationType.getOperationType(tOperationHandle.getOperationType());
>   this.hasResultSet = tOperationHandle.isHasResultSet();
>   this.protocol = protocol;
> }
>  
> {code}
> `*OperationHandle* ` is widely used, It's better to improve the execution 
> efficiency of `OperationType.getOperationType`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35205:


Assignee: (was: Apache Spark)

> Simplify operationType.getOperationType by using a hashMap.
> ---
>
> Key: SPARK-35205
> URL: https://issues.apache.org/jira/browse/SPARK-35205
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: akiyamaneko
>Priority: Minor
>
> SImply the *getOperationType* method.
>  Introduce a *HashMap* to cache the existed enumeration types, so as to avoid 
> a linear search in a for loop.
>  `*OperationType.getOperationType*` can be called in 
> OperationHandler.constructor:
>  
> {code:java}
>  
> public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion 
> protocol) {
>   super(tOperationHandle.getOperationId());
>   this.opType = 
> OperationType.getOperationType(tOperationHandle.getOperationType());
>   this.hasResultSet = tOperationHandle.isHasResultSet();
>   this.protocol = protocol;
> }
>  
> {code}
> `*OperationHandle* ` is widely used, It's better to improve the execution 
> efficiency of `OperationType.getOperationType`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330371#comment-17330371
 ] 

Apache Spark commented on SPARK-35205:
--

User 'kyoty' has created a pull request for this issue:
https://github.com/apache/spark/pull/32313

> Simplify operationType.getOperationType by using a hashMap.
> ---
>
> Key: SPARK-35205
> URL: https://issues.apache.org/jira/browse/SPARK-35205
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: akiyamaneko
>Priority: Minor
>
> SImply the *getOperationType* method.
>  Introduce a *HashMap* to cache the existed enumeration types, so as to avoid 
> a linear search in a for loop.
>  `*OperationType.getOperationType*` can be called in 
> OperationHandler.constructor:
>  
> {code:java}
>  
> public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion 
> protocol) {
>   super(tOperationHandle.getOperationId());
>   this.opType = 
> OperationType.getOperationType(tOperationHandle.getOperationType());
>   this.hasResultSet = tOperationHandle.isHasResultSet();
>   this.protocol = protocol;
> }
>  
> {code}
> `*OperationHandle* ` is widely used, It's better to improve the execution 
> efficiency of `OperationType.getOperationType`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35205:


Assignee: Apache Spark

> Simplify operationType.getOperationType by using a hashMap.
> ---
>
> Key: SPARK-35205
> URL: https://issues.apache.org/jira/browse/SPARK-35205
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: akiyamaneko
>Assignee: Apache Spark
>Priority: Minor
>
> SImply the *getOperationType* method.
>  Introduce a *HashMap* to cache the existed enumeration types, so as to avoid 
> a linear search in a for loop.
>  `*OperationType.getOperationType*` can be called in 
> OperationHandler.constructor:
>  
> {code:java}
>  
> public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion 
> protocol) {
>   super(tOperationHandle.getOperationId());
>   this.opType = 
> OperationType.getOperationType(tOperationHandle.getOperationType());
>   this.hasResultSet = tOperationHandle.isHasResultSet();
>   this.protocol = protocol;
> }
>  
> {code}
> `*OperationHandle* ` is widely used, It's better to improve the execution 
> efficiency of `OperationType.getOperationType`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.

2021-04-23 Thread akiyamaneko (Jira)

akiyamaneko created SPARK-35205:
---

 Summary: Simplify operationType.getOperationType by using a 
hashMap.
 Key: SPARK-35205
 URL: https://issues.apache.org/jira/browse/SPARK-35205
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.1
Reporter: akiyamaneko


SImply the *getOperationType* method.
 Introduce a *HashMap* to cache the existed enumeration types, so as to avoid a 
linear search in a for loop.
 `*OperationType.getOperationType*` can be called in 
OperationHandler.constructor:

 
{code:java}
 

public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion 
protocol) {
  super(tOperationHandle.getOperationId());
  this.opType = 
OperationType.getOperationType(tOperationHandle.getOperationType());
  this.hasResultSet = tOperationHandle.isHasResultSet();
  this.protocol = protocol;
}
 

{code}
`*OperationHandle* ` is widely used, It's better to improve the execution 
efficiency of `OperationType.getOperationType`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35204:


Assignee: Apache Spark

> CatalystTypeConverters of date/timestamp should accept both the old and new 
> Java time classes
> -
>
> Key: SPARK-35204
> URL: https://issues.apache.org/jira/browse/SPARK-35204
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35204:


Assignee: (was: Apache Spark)

> CatalystTypeConverters of date/timestamp should accept both the old and new 
> Java time classes
> -
>
> Key: SPARK-35204
> URL: https://issues.apache.org/jira/browse/SPARK-35204
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330332#comment-17330332
 ] 

Apache Spark commented on SPARK-35204:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/32312

> CatalystTypeConverters of date/timestamp should accept both the old and new 
> Java time classes
> -
>
> Key: SPARK-35204
> URL: https://issues.apache.org/jira/browse/SPARK-35204
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-35176) Raise TypeError in inappropriate type case rather than ValueError

2021-04-23 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329971#comment-17329971
 ] 

Yikun Jiang edited comment on SPARK-35176 at 4/23/21, 9:14 AM:
---

I write up a POC in [https://github.com/Yikun/annotation-type-checker/pull/4] 
to add some simple way to do input validation (runtime type checker).


was (Author: yikunkero):
I write up a POC in [https://github.com/Yikun/annotation-type-checker/pull/4] 
to add some simple way to do runtime type checker.

>  Raise TypeError in inappropriate type case rather than ValueError
> --
>
> Key: SPARK-35176
> URL: https://issues.apache.org/jira/browse/SPARK-35176
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Yikun Jiang
>Priority: Minor
>
> There are many wrong error type usages on ValueError type.
> When an operation or function is applied to an object of inappropriate type, 
> we should use TypeError rather than ValueError.
> such as:
> [https://github.com/apache/spark/blob/355c39939d9e4c87ffc9538eb822a41cb2ff93fb/python/pyspark/sql/dataframe.py#L1137]
> [https://github.com/apache/spark/blob/355c39939d9e4c87ffc9538eb822a41cb2ff93fb/python/pyspark/sql/dataframe.py#L1228]
>  
> We should do some correction in some right time, note that if we do these 
> corrections, it will break some catch on original ValueError.
>  
> [1] https://docs.python.org/3/library/exceptions.html#TypeError



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35169) Wrong result of min ANSI interval division by -1

2021-04-23 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330241#comment-17330241
 ] 

angerszhu commented on SPARK-35169:
---

It's guava IntMath.divude's bug..

> Wrong result of min ANSI interval division by -1
> 
>
> Key: SPARK-35169
> URL: https://issues.apache.org/jira/browse/SPARK-35169
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> The code below portraits the issue:
> {code:scala}
> scala> Seq(java.time.Period.ofMonths(Int.MinValue)).toDF("i").select($"i" / 
> -1).show(false)
> +-+
> |(i / -1) |
> +-+
> |INTERVAL '-178956970-8' YEAR TO MONTH|
> +-+
> scala> Seq(java.time.Duration.of(Long.MinValue, 
> java.time.temporal.ChronoUnit.MICROS)).toDF("i").select($"i" / -1).show(false)
> +---+
> |(i / -1)   |
> +---+
> |INTERVAL '-106751991 04:00:54.775808' DAY TO SECOND|
> +---+
> {code}
> The result cannot be a negative interval. Spark must throw an overflow 
> exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes

2021-04-23 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-35204:
---

 Summary: CatalystTypeConverters of date/timestamp should accept 
both the old and new Java time classes
 Key: SPARK-35204
 URL: https://issues.apache.org/jira/browse/SPARK-35204
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35088) Accept ANSI intervals by the Sequence expression

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330232#comment-17330232
 ] 

Apache Spark commented on SPARK-35088:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/32311

> Accept ANSI intervals by the Sequence expression
> 
>
> Key: SPARK-35088
> URL: https://issues.apache.org/jira/browse/SPARK-35088
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Currently, the expression accepts only CalendarIntervalType as the step 
> expression. It should support ANSI intervals as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35088) Accept ANSI intervals by the Sequence expression

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35088:


Assignee: Apache Spark

> Accept ANSI intervals by the Sequence expression
> 
>
> Key: SPARK-35088
> URL: https://issues.apache.org/jira/browse/SPARK-35088
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Currently, the expression accepts only CalendarIntervalType as the step 
> expression. It should support ANSI intervals as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35088) Accept ANSI intervals by the Sequence expression

2021-04-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35088:


Assignee: (was: Apache Spark)

> Accept ANSI intervals by the Sequence expression
> 
>
> Key: SPARK-35088
> URL: https://issues.apache.org/jira/browse/SPARK-35088
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Currently, the expression accepts only CalendarIntervalType as the step 
> expression. It should support ANSI intervals as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35088) Accept ANSI intervals by the Sequence expression

2021-04-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330230#comment-17330230
 ] 

Apache Spark commented on SPARK-35088:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/32311

> Accept ANSI intervals by the Sequence expression
> 
>
> Key: SPARK-35088
> URL: https://issues.apache.org/jira/browse/SPARK-35088
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Currently, the expression accepts only CalendarIntervalType as the step 
> expression. It should support ANSI intervals as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35078) Migrate to transformWithPruning or resolveWithPruning for expression rules

2021-04-23 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-35078:
--

Assignee: Yingyi Bu

> Migrate to transformWithPruning or resolveWithPruning for expression rules
> --
>
> Key: SPARK-35078
> URL: https://issues.apache.org/jira/browse/SPARK-35078
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 3.1.0
>Reporter: Yingyi Bu
>Assignee: Yingyi Bu
>Priority: Major
>
> E.g., rules in org/apache/spark/sql/catalyst/optimizer/expressions.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 110 matches

Mail list logo