[jira] [Updated] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class

2022-03-09 Thread Brian Schaefer (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Schaefer updated SPARK-38483:
---
Description: 
Having the name of a column as an attribute of PySpark {{Column}} class 
instances can enable some convenient patterns, for example:

Applying a function to a column and aliasing with the original name:
{code:java}
values = F.col("values")
# repeating the column name as an alias
distinct_values = F.array_distinct(values).alias("values")
# re-using the existing column name
distinct_values = F.array_distinct(values).alias(values._name){code}
Checking the column name inside a custom function and applying conditional 
logic on the name:
{code:java}
def custom_function(col: Column) -> Column:
if col._name == "my_column":
return col.astype("int")
return col.astype("string"){code}
The proposal in this issue is to add a property {{Column._name}} that obtains 
the name or alias of a column in a similar way as currently done in the 
{{Column.__repr__}} method: 
[https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.]
 The choice of {{_name}} intentionally avoids collision with the existing 
{{Column.name}} method, which is an alias for {{{}Column.alias{}}}.
 Labels: starter  (was: )

> Column name or alias as an attribute of the PySpark Column class
> 
>
> Key: SPARK-38483
> URL: https://issues.apache.org/jira/browse/SPARK-38483
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Brian Schaefer
>Priority: Minor
>  Labels: starter
>
> Having the name of a column as an attribute of PySpark {{Column}} class 
> instances can enable some convenient patterns, for example:
> Applying a function to a column and aliasing with the original name:
> {code:java}
> values = F.col("values")
> # repeating the column name as an alias
> distinct_values = F.array_distinct(values).alias("values")
> # re-using the existing column name
> distinct_values = F.array_distinct(values).alias(values._name){code}
> Checking the column name inside a custom function and applying conditional 
> logic on the name:
> {code:java}
> def custom_function(col: Column) -> Column:
> if col._name == "my_column":
> return col.astype("int")
> return col.astype("string"){code}
> The proposal in this issue is to add a property {{Column._name}} that obtains 
> the name or alias of a column in a similar way as currently done in the 
> {{Column.__repr__}} method: 
> [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.]
>  The choice of {{_name}} intentionally avoids collision with the existing 
> {{Column.name}} method, which is an alias for {{{}Column.alias{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class

2022-03-09 Thread Brian Schaefer (Jira)
Brian Schaefer created SPARK-38483:
--

 Summary: Column name or alias as an attribute of the PySpark 
Column class
 Key: SPARK-38483
 URL: https://issues.apache.org/jira/browse/SPARK-38483
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 3.2.1
Reporter: Brian Schaefer






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36681) Fail to load Snappy codec

2022-03-09 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-36681.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35784
[https://github.com/apache/spark/pull/35784]

> Fail to load Snappy codec
> -
>
> Key: SPARK-36681
> URL: https://issues.apache.org/jira/browse/SPARK-36681
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.3.0
>
>
> snappy-java as a native library should not be relocated in Hadoop shaded 
> client libraries. Currently we use Hadoop shaded client libraries in Spark. 
> If trying to use SnappyCodec to write sequence file, we will encounter the 
> following error:
> {code}
> [info]   Cause: java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native 
> Method)   
>   
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151)   
>   
>
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282)
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589)
>  
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36681) Fail to load Snappy codec

2022-03-09 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-36681:
---

Assignee: L. C. Hsieh

> Fail to load Snappy codec
> -
>
> Key: SPARK-36681
> URL: https://issues.apache.org/jira/browse/SPARK-36681
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> snappy-java as a native library should not be relocated in Hadoop shaded 
> client libraries. Currently we use Hadoop shaded client libraries in Spark. 
> If trying to use SnappyCodec to write sequence file, we will encounter the 
> following error:
> {code}
> [info]   Cause: java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native 
> Method)   
>   
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151)   
>   
>
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282)
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589)
>  
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38354) Add hash probes metrics for shuffled hash join

2022-03-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-38354:
---

Assignee: Cheng Su

> Add hash probes metrics for shuffled hash join
> --
>
> Key: SPARK-38354
> URL: https://issues.apache.org/jira/browse/SPARK-38354
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Trivial
>
> For `HashAggregate` there's a SQL metrics to track number of hash probes per 
> looked-up key. It would be better to add a similar metrics for shuffled hash 
> join as well, to get some idea of hash probing performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38354) Add hash probes metrics for shuffled hash join

2022-03-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-38354.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35686
[https://github.com/apache/spark/pull/35686]

> Add hash probes metrics for shuffled hash join
> --
>
> Key: SPARK-38354
> URL: https://issues.apache.org/jira/browse/SPARK-38354
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Trivial
> Fix For: 3.3.0
>
>
> For `HashAggregate` there's a SQL metrics to track number of hash probes per 
> looked-up key. It would be better to add a similar metrics for shuffled hash 
> join as well, to get some idea of hash probing performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37947) Cannot use _outer generators in a lateral view

2022-03-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37947:
---

Assignee: Bruce Robbins

> Cannot use _outer generators in a lateral view
> 
>
> Key: SPARK-37947
> URL: https://issues.apache.org/jira/browse/SPARK-37947
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Minor
>
> This works:
> {noformat}
> select * from values 1, 2 lateral view outer explode(array()) as b;
> {noformat}
> But this does not work:
> {noformat}
> select * from values 1, 2 lateral view explode_outer(array()) as b;
> {noformat}
> It produces the error:
> {noformat}
> Error in query: Column 'b' does not exist. Did you mean one of the following? 
> [col1]; line 1 pos 26;
> {noformat}
> Similarly, this works:
> {noformat}
> select * from values 1, 2
> lateral view outer inline(array(struct(1, 2, 3))) as b, c, d;
> {noformat}
> But this does not:
> {noformat}
> select * from values 1, 2
> lateral view inline_outer(array(struct(1, 2, 3))) as b, c, d;
> {noformat}
> It produces the error:
> {noformat}
> Error in query: Column 'b' does not exist. Did you mean one of the following? 
> [col1]; line 2 pos 0;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37947) Cannot use _outer generators in a lateral view

2022-03-09 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37947.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35232
[https://github.com/apache/spark/pull/35232]

> Cannot use _outer generators in a lateral view
> 
>
> Key: SPARK-37947
> URL: https://issues.apache.org/jira/browse/SPARK-37947
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Minor
> Fix For: 3.3.0
>
>
> This works:
> {noformat}
> select * from values 1, 2 lateral view outer explode(array()) as b;
> {noformat}
> But this does not work:
> {noformat}
> select * from values 1, 2 lateral view explode_outer(array()) as b;
> {noformat}
> It produces the error:
> {noformat}
> Error in query: Column 'b' does not exist. Did you mean one of the following? 
> [col1]; line 1 pos 26;
> {noformat}
> Similarly, this works:
> {noformat}
> select * from values 1, 2
> lateral view outer inline(array(struct(1, 2, 3))) as b, c, d;
> {noformat}
> But this does not:
> {noformat}
> select * from values 1, 2
> lateral view inline_outer(array(struct(1, 2, 3))) as b, c, d;
> {noformat}
> It produces the error:
> {noformat}
> Error in query: Column 'b' does not exist. Did you mean one of the following? 
> [col1]; line 2 pos 0;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37995) TPCDS 1TB q72 fails when spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly is false

2022-03-09 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503595#comment-17503595
 ] 

Wenchen Fan commented on SPARK-37995:
-

Does this problem still exists in 3.2 and master branch?

> TPCDS 1TB q72 fails when 
> spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly is false
> 
>
> Key: SPARK-37995
> URL: https://issues.apache.org/jira/browse/SPARK-37995
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Kapil Singh
>Priority: Major
> Attachments: full-stacktrace.txt
>
>
> TPCDS 1TB q72 fails in 3.2 Spark when 
> spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly is false. We 
> have been running with this config in 3.1 as well and it worked fine in that 
> version. This used to add a subquery dpp in q72.
> Relevant stack trace
> {code:java}
> rror: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to 
> org.apache.spark.sql.execution.SparkPlan  at 
> scala.collection.immutable.List.map(List.scala:293)  at 
> org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
>   at 
> org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> 
> 
> at 
> org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
>   at 
> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.onUpdatePlan(AdaptiveSparkPlanExec.scala:708)
>   at 
> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$2(AdaptiveSparkPlanExec.scala:239)
>   at scala.runtime.java8.JFunction1$mcVJ$sp.apply(JFunction1$mcVJ$sp.java:23) 
>  at scala.Option.foreach(Option.scala:407)  at 
> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:239)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)  at 
> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:226)
>   at 
> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:365)
>   at 
> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:338)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38482) Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs

2022-03-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38482:


Assignee: (was: Apache Spark)

> Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs
> ---
>
> Key: SPARK-38482
> URL: https://issues.apache.org/jira/browse/SPARK-38482
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: yikf
>Priority: Minor
>
> Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38482) Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs

2022-03-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503570#comment-17503570
 ] 

Apache Spark commented on SPARK-38482:
--

User 'Yikf' has created a pull request for this issue:
https://github.com/apache/spark/pull/35788

> Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs
> ---
>
> Key: SPARK-38482
> URL: https://issues.apache.org/jira/browse/SPARK-38482
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: yikf
>Priority: Minor
>
> Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38482) Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs

2022-03-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38482:


Assignee: Apache Spark

> Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs
> ---
>
> Key: SPARK-38482
> URL: https://issues.apache.org/jira/browse/SPARK-38482
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: yikf
>Assignee: Apache Spark
>Priority: Minor
>
> Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38438) Can't update spark.jars.packages on existing global/default context

2022-03-09 Thread Rafal Wojdyla (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502633#comment-17502633
 ] 

Rafal Wojdyla edited comment on SPARK-38438 at 3/9/22, 12:35 PM:
-

The workaround actually doesn't stop the existing JVM, it does stop most of the 
threads in the JVM (including spark context related, and py4j gateway), turns 
out the only (non-daemon) thread left is the `main` thread:

{noformat}
"main" #1 prio=5 os_prio=31 cpu=1381.53ms elapsed=67.25s tid=0x7fc478809000 
nid=0x2703 runnable  [0x7c094000]
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(java.base@11.0.9.1/Native Method)
at 
java.io.FileInputStream.read(java.base@11.0.9.1/FileInputStream.java:279)
at 
java.io.BufferedInputStream.fill(java.base@11.0.9.1/BufferedInputStream.java:252)
at 
java.io.BufferedInputStream.read(java.base@11.0.9.1/BufferedInputStream.java:271)
- locked <0x0007c1012ca0> (a java.io.BufferedInputStream)
at 
org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:68)
at 
org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.9.1/Native 
Method)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.9.1/NativeMethodAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.9.1/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@11.0.9.1/Method.java:566)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{noformat}

This is waiting on the python process to stop: 
https://github.com/apache/spark/blob/71991f75ff441e80a52cb71f66f46bfebdb05671/core/src/main/scala/org/apache/spark/api/python/PythonGatewayServer.scala#L68-L70

Would it make sense to just close the stdin to trigger shutdown of the JVM, in 
which case the hard reset would be:

{code:python}
s.stop()
s._sc._gateway.shutdown()
s._sc._gateway.proc.stdin.close()
SparkContext._gateway = None
SparkContext._jvm = None
{code}

Edit: updated the issue description with the extra line.


was (Author: ravwojdyla):
The workaround actually doesn't stop the existing JVM, it does stop most of the 
threads in the JVM (including spark context related, and py4j gateway), turns 
out the only (non-daemon) thread left is the `main` thread:

{noformat}
"main" #1 prio=5 os_prio=31 cpu=1381.53ms elapsed=67.25s tid=0x7fc478809000 
nid=0x2703 runnable  [0x7c094000]
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(java.base@11.0.9.1/Native Method)
at 
java.io.FileInputStream.read(java.base@11.0.9.1/FileInputStream.java:279)
at 
java.io.BufferedInputStream.fill(java.base@11.0.9.1/BufferedInputStream.java:252)
at 
java.io.BufferedInputStream.read(java.base@11.0.9.1/BufferedInputStream.java:271)
- locked <0x0007c1012ca0> (a java.io.BufferedInputStream)
at 
org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:68)
at 
org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.9.1/Native 
Method)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.9.1/NativeMethodAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.9.1/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@11.0.9.1/Method.java:566)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at 

[jira] [Updated] (SPARK-38438) Can't update spark.jars.packages on existing global/default context

2022-03-09 Thread Rafal Wojdyla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafal Wojdyla updated SPARK-38438:
--
Description: 
Reproduction:

{code:python}
from pyspark.sql import SparkSession

# default session:
s = SparkSession.builder.getOrCreate()

# later on we want to update jars.packages, here's e.g. spark-hats
s = (SparkSession.builder
 .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
 .getOrCreate())

# line below returns None, the config was not propagated:
s._sc._conf.get("spark.jars.packages")
{code}

Stopping the context doesn't help, in fact it's even more confusing, because 
the configuration is updated, but doesn't have an effect:

{code:python}
from pyspark.sql import SparkSession

# default session:
s = SparkSession.builder.getOrCreate()

s.stop()

s = (SparkSession.builder
 .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
 .getOrCreate())

# now this line returns 'za.co.absa:spark-hats_2.12:0.2.2', but the context
# doesn't download the jar/package, as it would if there was no global context
# thus the extra package is unusable. It's not downloaded, or added to the
# classpath.
s._sc._conf.get("spark.jars.packages")
{code}

One workaround is to stop the context AND kill the JVM gateway, which seems to 
be a kind of hard reset:

{code:python}
from pyspark import SparkContext
from pyspark.sql import SparkSession

# default session:
s = SparkSession.builder.getOrCreate()

# Hard reset:
s.stop()
s._sc._gateway.shutdown()
s._sc._gateway.proc.stdin.close()
SparkContext._gateway = None
SparkContext._jvm = None

s = (SparkSession.builder
 .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
 .getOrCreate())

# Now we are guaranteed there's a new spark session, and packages
# are downloaded, added to the classpath etc.
{code}

  was:
Reproduction:

{code:python}
from pyspark.sql import SparkSession

# default session:
s = SparkSession.builder.getOrCreate()

# later on we want to update jars.packages, here's e.g. spark-hats
s = (SparkSession.builder
 .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
 .getOrCreate())

# line below returns None, the config was not propagated:
s._sc._conf.get("spark.jars.packages")
{code}

Stopping the context doesn't help, in fact it's even more confusing, because 
the configuration is updated, but doesn't have an effect:

{code:python}
from pyspark.sql import SparkSession

# default session:
s = SparkSession.builder.getOrCreate()

s.stop()

s = (SparkSession.builder
 .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
 .getOrCreate())

# now this line returns 'za.co.absa:spark-hats_2.12:0.2.2', but the context
# doesn't download the jar/package, as it would if there was no global context
# thus the extra package is unusable. It's not downloaded, or added to the
# classpath.
s._sc._conf.get("spark.jars.packages")
{code}

One workaround is to stop the context AND kill the JVM gateway, which seems to 
be a kind of hard reset:

{code:python}
from pyspark import SparkContext
from pyspark.sql import SparkSession

# default session:
s = SparkSession.builder.getOrCreate()

# Hard reset:
s.stop()
s._sc._gateway.shutdown()
SparkContext._gateway = None
SparkContext._jvm = None

s = (SparkSession.builder
 .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
 .getOrCreate())

# Now we are guaranteed there's a new spark session, and packages
# are downloaded, added to the classpath etc.
{code}


> Can't update spark.jars.packages on existing global/default context
> ---
>
> Key: SPARK-38438
> URL: https://issues.apache.org/jira/browse/SPARK-38438
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.2.1
> Environment: py: 3.9
> spark: 3.2.1
>Reporter: Rafal Wojdyla
>Priority: Major
>
> Reproduction:
> {code:python}
> from pyspark.sql import SparkSession
> # default session:
> s = SparkSession.builder.getOrCreate()
> # later on we want to update jars.packages, here's e.g. spark-hats
> s = (SparkSession.builder
>  .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
>  .getOrCreate())
> # line below returns None, the config was not propagated:
> s._sc._conf.get("spark.jars.packages")
> {code}
> Stopping the context doesn't help, in fact it's even more confusing, because 
> the configuration is updated, but doesn't have an effect:
> {code:python}
> from pyspark.sql import SparkSession
> # default session:
> s = SparkSession.builder.getOrCreate()
> s.stop()
> s = (SparkSession.builder
>  .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
>  .getOrCreate())
> # now this line returns 

[jira] [Created] (SPARK-38482) Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs

2022-03-09 Thread yikf (Jira)
yikf created SPARK-38482:


 Summary: Migrate legacy.keepCommandOutputSchema related to 
KeepLegacyOutputs
 Key: SPARK-38482
 URL: https://issues.apache.org/jira/browse/SPARK-38482
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: yikf


Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception

2022-03-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503540#comment-17503540
 ] 

Apache Spark commented on SPARK-38481:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/35787

> Substitute Java overflow exception from TIMESTAMPADD by Spark exception
> ---
>
> Key: SPARK-38481
> URL: https://issues.apache.org/jira/browse/SPARK-38481
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark SQL can throw Java exceptions from the 
> timestampadd()/date_add()/dateadd() functions, for instance:
> {code:java}
> spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03');
> 22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, 
> 100, timestamp'2022-03-09 01:02:03')]
> java.lang.ArithmeticException: long overflow
>   at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
> {code}
> That might confuse non-Scala/Java users. Need to wrap such kind of exception 
> by Spark's exception using an error class.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception

2022-03-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503538#comment-17503538
 ] 

Apache Spark commented on SPARK-38481:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/35787

> Substitute Java overflow exception from TIMESTAMPADD by Spark exception
> ---
>
> Key: SPARK-38481
> URL: https://issues.apache.org/jira/browse/SPARK-38481
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark SQL can throw Java exceptions from the 
> timestampadd()/date_add()/dateadd() functions, for instance:
> {code:java}
> spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03');
> 22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, 
> 100, timestamp'2022-03-09 01:02:03')]
> java.lang.ArithmeticException: long overflow
>   at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
> {code}
> That might confuse non-Scala/Java users. Need to wrap such kind of exception 
> by Spark's exception using an error class.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception

2022-03-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38481:


Assignee: Max Gekk  (was: Apache Spark)

> Substitute Java overflow exception from TIMESTAMPADD by Spark exception
> ---
>
> Key: SPARK-38481
> URL: https://issues.apache.org/jira/browse/SPARK-38481
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark SQL can throw Java exceptions from the 
> timestampadd()/date_add()/dateadd() functions, for instance:
> {code:java}
> spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03');
> 22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, 
> 100, timestamp'2022-03-09 01:02:03')]
> java.lang.ArithmeticException: long overflow
>   at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
> {code}
> That might confuse non-Scala/Java users. Need to wrap such kind of exception 
> by Spark's exception using an error class.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception

2022-03-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38481:


Assignee: Apache Spark  (was: Max Gekk)

> Substitute Java overflow exception from TIMESTAMPADD by Spark exception
> ---
>
> Key: SPARK-38481
> URL: https://issues.apache.org/jira/browse/SPARK-38481
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark SQL can throw Java exceptions from the 
> timestampadd()/date_add()/dateadd() functions, for instance:
> {code:java}
> spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03');
> 22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, 
> 100, timestamp'2022-03-09 01:02:03')]
> java.lang.ArithmeticException: long overflow
>   at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
> {code}
> That might confuse non-Scala/Java users. Need to wrap such kind of exception 
> by Spark's exception using an error class.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38187) Support resource reservation (Introduce minCPU/minMemory) with volcano implementations

2022-03-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503531#comment-17503531
 ] 

Apache Spark commented on SPARK-38187:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35786

> Support resource reservation (Introduce minCPU/minMemory) with volcano 
> implementations
> --
>
> Key: SPARK-38187
> URL: https://issues.apache.org/jira/browse/SPARK-38187
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38187) Support resource reservation (Introduce minCPU/minMemory) with volcano implementations

2022-03-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503530#comment-17503530
 ] 

Apache Spark commented on SPARK-38187:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35786

> Support resource reservation (Introduce minCPU/minMemory) with volcano 
> implementations
> --
>
> Key: SPARK-38187
> URL: https://issues.apache.org/jira/browse/SPARK-38187
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38213) support Metrics information report to kafkaSink.

2022-03-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503529#comment-17503529
 ] 

Apache Spark commented on SPARK-38213:
--

User 'senthh' has created a pull request for this issue:
https://github.com/apache/spark/pull/35785

> support Metrics information report to kafkaSink.
> 
>
> Key: SPARK-38213
> URL: https://issues.apache.org/jira/browse/SPARK-38213
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: YuanGuanhu
>Priority: Major
>
> Spark now support ConsoleSink/CsvSink/GraphiteSink/JmxSink etc. Now we want 
> report metrics information to kafka, we can work to support this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception

2022-03-09 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38481:
-
Description: 
Currently, Spark SQL can throw Java exceptions from the 
timestampadd()/date_add()/dateadd() functions, for instance:

{code:java}
spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03');
22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, 
100, timestamp'2022-03-09 01:02:03')]
java.lang.ArithmeticException: long overflow
at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292]
at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505)
 ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724)
 ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197)
 ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
{code}

That might confuse non-Scala/Java users. Need to wrap such kind of exception by 
Spark's exception using an error class.

  was:
Currently, Spark SQL can throw Java exceptions from the 
timestampadd()/date_add()/dateadd() functions, for instance:

{code:java}

{code}

That might confuse non-Scala/Java users. Need to wrap such kind of exception by 
Spark's exception using an error class.


> Substitute Java overflow exception from TIMESTAMPADD by Spark exception
> ---
>
> Key: SPARK-38481
> URL: https://issues.apache.org/jira/browse/SPARK-38481
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark SQL can throw Java exceptions from the 
> timestampadd()/date_add()/dateadd() functions, for instance:
> {code:java}
> spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03');
> 22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, 
> 100, timestamp'2022-03-09 01:02:03')]
> java.lang.ArithmeticException: long overflow
>   at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197)
>  ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
> {code}
> That might confuse non-Scala/Java users. Need to wrap such kind of exception 
> by Spark's exception using an error class.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38455) Support driver/executor PodGroup templates

2022-03-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503527#comment-17503527
 ] 

Apache Spark commented on SPARK-38455:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35786

> Support driver/executor PodGroup templates
> --
>
> Key: SPARK-38455
> URL: https://issues.apache.org/jira/browse/SPARK-38455
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception

2022-03-09 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38481:
-
Description: 
Currently, Spark SQL can throw Java exceptions from the 
timestampadd()/date_add()/dateadd() functions, for instance:

{code:java}

{code}

That might confuse non-Scala/Java users. Need to wrap such kind of exception by 
Spark's exception using an error class.

  was:
Currently, Spark SQL can throw Java exceptions from the 
aes_encrypt()/aes_decrypt() functions, for instance:

{code:java}
java.lang.RuntimeException: javax.crypto.AEADBadTagException: Tag mismatch!
at 
org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:93)
at 
org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesDecrypt(ExpressionImplUtils.java:43)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:354)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.crypto.AEADBadTagException: Tag mismatch!
at 
com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:620)
at 
com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1116)
at 
com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1053)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853)
at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446)
at javax.crypto.Cipher.doFinal(Cipher.java:2226)
at 
org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:87)
... 19 more
{code}

That might confuse non-Scala/Java users. Need to wrap such kind of exception by 
Spark's exception.


> Substitute Java overflow exception from TIMESTAMPADD by Spark exception
> ---
>
> Key: SPARK-38481
> URL: https://issues.apache.org/jira/browse/SPARK-38481
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark SQL can throw Java exceptions from the 
> timestampadd()/date_add()/dateadd() functions, for instance:
> {code:java}
> {code}
> That might confuse non-Scala/Java users. Need to wrap such kind of exception 
> by Spark's exception using an error class.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception

2022-03-09 Thread Max Gekk (Jira)
Max Gekk created SPARK-38481:


 Summary: Substitute Java overflow exception from TIMESTAMPADD by 
Spark exception
 Key: SPARK-38481
 URL: https://issues.apache.org/jira/browse/SPARK-38481
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk
Assignee: Max Gekk
 Fix For: 3.3.0


Currently, Spark SQL can throw Java exceptions from the 
aes_encrypt()/aes_decrypt() functions, for instance:

{code:java}
java.lang.RuntimeException: javax.crypto.AEADBadTagException: Tag mismatch!
at 
org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:93)
at 
org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesDecrypt(ExpressionImplUtils.java:43)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:354)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.crypto.AEADBadTagException: Tag mismatch!
at 
com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:620)
at 
com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1116)
at 
com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1053)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853)
at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446)
at javax.crypto.Cipher.doFinal(Cipher.java:2226)
at 
org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:87)
... 19 more
{code}

That might confuse non-Scala/Java users. Need to wrap such kind of exception by 
Spark's exception.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37421) Inline type hints for python/pyspark/mllib/evaluation.py

2022-03-09 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz resolved SPARK-37421.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34680
[https://github.com/apache/spark/pull/34680]

> Inline type hints for python/pyspark/mllib/evaluation.py
> 
>
> Key: SPARK-37421
> URL: https://issues.apache.org/jira/browse/SPARK-37421
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>
> Inline type hints from python/pyspark/mlib/evaluation.pyi to 
> python/pyspark/mllib/evaluation.py



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37421) Inline type hints for python/pyspark/mllib/evaluation.py

2022-03-09 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz reassigned SPARK-37421:
--

Assignee: dch nguyen

> Inline type hints for python/pyspark/mllib/evaluation.py
> 
>
> Key: SPARK-37421
> URL: https://issues.apache.org/jira/browse/SPARK-37421
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: dch nguyen
>Priority: Major
>
> Inline type hints from python/pyspark/mlib/evaluation.pyi to 
> python/pyspark/mllib/evaluation.py



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38437) Dynamic serialization of Java datetime objects to micros/days

2022-03-09 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-38437:


Assignee: Max Gekk

> Dynamic serialization of Java datetime objects to micros/days
> -
>
> Key: SPARK-38437
> URL: https://issues.apache.org/jira/browse/SPARK-38437
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Make serializers to micros/days more tolerant to input Java objects, and 
> accept:
> - for timestamps: java.sql.Timestamp and java.time.Instant
> - for days: java.sql.Date and java.time.LocalDate
> This should make Spark SQL more reliable to user's and datasource inputs. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38437) Dynamic serialization of Java datetime objects to micros/days

2022-03-09 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-38437.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35756
[https://github.com/apache/spark/pull/35756]

> Dynamic serialization of Java datetime objects to micros/days
> -
>
> Key: SPARK-38437
> URL: https://issues.apache.org/jira/browse/SPARK-38437
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> Make serializers to micros/days more tolerant to input Java objects, and 
> accept:
> - for timestamps: java.sql.Timestamp and java.time.Instant
> - for days: java.sql.Date and java.time.LocalDate
> This should make Spark SQL more reliable to user's and datasource inputs. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2