[jira] [Updated] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class
[ https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Schaefer updated SPARK-38483: --- Description: Having the name of a column as an attribute of PySpark {{Column}} class instances can enable some convenient patterns, for example: Applying a function to a column and aliasing with the original name: {code:java} values = F.col("values") # repeating the column name as an alias distinct_values = F.array_distinct(values).alias("values") # re-using the existing column name distinct_values = F.array_distinct(values).alias(values._name){code} Checking the column name inside a custom function and applying conditional logic on the name: {code:java} def custom_function(col: Column) -> Column: if col._name == "my_column": return col.astype("int") return col.astype("string"){code} The proposal in this issue is to add a property {{Column._name}} that obtains the name or alias of a column in a similar way as currently done in the {{Column.__repr__}} method: [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.] The choice of {{_name}} intentionally avoids collision with the existing {{Column.name}} method, which is an alias for {{{}Column.alias{}}}. Labels: starter (was: ) > Column name or alias as an attribute of the PySpark Column class > > > Key: SPARK-38483 > URL: https://issues.apache.org/jira/browse/SPARK-38483 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.1 >Reporter: Brian Schaefer >Priority: Minor > Labels: starter > > Having the name of a column as an attribute of PySpark {{Column}} class > instances can enable some convenient patterns, for example: > Applying a function to a column and aliasing with the original name: > {code:java} > values = F.col("values") > # repeating the column name as an alias > distinct_values = F.array_distinct(values).alias("values") > # re-using the existing column name > distinct_values = F.array_distinct(values).alias(values._name){code} > Checking the column name inside a custom function and applying conditional > logic on the name: > {code:java} > def custom_function(col: Column) -> Column: > if col._name == "my_column": > return col.astype("int") > return col.astype("string"){code} > The proposal in this issue is to add a property {{Column._name}} that obtains > the name or alias of a column in a similar way as currently done in the > {{Column.__repr__}} method: > [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.] > The choice of {{_name}} intentionally avoids collision with the existing > {{Column.name}} method, which is an alias for {{{}Column.alias{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class
Brian Schaefer created SPARK-38483: -- Summary: Column name or alias as an attribute of the PySpark Column class Key: SPARK-38483 URL: https://issues.apache.org/jira/browse/SPARK-38483 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 3.2.1 Reporter: Brian Schaefer -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36681) Fail to load Snappy codec
[ https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-36681. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35784 [https://github.com/apache/spark/pull/35784] > Fail to load Snappy codec > - > > Key: SPARK-36681 > URL: https://issues.apache.org/jira/browse/SPARK-36681 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.3.0 > > > snappy-java as a native library should not be relocated in Hadoop shaded > client libraries. Currently we use Hadoop shaded client libraries in Spark. > If trying to use SnappyCodec to write sequence file, we will encounter the > following error: > {code} > [info] Cause: java.lang.UnsatisfiedLinkError: > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native > Method) > > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151) > > > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282) > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589) > > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629) > > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36681) Fail to load Snappy codec
[ https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-36681: --- Assignee: L. C. Hsieh > Fail to load Snappy codec > - > > Key: SPARK-36681 > URL: https://issues.apache.org/jira/browse/SPARK-36681 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > snappy-java as a native library should not be relocated in Hadoop shaded > client libraries. Currently we use Hadoop shaded client libraries in Spark. > If trying to use SnappyCodec to write sequence file, we will encounter the > following error: > {code} > [info] Cause: java.lang.UnsatisfiedLinkError: > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native > Method) > > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151) > > > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282) > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589) > > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629) > > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38354) Add hash probes metrics for shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-38354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-38354: --- Assignee: Cheng Su > Add hash probes metrics for shuffled hash join > -- > > Key: SPARK-38354 > URL: https://issues.apache.org/jira/browse/SPARK-38354 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Trivial > > For `HashAggregate` there's a SQL metrics to track number of hash probes per > looked-up key. It would be better to add a similar metrics for shuffled hash > join as well, to get some idea of hash probing performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38354) Add hash probes metrics for shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-38354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-38354. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35686 [https://github.com/apache/spark/pull/35686] > Add hash probes metrics for shuffled hash join > -- > > Key: SPARK-38354 > URL: https://issues.apache.org/jira/browse/SPARK-38354 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Trivial > Fix For: 3.3.0 > > > For `HashAggregate` there's a SQL metrics to track number of hash probes per > looked-up key. It would be better to add a similar metrics for shuffled hash > join as well, to get some idea of hash probing performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37947) Cannot use _outer generators in a lateral view
[ https://issues.apache.org/jira/browse/SPARK-37947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37947: --- Assignee: Bruce Robbins > Cannot use _outer generators in a lateral view > > > Key: SPARK-37947 > URL: https://issues.apache.org/jira/browse/SPARK-37947 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Minor > > This works: > {noformat} > select * from values 1, 2 lateral view outer explode(array()) as b; > {noformat} > But this does not work: > {noformat} > select * from values 1, 2 lateral view explode_outer(array()) as b; > {noformat} > It produces the error: > {noformat} > Error in query: Column 'b' does not exist. Did you mean one of the following? > [col1]; line 1 pos 26; > {noformat} > Similarly, this works: > {noformat} > select * from values 1, 2 > lateral view outer inline(array(struct(1, 2, 3))) as b, c, d; > {noformat} > But this does not: > {noformat} > select * from values 1, 2 > lateral view inline_outer(array(struct(1, 2, 3))) as b, c, d; > {noformat} > It produces the error: > {noformat} > Error in query: Column 'b' does not exist. Did you mean one of the following? > [col1]; line 2 pos 0; > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37947) Cannot use _outer generators in a lateral view
[ https://issues.apache.org/jira/browse/SPARK-37947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37947. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35232 [https://github.com/apache/spark/pull/35232] > Cannot use _outer generators in a lateral view > > > Key: SPARK-37947 > URL: https://issues.apache.org/jira/browse/SPARK-37947 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Minor > Fix For: 3.3.0 > > > This works: > {noformat} > select * from values 1, 2 lateral view outer explode(array()) as b; > {noformat} > But this does not work: > {noformat} > select * from values 1, 2 lateral view explode_outer(array()) as b; > {noformat} > It produces the error: > {noformat} > Error in query: Column 'b' does not exist. Did you mean one of the following? > [col1]; line 1 pos 26; > {noformat} > Similarly, this works: > {noformat} > select * from values 1, 2 > lateral view outer inline(array(struct(1, 2, 3))) as b, c, d; > {noformat} > But this does not: > {noformat} > select * from values 1, 2 > lateral view inline_outer(array(struct(1, 2, 3))) as b, c, d; > {noformat} > It produces the error: > {noformat} > Error in query: Column 'b' does not exist. Did you mean one of the following? > [col1]; line 2 pos 0; > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37995) TPCDS 1TB q72 fails when spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly is false
[ https://issues.apache.org/jira/browse/SPARK-37995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503595#comment-17503595 ] Wenchen Fan commented on SPARK-37995: - Does this problem still exists in 3.2 and master branch? > TPCDS 1TB q72 fails when > spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly is false > > > Key: SPARK-37995 > URL: https://issues.apache.org/jira/browse/SPARK-37995 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Kapil Singh >Priority: Major > Attachments: full-stacktrace.txt > > > TPCDS 1TB q72 fails in 3.2 Spark when > spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly is false. We > have been running with this config in 3.1 as well and it worked fine in that > version. This used to add a subquery dpp in q72. > Relevant stack trace > {code:java} > rror: java.lang.ClassCastException: > org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to > org.apache.spark.sql.execution.SparkPlan at > scala.collection.immutable.List.map(List.scala:293) at > org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75) > at > org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > > > at > org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75) > at > org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.onUpdatePlan(AdaptiveSparkPlanExec.scala:708) > at > org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$2(AdaptiveSparkPlanExec.scala:239) > at scala.runtime.java8.JFunction1$mcVJ$sp.apply(JFunction1$mcVJ$sp.java:23) > at scala.Option.foreach(Option.scala:407) at > org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:239) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at > org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:226) > at > org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:365) > at > org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:338) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38482) Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs
[ https://issues.apache.org/jira/browse/SPARK-38482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38482: Assignee: (was: Apache Spark) > Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs > --- > > Key: SPARK-38482 > URL: https://issues.apache.org/jira/browse/SPARK-38482 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: yikf >Priority: Minor > > Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38482) Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs
[ https://issues.apache.org/jira/browse/SPARK-38482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503570#comment-17503570 ] Apache Spark commented on SPARK-38482: -- User 'Yikf' has created a pull request for this issue: https://github.com/apache/spark/pull/35788 > Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs > --- > > Key: SPARK-38482 > URL: https://issues.apache.org/jira/browse/SPARK-38482 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: yikf >Priority: Minor > > Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38482) Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs
[ https://issues.apache.org/jira/browse/SPARK-38482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38482: Assignee: Apache Spark > Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs > --- > > Key: SPARK-38482 > URL: https://issues.apache.org/jira/browse/SPARK-38482 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: yikf >Assignee: Apache Spark >Priority: Minor > > Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38438) Can't update spark.jars.packages on existing global/default context
[ https://issues.apache.org/jira/browse/SPARK-38438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502633#comment-17502633 ] Rafal Wojdyla edited comment on SPARK-38438 at 3/9/22, 12:35 PM: - The workaround actually doesn't stop the existing JVM, it does stop most of the threads in the JVM (including spark context related, and py4j gateway), turns out the only (non-daemon) thread left is the `main` thread: {noformat} "main" #1 prio=5 os_prio=31 cpu=1381.53ms elapsed=67.25s tid=0x7fc478809000 nid=0x2703 runnable [0x7c094000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(java.base@11.0.9.1/Native Method) at java.io.FileInputStream.read(java.base@11.0.9.1/FileInputStream.java:279) at java.io.BufferedInputStream.fill(java.base@11.0.9.1/BufferedInputStream.java:252) at java.io.BufferedInputStream.read(java.base@11.0.9.1/BufferedInputStream.java:271) - locked <0x0007c1012ca0> (a java.io.BufferedInputStream) at org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:68) at org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.9.1/Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.9.1/NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.9.1/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@11.0.9.1/Method.java:566) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {noformat} This is waiting on the python process to stop: https://github.com/apache/spark/blob/71991f75ff441e80a52cb71f66f46bfebdb05671/core/src/main/scala/org/apache/spark/api/python/PythonGatewayServer.scala#L68-L70 Would it make sense to just close the stdin to trigger shutdown of the JVM, in which case the hard reset would be: {code:python} s.stop() s._sc._gateway.shutdown() s._sc._gateway.proc.stdin.close() SparkContext._gateway = None SparkContext._jvm = None {code} Edit: updated the issue description with the extra line. was (Author: ravwojdyla): The workaround actually doesn't stop the existing JVM, it does stop most of the threads in the JVM (including spark context related, and py4j gateway), turns out the only (non-daemon) thread left is the `main` thread: {noformat} "main" #1 prio=5 os_prio=31 cpu=1381.53ms elapsed=67.25s tid=0x7fc478809000 nid=0x2703 runnable [0x7c094000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(java.base@11.0.9.1/Native Method) at java.io.FileInputStream.read(java.base@11.0.9.1/FileInputStream.java:279) at java.io.BufferedInputStream.fill(java.base@11.0.9.1/BufferedInputStream.java:252) at java.io.BufferedInputStream.read(java.base@11.0.9.1/BufferedInputStream.java:271) - locked <0x0007c1012ca0> (a java.io.BufferedInputStream) at org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:68) at org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.9.1/Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.9.1/NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.9.1/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@11.0.9.1/Method.java:566) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at
[jira] [Updated] (SPARK-38438) Can't update spark.jars.packages on existing global/default context
[ https://issues.apache.org/jira/browse/SPARK-38438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafal Wojdyla updated SPARK-38438: -- Description: Reproduction: {code:python} from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() # later on we want to update jars.packages, here's e.g. spark-hats s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # line below returns None, the config was not propagated: s._sc._conf.get("spark.jars.packages") {code} Stopping the context doesn't help, in fact it's even more confusing, because the configuration is updated, but doesn't have an effect: {code:python} from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() s.stop() s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # now this line returns 'za.co.absa:spark-hats_2.12:0.2.2', but the context # doesn't download the jar/package, as it would if there was no global context # thus the extra package is unusable. It's not downloaded, or added to the # classpath. s._sc._conf.get("spark.jars.packages") {code} One workaround is to stop the context AND kill the JVM gateway, which seems to be a kind of hard reset: {code:python} from pyspark import SparkContext from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() # Hard reset: s.stop() s._sc._gateway.shutdown() s._sc._gateway.proc.stdin.close() SparkContext._gateway = None SparkContext._jvm = None s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # Now we are guaranteed there's a new spark session, and packages # are downloaded, added to the classpath etc. {code} was: Reproduction: {code:python} from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() # later on we want to update jars.packages, here's e.g. spark-hats s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # line below returns None, the config was not propagated: s._sc._conf.get("spark.jars.packages") {code} Stopping the context doesn't help, in fact it's even more confusing, because the configuration is updated, but doesn't have an effect: {code:python} from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() s.stop() s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # now this line returns 'za.co.absa:spark-hats_2.12:0.2.2', but the context # doesn't download the jar/package, as it would if there was no global context # thus the extra package is unusable. It's not downloaded, or added to the # classpath. s._sc._conf.get("spark.jars.packages") {code} One workaround is to stop the context AND kill the JVM gateway, which seems to be a kind of hard reset: {code:python} from pyspark import SparkContext from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() # Hard reset: s.stop() s._sc._gateway.shutdown() SparkContext._gateway = None SparkContext._jvm = None s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # Now we are guaranteed there's a new spark session, and packages # are downloaded, added to the classpath etc. {code} > Can't update spark.jars.packages on existing global/default context > --- > > Key: SPARK-38438 > URL: https://issues.apache.org/jira/browse/SPARK-38438 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.2.1 > Environment: py: 3.9 > spark: 3.2.1 >Reporter: Rafal Wojdyla >Priority: Major > > Reproduction: > {code:python} > from pyspark.sql import SparkSession > # default session: > s = SparkSession.builder.getOrCreate() > # later on we want to update jars.packages, here's e.g. spark-hats > s = (SparkSession.builder > .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") > .getOrCreate()) > # line below returns None, the config was not propagated: > s._sc._conf.get("spark.jars.packages") > {code} > Stopping the context doesn't help, in fact it's even more confusing, because > the configuration is updated, but doesn't have an effect: > {code:python} > from pyspark.sql import SparkSession > # default session: > s = SparkSession.builder.getOrCreate() > s.stop() > s = (SparkSession.builder > .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") > .getOrCreate()) > # now this line returns
[jira] [Created] (SPARK-38482) Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs
yikf created SPARK-38482: Summary: Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs Key: SPARK-38482 URL: https://issues.apache.org/jira/browse/SPARK-38482 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: yikf Migrate legacy.keepCommandOutputSchema related to KeepLegacyOutputs -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception
[ https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503540#comment-17503540 ] Apache Spark commented on SPARK-38481: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/35787 > Substitute Java overflow exception from TIMESTAMPADD by Spark exception > --- > > Key: SPARK-38481 > URL: https://issues.apache.org/jira/browse/SPARK-38481 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark SQL can throw Java exceptions from the > timestampadd()/date_add()/dateadd() functions, for instance: > {code:java} > spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03'); > 22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, > 100, timestamp'2022-03-09 01:02:03')] > java.lang.ArithmeticException: long overflow > at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > {code} > That might confuse non-Scala/Java users. Need to wrap such kind of exception > by Spark's exception using an error class. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception
[ https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503538#comment-17503538 ] Apache Spark commented on SPARK-38481: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/35787 > Substitute Java overflow exception from TIMESTAMPADD by Spark exception > --- > > Key: SPARK-38481 > URL: https://issues.apache.org/jira/browse/SPARK-38481 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark SQL can throw Java exceptions from the > timestampadd()/date_add()/dateadd() functions, for instance: > {code:java} > spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03'); > 22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, > 100, timestamp'2022-03-09 01:02:03')] > java.lang.ArithmeticException: long overflow > at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > {code} > That might confuse non-Scala/Java users. Need to wrap such kind of exception > by Spark's exception using an error class. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception
[ https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38481: Assignee: Max Gekk (was: Apache Spark) > Substitute Java overflow exception from TIMESTAMPADD by Spark exception > --- > > Key: SPARK-38481 > URL: https://issues.apache.org/jira/browse/SPARK-38481 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark SQL can throw Java exceptions from the > timestampadd()/date_add()/dateadd() functions, for instance: > {code:java} > spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03'); > 22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, > 100, timestamp'2022-03-09 01:02:03')] > java.lang.ArithmeticException: long overflow > at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > {code} > That might confuse non-Scala/Java users. Need to wrap such kind of exception > by Spark's exception using an error class. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception
[ https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38481: Assignee: Apache Spark (was: Max Gekk) > Substitute Java overflow exception from TIMESTAMPADD by Spark exception > --- > > Key: SPARK-38481 > URL: https://issues.apache.org/jira/browse/SPARK-38481 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark SQL can throw Java exceptions from the > timestampadd()/date_add()/dateadd() functions, for instance: > {code:java} > spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03'); > 22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, > 100, timestamp'2022-03-09 01:02:03')] > java.lang.ArithmeticException: long overflow > at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > {code} > That might confuse non-Scala/Java users. Need to wrap such kind of exception > by Spark's exception using an error class. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38187) Support resource reservation (Introduce minCPU/minMemory) with volcano implementations
[ https://issues.apache.org/jira/browse/SPARK-38187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503531#comment-17503531 ] Apache Spark commented on SPARK-38187: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/35786 > Support resource reservation (Introduce minCPU/minMemory) with volcano > implementations > -- > > Key: SPARK-38187 > URL: https://issues.apache.org/jira/browse/SPARK-38187 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38187) Support resource reservation (Introduce minCPU/minMemory) with volcano implementations
[ https://issues.apache.org/jira/browse/SPARK-38187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503530#comment-17503530 ] Apache Spark commented on SPARK-38187: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/35786 > Support resource reservation (Introduce minCPU/minMemory) with volcano > implementations > -- > > Key: SPARK-38187 > URL: https://issues.apache.org/jira/browse/SPARK-38187 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38213) support Metrics information report to kafkaSink.
[ https://issues.apache.org/jira/browse/SPARK-38213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503529#comment-17503529 ] Apache Spark commented on SPARK-38213: -- User 'senthh' has created a pull request for this issue: https://github.com/apache/spark/pull/35785 > support Metrics information report to kafkaSink. > > > Key: SPARK-38213 > URL: https://issues.apache.org/jira/browse/SPARK-38213 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: YuanGuanhu >Priority: Major > > Spark now support ConsoleSink/CsvSink/GraphiteSink/JmxSink etc. Now we want > report metrics information to kafka, we can work to support this. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception
[ https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-38481: - Description: Currently, Spark SQL can throw Java exceptions from the timestampadd()/date_add()/dateadd() functions, for instance: {code:java} spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03'); 22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03')] java.lang.ArithmeticException: long overflow at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292] at org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505) ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] at org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724) ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] at org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197) ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] {code} That might confuse non-Scala/Java users. Need to wrap such kind of exception by Spark's exception using an error class. was: Currently, Spark SQL can throw Java exceptions from the timestampadd()/date_add()/dateadd() functions, for instance: {code:java} {code} That might confuse non-Scala/Java users. Need to wrap such kind of exception by Spark's exception using an error class. > Substitute Java overflow exception from TIMESTAMPADD by Spark exception > --- > > Key: SPARK-38481 > URL: https://issues.apache.org/jira/browse/SPARK-38481 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark SQL can throw Java exceptions from the > timestampadd()/date_add()/dateadd() functions, for instance: > {code:java} > spark-sql> select timestampadd(YEAR, 100, timestamp'2022-03-09 01:02:03'); > 22/03/09 14:47:15 ERROR SparkSQLDriver: Failed in [select timestampadd(YEAR, > 100, timestamp'2022-03-09 01:02:03')] > java.lang.ArithmeticException: long overflow > at java.lang.Math.multiplyExact(Math.java:892) ~[?:1.8.0_292] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:505) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAddMonths(DateTimeUtils.scala:724) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1197) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > {code} > That might confuse non-Scala/Java users. Need to wrap such kind of exception > by Spark's exception using an error class. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38455) Support driver/executor PodGroup templates
[ https://issues.apache.org/jira/browse/SPARK-38455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503527#comment-17503527 ] Apache Spark commented on SPARK-38455: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/35786 > Support driver/executor PodGroup templates > -- > > Key: SPARK-38455 > URL: https://issues.apache.org/jira/browse/SPARK-38455 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception
[ https://issues.apache.org/jira/browse/SPARK-38481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-38481: - Description: Currently, Spark SQL can throw Java exceptions from the timestampadd()/date_add()/dateadd() functions, for instance: {code:java} {code} That might confuse non-Scala/Java users. Need to wrap such kind of exception by Spark's exception using an error class. was: Currently, Spark SQL can throw Java exceptions from the aes_encrypt()/aes_decrypt() functions, for instance: {code:java} java.lang.RuntimeException: javax.crypto.AEADBadTagException: Tag mismatch! at org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:93) at org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesDecrypt(ExpressionImplUtils.java:43) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:354) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: javax.crypto.AEADBadTagException: Tag mismatch! at com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:620) at com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1116) at com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1053) at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853) at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446) at javax.crypto.Cipher.doFinal(Cipher.java:2226) at org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:87) ... 19 more {code} That might confuse non-Scala/Java users. Need to wrap such kind of exception by Spark's exception. > Substitute Java overflow exception from TIMESTAMPADD by Spark exception > --- > > Key: SPARK-38481 > URL: https://issues.apache.org/jira/browse/SPARK-38481 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark SQL can throw Java exceptions from the > timestampadd()/date_add()/dateadd() functions, for instance: > {code:java} > {code} > That might confuse non-Scala/Java users. Need to wrap such kind of exception > by Spark's exception using an error class. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38481) Substitute Java overflow exception from TIMESTAMPADD by Spark exception
Max Gekk created SPARK-38481: Summary: Substitute Java overflow exception from TIMESTAMPADD by Spark exception Key: SPARK-38481 URL: https://issues.apache.org/jira/browse/SPARK-38481 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Max Gekk Assignee: Max Gekk Fix For: 3.3.0 Currently, Spark SQL can throw Java exceptions from the aes_encrypt()/aes_decrypt() functions, for instance: {code:java} java.lang.RuntimeException: javax.crypto.AEADBadTagException: Tag mismatch! at org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:93) at org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesDecrypt(ExpressionImplUtils.java:43) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:354) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: javax.crypto.AEADBadTagException: Tag mismatch! at com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:620) at com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1116) at com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1053) at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853) at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446) at javax.crypto.Cipher.doFinal(Cipher.java:2226) at org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:87) ... 19 more {code} That might confuse non-Scala/Java users. Need to wrap such kind of exception by Spark's exception. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37421) Inline type hints for python/pyspark/mllib/evaluation.py
[ https://issues.apache.org/jira/browse/SPARK-37421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-37421. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34680 [https://github.com/apache/spark/pull/34680] > Inline type hints for python/pyspark/mllib/evaluation.py > > > Key: SPARK-37421 > URL: https://issues.apache.org/jira/browse/SPARK-37421 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: dch nguyen >Priority: Major > Fix For: 3.3.0 > > > Inline type hints from python/pyspark/mlib/evaluation.pyi to > python/pyspark/mllib/evaluation.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37421) Inline type hints for python/pyspark/mllib/evaluation.py
[ https://issues.apache.org/jira/browse/SPARK-37421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-37421: -- Assignee: dch nguyen > Inline type hints for python/pyspark/mllib/evaluation.py > > > Key: SPARK-37421 > URL: https://issues.apache.org/jira/browse/SPARK-37421 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: dch nguyen >Priority: Major > > Inline type hints from python/pyspark/mlib/evaluation.pyi to > python/pyspark/mllib/evaluation.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38437) Dynamic serialization of Java datetime objects to micros/days
[ https://issues.apache.org/jira/browse/SPARK-38437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-38437: Assignee: Max Gekk > Dynamic serialization of Java datetime objects to micros/days > - > > Key: SPARK-38437 > URL: https://issues.apache.org/jira/browse/SPARK-38437 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Make serializers to micros/days more tolerant to input Java objects, and > accept: > - for timestamps: java.sql.Timestamp and java.time.Instant > - for days: java.sql.Date and java.time.LocalDate > This should make Spark SQL more reliable to user's and datasource inputs. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38437) Dynamic serialization of Java datetime objects to micros/days
[ https://issues.apache.org/jira/browse/SPARK-38437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-38437. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35756 [https://github.com/apache/spark/pull/35756] > Dynamic serialization of Java datetime objects to micros/days > - > > Key: SPARK-38437 > URL: https://issues.apache.org/jira/browse/SPARK-38437 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > Make serializers to micros/days more tolerant to input Java objects, and > accept: > - for timestamps: java.sql.Timestamp and java.time.Instant > - for days: java.sql.Date and java.time.LocalDate > This should make Spark SQL more reliable to user's and datasource inputs. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org