[jira] [Commented] (SPARK-35210) Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue
[ https://issues.apache.org/jira/browse/SPARK-35210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331160#comment-17331160 ] Apache Spark commented on SPARK-35210: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/32318 > Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue > - > > Key: SPARK-35210 > URL: https://issues.apache.org/jira/browse/SPARK-35210 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.7, 3.0.2, 3.1.1, 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Blocker > > SPARK-34988 upgraded Jetty to 9.4.39 for CVE-2021-28165. > But after the upgrade, Jetty 9.4.40 was released to fix the > ERR_CONNECTION_RESET issue > (https://github.com/eclipse/jetty.project/issues/6152). > This issue seems to affect Jetty 9.4.39 when POST method is used with SSL. > For Spark, job submission using REST and ThriftServer with HTTPS protocol can > be affected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35210) Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue
[ https://issues.apache.org/jira/browse/SPARK-35210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331159#comment-17331159 ] Apache Spark commented on SPARK-35210: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/32318 > Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue > - > > Key: SPARK-35210 > URL: https://issues.apache.org/jira/browse/SPARK-35210 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.7, 3.0.2, 3.1.1, 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Blocker > > SPARK-34988 upgraded Jetty to 9.4.39 for CVE-2021-28165. > But after the upgrade, Jetty 9.4.40 was released to fix the > ERR_CONNECTION_RESET issue > (https://github.com/eclipse/jetty.project/issues/6152). > This issue seems to affect Jetty 9.4.39 when POST method is used with SSL. > For Spark, job submission using REST and ThriftServer with HTTPS protocol can > be affected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35210) Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue
[ https://issues.apache.org/jira/browse/SPARK-35210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35210: Assignee: Kousuke Saruta (was: Apache Spark) > Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue > - > > Key: SPARK-35210 > URL: https://issues.apache.org/jira/browse/SPARK-35210 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.7, 3.0.2, 3.1.1, 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Blocker > > SPARK-34988 upgraded Jetty to 9.4.39 for CVE-2021-28165. > But after the upgrade, Jetty 9.4.40 was released to fix the > ERR_CONNECTION_RESET issue > (https://github.com/eclipse/jetty.project/issues/6152). > This issue seems to affect Jetty 9.4.39 when POST method is used with SSL. > For Spark, job submission using REST and ThriftServer with HTTPS protocol can > be affected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35210) Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue
[ https://issues.apache.org/jira/browse/SPARK-35210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35210: Assignee: Apache Spark (was: Kousuke Saruta) > Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue > - > > Key: SPARK-35210 > URL: https://issues.apache.org/jira/browse/SPARK-35210 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.7, 3.0.2, 3.1.1, 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Blocker > > SPARK-34988 upgraded Jetty to 9.4.39 for CVE-2021-28165. > But after the upgrade, Jetty 9.4.40 was released to fix the > ERR_CONNECTION_RESET issue > (https://github.com/eclipse/jetty.project/issues/6152). > This issue seems to affect Jetty 9.4.39 when POST method is used with SSL. > For Spark, job submission using REST and ThriftServer with HTTPS protocol can > be affected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35210) Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue
Kousuke Saruta created SPARK-35210: -- Summary: Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue Key: SPARK-35210 URL: https://issues.apache.org/jira/browse/SPARK-35210 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.1, 3.0.2, 2.4.7, 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta SPARK-34988 upgraded Jetty to 9.4.39 for CVE-2021-28165. But after the upgrade, Jetty 9.4.40 was released to fix the ERR_CONNECTION_RESET issue (https://github.com/eclipse/jetty.project/issues/6152). This issue seems to affect Jetty 9.4.39 when POST method is used with SSL. For Spark, job submission using REST and ThriftServer with HTTPS protocol can be affected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression
[ https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331157#comment-17331157 ] Dongjoon Hyun commented on SPARK-35196: --- Ya, Sorry for the negative opinion. There was a previous report of that non-working situation. IIRC, there is a document commit to give a warning about that. > DataFrameWriter.text support zstd compression > - > > Key: SPARK-35196 > URL: https://issues.apache.org/jira/browse/SPARK-35196 > Project: Spark > Issue Type: Task > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Leonard Lausen >Priority: Major > > [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html] > specifies that only the following compression codecs are supported: `none, > bzip2, gzip, lz4, snappy and deflate` > However, RDD API supports compression with zstd if users specify > 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the > saveAsTextFile method. > Please also expose zstd in the DataFrameWriter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35199) Tasks are failing with zstd default of spark.shuffle.mapStatus.compression.codec
[ https://issues.apache.org/jira/browse/SPARK-35199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331155#comment-17331155 ] Dongjoon Hyun commented on SPARK-35199: --- According to error logs, it seems that you are mixing ZSTD JNI. That doesn't work. We experienced many API incompatibility at ZSTD JNI. That's the reason we recently upgrade Parquet/Avro/Kafka. For the ZSTD JNI incompatibility issues, please see https://github.com/luben/zstd-jni/issues?q=is%3Aissue+ . {code} Decompression error: Version not supported at {code} > Tasks are failing with zstd default of > spark.shuffle.mapStatus.compression.codec > > > Key: SPARK-35199 > URL: https://issues.apache.org/jira/browse/SPARK-35199 > Project: Spark > Issue Type: Task > Components: PySpark >Affects Versions: 3.0.1 >Reporter: Leonard Lausen >Priority: Major > > In Spark 3.0.1, tasks fail with the default value of > {{spark.shuffle.mapStatus.compression.codec=zstd}}, but work without problem > when changing the value to {{spark.shuffle.mapStatus.compression.codec=lz4}}. > Exemplar backtrace: > > {code:java} > java.io.IOException: Decompression error: Version not supported at > com.github.luben.zstd.ZstdInputStream.readInternal(ZstdInputStream.java:164) > at com.github.luben.zstd.ZstdInputStream.read(ZstdInputStream.java:120) at > java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at > java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at > java.io.BufferedInputStream.read(BufferedInputStream.java:345) at > java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2781) > at > java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2797) > at > java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3274) > at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934) at > java.io.ObjectInputStream.(ObjectInputStream.java:396) at > org.apache.spark.MapOutputTracker$.deserializeObject$1(MapOutputTracker.scala:954) > at > org.apache.spark.MapOutputTracker$.deserializeMapStatuses(MapOutputTracker.scala:964) > at > org.apache.spark.MapOutputTrackerWorker.$anonfun$getStatuses$2(MapOutputTracker.scala:856) > at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64) at > org.apache.spark.MapOutputTrackerWorker.getStatuses(MapOutputTracker.scala:851) > at > org.apache.spark.MapOutputTrackerWorker.getMapSizesByExecutorId(MapOutputTracker.scala:808) > at > org.apache.spark.shuffle.sort.SortShuffleManager.getReader(SortShuffleManager.scala:128) > at > org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:185) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at > org.apache.spark.scheduler.Task.run(Task.scala:127) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} > {{}} > Exemplar code to reproduce the issue > {code:java} > import pyspark.sql.functions as F > df = spark.read.text("s3://my-bucket-with-300GB-compressed-text-files") > df_rand = df.orderBy(F.rand(1)) > df_rand.write.text('s3://shuffled-output''){code} > See > [https://stackoverflow.com/questions/64876463/spark-3-0-1-tasks-are-failing-when-using-zstd-compression-codec] > for another report of this issue and workaround. -- This message was sent by Atlassian Jira (v8.3.4#803005) --
[jira] [Commented] (SPARK-35199) Tasks are failing with zstd default of spark.shuffle.mapStatus.compression.codec
[ https://issues.apache.org/jira/browse/SPARK-35199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331154#comment-17331154 ] Dongjoon Hyun commented on SPARK-35199: --- Well, I'd recommend to use ZSTD at Apache Spark 3.2+. Many issues are fixed via SPARK-34651 . BTW, could you provide a reproducible example, [~lausen]? We cannot access your bucket. > Tasks are failing with zstd default of > spark.shuffle.mapStatus.compression.codec > > > Key: SPARK-35199 > URL: https://issues.apache.org/jira/browse/SPARK-35199 > Project: Spark > Issue Type: Task > Components: PySpark >Affects Versions: 3.0.1 >Reporter: Leonard Lausen >Priority: Major > > In Spark 3.0.1, tasks fail with the default value of > {{spark.shuffle.mapStatus.compression.codec=zstd}}, but work without problem > when changing the value to {{spark.shuffle.mapStatus.compression.codec=lz4}}. > Exemplar backtrace: > > {code:java} > java.io.IOException: Decompression error: Version not supported at > com.github.luben.zstd.ZstdInputStream.readInternal(ZstdInputStream.java:164) > at com.github.luben.zstd.ZstdInputStream.read(ZstdInputStream.java:120) at > java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at > java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at > java.io.BufferedInputStream.read(BufferedInputStream.java:345) at > java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2781) > at > java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2797) > at > java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3274) > at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934) at > java.io.ObjectInputStream.(ObjectInputStream.java:396) at > org.apache.spark.MapOutputTracker$.deserializeObject$1(MapOutputTracker.scala:954) > at > org.apache.spark.MapOutputTracker$.deserializeMapStatuses(MapOutputTracker.scala:964) > at > org.apache.spark.MapOutputTrackerWorker.$anonfun$getStatuses$2(MapOutputTracker.scala:856) > at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64) at > org.apache.spark.MapOutputTrackerWorker.getStatuses(MapOutputTracker.scala:851) > at > org.apache.spark.MapOutputTrackerWorker.getMapSizesByExecutorId(MapOutputTracker.scala:808) > at > org.apache.spark.shuffle.sort.SortShuffleManager.getReader(SortShuffleManager.scala:128) > at > org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:185) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at > org.apache.spark.scheduler.Task.run(Task.scala:127) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} > {{}} > Exemplar code to reproduce the issue > {code:java} > import pyspark.sql.functions as F > df = spark.read.text("s3://my-bucket-with-300GB-compressed-text-files") > df_rand = df.orderBy(F.rand(1)) > df_rand.write.text('s3://shuffled-output''){code} > See > [https://stackoverflow.com/questions/64876463/spark-3-0-1-tasks-are-failing-when-using-zstd-compression-codec] > for another report of this issue and workaround. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression
[ https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331153#comment-17331153 ] Hyukjin Kwon commented on SPARK-35196: -- I see. Thanks Dongjoon for clarification! > DataFrameWriter.text support zstd compression > - > > Key: SPARK-35196 > URL: https://issues.apache.org/jira/browse/SPARK-35196 > Project: Spark > Issue Type: Task > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Leonard Lausen >Priority: Major > > [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html] > specifies that only the following compression codecs are supported: `none, > bzip2, gzip, lz4, snappy and deflate` > However, RDD API supports compression with zstd if users specify > 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the > saveAsTextFile method. > Please also expose zstd in the DataFrameWriter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression
[ https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331152#comment-17331152 ] Dongjoon Hyun commented on SPARK-35196: --- In addition, even with Hadoop 3.1, the official Apache Spark distribution raises a failure when you try to use `org.apache.hadoop.io.compress.ZStandardCodec`. > DataFrameWriter.text support zstd compression > - > > Key: SPARK-35196 > URL: https://issues.apache.org/jira/browse/SPARK-35196 > Project: Spark > Issue Type: Task > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Leonard Lausen >Priority: Major > > [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html] > specifies that only the following compression codecs are supported: `none, > bzip2, gzip, lz4, snappy and deflate` > However, RDD API supports compression with zstd if users specify > 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the > saveAsTextFile method. > Please also expose zstd in the DataFrameWriter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression
[ https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331150#comment-17331150 ] Dongjoon Hyun commented on SPARK-35196: --- Hi, [~lausen] and [~hyukjin.kwon]. We still didn't drop Hadoop 2.7. `org.apache.hadoop.io.compress.ZStandardCodec` is added at Apache Hadoop 2.9.0+. We may add some notes for the limitation, but I'm -1 for adding an alias. > DataFrameWriter.text support zstd compression > - > > Key: SPARK-35196 > URL: https://issues.apache.org/jira/browse/SPARK-35196 > Project: Spark > Issue Type: Task > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Leonard Lausen >Priority: Major > > [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html] > specifies that only the following compression codecs are supported: `none, > bzip2, gzip, lz4, snappy and deflate` > However, RDD API supports compression with zstd if users specify > 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the > saveAsTextFile method. > Please also expose zstd in the DataFrameWriter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33195) stages/stage UI page fails to load when spark reverse proxy is enabled
[ https://issues.apache.org/jira/browse/SPARK-33195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331144#comment-17331144 ] Apache Spark commented on SPARK-33195: -- User 'mdianjun' has created a pull request for this issue: https://github.com/apache/spark/pull/32317 > stages/stage UI page fails to load when spark reverse proxy is enabled > -- > > Key: SPARK-33195 > URL: https://issues.apache.org/jira/browse/SPARK-33195 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.1 >Reporter: Liran >Priority: Major > > I think we have the same issue reported in SPARK-32467, reproduced with > reverse proxy redirects, I'm getting the exact same error in spark UI. > Url page: > {code:java} > http://:8080/proxy/app-20201020143315-0005/stages/stage/?id=7&attempt=0{code} > The url above fails to load, looking at the network tab - this request fails: > {code:java} > http://:8080/proxy/app-20201020143315-0005/api/v1/applications/app-20201020143315-0005/stages/7/0/taskTable?draw=1&order%5B0%5D%5Bcolumn%5D=0&order%5B0%5D%5Bdir%5D=asc&start=0&length=20&search%5Bvalue%5D=&search%5Bregex%5D=false&numTasks=1&columnIndexToSort=0&columnNameToSort=Index&_=1603206039549 > {code} > Server error stack trace: > {code:java} > /api/v1/applications/app-20201020113310-0004/stages/7/0/taskTable/api/v1/applications/app-20201020113310-0004/stages/7/0/taskTablejavax.servlet.ServletException: > java.lang.NullPointerException at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) > at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) > at > org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.sparkproject.jetty.server.Server.handle(Server.java:505) at > org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) at > org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) at > org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at > org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) > at > org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698) > at > org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804) > at java.lang.Thread.run(Thread.java:748)Caused by: > java.lang.NullPointerException at > org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175) > at > org.apache.spark.status.api.v1.BaseAppResource.
[jira] [Assigned] (SPARK-33195) stages/stage UI page fails to load when spark reverse proxy is enabled
[ https://issues.apache.org/jira/browse/SPARK-33195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33195: Assignee: (was: Apache Spark) > stages/stage UI page fails to load when spark reverse proxy is enabled > -- > > Key: SPARK-33195 > URL: https://issues.apache.org/jira/browse/SPARK-33195 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.1 >Reporter: Liran >Priority: Major > > I think we have the same issue reported in SPARK-32467, reproduced with > reverse proxy redirects, I'm getting the exact same error in spark UI. > Url page: > {code:java} > http://:8080/proxy/app-20201020143315-0005/stages/stage/?id=7&attempt=0{code} > The url above fails to load, looking at the network tab - this request fails: > {code:java} > http://:8080/proxy/app-20201020143315-0005/api/v1/applications/app-20201020143315-0005/stages/7/0/taskTable?draw=1&order%5B0%5D%5Bcolumn%5D=0&order%5B0%5D%5Bdir%5D=asc&start=0&length=20&search%5Bvalue%5D=&search%5Bregex%5D=false&numTasks=1&columnIndexToSort=0&columnNameToSort=Index&_=1603206039549 > {code} > Server error stack trace: > {code:java} > /api/v1/applications/app-20201020113310-0004/stages/7/0/taskTable/api/v1/applications/app-20201020113310-0004/stages/7/0/taskTablejavax.servlet.ServletException: > java.lang.NullPointerException at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) > at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) > at > org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.sparkproject.jetty.server.Server.handle(Server.java:505) at > org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) at > org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) at > org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at > org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) > at > org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698) > at > org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804) > at java.lang.Thread.run(Thread.java:748)Caused by: > java.lang.NullPointerException at > org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175) > at > org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:140) > at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:107) at >
[jira] [Assigned] (SPARK-33195) stages/stage UI page fails to load when spark reverse proxy is enabled
[ https://issues.apache.org/jira/browse/SPARK-33195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33195: Assignee: Apache Spark > stages/stage UI page fails to load when spark reverse proxy is enabled > -- > > Key: SPARK-33195 > URL: https://issues.apache.org/jira/browse/SPARK-33195 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.1 >Reporter: Liran >Assignee: Apache Spark >Priority: Major > > I think we have the same issue reported in SPARK-32467, reproduced with > reverse proxy redirects, I'm getting the exact same error in spark UI. > Url page: > {code:java} > http://:8080/proxy/app-20201020143315-0005/stages/stage/?id=7&attempt=0{code} > The url above fails to load, looking at the network tab - this request fails: > {code:java} > http://:8080/proxy/app-20201020143315-0005/api/v1/applications/app-20201020143315-0005/stages/7/0/taskTable?draw=1&order%5B0%5D%5Bcolumn%5D=0&order%5B0%5D%5Bdir%5D=asc&start=0&length=20&search%5Bvalue%5D=&search%5Bregex%5D=false&numTasks=1&columnIndexToSort=0&columnNameToSort=Index&_=1603206039549 > {code} > Server error stack trace: > {code:java} > /api/v1/applications/app-20201020113310-0004/stages/7/0/taskTable/api/v1/applications/app-20201020113310-0004/stages/7/0/taskTablejavax.servlet.ServletException: > java.lang.NullPointerException at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) > at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) > at > org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.sparkproject.jetty.server.Server.handle(Server.java:505) at > org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) at > org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) at > org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at > org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) > at > org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698) > at > org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804) > at java.lang.Thread.run(Thread.java:748)Caused by: > java.lang.NullPointerException at > org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175) > at > org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:140) > at org.apache.spark.ui.SparkUI.withSparkUI
[jira] [Commented] (SPARK-33195) stages/stage UI page fails to load when spark reverse proxy is enabled
[ https://issues.apache.org/jira/browse/SPARK-33195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331143#comment-17331143 ] Apache Spark commented on SPARK-33195: -- User 'mdianjun' has created a pull request for this issue: https://github.com/apache/spark/pull/32317 > stages/stage UI page fails to load when spark reverse proxy is enabled > -- > > Key: SPARK-33195 > URL: https://issues.apache.org/jira/browse/SPARK-33195 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.1 >Reporter: Liran >Priority: Major > > I think we have the same issue reported in SPARK-32467, reproduced with > reverse proxy redirects, I'm getting the exact same error in spark UI. > Url page: > {code:java} > http://:8080/proxy/app-20201020143315-0005/stages/stage/?id=7&attempt=0{code} > The url above fails to load, looking at the network tab - this request fails: > {code:java} > http://:8080/proxy/app-20201020143315-0005/api/v1/applications/app-20201020143315-0005/stages/7/0/taskTable?draw=1&order%5B0%5D%5Bcolumn%5D=0&order%5B0%5D%5Bdir%5D=asc&start=0&length=20&search%5Bvalue%5D=&search%5Bregex%5D=false&numTasks=1&columnIndexToSort=0&columnNameToSort=Index&_=1603206039549 > {code} > Server error stack trace: > {code:java} > /api/v1/applications/app-20201020113310-0004/stages/7/0/taskTable/api/v1/applications/app-20201020113310-0004/stages/7/0/taskTablejavax.servlet.ServletException: > java.lang.NullPointerException at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) > at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) > at > org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.sparkproject.jetty.server.Server.handle(Server.java:505) at > org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) at > org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) at > org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at > org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at > org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) > at > org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698) > at > org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804) > at java.lang.Thread.run(Thread.java:748)Caused by: > java.lang.NullPointerException at > org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175) > at > org.apache.spark.status.api.v1.BaseAppResource.
[jira] [Commented] (SPARK-35168) mapred.reduce.tasks should be shuffle.partitions not adaptive.coalescePartitions.initialPartitionNum
[ https://issues.apache.org/jira/browse/SPARK-35168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331132#comment-17331132 ] Kent Yao commented on SPARK-35168: -- Thanks [~dongjoon] > mapred.reduce.tasks should be shuffle.partitions not > adaptive.coalescePartitions.initialPartitionNum > > > Key: SPARK-35168 > URL: https://issues.apache.org/jira/browse/SPARK-35168 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.2, 3.1.1, 3.2.0 >Reporter: Kent Yao >Priority: Minor > > {code:java} > spark-sql> set spark.sql.adaptive.coalescePartitions.initialPartitionNum=1; > spark.sql.adaptive.coalescePartitions.initialPartitionNum 1 > Time taken: 2.18 seconds, Fetched 1 row(s) > spark-sql> set mapred.reduce.tasks; > 21/04/21 14:27:11 WARN SetCommand: Property mapred.reduce.tasks is > deprecated, showing spark.sql.shuffle.partitions instead. > spark.sql.shuffle.partitions 1 > Time taken: 0.03 seconds, Fetched 1 row(s) > spark-sql> set spark.sql.shuffle.partitions; > spark.sql.shuffle.partitions 200 > Time taken: 0.024 seconds, Fetched 1 row(s) > spark-sql> set mapred.reduce.tasks=2; > 21/04/21 14:31:52 WARN SetCommand: Property mapred.reduce.tasks is > deprecated, automatically converted to spark.sql.shuffle.partitions instead. > spark.sql.shuffle.partitions 2 > Time taken: 0.017 seconds, Fetched 1 row(s) > spark-sql> set mapred.reduce.tasks; > 21/04/21 14:31:55 WARN SetCommand: Property mapred.reduce.tasks is > deprecated, showing spark.sql.shuffle.partitions instead. > spark.sql.shuffle.partitions 1 > Time taken: 0.017 seconds, Fetched 1 row(s) > spark-sql> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35209) CLONE - CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes
Kadir Selçuk created SPARK-35209: Summary: CLONE - CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes Key: SPARK-35209 URL: https://issues.apache.org/jira/browse/SPARK-35209 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Kadir Selçuk Assignee: Wenchen Fan Fix For: 3.2.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes
[ https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331117#comment-17331117 ] Kadir Selçuk commented on SPARK-35204: -- Sorunları çözmek > CatalystTypeConverters of date/timestamp should accept both the old and new > Java time classes > - > > Key: SPARK-35204 > URL: https://issues.apache.org/jira/browse/SPARK-35204 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35160) Spark application submitted despite failing to get Hive delegation token
[ https://issues.apache.org/jira/browse/SPARK-35160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331114#comment-17331114 ] Manu Zhang commented on SPARK-35160: [~hyukjin.kwon], Thanks for reminder. I've added my proposal and I did ask about it on mailing list. It will be great if you know the reasoning behind it or you may forward to someone who knows. > Spark application submitted despite failing to get Hive delegation token > > > Key: SPARK-35160 > URL: https://issues.apache.org/jira/browse/SPARK-35160 > Project: Spark > Issue Type: Improvement > Components: Security >Affects Versions: 3.1.1 >Reporter: Manu Zhang >Priority: Major > > Currently, when running on YARN and failing to get Hive delegation token, a > Spark SQL application will still be submitted. Eventually, the application > will fail on connecting to Hive metastore without a valid delegation token. > Is there any reason for this design ? > cc [~jerryshao] who originally implemented this in > https://issues.apache.org/jira/browse/SPARK-14743 > I'd propose to fail immediately like HadoopFSDelegationTokenProvider. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35160) Spark application submitted despite failing to get Hive delegation token
[ https://issues.apache.org/jira/browse/SPARK-35160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manu Zhang updated SPARK-35160: --- Description: Currently, when running on YARN and failing to get Hive delegation token, a Spark SQL application will still be submitted. Eventually, the application will fail on connecting to Hive metastore without a valid delegation token. Is there any reason for this design ? cc [~jerryshao] who originally implemented this in https://issues.apache.org/jira/browse/SPARK-14743 I'd propose to fail immediately like HadoopFSDelegationTokenProvider. was: Currently, when running on YARN and failing to get Hive delegation token, a Spark SQL application will still be submitted. Eventually, the application will fail on connecting to Hive metastore without a valid delegation token. Is there any reason for this design ? cc [~jerryshao] who originally implemented this in https://issues.apache.org/jira/browse/SPARK-14743 > Spark application submitted despite failing to get Hive delegation token > > > Key: SPARK-35160 > URL: https://issues.apache.org/jira/browse/SPARK-35160 > Project: Spark > Issue Type: Improvement > Components: Security >Affects Versions: 3.1.1 >Reporter: Manu Zhang >Priority: Major > > Currently, when running on YARN and failing to get Hive delegation token, a > Spark SQL application will still be submitted. Eventually, the application > will fail on connecting to Hive metastore without a valid delegation token. > Is there any reason for this design ? > cc [~jerryshao] who originally implemented this in > https://issues.apache.org/jira/browse/SPARK-14743 > I'd propose to fail immediately like HadoopFSDelegationTokenProvider. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28247) Flaky test: "query without test harness" in ContinuousSuite
[ https://issues.apache.org/jira/browse/SPARK-28247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331103#comment-17331103 ] Apache Spark commented on SPARK-28247: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/32316 > Flaky test: "query without test harness" in ContinuousSuite > --- > > Key: SPARK-28247 > URL: https://issues.apache.org/jira/browse/SPARK-28247 > Project: Spark > Issue Type: Test > Components: Structured Streaming, Tests >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 2.4.4, 3.0.0 > > > This test has failed a few times in some PRs, as well as easy to reproduce > locally. Example of a failure: > {noformat} > [info] - query without test harness *** FAILED *** (2 seconds, 931 > milliseconds) > [info] scala.Predef.Set.apply[Int](0, 1, 2, > 3).map[org.apache.spark.sql.Row, > scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => > org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row]) > was false > (ContinuousSuite.scala:226){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35208) Add docs for LATERAL subqueries
Allison Wang created SPARK-35208: Summary: Add docs for LATERAL subqueries Key: SPARK-35208 URL: https://issues.apache.org/jira/browse/SPARK-35208 Project: Spark Issue Type: Task Components: docs Affects Versions: 3.2.0 Reporter: Allison Wang Add documentation for LATERAL subqueries. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE
[ https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331096#comment-17331096 ] Wei Xue commented on SPARK-35133: - Totally understand. But turning off AQE is sometimes part of the debugging process, too, in order to isolate the problem. > EXPLAIN CODEGEN does not work with AQE > -- > > Key: SPARK-35133 > URL: https://issues.apache.org/jira/browse/SPARK-35133 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Priority: Major > > `EXPLAIN CODEGEN ` (and Dataset.explain("codegen")) prints out the > generated code for each stage of plan. The current implementation is to match > `WholeStageCodegenExec` operator in query plan and prints out generated code > ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L111-L118] > ). This does not work with AQE as we wrap the whole query plan inside > `AdaptiveSparkPlanExec` and do not run whole stage code-gen physical plan > rule eagerly (`CollapseCodegenStages`). This introduces unexpected behavior > change for EXPLAIN query (and Dataset.explain), as we enable AQE by default > now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35207) hash() and other hash builtins do not normalize negative zero
[ https://issues.apache.org/jira/browse/SPARK-35207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated SPARK-35207: -- Description: I would generally expect that {{x = y => hash(x) = hash(y)}}. However +-0 hash to different values for floating point types. {noformat} scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as double))").show +-+--+ |hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))| +-+--+ | -1670924195|-853646085| +-+--+ scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show ++ |(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))| ++ |true| ++ {noformat} I'm not sure how likely this is to cause issues in practice, since only a limited number of calculations can produce -0 and joining or aggregating with floating point keys is a bad practice as a general rule, but I think it would be safer if we normalised -0.0 to +0.0. was: I would generally expect that x = y => hash(x) = hash(y). However +-0 hash to different values for floating point types. {noformat} scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as double))").show +-+--+ |hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))| +-+--+ | -1670924195|-853646085| +-+--+ scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show ++ |(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))| ++ |true| ++ {noformat} I'm not sure how likely this is to cause issues in practice, since only a limited number of calculations can produce -0 and joining or aggregating with floating point keys is a bad practice as a general rule, but I think it would be safer if we normalised -0.0 to +0.0. > hash() and other hash builtins do not normalize negative zero > - > > Key: SPARK-35207 > URL: https://issues.apache.org/jira/browse/SPARK-35207 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Tim Armstrong >Priority: Major > Labels: correctness > > I would generally expect that {{x = y => hash(x) = hash(y)}}. However +-0 > hash to different values for floating point types. > {noformat} > scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as > double))").show > +-+--+ > |hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))| > +-+--+ > | -1670924195|-853646085| > +-+--+ > scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as > double)").show > ++ > |(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))| > ++ > |true| > ++ > {noformat} > I'm not sure how likely this is to cause issues in practice, since only a > limited number of calculations can produce -0 and joining or aggregating with > floating point keys is a bad practice as a general rule, but I think it would > be safer if we normalised -0.0 to +0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35207) hash() and other hash builtins do not normalize negative zero
[ https://issues.apache.org/jira/browse/SPARK-35207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated SPARK-35207: -- Description: I would generally expect that {{x = y => hash( x ) = hash( y )}}. However +-0 hash to different values for floating point types. {noformat} scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as double))").show +-+--+ |hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))| +-+--+ | -1670924195|-853646085| +-+--+ scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show ++ |(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))| ++ |true| ++ {noformat} I'm not sure how likely this is to cause issues in practice, since only a limited number of calculations can produce -0 and joining or aggregating with floating point keys is a bad practice as a general rule, but I think it would be safer if we normalised -0.0 to +0.0. was: I would generally expect that {{x = y => hash(x) = hash(y)}}. However +-0 hash to different values for floating point types. {noformat} scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as double))").show +-+--+ |hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))| +-+--+ | -1670924195|-853646085| +-+--+ scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show ++ |(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))| ++ |true| ++ {noformat} I'm not sure how likely this is to cause issues in practice, since only a limited number of calculations can produce -0 and joining or aggregating with floating point keys is a bad practice as a general rule, but I think it would be safer if we normalised -0.0 to +0.0. > hash() and other hash builtins do not normalize negative zero > - > > Key: SPARK-35207 > URL: https://issues.apache.org/jira/browse/SPARK-35207 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Tim Armstrong >Priority: Major > Labels: correctness > > I would generally expect that {{x = y => hash( x ) = hash( y )}}. However +-0 > hash to different values for floating point types. > {noformat} > scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as > double))").show > +-+--+ > |hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))| > +-+--+ > | -1670924195|-853646085| > +-+--+ > scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as > double)").show > ++ > |(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))| > ++ > |true| > ++ > {noformat} > I'm not sure how likely this is to cause issues in practice, since only a > limited number of calculations can produce -0 and joining or aggregating with > floating point keys is a bad practice as a general rule, but I think it would > be safer if we normalised -0.0 to +0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35207) hash() and other hash builtins do not normalize negative zero
Tim Armstrong created SPARK-35207: - Summary: hash() and other hash builtins do not normalize negative zero Key: SPARK-35207 URL: https://issues.apache.org/jira/browse/SPARK-35207 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.1 Reporter: Tim Armstrong I would generally expect that x = y => hash(x) = hash(y). However +-0 hash to different values for floating point types. {noformat} scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as double))").show +-+--+ |hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))| +-+--+ | -1670924195|-853646085| +-+--+ scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show ++ |(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))| ++ |true| ++ {noformat} I'm not sure how likely this is to cause issues in practice, since only a limited number of calculations can produce -0 and joining or aggregating with floating point keys is a bad practice as a general rule, but I think it would be safer if we normalised -0.0 to +0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35168) mapred.reduce.tasks should be shuffle.partitions not adaptive.coalescePartitions.initialPartitionNum
[ https://issues.apache.org/jira/browse/SPARK-35168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331059#comment-17331059 ] Dongjoon Hyun commented on SPARK-35168: --- Thank you, [~Qin Yao]. I converted this into a subtask of SPARK-33828 in order to give more visibility. > mapred.reduce.tasks should be shuffle.partitions not > adaptive.coalescePartitions.initialPartitionNum > > > Key: SPARK-35168 > URL: https://issues.apache.org/jira/browse/SPARK-35168 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.2, 3.1.1, 3.2.0 >Reporter: Kent Yao >Priority: Minor > > {code:java} > spark-sql> set spark.sql.adaptive.coalescePartitions.initialPartitionNum=1; > spark.sql.adaptive.coalescePartitions.initialPartitionNum 1 > Time taken: 2.18 seconds, Fetched 1 row(s) > spark-sql> set mapred.reduce.tasks; > 21/04/21 14:27:11 WARN SetCommand: Property mapred.reduce.tasks is > deprecated, showing spark.sql.shuffle.partitions instead. > spark.sql.shuffle.partitions 1 > Time taken: 0.03 seconds, Fetched 1 row(s) > spark-sql> set spark.sql.shuffle.partitions; > spark.sql.shuffle.partitions 200 > Time taken: 0.024 seconds, Fetched 1 row(s) > spark-sql> set mapred.reduce.tasks=2; > 21/04/21 14:31:52 WARN SetCommand: Property mapred.reduce.tasks is > deprecated, automatically converted to spark.sql.shuffle.partitions instead. > spark.sql.shuffle.partitions 2 > Time taken: 0.017 seconds, Fetched 1 row(s) > spark-sql> set mapred.reduce.tasks; > 21/04/21 14:31:55 WARN SetCommand: Property mapred.reduce.tasks is > deprecated, showing spark.sql.shuffle.partitions instead. > spark.sql.shuffle.partitions 1 > Time taken: 0.017 seconds, Fetched 1 row(s) > spark-sql> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35168) mapred.reduce.tasks should be shuffle.partitions not adaptive.coalescePartitions.initialPartitionNum
[ https://issues.apache.org/jira/browse/SPARK-35168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35168: -- Parent: SPARK-33828 Issue Type: Sub-task (was: Bug) > mapred.reduce.tasks should be shuffle.partitions not > adaptive.coalescePartitions.initialPartitionNum > > > Key: SPARK-35168 > URL: https://issues.apache.org/jira/browse/SPARK-35168 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.2, 3.1.1, 3.2.0 >Reporter: Kent Yao >Priority: Minor > > {code:java} > spark-sql> set spark.sql.adaptive.coalescePartitions.initialPartitionNum=1; > spark.sql.adaptive.coalescePartitions.initialPartitionNum 1 > Time taken: 2.18 seconds, Fetched 1 row(s) > spark-sql> set mapred.reduce.tasks; > 21/04/21 14:27:11 WARN SetCommand: Property mapred.reduce.tasks is > deprecated, showing spark.sql.shuffle.partitions instead. > spark.sql.shuffle.partitions 1 > Time taken: 0.03 seconds, Fetched 1 row(s) > spark-sql> set spark.sql.shuffle.partitions; > spark.sql.shuffle.partitions 200 > Time taken: 0.024 seconds, Fetched 1 row(s) > spark-sql> set mapred.reduce.tasks=2; > 21/04/21 14:31:52 WARN SetCommand: Property mapred.reduce.tasks is > deprecated, automatically converted to spark.sql.shuffle.partitions instead. > spark.sql.shuffle.partitions 2 > Time taken: 0.017 seconds, Fetched 1 row(s) > spark-sql> set mapred.reduce.tasks; > 21/04/21 14:31:55 WARN SetCommand: Property mapred.reduce.tasks is > deprecated, showing spark.sql.shuffle.partitions instead. > spark.sql.shuffle.partitions 1 > Time taken: 0.017 seconds, Fetched 1 row(s) > spark-sql> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34297) Add metrics for data loss and offset out range for KafkaMicroBatchStream
[ https://issues.apache.org/jira/browse/SPARK-34297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-34297. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31398 [https://github.com/apache/spark/pull/31398] > Add metrics for data loss and offset out range for KafkaMicroBatchStream > > > Key: SPARK-34297 > URL: https://issues.apache.org/jira/browse/SPARK-34297 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > When testing SS, I found it is hard to track data loss of SS reading from > Kafka. The micro scan node has only one metric, number of output rows. Users > have no idea how many times offsets to fetch are out of Kafak now, how many > times data loss happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE
[ https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331011#comment-17331011 ] Cheng Su commented on SPARK-35133: -- btw just to provide more context, I am running into this in reality when trying to debug code-gen for some queries in unit test. So I guess others can run into this issue as well. I will spend one afternoon or so to figure out if there's a clean fix. Thanks. > EXPLAIN CODEGEN does not work with AQE > -- > > Key: SPARK-35133 > URL: https://issues.apache.org/jira/browse/SPARK-35133 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Priority: Major > > `EXPLAIN CODEGEN ` (and Dataset.explain("codegen")) prints out the > generated code for each stage of plan. The current implementation is to match > `WholeStageCodegenExec` operator in query plan and prints out generated code > ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L111-L118] > ). This does not work with AQE as we wrap the whole query plan inside > `AdaptiveSparkPlanExec` and do not run whole stage code-gen physical plan > rule eagerly (`CollapseCodegenStages`). This introduces unexpected behavior > change for EXPLAIN query (and Dataset.explain), as we enable AQE by default > now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit
[ https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331003#comment-17331003 ] L. C. Hsieh commented on SPARK-35156: - > Can you just use a later version? it may be a problem that was fixed. > Or some issue in how you packager your app, like, having it include > incompatible K8S classes. Master is okay. I guess it is not intentionally fixed or is not backport? Because branch-3.1 has the issue. As branch-3.0 is okay too, maybe some change between 3.0 and master causes it. The exception is seen locally when running spark-submit. You can also see the above stack trace that the exception is thrown early in SparkSubmit. And master, branch-3.0 both are not affected from the issue. It seems not related to how packaging app is done. It is good if anyone can test it too in case I really did something incorrect during the tests. > Thrown java.lang.NoClassDefFoundError when using spark-submit > - > > Key: SPARK-35156 > URL: https://issues.apache.org/jira/browse/SPARK-35156 > Project: Spark > Issue Type: Bug > Components: Build, Kubernetes >Affects Versions: 3.1.1 >Reporter: L. C. Hsieh >Priority: Major > > Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S > cluster. > Master, branch-3.0 are okay. Branch-3.1 is affected. > How to reproduce: > 1. Using sbt to build Spark with Kubernetes (-Pkubernetes) > 2. Run spark-submit to submit to K8S cluster > 3. Get the following exception > {code:java} > 21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > > Exception in thread "main" java.lang.NoClassDefFoundError: > com/fasterxml/jackson/dataformat/yaml/YAMLFactory > > at > io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46) > > at > io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564) > > at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530) > > > at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264) > > > at io.fabric8.kubernetes.client.Config.(Config.java:230) > > > at io.fabric8.kubernetes.client.Config.(Config.java:224) > > > at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259) > > at > org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207) > > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621) > > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179) > > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) > > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > > > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.fa
[jira] [Commented] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit
[ https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330999#comment-17330999 ] L. C. Hsieh commented on SPARK-35156: - > Do you mean Branch-3.0 is affected alone? Sorry for the typo. Only branch-3.1 is affected. > Thrown java.lang.NoClassDefFoundError when using spark-submit > - > > Key: SPARK-35156 > URL: https://issues.apache.org/jira/browse/SPARK-35156 > Project: Spark > Issue Type: Bug > Components: Build, Kubernetes >Affects Versions: 3.1.1 >Reporter: L. C. Hsieh >Priority: Major > > Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S > cluster. > Master, branch-3.0 are okay. Branch-3.1 is affected. > How to reproduce: > 1. Using sbt to build Spark with Kubernetes (-Pkubernetes) > 2. Run spark-submit to submit to K8S cluster > 3. Get the following exception > {code:java} > 21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > > Exception in thread "main" java.lang.NoClassDefFoundError: > com/fasterxml/jackson/dataformat/yaml/YAMLFactory > > at > io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46) > > at > io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564) > > at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530) > > > at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264) > > > at io.fabric8.kubernetes.client.Config.(Config.java:230) > > > at io.fabric8.kubernetes.client.Config.(Config.java:224) > > > at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259) > > at > org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207) > > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621) > > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179) > > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) > > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > > > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.fasterxml.jackson.dataformat.yaml.YAMLFactory > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 19 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit
[ https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-35156: Description: Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S cluster. Master, branch-3.0 are okay. Branch-3.1 is affected. How to reproduce: 1. Using sbt to build Spark with Kubernetes (-Pkubernetes) 2. Run spark-submit to submit to K8S cluster 3. Get the following exception {code:java} 21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file Exception in thread "main" java.lang.NoClassDefFoundError: com/fasterxml/jackson/dataformat/yaml/YAMLFactory at io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46) at io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564) at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530) at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264) at io.fabric8.kubernetes.client.Config.(Config.java:230) at io.fabric8.kubernetes.client.Config.(Config.java:224) at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259) at org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.dataformat.yaml.YAMLFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 19 more {code} was: Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S cluster. Master, branch-3.1 are okay. Branch-3.1 is affected. How to reproduce: 1. Using sbt to build Spark with Kubernetes (-Pkubernetes) 2. Run spark-submit to submit to K8S cluster 3. Get the following exception {code:java} 21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file Exception in thread "main" java.lang.NoClassDefFoundError: com/fasterxml/jackson/dataformat/yaml/YAMLFactory at io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46) at io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)
[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE
[ https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330982#comment-17330982 ] Cheng Su commented on SPARK-35133: -- When ever developers/users want to debug generated code for query in spark-shell or spark-sql command line, they have to disable AQE explicitly. After debugging, they have to enable AQE back for running queries or doing some other stuff. I feel it's kind of inconvenient for debugging. > EXPLAIN CODEGEN does not work with AQE > -- > > Key: SPARK-35133 > URL: https://issues.apache.org/jira/browse/SPARK-35133 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Priority: Major > > `EXPLAIN CODEGEN ` (and Dataset.explain("codegen")) prints out the > generated code for each stage of plan. The current implementation is to match > `WholeStageCodegenExec` operator in query plan and prints out generated code > ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L111-L118] > ). This does not work with AQE as we wrap the whole query plan inside > `AdaptiveSparkPlanExec` and do not run whole stage code-gen physical plan > rule eagerly (`CollapseCodegenStages`). This introduces unexpected behavior > change for EXPLAIN query (and Dataset.explain), as we enable AQE by default > now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35013) Spark allows to set spark.driver.cores=0
[ https://issues.apache.org/jira/browse/SPARK-35013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-35013: - Issue Type: Improvement (was: Bug) > Spark allows to set spark.driver.cores=0 > > > Key: SPARK-35013 > URL: https://issues.apache.org/jira/browse/SPARK-35013 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.7, 3.1.1 >Reporter: Oleg Lypkan >Priority: Minor > > I found an inconsistency in [validation logic of Spark submit arguments > |https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L248-L258]that > allows *spark.driver.cores* value to be set to 0 but requires > *spark.driver.memory,* *spark.executor.cores, spark.executor.memory* to be > positive numbers: > {quote}Exception in thread "main" org.apache.spark.SparkException: Driver > memory must be a positive number > Exception in thread "main" org.apache.spark.SparkException: Executor cores > must be a positive number > Exception in thread "main" org.apache.spark.SparkException: Executor memory > must be a positive number > {quote} > I would like to understand if there is a reason for this inconsistency in the > validation logic or it is a bug? > Thank you -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35013) Spark allows to set spark.driver.cores=0
[ https://issues.apache.org/jira/browse/SPARK-35013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330978#comment-17330978 ] Sean R. Owen commented on SPARK-35013: -- I can't think of a reason to allow 0 cores. Feel free to open a PR. > Spark allows to set spark.driver.cores=0 > > > Key: SPARK-35013 > URL: https://issues.apache.org/jira/browse/SPARK-35013 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.7, 3.1.1 >Reporter: Oleg Lypkan >Priority: Minor > > I found an inconsistency in [validation logic of Spark submit arguments > |https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L248-L258]that > allows *spark.driver.cores* value to be set to 0 but requires > *spark.driver.memory,* *spark.executor.cores, spark.executor.memory* to be > positive numbers: > {quote}Exception in thread "main" org.apache.spark.SparkException: Driver > memory must be a positive number > Exception in thread "main" org.apache.spark.SparkException: Executor cores > must be a positive number > Exception in thread "main" org.apache.spark.SparkException: Executor memory > must be a positive number > {quote} > I would like to understand if there is a reason for this inconsistency in the > validation logic or it is a bug? > Thank you -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35027) Close the inputStream in FileAppender when writing the logs failure
[ https://issues.apache.org/jira/browse/SPARK-35027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330976#comment-17330976 ] Sean R. Owen commented on SPARK-35027: -- Are you sure? stop() is called on an error on these FileAppenders. > Close the inputStream in FileAppender when writing the logs failure > --- > > Key: SPARK-35027 > URL: https://issues.apache.org/jira/browse/SPARK-35027 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Jack Hu >Priority: Major > > In Spark Cluster, the ExecutorRunner uses FileAppender to redirect the > stdout/stderr of executors to file, when the writing processing is failure > due to some reasons: disk full, the FileAppender will only close the input > stream to file, but leave the pipe's stdout/stderr open, following writting > operation in executor side may be hung. > need to close the inputStream in FileAppender ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35046) Wrong memory allocation on standalone mode cluster
[ https://issues.apache.org/jira/browse/SPARK-35046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-35046. -- Resolution: Invalid > Wrong memory allocation on standalone mode cluster > -- > > Key: SPARK-35046 > URL: https://issues.apache.org/jira/browse/SPARK-35046 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 3.0.1 >Reporter: Mohamadreza Rostami >Priority: Major > > I see a bug in executer memory allocation in the standalone cluster, but I > can't find which part of the spark code causes this problem. That why's I > decided to raise this issue here. > Assume you have 3 workers with 10 CPU cores and 10 Gigabyte memories. Assume > also you have 2 spark jobs that run on this cluster of workers, and these > jobs configs set as below: > - > job-1: > executer-memory: 5g > executer-CPU: 4 > max-cores: 8 > -- > job-2: > executer-memory: 6g > executer-CPU: 4 > max-cores: 8 > -- > In this situation, We expect that if we submit both of these jobs, the first > job that submits get 2 executers which each of them has 4 CPU core and 5g > memory, and the second job gets only one executer on thirds worker who has 4 > CPU core and 6g memory because worker 1 and worker 2 doesn't have enough > memory to accept the second job. But surprisingly, we see that one of the > first or second workers creates an executor for job-2, and the worker's > consuming memory goes beyond what's allocated to that and gets 11g memory > from the operating system. > Is this behavior normal? I think this can cause some undefined behavior > problem in the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35054) Getting Critical Vulnerability CVE-2021-20231 on spark 3.0.0 branch
[ https://issues.apache.org/jira/browse/SPARK-35054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-35054. -- Resolution: Invalid There's no info about what these are or if they even affect Spark. > Getting Critical Vulnerability CVE-2021-20231 on spark 3.0.0 branch > --- > > Key: SPARK-35054 > URL: https://issues.apache.org/jira/browse/SPARK-35054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shashank Jain >Priority: Major > > Currently while running Trivy Scan on Spark build we are getting the > following critical vulnerability > CVE-2021-20231 > CVE-2021-20232 > How to fix these vulnerabilities in spark 3.0.0 branch ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit
[ https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330971#comment-17330971 ] Sean R. Owen commented on SPARK-35156: -- Can you just use a later version? it may be a problem that was fixed. Or some issue in how you packager your app, like, having it include incompatible K8S classes. > Thrown java.lang.NoClassDefFoundError when using spark-submit > - > > Key: SPARK-35156 > URL: https://issues.apache.org/jira/browse/SPARK-35156 > Project: Spark > Issue Type: Bug > Components: Build, Kubernetes >Affects Versions: 3.1.1 >Reporter: L. C. Hsieh >Priority: Major > > Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S > cluster. > Master, branch-3.1 are okay. Branch-3.1 is affected. > How to reproduce: > 1. Using sbt to build Spark with Kubernetes (-Pkubernetes) > 2. Run spark-submit to submit to K8S cluster > 3. Get the following exception > {code:java} > 21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > > Exception in thread "main" java.lang.NoClassDefFoundError: > com/fasterxml/jackson/dataformat/yaml/YAMLFactory > > at > io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46) > > at > io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564) > > at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530) > > > at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264) > > > at io.fabric8.kubernetes.client.Config.(Config.java:230) > > > at io.fabric8.kubernetes.client.Config.(Config.java:224) > > > at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259) > > at > org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207) > > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621) > > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179) > > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) > > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > > > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.fasterxml.jackson.dataformat.yaml.YAMLFactory > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 19 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For addi
[jira] [Resolved] (SPARK-35193) Scala/Java compatibility issue Re: how to use externalResource in java transformer from Scala Transformer?
[ https://issues.apache.org/jira/browse/SPARK-35193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-35193. -- Resolution: Invalid I think this should be a question to the user@ list - I don't see reason to believe it's a Spark issue. There are several things that could be wrong, like, ExternalResourceParam not extending Param or not having the right name, etc. > Scala/Java compatibility issue Re: how to use externalResource in java > transformer from Scala Transformer? > -- > > Key: SPARK-35193 > URL: https://issues.apache.org/jira/browse/SPARK-35193 > Project: Spark > Issue Type: Bug > Components: Java API, ML >Affects Versions: 3.1.1 >Reporter: Arthur >Priority: Major > > I am trying to make a custom transformer use an externalResource, as it > requires a large table to do the transformation. I'm not super familiar with > scala syntax, but from snippets found on the internet I think I've made a > proper java implementation. I am running into the following error: > Exception in thread "main" java.lang.IllegalArgumentException: requirement > failed: Param HardMatchDetector_d95b8f699114__externalResource does not > belong to HardMatchDetector_d95b8f699114. > at scala.Predef$.require(Predef.scala:281) > at org.apache.spark.ml.param.Params.shouldOwn(params.scala:851) > at org.apache.spark.ml.param.Params.set(params.scala:727) > at org.apache.spark.ml.param.Params.set$(params.scala:726) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) > at org.apache.spark.ml.param.Params.set(params.scala:713) > at org.apache.spark.ml.param.Params.set$(params.scala:712) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) > at HardMatchDetector.setResource(HardMatchDetector.java:45) > > Code as follows: > {code:java} > public class HardMatchDetector extends Transformer implements > DefaultParamsWritable, DefaultParamsReadable, Serializable { > public String inputColumn = "value"; > public String outputColumn = "hardMatches"; > private ExternalResourceParam resourceParam = new > ExternalResourceParam(this, "externalResource", "external resource, parquet > file with 2 columns, one names and one wordcount");; > private String uid; > public HardMatchDetector setResource(final ExternalResource value) > { return (HardMatchDetector)this.set(this.resourceParam, value); } > public HardMatchDetector setResource(final String path) > { return this.setResource(new ExternalResource(path, ReadAs.TEXT(), new > HashMap())); } > @Override > public String uid() > { return getUid(); } > private String getUid() { > if (uid == null) > { uid = Identifiable$.MODULE$.randomUID("HardMatchDetector"); } > return uid; > } > @Override > public Dataset transform(final Dataset dataset) > { return dataset; } > @Override > public StructType transformSchema(StructType schema) > { return schema.add(DataTypes.createStructField(outputColumn, > DataTypes.StringType, true)); } > @Override > public Transformer copy(ParamMap extra) > { return new HardMatchDetector(); } > } > public class HardMatcherTest extends AbstractSparkTest > { @Test > public void test() > { > var hardMatcher = new HardMatchDetector().setResource(pathName); } > } > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34430) Update index.md with a pyspark hint to avoid java.nio.DirectByteBuffer.(long, int) not available
[ https://issues.apache.org/jira/browse/SPARK-34430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-34430. -- Fix Version/s: (was: 3.0.0) Target Version/s: (was: 3.0.0) Resolution: Won't Fix > Update index.md with a pyspark hint to avoid java.nio.DirectByteBuffer.(long, > int) not available > > > Key: SPARK-34430 > URL: https://issues.apache.org/jira/browse/SPARK-34430 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Marco van der Linden >Priority: Trivial > Labels: pull-request-available > Original Estimate: 1h > Remaining Estimate: 1h > > Took us a while to figure out how to fix this with pyspark this might save a > few people a few hours... > > The documentation describes vaguely how to fix the issue, by setting a > parameter but without an actual working example. > With the given PySpark example it should hold enough information to set this > in other scenarios as well. > > > Kept the change to the docs as small as possible. > h3. What changes were proposed in this pull request? > doc update, see title > h3. Why are the changes needed? > save people time figuring out how to resolve it > h3. Does this PR introduce _any_ user-facing change? > no > h3. How was this patch tested? > no code changes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35197) Accumulators Explore Page on Spark UI on History Server
[ https://issues.apache.org/jira/browse/SPARK-35197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frida Montserrat Pulido Padilla updated SPARK-35197: Summary: Accumulators Explore Page on Spark UI on History Server (was: Accumulators Explore Page on Spark UI in History Server) > Accumulators Explore Page on Spark UI on History Server > --- > > Key: SPARK-35197 > URL: https://issues.apache.org/jira/browse/SPARK-35197 > Project: Spark > Issue Type: New Feature > Components: Spark Core, Web UI >Affects Versions: 2.4.4 >Reporter: Frida Montserrat Pulido Padilla >Priority: Minor > Labels: accumulators, ui > Fix For: 2.4.4 > > > Proposition for *Accumulators Explore Page* on *SparkUI*: The particular > information for the accumulators will be located under a new tab that has an > overview page with links to check for more details about the accumulators > information by a particular name or stage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE
[ https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330920#comment-17330920 ] Wei Xue commented on SPARK-35133: - I'm not against fixing it. But just wondering is it even worth the trouble? > EXPLAIN CODEGEN does not work with AQE > -- > > Key: SPARK-35133 > URL: https://issues.apache.org/jira/browse/SPARK-35133 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Priority: Major > > `EXPLAIN CODEGEN ` (and Dataset.explain("codegen")) prints out the > generated code for each stage of plan. The current implementation is to match > `WholeStageCodegenExec` operator in query plan and prints out generated code > ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L111-L118] > ). This does not work with AQE as we wrap the whole query plan inside > `AdaptiveSparkPlanExec` and do not run whole stage code-gen physical plan > rule eagerly (`CollapseCodegenStages`). This introduces unexpected behavior > change for EXPLAIN query (and Dataset.explain), as we enable AQE by default > now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34458) Spark-hive: apache hive dependency with CVEs
[ https://issues.apache.org/jira/browse/SPARK-34458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330873#comment-17330873 ] Bhupesh edited comment on SPARK-34458 at 4/23/21, 4:48 PM: --- I am going to work on it was (Author: bdhiman84): I found that, this is already upgraded twice. Following are the git link of change. * * [https://github.pie.apple.com/blnu/apache-spark/commit/29e7d354a896fbf5a00e22da6554356aa0d4eb95] * [https://github.pie.apple.com/blnu/apache-spark/commit/181d326a98c07d6021f11d5eb85962360bd8406d] > Spark-hive: apache hive dependency with CVEs > > > Key: SPARK-34458 > URL: https://issues.apache.org/jira/browse/SPARK-34458 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Gang Liang >Priority: Major > > Apache hive version 2.3.7 used by spark-hive (version 3.0.1) has the > following CVEs, as reported by our security team. > CVE-2017-12625, CVE-2015-1772, CVE-2016-3083, CVE-2018-11777, CVE-2014-0228 > Please upgrade apache hive libraries to a higher version with no known > security risks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes
[ https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-35204: Assignee: Wenchen Fan > CatalystTypeConverters of date/timestamp should accept both the old and new > Java time classes > - > > Key: SPARK-35204 > URL: https://issues.apache.org/jira/browse/SPARK-35204 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes
[ https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-35204. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32312 [https://github.com/apache/spark/pull/32312] > CatalystTypeConverters of date/timestamp should accept both the old and new > Java time classes > - > > Key: SPARK-35204 > URL: https://issues.apache.org/jira/browse/SPARK-35204 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34458) Spark-hive: apache hive dependency with CVEs
[ https://issues.apache.org/jira/browse/SPARK-34458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330873#comment-17330873 ] Bhupesh edited comment on SPARK-34458 at 4/23/21, 3:44 PM: --- I found that, this is already upgraded twice. Following are the git link of change. * * [https://github.pie.apple.com/blnu/apache-spark/commit/29e7d354a896fbf5a00e22da6554356aa0d4eb95] * [https://github.pie.apple.com/blnu/apache-spark/commit/181d326a98c07d6021f11d5eb85962360bd8406d] was (Author: bdhiman84): I found that, this is already upgraded twice. Following are the git link of change. * [https://github.pie.apple.com/blnu/apache-spark/commit/29e7d354a896fbf5a00e22da6554356aa0d4eb95] * [https://github.pie.apple.com/blnu/apache-spark/commit/181d326a98c07d6021f11d5eb85962360bd8406d] > Spark-hive: apache hive dependency with CVEs > > > Key: SPARK-34458 > URL: https://issues.apache.org/jira/browse/SPARK-34458 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Gang Liang >Priority: Major > > Apache hive version 2.3.7 used by spark-hive (version 3.0.1) has the > following CVEs, as reported by our security team. > CVE-2017-12625, CVE-2015-1772, CVE-2016-3083, CVE-2018-11777, CVE-2014-0228 > Please upgrade apache hive libraries to a higher version with no known > security risks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34458) Spark-hive: apache hive dependency with CVEs
[ https://issues.apache.org/jira/browse/SPARK-34458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330873#comment-17330873 ] Bhupesh commented on SPARK-34458: - I found that, this is already upgraded twice. Following are the git link of change. * [https://github.pie.apple.com/blnu/apache-spark/commit/29e7d354a896fbf5a00e22da6554356aa0d4eb95] * [https://github.pie.apple.com/blnu/apache-spark/commit/181d326a98c07d6021f11d5eb85962360bd8406d] > Spark-hive: apache hive dependency with CVEs > > > Key: SPARK-34458 > URL: https://issues.apache.org/jira/browse/SPARK-34458 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Gang Liang >Priority: Major > > Apache hive version 2.3.7 used by spark-hive (version 3.0.1) has the > following CVEs, as reported by our security team. > CVE-2017-12625, CVE-2015-1772, CVE-2016-3083, CVE-2018-11777, CVE-2014-0228 > Please upgrade apache hive libraries to a higher version with no known > security risks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35162) New SQL functions: TRY_ADD/TRY_DIVIDE
[ https://issues.apache.org/jira/browse/SPARK-35162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-35162: --- Summary: New SQL functions: TRY_ADD/TRY_DIVIDE (was: New SQL functions: TRY_ADD/TRY_SUBTRACT/TRY_MULTIPLY/TRY_DIVIDE/TRY_DIV) > New SQL functions: TRY_ADD/TRY_DIVIDE > - > > Key: SPARK-35162 > URL: https://issues.apache.org/jira/browse/SPARK-35162 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35161) Error-handling SQL functions
[ https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-35161: --- Summary: Error-handling SQL functions (was: Safe version SQL functions) > Error-handling SQL functions > > > Key: SPARK-35161 > URL: https://issues.apache.org/jira/browse/SPARK-35161 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Major > > Create new safe version SQL functions for existing SQL functions/operators, > which returns NULL if overflow/error occurs. So that: > 1. Users can manage to finish queries without interruptions in ANSI mode. > 2. Users can get NULLs instead of unreasonable results if overflow occurs > when ANSI mode is off. > For example, the behavior of the following SQL operations is unreasonable: > {code:java} > 2147483647 + 2 => -2147483647 > CAST(2147483648L AS INT) => -2147483648 > {code} > With the new safe version SQL functions: > {code:java} > TRY_ADD(2147483647, 2) => null > TRY_CAST(2147483648L AS INT) => null > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35161) Error-handling SQL functions
[ https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-35161: --- Description: Create new Error-handling version SQL functions for existing SQL functions/operators, which returns NULL if overflow/error occurs. So that: 1. Users can manage to finish queries without interruptions in ANSI mode. 2. Users can get NULLs instead of unreasonable results if overflow occurs when ANSI mode is off. For example, the behavior of the following SQL operations is unreasonable: {code:java} 2147483647 + 2 => -2147483647 CAST(2147483648L AS INT) => -2147483648 {code} With the new safe version SQL functions: {code:java} TRY_ADD(2147483647, 2) => null TRY_CAST(2147483648L AS INT) => null {code} was: Create new safe version SQL functions for existing SQL functions/operators, which returns NULL if overflow/error occurs. So that: 1. Users can manage to finish queries without interruptions in ANSI mode. 2. Users can get NULLs instead of unreasonable results if overflow occurs when ANSI mode is off. For example, the behavior of the following SQL operations is unreasonable: {code:java} 2147483647 + 2 => -2147483647 CAST(2147483648L AS INT) => -2147483648 {code} With the new safe version SQL functions: {code:java} TRY_ADD(2147483647, 2) => null TRY_CAST(2147483648L AS INT) => null {code} > Error-handling SQL functions > > > Key: SPARK-35161 > URL: https://issues.apache.org/jira/browse/SPARK-35161 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Major > > Create new Error-handling version SQL functions for existing SQL > functions/operators, which returns NULL if overflow/error occurs. So that: > 1. Users can manage to finish queries without interruptions in ANSI mode. > 2. Users can get NULLs instead of unreasonable results if overflow occurs > when ANSI mode is off. > For example, the behavior of the following SQL operations is unreasonable: > {code:java} > 2147483647 + 2 => -2147483647 > CAST(2147483648L AS INT) => -2147483648 > {code} > With the new safe version SQL functions: > {code:java} > TRY_ADD(2147483647, 2) => null > TRY_CAST(2147483648L AS INT) => null > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression
[ https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330803#comment-17330803 ] Hyukjin Kwon commented on SPARK-35196: -- Yeah, I think it's all implemented properly. We should probably add the alias at [https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CompressionCodecs.scala#L30-L36], and fix the documentations at DataFrameWriter.scala, DataStreamWriter.scala, streaming.py readwriter.py > DataFrameWriter.text support zstd compression > - > > Key: SPARK-35196 > URL: https://issues.apache.org/jira/browse/SPARK-35196 > Project: Spark > Issue Type: Task > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Leonard Lausen >Priority: Major > > [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html] > specifies that only the following compression codecs are supported: `none, > bzip2, gzip, lz4, snappy and deflate` > However, RDD API supports compression with zstd if users specify > 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the > saveAsTextFile method. > Please also expose zstd in the DataFrameWriter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression
[ https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330795#comment-17330795 ] Leonard Lausen commented on SPARK-35196: Great. Adding the alias should be straightforward but a helpful addition. I found the Python interface at [https://github.com/apache/spark/blob/faa928cefc8c1c6d7771aacd2ae7670162346361/python/pyspark/sql/readwriter.py#L1300-L1301] Could you point out where the _jdf.write / _jwrite / _jwriter are implemented? I suspect the alias needs to be added there. > DataFrameWriter.text support zstd compression > - > > Key: SPARK-35196 > URL: https://issues.apache.org/jira/browse/SPARK-35196 > Project: Spark > Issue Type: Task > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Leonard Lausen >Priority: Major > > [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html] > specifies that only the following compression codecs are supported: `none, > bzip2, gzip, lz4, snappy and deflate` > However, RDD API supports compression with zstd if users specify > 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the > saveAsTextFile method. > Please also expose zstd in the DataFrameWriter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35206) Extract common get project path ability as function to SparkFunctionSuite
[ https://issues.apache.org/jira/browse/SPARK-35206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330460#comment-17330460 ] Apache Spark commented on SPARK-35206: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/32315 > Extract common get project path ability as function to SparkFunctionSuite > - > > Key: SPARK-35206 > URL: https://issues.apache.org/jira/browse/SPARK-35206 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: wuyi >Priority: Major > > Spark sql has test suites to read resources when running tests. The way of > getting the path of resources is commonly used in different suites. We can > extract them into a function to ease the maintenance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35206) Extract common get project path ability as function to SparkFunctionSuite
[ https://issues.apache.org/jira/browse/SPARK-35206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35206: Assignee: Apache Spark > Extract common get project path ability as function to SparkFunctionSuite > - > > Key: SPARK-35206 > URL: https://issues.apache.org/jira/browse/SPARK-35206 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: wuyi >Assignee: Apache Spark >Priority: Major > > Spark sql has test suites to read resources when running tests. The way of > getting the path of resources is commonly used in different suites. We can > extract them into a function to ease the maintenance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35206) Extract common get project path ability as function to SparkFunctionSuite
[ https://issues.apache.org/jira/browse/SPARK-35206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330461#comment-17330461 ] Apache Spark commented on SPARK-35206: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/32315 > Extract common get project path ability as function to SparkFunctionSuite > - > > Key: SPARK-35206 > URL: https://issues.apache.org/jira/browse/SPARK-35206 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: wuyi >Priority: Major > > Spark sql has test suites to read resources when running tests. The way of > getting the path of resources is commonly used in different suites. We can > extract them into a function to ease the maintenance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35206) Extract common get project path ability as function to SparkFunctionSuite
[ https://issues.apache.org/jira/browse/SPARK-35206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35206: Assignee: (was: Apache Spark) > Extract common get project path ability as function to SparkFunctionSuite > - > > Key: SPARK-35206 > URL: https://issues.apache.org/jira/browse/SPARK-35206 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: wuyi >Priority: Major > > Spark sql has test suites to read resources when running tests. The way of > getting the path of resources is commonly used in different suites. We can > extract them into a function to ease the maintenance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35206) Extract common get project path ability as function to SparkFunctionSuite
wuyi created SPARK-35206: Summary: Extract common get project path ability as function to SparkFunctionSuite Key: SPARK-35206 URL: https://issues.apache.org/jira/browse/SPARK-35206 Project: Spark Issue Type: Improvement Components: SQL, Tests Affects Versions: 3.2.0 Reporter: wuyi Spark sql has test suites to read resources when running tests. The way of getting the path of resources is commonly used in different suites. We can extract them into a function to ease the maintenance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35201) Format empty grouping set exception in CUBE/ROLLUP
[ https://issues.apache.org/jira/browse/SPARK-35201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-35201. -- Fix Version/s: 3.2.0 Assignee: angerszhu Resolution: Fixed Resolved by https://github.com/apache/spark/pull/32307 > Format empty grouping set exception in CUBE/ROLLUP > -- > > Key: SPARK-35201 > URL: https://issues.apache.org/jira/browse/SPARK-35201 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > Format empty grouping set exception in CUBE/ROLLUP -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35123) read partitioned parquet: my_col=NOW replaced by on read()
[ https://issues.apache.org/jira/browse/SPARK-35123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35123. -- Resolution: Duplicate > read partitioned parquet: my_col=NOW replaced by on read() > - > > Key: SPARK-35123 > URL: https://issues.apache.org/jira/browse/SPARK-35123 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Killian >Priority: Major > > When reading parquet file partitioned with a column containing the value > "NOW", The value is interpreted as now() and replaced by the current time at > the moment of the read() funct is executed > {code:java} > // step to reproduce > df = spark.createDataFrame(data=[("NOW",1), ("TEST", 2)], schema = ["col1", > "id"]) > df.write.partitionBy("col1").parquet("test/test.parquet") > >>> /home/test/test.parquet/col1=NOW > df_loaded = spark.read.option( > "basePath", > "test/test.parquet", > ).parquet("test/test.parquet/col1=*") > >>> > +---+--+ > |id |col1 | > +---+--+ > |2 |TEST | > |1 |2021-04-18 14:36:46.532273| > +---+--+{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35123) read partitioned parquet: my_col=NOW replaced by on read()
[ https://issues.apache.org/jira/browse/SPARK-35123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330391#comment-17330391 ] Max Gekk commented on SPARK-35123: -- The PR [https://github.com/apache/spark/pull/31549] should fix this particular case. [~salticidae] Can you reproduce the issue on the master? > read partitioned parquet: my_col=NOW replaced by on read() > - > > Key: SPARK-35123 > URL: https://issues.apache.org/jira/browse/SPARK-35123 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Killian >Priority: Major > > When reading parquet file partitioned with a column containing the value > "NOW", The value is interpreted as now() and replaced by the current time at > the moment of the read() funct is executed > {code:java} > // step to reproduce > df = spark.createDataFrame(data=[("NOW",1), ("TEST", 2)], schema = ["col1", > "id"]) > df.write.partitionBy("col1").parquet("test/test.parquet") > >>> /home/test/test.parquet/col1=NOW > df_loaded = spark.read.option( > "basePath", > "test/test.parquet", > ).parquet("test/test.parquet/col1=*") > >>> > +---+--+ > |id |col1 | > +---+--+ > |2 |TEST | > |1 |2021-04-18 14:36:46.532273| > +---+--+{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35133) EXPLAIN CODEGEN does not work with AQE
[ https://issues.apache.org/jira/browse/SPARK-35133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330389#comment-17330389 ] Hyukjin Kwon commented on SPARK-35133: -- cc [~maryannxue] and [~Ngone51] FYI > EXPLAIN CODEGEN does not work with AQE > -- > > Key: SPARK-35133 > URL: https://issues.apache.org/jira/browse/SPARK-35133 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Priority: Major > > `EXPLAIN CODEGEN ` (and Dataset.explain("codegen")) prints out the > generated code for each stage of plan. The current implementation is to match > `WholeStageCodegenExec` operator in query plan and prints out generated code > ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L111-L118] > ). This does not work with AQE as we wrap the whole query plan inside > `AdaptiveSparkPlanExec` and do not run whole stage code-gen physical plan > rule eagerly (`CollapseCodegenStages`). This introduces unexpected behavior > change for EXPLAIN query (and Dataset.explain), as we enable AQE by default > now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35123) read partitioned parquet: my_col=NOW replaced by on read()
[ https://issues.apache.org/jira/browse/SPARK-35123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330390#comment-17330390 ] Hyukjin Kwon commented on SPARK-35123: -- [~maxgekk] FYI > read partitioned parquet: my_col=NOW replaced by on read() > - > > Key: SPARK-35123 > URL: https://issues.apache.org/jira/browse/SPARK-35123 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Killian >Priority: Major > > When reading parquet file partitioned with a column containing the value > "NOW", The value is interpreted as now() and replaced by the current time at > the moment of the read() funct is executed > {code:java} > // step to reproduce > df = spark.createDataFrame(data=[("NOW",1), ("TEST", 2)], schema = ["col1", > "id"]) > df.write.partitionBy("col1").parquet("test/test.parquet") > >>> /home/test/test.parquet/col1=NOW > df_loaded = spark.read.option( > "basePath", > "test/test.parquet", > ).parquet("test/test.parquet/col1=*") > >>> > +---+--+ > |id |col1 | > +---+--+ > |2 |TEST | > |1 |2021-04-18 14:36:46.532273| > +---+--+{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35149) I am facing this issue regularly, how to fix this issue.
[ https://issues.apache.org/jira/browse/SPARK-35149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330388#comment-17330388 ] Hyukjin Kwon commented on SPARK-35149: -- For questions, please use Spark mailing list. > I am facing this issue regularly, how to fix this issue. > > > Key: SPARK-35149 > URL: https://issues.apache.org/jira/browse/SPARK-35149 > Project: Spark > Issue Type: Question > Components: Spark Submit >Affects Versions: 2.2.2 >Reporter: Eppa Rakesh >Priority: Critical > > 21/04/19 21:02:11 WARN hdfs.DataStreamer: Exception for > BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312 > java.io.EOFException: Unexpected EOF while trying to read response from > server > at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:448) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213) > at > org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1086) > 21/04/19 21:04:01 WARN hdfs.DataStreamer: Error Recovery for > BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312 in pipeline > [DatanodeInfoWithStorage[10.34.39.42:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK], > > DatanodeInfoWithStorage[10.56.47.67:9866,DS-c28dab54-8fa0-4a49-80ec-345cc0cc52bd,DISK], > > DatanodeInfoWithStorage[10.56.47.55:9866,DS-79f5dd22-d0bc-4fe0-8e50-8a570779de17,DISK]]: > datanode > 0(DatanodeInfoWithStorage[10.56.47.36:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK]) > is bad. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35149) I am facing this issue regularly, how to fix this issue.
[ https://issues.apache.org/jira/browse/SPARK-35149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35149. -- Resolution: Invalid > I am facing this issue regularly, how to fix this issue. > > > Key: SPARK-35149 > URL: https://issues.apache.org/jira/browse/SPARK-35149 > Project: Spark > Issue Type: Question > Components: Spark Submit >Affects Versions: 2.2.2 >Reporter: Eppa Rakesh >Priority: Critical > > 21/04/19 21:02:11 WARN hdfs.DataStreamer: Exception for > BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312 > java.io.EOFException: Unexpected EOF while trying to read response from > server > at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:448) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213) > at > org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1086) > 21/04/19 21:04:01 WARN hdfs.DataStreamer: Error Recovery for > BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312 in pipeline > [DatanodeInfoWithStorage[10.34.39.42:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK], > > DatanodeInfoWithStorage[10.56.47.67:9866,DS-c28dab54-8fa0-4a49-80ec-345cc0cc52bd,DISK], > > DatanodeInfoWithStorage[10.56.47.55:9866,DS-79f5dd22-d0bc-4fe0-8e50-8a570779de17,DISK]]: > datanode > 0(DatanodeInfoWithStorage[10.56.47.36:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK]) > is bad. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35154) Rpc env not shutdown when shutdown method call by endpoint onStop
[ https://issues.apache.org/jira/browse/SPARK-35154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330387#comment-17330387 ] Hyukjin Kwon commented on SPARK-35154: -- {{RpcEndpoint}} isn't an API. > Rpc env not shutdown when shutdown method call by endpoint onStop > - > > Key: SPARK-35154 > URL: https://issues.apache.org/jira/browse/SPARK-35154 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 > Environment: spark-3.x >Reporter: LIU >Priority: Minor > > when i use this code to work, Rpc thread hangs up and not close gracefully. > i think when rpc thread called shutdown on OnStop method, it will try to put > MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, > it will make others thread return & stop but current thread which call OnStop > method to await current pool to stop. it makes current thread not stop, and > pending program. > I'm not sure that needs to be improved or not? > > {code:java} > //代码占位符{code} > test("Rpc env not shutdown when shutdown method call by endpoint onStop") { > val rpcEndpoint = new RpcEndpoint { > override val rpcEnv: RpcEnv = env > override def onStop(): Unit = { > env.shutdown() > env.awaitTermination() > } > override def receiveAndReply(context: RpcCallContext): > PartialFunction[Any, Unit] = { > case m => context.reply(m) > } > } > env.setupEndpoint("test", rpcEndpoint) > rpcEndpoint.stop() > env.awaitTermination() > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35154) Rpc env not shutdown when shutdown method call by endpoint onStop
[ https://issues.apache.org/jira/browse/SPARK-35154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35154. -- Resolution: Invalid > Rpc env not shutdown when shutdown method call by endpoint onStop > - > > Key: SPARK-35154 > URL: https://issues.apache.org/jira/browse/SPARK-35154 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 > Environment: spark-3.x >Reporter: LIU >Priority: Minor > > when i use this code to work, Rpc thread hangs up and not close gracefully. > i think when rpc thread called shutdown on OnStop method, it will try to put > MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, > it will make others thread return & stop but current thread which call OnStop > method to await current pool to stop. it makes current thread not stop, and > pending program. > I'm not sure that needs to be improved or not? > > {code:java} > //代码占位符{code} > test("Rpc env not shutdown when shutdown method call by endpoint onStop") { > val rpcEndpoint = new RpcEndpoint { > override val rpcEnv: RpcEnv = env > override def onStop(): Unit = { > env.shutdown() > env.awaitTermination() > } > override def receiveAndReply(context: RpcCallContext): > PartialFunction[Any, Unit] = { > case m => context.reply(m) > } > } > env.setupEndpoint("test", rpcEndpoint) > rpcEndpoint.stop() > env.awaitTermination() > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit
[ https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330386#comment-17330386 ] Hyukjin Kwon commented on SPARK-35156: -- [~viirya] no big deal but: {quote} Master, branch-3.1 are okay. Branch-3.1 is affected {quote} Do you mean Branch-3.0 is affected alone? > Thrown java.lang.NoClassDefFoundError when using spark-submit > - > > Key: SPARK-35156 > URL: https://issues.apache.org/jira/browse/SPARK-35156 > Project: Spark > Issue Type: Bug > Components: Build, Kubernetes >Affects Versions: 3.1.1 >Reporter: L. C. Hsieh >Priority: Major > > Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S > cluster. > Master, branch-3.1 are okay. Branch-3.1 is affected. > How to reproduce: > 1. Using sbt to build Spark with Kubernetes (-Pkubernetes) > 2. Run spark-submit to submit to K8S cluster > 3. Get the following exception > {code:java} > 21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > > Exception in thread "main" java.lang.NoClassDefFoundError: > com/fasterxml/jackson/dataformat/yaml/YAMLFactory > > at > io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46) > > at > io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564) > > at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530) > > > at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264) > > > at io.fabric8.kubernetes.client.Config.(Config.java:230) > > > at io.fabric8.kubernetes.client.Config.(Config.java:224) > > > at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259) > > at > org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207) > > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621) > > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207) > > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179) > > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) > > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > > > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.fasterxml.jackson.dataformat.yaml.YAMLFactory > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 19 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: i
[jira] [Commented] (SPARK-35160) Spark application submitted despite failing to get Hive delegation token
[ https://issues.apache.org/jira/browse/SPARK-35160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330385#comment-17330385 ] Hyukjin Kwon commented on SPARK-35160: -- [~mauzhang] if this is a question, it should better be asked to the mailing list. If you file an issue, it would be greatly helpful what suggestion you would propose. > Spark application submitted despite failing to get Hive delegation token > > > Key: SPARK-35160 > URL: https://issues.apache.org/jira/browse/SPARK-35160 > Project: Spark > Issue Type: Improvement > Components: Security >Affects Versions: 3.1.1 >Reporter: Manu Zhang >Priority: Major > > Currently, when running on YARN and failing to get Hive delegation token, a > Spark SQL application will still be submitted. Eventually, the application > will fail on connecting to Hive metastore without a valid delegation token. > Is there any reason for this design ? > cc [~jerryshao] who originally implemented this in > https://issues.apache.org/jira/browse/SPARK-14743 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35176) Raise TypeError in inappropriate type case rather than ValueError
[ https://issues.apache.org/jira/browse/SPARK-35176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330383#comment-17330383 ] Hyukjin Kwon commented on SPARK-35176: -- [~yikunkero] Please go ahead for a PR > Raise TypeError in inappropriate type case rather than ValueError > -- > > Key: SPARK-35176 > URL: https://issues.apache.org/jira/browse/SPARK-35176 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Priority: Minor > > There are many wrong error type usages on ValueError type. > When an operation or function is applied to an object of inappropriate type, > we should use TypeError rather than ValueError. > such as: > [https://github.com/apache/spark/blob/355c39939d9e4c87ffc9538eb822a41cb2ff93fb/python/pyspark/sql/dataframe.py#L1137] > [https://github.com/apache/spark/blob/355c39939d9e4c87ffc9538eb822a41cb2ff93fb/python/pyspark/sql/dataframe.py#L1228] > > We should do some correction in some right time, note that if we do these > corrections, it will break some catch on original ValueError. > > [1] https://docs.python.org/3/library/exceptions.html#TypeError -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35184) Filtering a dataframe after groupBy and user-define-aggregate-function in Pyspark will cause java.lang.UnsupportedOperationException
[ https://issues.apache.org/jira/browse/SPARK-35184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35184. -- Resolution: Cannot Reproduce > Filtering a dataframe after groupBy and user-define-aggregate-function in > Pyspark will cause java.lang.UnsupportedOperationException > > > Key: SPARK-35184 > URL: https://issues.apache.org/jira/browse/SPARK-35184 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.4.0 >Reporter: Xiao Jin >Priority: Major > > I found some strange error when I'm coding Pyspark UDAF. After I call groupBy > function and agg function, I want to filter some data from remaining > dataframe, but it seems not work. My sample code is below. > {code:java} > >>> from pyspark.sql.functions import pandas_udf, PandasUDFType, col > >>> df = spark.createDataFrame( > ... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], > ... ("id", "v")) > >>> @pandas_udf("double", PandasUDFType.GROUPED_AGG) > ... def mean_udf(v): > ... return v.mean() > >>> df.groupby("id").agg(mean_udf(df['v']).alias("mean")).filter(col("mean") > >>> > 5).show() > {code} > The code above will cause exception printed below > {code:java} > Traceback (most recent call last): > File "", line 1, in > File "/opt/spark/python/pyspark/sql/dataframe.py", line 378, in show > print(self._jdf.showString(n, 20, vertical)) > File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line > 1257, in __call__ > File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco > return f(*a, **kw) > File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line > 328, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o3717.showString. > : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, > tree: > Exchange hashpartitioning(id#1726L, 200) > +- *(1) Filter (mean_udf(v#1727) > 5.0) >+- Scan ExistingRDD[id#1726L,v#1727] > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391) > at > org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:121) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.python.AggregateInPandasExec.doExecute(AggregateInPandasExec.scala:80) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247) > at > org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339) > at > org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) > at > org.apache.spark.sql.Dataset.org$apache$spark$s
[jira] [Commented] (SPARK-35184) Filtering a dataframe after groupBy and user-define-aggregate-function in Pyspark will cause java.lang.UnsupportedOperationException
[ https://issues.apache.org/jira/browse/SPARK-35184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330381#comment-17330381 ] Hyukjin Kwon commented on SPARK-35184: -- Seems like it works in the latest master branch: {code:java} +---++ | id|mean| +---++ | 2| 6.0| +---++ {code} It would be great if we can identify and see if we can backport. > Filtering a dataframe after groupBy and user-define-aggregate-function in > Pyspark will cause java.lang.UnsupportedOperationException > > > Key: SPARK-35184 > URL: https://issues.apache.org/jira/browse/SPARK-35184 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.4.0 >Reporter: Xiao Jin >Priority: Major > > I found some strange error when I'm coding Pyspark UDAF. After I call groupBy > function and agg function, I want to filter some data from remaining > dataframe, but it seems not work. My sample code is below. > {code:java} > >>> from pyspark.sql.functions import pandas_udf, PandasUDFType, col > >>> df = spark.createDataFrame( > ... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], > ... ("id", "v")) > >>> @pandas_udf("double", PandasUDFType.GROUPED_AGG) > ... def mean_udf(v): > ... return v.mean() > >>> df.groupby("id").agg(mean_udf(df['v']).alias("mean")).filter(col("mean") > >>> > 5).show() > {code} > The code above will cause exception printed below > {code:java} > Traceback (most recent call last): > File "", line 1, in > File "/opt/spark/python/pyspark/sql/dataframe.py", line 378, in show > print(self._jdf.showString(n, 20, vertical)) > File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line > 1257, in __call__ > File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco > return f(*a, **kw) > File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line > 328, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o3717.showString. > : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, > tree: > Exchange hashpartitioning(id#1726L, 200) > +- *(1) Filter (mean_udf(v#1727) > 5.0) >+- Scan ExistingRDD[id#1726L,v#1727] > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391) > at > org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:121) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.python.AggregateInPandasExec.doExecute(AggregateInPandasExec.scala:80) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247) > at > org.apache.spark.sql
[jira] [Assigned] (SPARK-35169) Wrong result of min ANSI interval division by -1
[ https://issues.apache.org/jira/browse/SPARK-35169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35169: Assignee: Apache Spark > Wrong result of min ANSI interval division by -1 > > > Key: SPARK-35169 > URL: https://issues.apache.org/jira/browse/SPARK-35169 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > The code below portraits the issue: > {code:scala} > scala> Seq(java.time.Period.ofMonths(Int.MinValue)).toDF("i").select($"i" / > -1).show(false) > +-+ > |(i / -1) | > +-+ > |INTERVAL '-178956970-8' YEAR TO MONTH| > +-+ > scala> Seq(java.time.Duration.of(Long.MinValue, > java.time.temporal.ChronoUnit.MICROS)).toDF("i").select($"i" / -1).show(false) > +---+ > |(i / -1) | > +---+ > |INTERVAL '-106751991 04:00:54.775808' DAY TO SECOND| > +---+ > {code} > The result cannot be a negative interval. Spark must throw an overflow > exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35169) Wrong result of min ANSI interval division by -1
[ https://issues.apache.org/jira/browse/SPARK-35169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35169: Assignee: (was: Apache Spark) > Wrong result of min ANSI interval division by -1 > > > Key: SPARK-35169 > URL: https://issues.apache.org/jira/browse/SPARK-35169 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > The code below portraits the issue: > {code:scala} > scala> Seq(java.time.Period.ofMonths(Int.MinValue)).toDF("i").select($"i" / > -1).show(false) > +-+ > |(i / -1) | > +-+ > |INTERVAL '-178956970-8' YEAR TO MONTH| > +-+ > scala> Seq(java.time.Duration.of(Long.MinValue, > java.time.temporal.ChronoUnit.MICROS)).toDF("i").select($"i" / -1).show(false) > +---+ > |(i / -1) | > +---+ > |INTERVAL '-106751991 04:00:54.775808' DAY TO SECOND| > +---+ > {code} > The result cannot be a negative interval. Spark must throw an overflow > exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35190) all columns are read even if column pruning applies when spark3.0 read table written by spark2.2
[ https://issues.apache.org/jira/browse/SPARK-35190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35190. -- Resolution: Duplicate > all columns are read even if column pruning applies when spark3.0 read table > written by spark2.2 > > > Key: SPARK-35190 > URL: https://issues.apache.org/jira/browse/SPARK-35190 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.0.0 > Environment: spark3.0 > set spark.sql.hive.convertMetastoreOrc=true (default value in spark3.0) > set spark.sql.orc.impl=native(default velue in spark3.0) >Reporter: xiaoli >Priority: Major > > Before I address this issue, let me talk about the issue background: The > current spark version we use is 2.2, and we plan to migrate to spark3.0 in > near future. Before migration, we test some query in both spark2.2 and > spark3.0 to check potential issue. The data source table of these query is > orc format written by spark2.2. > > I find that even if column pruning is applied, spark3.0’s native reader will > read all columns. > > Then I do remote debug. In OrcUtils.scala’s requestedColumnIds Method, it > will check whether field name is started with “_col”. In my case, field name > is started with “_col”, like “_col1”, “_col2”. So pruneCols is not done. The > code is below: > > if (orcFieldNames.forall(_.startsWith("_col"))) { > // This is a ORC file written by Hive, no field names in the physical > schema, assume the > // physical schema maps to the data scheme by index. > _assert_(orcFieldNames.length <= dataSchema.length, "The given data schema > " + > s"*$*{dataSchema.catalogString} has less fields than the actual ORC > physical schema, " + > "no idea which columns were dropped, fail to read.") > // for ORC file written by Hive, no field names > // in the physical schema, there is a need to send the > // entire dataSchema instead of required schema. > // So pruneCols is not done in this case > Some(requiredSchema.fieldNames.map { name => > val index = dataSchema.fieldIndex(name) > if (index < orcFieldNames.length) { > index > } else { > -1 > } > }, false) > > Although this code comment explains reason, I still do not understand. This > issue only happens in this case: spark3.0 uses native reader to read table > written by spark2.2. > > In other cases, there is no such issue. I do another 2 tests: > Test1: use spark3.0’s hive reader (running with > spark.sql.hive.convertMetastoreOrc=false and spark.sql.orc.impl=hive) to read > the same table, it only reads pruned columns. > Test2: use spark3.0 to write a table, then use spark3.0’s native reader to > read this new table, it only reads pruned columns. > > This issue I mentioned is a block we use native reader in spark3.0. Can > anyone know further reason or provide solutions? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35191) all columns are read even if column pruning applies when spark3.0 read table written by spark2.2
[ https://issues.apache.org/jira/browse/SPARK-35191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35191. -- Resolution: Duplicate > all columns are read even if column pruning applies when spark3.0 read table > written by spark2.2 > > > Key: SPARK-35191 > URL: https://issues.apache.org/jira/browse/SPARK-35191 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.0.0 > Environment: spark3.0 > spark.sql.hive.convertMetastoreOrc=true(default value in spark3.0) > spark.sql.orc.impl=native(default value in spark3.0) >Reporter: xiaoli >Priority: Major > > Before I address this issue, let me talk about the issue background: The > current spark version we use is 2.2, and we plan to migrate to spark3.0 in > near future. Before migration, we test some query in both spark2.2 and > spark3.0 to check potential issue. The data source table of these query is > orc format written by spark2.2. > > I find that even if column pruning is applied, spark3.0’s native reader will > read all columns. > > Then I do remote debug. In OrcUtils.scala’s requestedColumnIds Method, it > will check whether field name is started with “_col”. In my case, field name > is started with “_col”, like “_col1”, “_col2”. So pruneCols is not done. The > code is below: > > if (orcFieldNames.forall(_.startsWith("_col"))) { > // This is a ORC file written by Hive, no field names in the physical > schema, assume the > // physical schema maps to the data scheme by index. > _assert_(orcFieldNames.length <= dataSchema.length, "The given data schema > " + > s"*$*{dataSchema.catalogString} has less fields than the actual ORC > physical schema, " + > "no idea which columns were dropped, fail to read.") > // for ORC file written by Hive, no field names > // in the physical schema, there is a need to send the > // entire dataSchema instead of required schema. > // So pruneCols is not done in this case > Some(requiredSchema.fieldNames.map { name => > val index = dataSchema.fieldIndex(name) > if (index < orcFieldNames.length) { > index > } else { > -1 > } > }, false) > > Although this code comment explains reason, I still do not understand. This > issue only happens in this case: spark3.0 uses native reader to read table > written by spark2.2. > > In other cases, there is no such issue. I do another 2 tests: > Test1: use spark3.0’s hive reader (running with > spark.sql.hive.convertMetastoreOrc=false and spark.sql.orc.impl=hive) to read > the same table, it only reads pruned columns. > Test2: use spark3.0 to write a table, then use spark3.0’s native reader to > read this new table, it only reads pruned columns. > > This issue I mentioned is a block we use native reader in spark3.0. Can > anyone know further reason or provide solutions? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35196) DataFrameWriter.text support zstd compression
[ https://issues.apache.org/jira/browse/SPARK-35196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330379#comment-17330379 ] Hyukjin Kwon commented on SPARK-35196: -- It does support, and you can specify {{org.apache.hadoop.io.compress.ZStandardCodec}} for compression option. However, I agree with adding a short name for the easy use. Are you interested in adding an alias? [~dongjoon] FYI > DataFrameWriter.text support zstd compression > - > > Key: SPARK-35196 > URL: https://issues.apache.org/jira/browse/SPARK-35196 > Project: Spark > Issue Type: Task > Components: PySpark >Affects Versions: 3.1.1 >Reporter: Leonard Lausen >Priority: Major > > [http://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.text.html] > specifies that only the following compression codecs are supported: `none, > bzip2, gzip, lz4, snappy and deflate` > However, RDD API supports compression with zstd if users specify > 'org.apache.hadoop.io.compress.ZStandardCodec' compressor in the > saveAsTextFile method. > Please also expose zstd in the DataFrameWriter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35199) Tasks are failing with zstd default of spark.shuffle.mapStatus.compression.codec
[ https://issues.apache.org/jira/browse/SPARK-35199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330377#comment-17330377 ] Hyukjin Kwon commented on SPARK-35199: -- cc [~dongjoon] FYI > Tasks are failing with zstd default of > spark.shuffle.mapStatus.compression.codec > > > Key: SPARK-35199 > URL: https://issues.apache.org/jira/browse/SPARK-35199 > Project: Spark > Issue Type: Task > Components: PySpark >Affects Versions: 3.0.1 >Reporter: Leonard Lausen >Priority: Major > > In Spark 3.0.1, tasks fail with the default value of > {{spark.shuffle.mapStatus.compression.codec=zstd}}, but work without problem > when changing the value to {{spark.shuffle.mapStatus.compression.codec=lz4}}. > Exemplar backtrace: > > {code:java} > java.io.IOException: Decompression error: Version not supported at > com.github.luben.zstd.ZstdInputStream.readInternal(ZstdInputStream.java:164) > at com.github.luben.zstd.ZstdInputStream.read(ZstdInputStream.java:120) at > java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at > java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at > java.io.BufferedInputStream.read(BufferedInputStream.java:345) at > java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2781) > at > java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2797) > at > java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3274) > at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934) at > java.io.ObjectInputStream.(ObjectInputStream.java:396) at > org.apache.spark.MapOutputTracker$.deserializeObject$1(MapOutputTracker.scala:954) > at > org.apache.spark.MapOutputTracker$.deserializeMapStatuses(MapOutputTracker.scala:964) > at > org.apache.spark.MapOutputTrackerWorker.$anonfun$getStatuses$2(MapOutputTracker.scala:856) > at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64) at > org.apache.spark.MapOutputTrackerWorker.getStatuses(MapOutputTracker.scala:851) > at > org.apache.spark.MapOutputTrackerWorker.getMapSizesByExecutorId(MapOutputTracker.scala:808) > at > org.apache.spark.shuffle.sort.SortShuffleManager.getReader(SortShuffleManager.scala:128) > at > org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:185) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at > org.apache.spark.scheduler.Task.run(Task.scala:127) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} > {{}} > Exemplar code to reproduce the issue > {code:java} > import pyspark.sql.functions as F > df = spark.read.text("s3://my-bucket-with-300GB-compressed-text-files") > df_rand = df.orderBy(F.rand(1)) > df_rand.write.text('s3://shuffled-output''){code} > See > [https://stackoverflow.com/questions/64876463/spark-3-0-1-tasks-are-failing-when-using-zstd-compression-codec] > for another report of this issue and workaround. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35169) Wrong result of min ANSI interval division by -1
[ https://issues.apache.org/jira/browse/SPARK-35169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330375#comment-17330375 ] Apache Spark commented on SPARK-35169: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/32314 > Wrong result of min ANSI interval division by -1 > > > Key: SPARK-35169 > URL: https://issues.apache.org/jira/browse/SPARK-35169 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > The code below portraits the issue: > {code:scala} > scala> Seq(java.time.Period.ofMonths(Int.MinValue)).toDF("i").select($"i" / > -1).show(false) > +-+ > |(i / -1) | > +-+ > |INTERVAL '-178956970-8' YEAR TO MONTH| > +-+ > scala> Seq(java.time.Duration.of(Long.MinValue, > java.time.temporal.ChronoUnit.MICROS)).toDF("i").select($"i" / -1).show(false) > +---+ > |(i / -1) | > +---+ > |INTERVAL '-106751991 04:00:54.775808' DAY TO SECOND| > +---+ > {code} > The result cannot be a negative interval. Spark must throw an overflow > exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35169) Wrong result of min ANSI interval division by -1
[ https://issues.apache.org/jira/browse/SPARK-35169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330374#comment-17330374 ] Apache Spark commented on SPARK-35169: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/32314 > Wrong result of min ANSI interval division by -1 > > > Key: SPARK-35169 > URL: https://issues.apache.org/jira/browse/SPARK-35169 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > The code below portraits the issue: > {code:scala} > scala> Seq(java.time.Period.ofMonths(Int.MinValue)).toDF("i").select($"i" / > -1).show(false) > +-+ > |(i / -1) | > +-+ > |INTERVAL '-178956970-8' YEAR TO MONTH| > +-+ > scala> Seq(java.time.Duration.of(Long.MinValue, > java.time.temporal.ChronoUnit.MICROS)).toDF("i").select($"i" / -1).show(false) > +---+ > |(i / -1) | > +---+ > |INTERVAL '-106751991 04:00:54.775808' DAY TO SECOND| > +---+ > {code} > The result cannot be a negative interval. Spark must throw an overflow > exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35205) Simplify org.apache.hive.service.cli.OperationType.getOperationType by using a hashMap.
[ https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] akiyamaneko updated SPARK-35205: Summary: Simplify org.apache.hive.service.cli.OperationType.getOperationType by using a hashMap. (was: Simplify operationType.getOperationType by using a hashMap.) > Simplify org.apache.hive.service.cli.OperationType.getOperationType by using > a hashMap. > --- > > Key: SPARK-35205 > URL: https://issues.apache.org/jira/browse/SPARK-35205 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: akiyamaneko >Priority: Minor > > SImply the *getOperationType* method in > `org.apache.hive.service.cli.OperationType`. > Introduce a *HashMap* to cache the existed enumeration types, so as to avoid > a linear search in a for loop. > `*OperationType.getOperationType*` can be called in > OperationHandler.constructor: > > {code:java} > > public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion > protocol) { > super(tOperationHandle.getOperationId()); > this.opType = > OperationType.getOperationType(tOperationHandle.getOperationType()); > this.hasResultSet = tOperationHandle.isHasResultSet(); > this.protocol = protocol; > } > > {code} > `*OperationHandle* ` is widely used, It's better to improve the execution > efficiency of `OperationType.getOperationType` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35205) Refractor org.apache.hive.service.cli.OperationType.getOperationType by using a hashMap.
[ https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] akiyamaneko updated SPARK-35205: Summary: Refractor org.apache.hive.service.cli.OperationType.getOperationType by using a hashMap. (was: Simplify org.apache.hive.service.cli.OperationType.getOperationType by using a hashMap.) > Refractor org.apache.hive.service.cli.OperationType.getOperationType by using > a hashMap. > > > Key: SPARK-35205 > URL: https://issues.apache.org/jira/browse/SPARK-35205 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: akiyamaneko >Priority: Minor > > SImply the *getOperationType* method in > `org.apache.hive.service.cli.OperationType`. > Introduce a *HashMap* to cache the existed enumeration types, so as to avoid > a linear search in a for loop. > `*OperationType.getOperationType*` can be called in > OperationHandler.constructor: > > {code:java} > > public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion > protocol) { > super(tOperationHandle.getOperationId()); > this.opType = > OperationType.getOperationType(tOperationHandle.getOperationType()); > this.hasResultSet = tOperationHandle.isHasResultSet(); > this.protocol = protocol; > } > > {code} > `*OperationHandle* ` is widely used, It's better to improve the execution > efficiency of `OperationType.getOperationType` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.
[ https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35205: Assignee: (was: Apache Spark) > Simplify operationType.getOperationType by using a hashMap. > --- > > Key: SPARK-35205 > URL: https://issues.apache.org/jira/browse/SPARK-35205 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: akiyamaneko >Priority: Minor > > SImply the *getOperationType* method. > Introduce a *HashMap* to cache the existed enumeration types, so as to avoid > a linear search in a for loop. > `*OperationType.getOperationType*` can be called in > OperationHandler.constructor: > > {code:java} > > public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion > protocol) { > super(tOperationHandle.getOperationId()); > this.opType = > OperationType.getOperationType(tOperationHandle.getOperationType()); > this.hasResultSet = tOperationHandle.isHasResultSet(); > this.protocol = protocol; > } > > {code} > `*OperationHandle* ` is widely used, It's better to improve the execution > efficiency of `OperationType.getOperationType` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.
[ https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] akiyamaneko updated SPARK-35205: Description: SImply the *getOperationType* method in `org.apache.hive.service.cli.OperationType`. Introduce a *HashMap* to cache the existed enumeration types, so as to avoid a linear search in a for loop. `*OperationType.getOperationType*` can be called in OperationHandler.constructor: {code:java} public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion protocol) { super(tOperationHandle.getOperationId()); this.opType = OperationType.getOperationType(tOperationHandle.getOperationType()); this.hasResultSet = tOperationHandle.isHasResultSet(); this.protocol = protocol; } {code} `*OperationHandle* ` is widely used, It's better to improve the execution efficiency of `OperationType.getOperationType` was: SImply the *getOperationType* method. Introduce a *HashMap* to cache the existed enumeration types, so as to avoid a linear search in a for loop. `*OperationType.getOperationType*` can be called in OperationHandler.constructor: {code:java} public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion protocol) { super(tOperationHandle.getOperationId()); this.opType = OperationType.getOperationType(tOperationHandle.getOperationType()); this.hasResultSet = tOperationHandle.isHasResultSet(); this.protocol = protocol; } {code} `*OperationHandle* ` is widely used, It's better to improve the execution efficiency of `OperationType.getOperationType` > Simplify operationType.getOperationType by using a hashMap. > --- > > Key: SPARK-35205 > URL: https://issues.apache.org/jira/browse/SPARK-35205 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: akiyamaneko >Priority: Minor > > SImply the *getOperationType* method in > `org.apache.hive.service.cli.OperationType`. > Introduce a *HashMap* to cache the existed enumeration types, so as to avoid > a linear search in a for loop. > `*OperationType.getOperationType*` can be called in > OperationHandler.constructor: > > {code:java} > > public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion > protocol) { > super(tOperationHandle.getOperationId()); > this.opType = > OperationType.getOperationType(tOperationHandle.getOperationType()); > this.hasResultSet = tOperationHandle.isHasResultSet(); > this.protocol = protocol; > } > > {code} > `*OperationHandle* ` is widely used, It's better to improve the execution > efficiency of `OperationType.getOperationType` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.
[ https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35205: Assignee: (was: Apache Spark) > Simplify operationType.getOperationType by using a hashMap. > --- > > Key: SPARK-35205 > URL: https://issues.apache.org/jira/browse/SPARK-35205 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: akiyamaneko >Priority: Minor > > SImply the *getOperationType* method. > Introduce a *HashMap* to cache the existed enumeration types, so as to avoid > a linear search in a for loop. > `*OperationType.getOperationType*` can be called in > OperationHandler.constructor: > > {code:java} > > public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion > protocol) { > super(tOperationHandle.getOperationId()); > this.opType = > OperationType.getOperationType(tOperationHandle.getOperationType()); > this.hasResultSet = tOperationHandle.isHasResultSet(); > this.protocol = protocol; > } > > {code} > `*OperationHandle* ` is widely used, It's better to improve the execution > efficiency of `OperationType.getOperationType` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.
[ https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330371#comment-17330371 ] Apache Spark commented on SPARK-35205: -- User 'kyoty' has created a pull request for this issue: https://github.com/apache/spark/pull/32313 > Simplify operationType.getOperationType by using a hashMap. > --- > > Key: SPARK-35205 > URL: https://issues.apache.org/jira/browse/SPARK-35205 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: akiyamaneko >Priority: Minor > > SImply the *getOperationType* method. > Introduce a *HashMap* to cache the existed enumeration types, so as to avoid > a linear search in a for loop. > `*OperationType.getOperationType*` can be called in > OperationHandler.constructor: > > {code:java} > > public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion > protocol) { > super(tOperationHandle.getOperationId()); > this.opType = > OperationType.getOperationType(tOperationHandle.getOperationType()); > this.hasResultSet = tOperationHandle.isHasResultSet(); > this.protocol = protocol; > } > > {code} > `*OperationHandle* ` is widely used, It's better to improve the execution > efficiency of `OperationType.getOperationType` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.
[ https://issues.apache.org/jira/browse/SPARK-35205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35205: Assignee: Apache Spark > Simplify operationType.getOperationType by using a hashMap. > --- > > Key: SPARK-35205 > URL: https://issues.apache.org/jira/browse/SPARK-35205 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: akiyamaneko >Assignee: Apache Spark >Priority: Minor > > SImply the *getOperationType* method. > Introduce a *HashMap* to cache the existed enumeration types, so as to avoid > a linear search in a for loop. > `*OperationType.getOperationType*` can be called in > OperationHandler.constructor: > > {code:java} > > public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion > protocol) { > super(tOperationHandle.getOperationId()); > this.opType = > OperationType.getOperationType(tOperationHandle.getOperationType()); > this.hasResultSet = tOperationHandle.isHasResultSet(); > this.protocol = protocol; > } > > {code} > `*OperationHandle* ` is widely used, It's better to improve the execution > efficiency of `OperationType.getOperationType` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35205) Simplify operationType.getOperationType by using a hashMap.
akiyamaneko created SPARK-35205: --- Summary: Simplify operationType.getOperationType by using a hashMap. Key: SPARK-35205 URL: https://issues.apache.org/jira/browse/SPARK-35205 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.1 Reporter: akiyamaneko SImply the *getOperationType* method. Introduce a *HashMap* to cache the existed enumeration types, so as to avoid a linear search in a for loop. `*OperationType.getOperationType*` can be called in OperationHandler.constructor: {code:java} public OperationHandle(TOperationHandle tOperationHandle, TProtocolVersion protocol) { super(tOperationHandle.getOperationId()); this.opType = OperationType.getOperationType(tOperationHandle.getOperationType()); this.hasResultSet = tOperationHandle.isHasResultSet(); this.protocol = protocol; } {code} `*OperationHandle* ` is widely used, It's better to improve the execution efficiency of `OperationType.getOperationType` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes
[ https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35204: Assignee: Apache Spark > CatalystTypeConverters of date/timestamp should accept both the old and new > Java time classes > - > > Key: SPARK-35204 > URL: https://issues.apache.org/jira/browse/SPARK-35204 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes
[ https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35204: Assignee: (was: Apache Spark) > CatalystTypeConverters of date/timestamp should accept both the old and new > Java time classes > - > > Key: SPARK-35204 > URL: https://issues.apache.org/jira/browse/SPARK-35204 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes
[ https://issues.apache.org/jira/browse/SPARK-35204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330332#comment-17330332 ] Apache Spark commented on SPARK-35204: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/32312 > CatalystTypeConverters of date/timestamp should accept both the old and new > Java time classes > - > > Key: SPARK-35204 > URL: https://issues.apache.org/jira/browse/SPARK-35204 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-35176) Raise TypeError in inappropriate type case rather than ValueError
[ https://issues.apache.org/jira/browse/SPARK-35176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329971#comment-17329971 ] Yikun Jiang edited comment on SPARK-35176 at 4/23/21, 9:14 AM: --- I write up a POC in [https://github.com/Yikun/annotation-type-checker/pull/4] to add some simple way to do input validation (runtime type checker). was (Author: yikunkero): I write up a POC in [https://github.com/Yikun/annotation-type-checker/pull/4] to add some simple way to do runtime type checker. > Raise TypeError in inappropriate type case rather than ValueError > -- > > Key: SPARK-35176 > URL: https://issues.apache.org/jira/browse/SPARK-35176 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Yikun Jiang >Priority: Minor > > There are many wrong error type usages on ValueError type. > When an operation or function is applied to an object of inappropriate type, > we should use TypeError rather than ValueError. > such as: > [https://github.com/apache/spark/blob/355c39939d9e4c87ffc9538eb822a41cb2ff93fb/python/pyspark/sql/dataframe.py#L1137] > [https://github.com/apache/spark/blob/355c39939d9e4c87ffc9538eb822a41cb2ff93fb/python/pyspark/sql/dataframe.py#L1228] > > We should do some correction in some right time, note that if we do these > corrections, it will break some catch on original ValueError. > > [1] https://docs.python.org/3/library/exceptions.html#TypeError -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35169) Wrong result of min ANSI interval division by -1
[ https://issues.apache.org/jira/browse/SPARK-35169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330241#comment-17330241 ] angerszhu commented on SPARK-35169: --- It's guava IntMath.divude's bug.. > Wrong result of min ANSI interval division by -1 > > > Key: SPARK-35169 > URL: https://issues.apache.org/jira/browse/SPARK-35169 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > The code below portraits the issue: > {code:scala} > scala> Seq(java.time.Period.ofMonths(Int.MinValue)).toDF("i").select($"i" / > -1).show(false) > +-+ > |(i / -1) | > +-+ > |INTERVAL '-178956970-8' YEAR TO MONTH| > +-+ > scala> Seq(java.time.Duration.of(Long.MinValue, > java.time.temporal.ChronoUnit.MICROS)).toDF("i").select($"i" / -1).show(false) > +---+ > |(i / -1) | > +---+ > |INTERVAL '-106751991 04:00:54.775808' DAY TO SECOND| > +---+ > {code} > The result cannot be a negative interval. Spark must throw an overflow > exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35204) CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes
Wenchen Fan created SPARK-35204: --- Summary: CatalystTypeConverters of date/timestamp should accept both the old and new Java time classes Key: SPARK-35204 URL: https://issues.apache.org/jira/browse/SPARK-35204 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35088) Accept ANSI intervals by the Sequence expression
[ https://issues.apache.org/jira/browse/SPARK-35088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330232#comment-17330232 ] Apache Spark commented on SPARK-35088: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/32311 > Accept ANSI intervals by the Sequence expression > > > Key: SPARK-35088 > URL: https://issues.apache.org/jira/browse/SPARK-35088 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Currently, the expression accepts only CalendarIntervalType as the step > expression. It should support ANSI intervals as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35088) Accept ANSI intervals by the Sequence expression
[ https://issues.apache.org/jira/browse/SPARK-35088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35088: Assignee: Apache Spark > Accept ANSI intervals by the Sequence expression > > > Key: SPARK-35088 > URL: https://issues.apache.org/jira/browse/SPARK-35088 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Currently, the expression accepts only CalendarIntervalType as the step > expression. It should support ANSI intervals as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35088) Accept ANSI intervals by the Sequence expression
[ https://issues.apache.org/jira/browse/SPARK-35088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35088: Assignee: (was: Apache Spark) > Accept ANSI intervals by the Sequence expression > > > Key: SPARK-35088 > URL: https://issues.apache.org/jira/browse/SPARK-35088 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Currently, the expression accepts only CalendarIntervalType as the step > expression. It should support ANSI intervals as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35088) Accept ANSI intervals by the Sequence expression
[ https://issues.apache.org/jira/browse/SPARK-35088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330230#comment-17330230 ] Apache Spark commented on SPARK-35088: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/32311 > Accept ANSI intervals by the Sequence expression > > > Key: SPARK-35088 > URL: https://issues.apache.org/jira/browse/SPARK-35088 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Currently, the expression accepts only CalendarIntervalType as the step > expression. It should support ANSI intervals as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35078) Migrate to transformWithPruning or resolveWithPruning for expression rules
[ https://issues.apache.org/jira/browse/SPARK-35078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-35078: -- Assignee: Yingyi Bu > Migrate to transformWithPruning or resolveWithPruning for expression rules > -- > > Key: SPARK-35078 > URL: https://issues.apache.org/jira/browse/SPARK-35078 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 3.1.0 >Reporter: Yingyi Bu >Assignee: Yingyi Bu >Priority: Major > > E.g., rules in org/apache/spark/sql/catalyst/optimizer/expressions.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org