[jira] [Commented] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data
[ https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388797#comment-17388797 ] Arghya Saha commented on SPARK-18105: - [~dongjoon] Can we please address this before next release, this is one of the blocker for many. For example, we have actually migrated our entire ETL running thousands of job everyday on 2PB data warehouse from EMR to open source Spark on K8S and is running stable except this issue. I can provide any further info required for debugging. The only issue of the error is its not reproducible. So out of 1000 of spark job only few fails and those few are different in different days. > LZ4 failed to decompress a stream of shuffled data > -- > > Key: SPARK-18105 > URL: https://issues.apache.org/jira/browse/SPARK-18105 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Davies Liu >Priority: Major > > When lz4 is used to compress the shuffle files, it may fail to decompress it > as "stream is corrupt" > {code} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in > stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted > at > org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220) > at > org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109) > at java.io.BufferedInputStream.read(BufferedInputStream.java:353) > at java.io.DataInputStream.read(DataInputStream.java:149) > at com.google.common.io.ByteStreams.read(ByteStreams.java:828) > at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110) > at scala.collection.Iterator$$anon$13.next(Iterator.scala:372) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30) > at > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > https://github.com/jpountz/lz4-java/issues/89 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data
[ https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388794#comment-17388794 ] Arghya Saha edited comment on SPARK-18105 at 7/28/21, 1:55 PM: --- I am also facing the same error, I have raised a duplicate [JIRA|https://issues.apache.org/jira/browse/SPARK-36196], did not notice this before. Do we have any further info or workaround on it? I am running Spark 3.1.1 was (Author: arghya18): I am also facing the same error, I have raised a duplicate JIRA, did not notice this before. Do we have any further info or workaround on it? I am running Spark 3.1.1 > LZ4 failed to decompress a stream of shuffled data > -- > > Key: SPARK-18105 > URL: https://issues.apache.org/jira/browse/SPARK-18105 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Davies Liu >Priority: Major > > When lz4 is used to compress the shuffle files, it may fail to decompress it > as "stream is corrupt" > {code} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in > stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted > at > org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220) > at > org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109) > at java.io.BufferedInputStream.read(BufferedInputStream.java:353) > at java.io.DataInputStream.read(DataInputStream.java:149) > at com.google.common.io.ByteStreams.read(ByteStreams.java:828) > at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110) > at scala.collection.Iterator$$anon$13.next(Iterator.scala:372) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30) > at > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > https://github.com/jpountz/lz4-java/issues/89 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data
[ https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388794#comment-17388794 ] Arghya Saha commented on SPARK-18105: - I am also facing the same error, I have raised a duplicate JIRA, did not notice this before. Do we have any further info or workaround on it? I am running Spark 3.1.1 > LZ4 failed to decompress a stream of shuffled data > -- > > Key: SPARK-18105 > URL: https://issues.apache.org/jira/browse/SPARK-18105 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Davies Liu >Priority: Major > > When lz4 is used to compress the shuffle files, it may fail to decompress it > as "stream is corrupt" > {code} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in > stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted > at > org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220) > at > org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109) > at java.io.BufferedInputStream.read(BufferedInputStream.java:353) > at java.io.DataInputStream.read(DataInputStream.java:149) > at com.google.common.io.ByteStreams.read(ByteStreams.java:828) > at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110) > at scala.collection.Iterator$$anon$13.next(Iterator.scala:372) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30) > at > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > https://github.com/jpountz/lz4-java/issues/89 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36196) Spark FetchFailedException Stream is corrupted Error
[ https://issues.apache.org/jira/browse/SPARK-36196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arghya Saha updated SPARK-36196: Affects Version/s: (was: 3.1.2) > Spark FetchFailedException Stream is corrupted Error > > > Key: SPARK-36196 > URL: https://issues.apache.org/jira/browse/SPARK-36196 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark, Spark Core >Affects Versions: 3.1.1 > Environment: Spark on K8s >Reporter: Arghya Saha >Priority: Major > > I am running Spark on K8S. There are around thousands of jobs runs everyday > but few are getting failed everyday(not same job) and with below exception. > It succeed on retry. I have read about the error in multiple Jira and saw its > resolved with Spark 3.0.0 but I am still getting the error with higher > version. > {code:java} > org.apache.spark.shuffle.FetchFailedException: Stream is corrupted > org.apache.spark.shuffle.FetchFailedException: Stream is corrupted at > org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:770) > at > org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:845) > at java.base/java.io.BufferedInputStream.fill(Unknown Source) at > java.base/java.io.BufferedInputStream.read1(Unknown Source) at > java.base/java.io.BufferedInputStream.read(Unknown Source) at > java.base/java.io.DataInputStream.read(Unknown Source) at > org.sparkproject.guava.io.ByteStreams.read(ByteStreams.java:899) at > org.sparkproject.guava.io.ByteStreams.readFully(ByteStreams.java:733) at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:127) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:494) at > scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.sort_addToSorter_0$(Unknown > Source) at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown > Source) at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at > org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83) > at > org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedBufferedToRowWithNullFreeJoinKey(SortMergeJoinExec.scala:817) > at > org.apache.spark.sql.execution.joins.SortMergeJoinScanner.(SortMergeJoinExec.scala:687) > at > org.apache.spark.sql.execution.joins.SortMergeJoinExec.$anonfun$doExecute$1(SortMergeJoinExec.scala:197) > at > org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at > org.apache.spark.scheduler.Task.run(Task.scala:131) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) at java.base/java.lang.Thread.run(Unknown Source)Caused by: > java.io.IOException: Stream is corrupted at > net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:250) at > net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157) at > org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:841) > ... 38 moreCaused by: net.jpountz.lz4.LZ4Exception: Error decoding offset > 8785 of input buffer at >
[jira] [Updated] (SPARK-36196) Spark FetchFailedException Stream is corrupted Error
[ https://issues.apache.org/jira/browse/SPARK-36196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arghya Saha updated SPARK-36196: Component/s: PySpark Kubernetes > Spark FetchFailedException Stream is corrupted Error > > > Key: SPARK-36196 > URL: https://issues.apache.org/jira/browse/SPARK-36196 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark, Spark Core >Affects Versions: 3.1.1, 3.1.2 > Environment: Spark on K8s >Reporter: Arghya Saha >Priority: Major > > I am running Spark on K8S. There are around thousands of jobs runs everyday > but few are getting failed everyday(not same job) and with below exception. > It succeed on retry. I have read about the error in multiple Jira and saw its > resolved with Spark 3.0.0 but I am still getting the error with higher > version. > {code:java} > org.apache.spark.shuffle.FetchFailedException: Stream is corrupted > org.apache.spark.shuffle.FetchFailedException: Stream is corrupted at > org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:770) > at > org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:845) > at java.base/java.io.BufferedInputStream.fill(Unknown Source) at > java.base/java.io.BufferedInputStream.read1(Unknown Source) at > java.base/java.io.BufferedInputStream.read(Unknown Source) at > java.base/java.io.DataInputStream.read(Unknown Source) at > org.sparkproject.guava.io.ByteStreams.read(ByteStreams.java:899) at > org.sparkproject.guava.io.ByteStreams.readFully(ByteStreams.java:733) at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:127) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:494) at > scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.sort_addToSorter_0$(Unknown > Source) at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown > Source) at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at > org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83) > at > org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedBufferedToRowWithNullFreeJoinKey(SortMergeJoinExec.scala:817) > at > org.apache.spark.sql.execution.joins.SortMergeJoinScanner.(SortMergeJoinExec.scala:687) > at > org.apache.spark.sql.execution.joins.SortMergeJoinExec.$anonfun$doExecute$1(SortMergeJoinExec.scala:197) > at > org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at > org.apache.spark.scheduler.Task.run(Task.scala:131) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) at java.base/java.lang.Thread.run(Unknown Source)Caused by: > java.io.IOException: Stream is corrupted at > net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:250) at > net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157) at > org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:841) > ... 38 moreCaused by: net.jpountz.lz4.LZ4Exception: Error decoding offset > 8785 of input buffer at >
[jira] [Created] (SPARK-36196) Spark FetchFailedException Stream is corrupted Error
Arghya Saha created SPARK-36196: --- Summary: Spark FetchFailedException Stream is corrupted Error Key: SPARK-36196 URL: https://issues.apache.org/jira/browse/SPARK-36196 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.2, 3.1.1 Environment: Spark on K8s Reporter: Arghya Saha I am running Spark on K8S. There are around thousands of jobs runs everyday but few are getting failed everyday(not same job) and with below exception. It succeed on retry. I have read about the error in multiple Jira and saw its resolved with Spark 3.0.0 but I am still getting the error with higher version. {code:java} org.apache.spark.shuffle.FetchFailedException: Stream is corrupted org.apache.spark.shuffle.FetchFailedException: Stream is corrupted at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:770) at org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:845) at java.base/java.io.BufferedInputStream.fill(Unknown Source) at java.base/java.io.BufferedInputStream.read1(Unknown Source) at java.base/java.io.BufferedInputStream.read(Unknown Source) at java.base/java.io.DataInputStream.read(Unknown Source) at org.sparkproject.guava.io.ByteStreams.read(ByteStreams.java:899) at org.sparkproject.guava.io.ByteStreams.readFully(ByteStreams.java:733) at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:127) at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110) at scala.collection.Iterator$$anon$11.next(Iterator.scala:494) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.sort_addToSorter_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) at org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83) at org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedBufferedToRowWithNullFreeJoinKey(SortMergeJoinExec.scala:817) at org.apache.spark.sql.execution.joins.SortMergeJoinScanner.(SortMergeJoinExec.scala:687) at org.apache.spark.sql.execution.joins.SortMergeJoinExec.$anonfun$doExecute$1(SortMergeJoinExec.scala:197) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source)Caused by: java.io.IOException: Stream is corrupted at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:250) at net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157) at org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:841) ... 38 moreCaused by: net.jpountz.lz4.LZ4Exception: Error decoding offset 8785 of input buffer at net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:39) at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:245) ... 40 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type
[ https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364371#comment-17364371 ] Arghya Saha commented on SPARK-35700: - Any update on this please? This is very common use case. > spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with > varchar data type > --- > > Key: SPARK-35700 > URL: https://issues.apache.org/jira/browse/SPARK-35700 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark, Spark Core >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 on K8S >Reporter: Arghya Saha >Priority: Major > > We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on > varchar column is failing which is unexpected and works on Spark 3.0.0. We > are trying to run it on Spark 3.1.1 (MR 3.2) on K8s > Below is my use case: > Tables are external hive table and files are stored as ORC. We do have > varchar column and when we are trying to perform join on varchar column we > are getting the exception. > As I understand Spark 3.1.1 have introduced varchar data type but seems its > not well tested with ORC and does not have backward compatibility. I have > even tried with below config without luck > *spark.sql.legacy.charVarcharAsString: "true"* > We are not getting the error when *spark.sql.orc.filterPushdown=false* > Below is the code: Here col1 is of type varchar(32) in hive > {code:java} > df = spark.sql("select col1, col2 from table1 a inner join table2 on b > (a.col1=b.col1 and a.col2 > b.col2 )") > df.write.format("orc").option("compression", > "zlib").mode("Append").save("") > {code} > Below is the error: > > {code:java} > Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most > recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor > 5): java.lang.UnsupportedOperationException: DataType: varchar(32) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135) > at > scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) > at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) > at scala.collection.immutable.List.flatMap(List.scala:355) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177)
[jira] [Updated] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type
[ https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arghya Saha updated SPARK-35700: Description: We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on varchar column is failing which is unexpected and works on Spark 3.0.0. We are trying to run it on Spark 3.1.1 (MR 3.2) on K8s Below is my use case: Tables are external hive table and files are stored as ORC. We do have varchar column and when we are trying to perform join on varchar column we are getting the exception. As I understand Spark 3.1.1 have introduced varchar data type but seems its not well tested with ORC and does not have backward compatibility. I have even tried with below config without luck *spark.sql.legacy.charVarcharAsString: "true"* We are not getting the error when *spark.sql.orc.filterPushdown=false* Below is the code: Here col1 is of type varchar(32) in hive {code:java} df = spark.sql("select col1, col2 from table1 a inner join table2 on b (a.col1=b.col1 and a.col2 > b.col2 )") df.write.format("orc").option("compression", "zlib").mode("Append").save("") {code} Below is the error: {code:java} Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor 5): java.lang.UnsupportedOperationException: DataType: varchar(32) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188) at scala.Option.foreach(Option.scala:407) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at
[jira] [Updated] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type
[ https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arghya Saha updated SPARK-35700: Description: We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on varchar column is failing which is unexpected and works on Spark 3.0.0. We are to run it on Spark 3.1.1 (MR 3.2) on K8s Below is my use case: Tables are external hive table and files are stored as ORC. We do have varchar column and when we are trying to perform join on varchar column we are getting the exception. As I understand Spark 3.1.1 have introduced varchar data type but seems its not well tested with ORC and does not have backward compatibility. I have even tried with below config without luck *spark.sql.legacy.charVarcharAsString: "true"* We are not getting the error when *spark.sql.orc.filterPushdown=false* Below is the code: Here col1 is of type varchar(32) in hive {code:java} df = spark.sql("select col1, col2 from table1 a inner join table2 on b (a.col1=b.col1 and a.col2 > b.col2 )") df.write.format("orc").option("compression", "zlib").mode("Append").save("") {code} Below is the error: {code:java} Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor 5): java.lang.UnsupportedOperationException: DataType: varchar(32) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188) at scala.Option.foreach(Option.scala:407) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at
[jira] [Updated] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type
[ https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arghya Saha updated SPARK-35700: Description: We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on varchar column is failing which is unexpected and works on Spark 3.0.0. We are trying to run it on Spark 3.1.1 (MR 3.2) on K8s Below is my use case: Tables are external hive table and files are stored as ORC. We do have varchar column and when we are trying to perform join on varchar column we are getting the exception. As I understand Spark 3.1.1 have introduced varchar data type but seems its not well tested with ORC and does not have backward compatibility. I have even tried with below config without luck *spark.sql.legacy.charVarcharAsString: "true"* We are not getting the error when *spark.sql.orc.filterPushdown=false* Below is the code: Here col1 is of type varchar(32) in hive {code:java} df = spark.sql("select col1, col2 from table1 a inner join table2 on b (a.col1=b.col1 and a.col2 > b.col2 )") df.write.format("orc").option("compression", "zlib").mode("Append").save("") {code} Below is the error: {code:java} Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor 5): java.lang.UnsupportedOperationException: DataType: varchar(32) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188) at scala.Option.foreach(Option.scala:407) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at
[jira] [Updated] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type
[ https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arghya Saha updated SPARK-35700: Description: We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on varchar column is failing which is unexpected and works on Spark 3.0.0. We are to run it on Spark 3.1.1 (MR 3.2) on K8s Below is my use case: Tables are external hive table and files are stored as ORC. We do have varchar column and when we are trying to perform join on varchar column we are getting the exception. As I understand Spark 3.1.1 have introduced varchar data type but seems its not well tested with ORC and does not have backward compatibility. I have even tried with below config without luck *spark.sql.legacy.charVarcharAsString: "true"* We are not getting the error when *spark.sql.orc.filterPushdown=false* Below is the code: Here col1 is of type varchar(32) in hive {code:java} df = spark.sql("select col1, col2 from table1 a inner join table2 on b (a.col1=b.col1 and a.col2 > b.col2 )") df.write.format("orc").option("compression", "zlib").mode("Append").save("") {code} Below is the error: {code:java} Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor 5): java.lang.UnsupportedOperationException: DataType: varchar(32) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188) at scala.Option.foreach(Option.scala:407) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at
[jira] [Created] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar(12) data type
Arghya Saha created SPARK-35700: --- Summary: spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar(12) data type Key: SPARK-35700 URL: https://issues.apache.org/jira/browse/SPARK-35700 Project: Spark Issue Type: Bug Components: Kubernetes, PySpark, Spark Core Affects Versions: 3.1.1 Environment: **Spark 3.1.1 on K8S** Reporter: Arghya Saha We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on varchar column is failing which is unexpected and works on Spark 3.0.0. We are running Spark 3.1.1 (MR 3.2) on K8s Below is my use case: Tables are external hive table and files are stored as ORC. We do have varchar column and when we are trying to perform join on varchar column we are getting the exception. As I understand Spark 3.1.1 have introduced varchar data type but seems its not well tested with ORC and does not have backward compatibility. I have even tried with below config without luck *spark.sql.legacy.charVarcharAsString: "true"* We are not getting the error when *spark.sql.orc.filterPushdown=false* Below is the code: Here col1 is of type varchar(32) in hive {code:java} df = spark.sql("select col1, col2 from table1 a inner join table2 on b (a.col1=b.col1 and a.col2 > b.col2 )") df.write.format("orc").option("compression", "zlib").mode("Append").save("") {code} Below is the error: {code:java} Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor 5): java.lang.UnsupportedOperationException: DataType: varchar(32) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134) at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188) at scala.Option.foreach(Option.scala:407) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
[jira] [Updated] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type
[ https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arghya Saha updated SPARK-35700: Environment: Spark 3.1.1 on K8S (was: **Spark 3.1.1 on K8S**) > spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with > varchar data type > --- > > Key: SPARK-35700 > URL: https://issues.apache.org/jira/browse/SPARK-35700 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark, Spark Core >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 on K8S >Reporter: Arghya Saha >Priority: Major > > We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on > varchar column is failing which is unexpected and works on Spark 3.0.0. We > are running Spark 3.1.1 (MR 3.2) on K8s > Below is my use case: > Tables are external hive table and files are stored as ORC. We do have > varchar column and when we are trying to perform join on varchar column we > are getting the exception. > As I understand Spark 3.1.1 have introduced varchar data type but seems its > not well tested with ORC and does not have backward compatibility. I have > even tried with below config without luck > *spark.sql.legacy.charVarcharAsString: "true"* > We are not getting the error when *spark.sql.orc.filterPushdown=false* > Below is the code: Here col1 is of type varchar(32) in hive > {code:java} > df = spark.sql("select col1, col2 from table1 a inner join table2 on b > (a.col1=b.col1 and a.col2 > b.col2 )") > df.write.format("orc").option("compression", > "zlib").mode("Append").save("") > {code} > Below is the error: > > {code:java} > Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most > recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor > 5): java.lang.UnsupportedOperationException: DataType: varchar(32) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135) > at > scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) > at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) > at scala.collection.immutable.List.flatMap(List.scala:355) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177) > at >
[jira] [Updated] (SPARK-35700) spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type
[ https://issues.apache.org/jira/browse/SPARK-35700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arghya Saha updated SPARK-35700: Summary: spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar data type (was: spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with varchar(12) data type) > spark.sql.orc.filterPushdown not working with Spark 3.1.1 for tables with > varchar data type > --- > > Key: SPARK-35700 > URL: https://issues.apache.org/jira/browse/SPARK-35700 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark, Spark Core >Affects Versions: 3.1.1 > Environment: **Spark 3.1.1 on K8S** >Reporter: Arghya Saha >Priority: Major > > We are not able to upgrade to Spark 3.1.1 from Spark 2.4.x as the join on > varchar column is failing which is unexpected and works on Spark 3.0.0. We > are running Spark 3.1.1 (MR 3.2) on K8s > Below is my use case: > Tables are external hive table and files are stored as ORC. We do have > varchar column and when we are trying to perform join on varchar column we > are getting the exception. > As I understand Spark 3.1.1 have introduced varchar data type but seems its > not well tested with ORC and does not have backward compatibility. I have > even tried with below config without luck > *spark.sql.legacy.charVarcharAsString: "true"* > We are not getting the error when *spark.sql.orc.filterPushdown=false* > Below is the code: Here col1 is of type varchar(32) in hive > {code:java} > df = spark.sql("select col1, col2 from table1 a inner join table2 on b > (a.col1=b.col1 and a.col2 > b.col2 )") > df.write.format("orc").option("compression", > "zlib").mode("Append").save("") > {code} > Below is the error: > > {code:java} > Job aborted due to stage failure: Task 43 in stage 5.0 failed 4 times, most > recent failure: Lost task 43.3 in stage 5.0 (TID 524) (10.219.36.64 executor > 5): java.lang.UnsupportedOperationException: DataType: varchar(32) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getPredicateLeafType(OrcFilters.scala:150) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.getType$1(OrcFilters.scala:222) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:266) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:132) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:135) > at > scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) > at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) > at scala.collection.immutable.List.flatMap(List.scala:355) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:134) > at > org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:73) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.columnartorow_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at
[jira] [Comment Edited] (SPARK-34645) [K8S] Driver pod stuck in Running state after job completes
[ https://issues.apache.org/jira/browse/SPARK-34645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345731#comment-17345731 ] Arghya Saha edited comment on SPARK-34645 at 5/16/21, 3:08 PM: --- I am facing the same issue Spark 3.1.1 + JDK 11. In which version the fix is expected? was (Author: arghya18): I am facing the same issue Spark 3.1.1 + JDK 11. In which the fix is expected? > [K8S] Driver pod stuck in Running state after job completes > --- > > Key: SPARK-34645 > URL: https://issues.apache.org/jira/browse/SPARK-34645 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.2 > Environment: Kubernetes: > {code:java} > Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", > GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", > BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", > Platform:"linux/amd64"} > Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", > GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", > BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", > Platform:"linux/amd64"} > {code} >Reporter: Andy Grove >Priority: Major > > I am running automated benchmarks in k8s, using spark-submit in cluster mode, > so the driver runs in a pod. > When running with Spark 3.0.1 and 3.1.1 everything works as expected and I > see the Spark context being shut down after the job completes. > However, when running with Spark 3.0.2 I do not see the context get shut down > and the driver pod is stuck in the Running state indefinitely. > This is the output I see after job completion with 3.0.1 and 3.1.1 and this > output does not appear with 3.0.2. With 3.0.2 there is no output at all after > the job completes. > {code:java} > 2021-03-05 20:09:24,576 INFO spark.SparkContext: Invoking stop() from > shutdown hook > 2021-03-05 20:09:24,592 INFO server.AbstractConnector: Stopped > Spark@784499d0{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} > 2021-03-05 20:09:24,594 INFO ui.SparkUI: Stopped Spark web UI at > http://benchmark-runner-3e8a38780400e0d1-driver-svc.default.svc:4040 > 2021-03-05 20:09:24,599 INFO k8s.KubernetesClusterSchedulerBackend: Shutting > down all executors > 2021-03-05 20:09:24,600 INFO > k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each > executor to shut down > 2021-03-05 20:09:24,609 WARN k8s.ExecutorPodsWatchSnapshotSource: Kubernetes > client has been closed (this is expected if the application is shutting down.) > 2021-03-05 20:09:24,719 INFO spark.MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 2021-03-05 20:09:24,736 INFO memory.MemoryStore: MemoryStore cleared > 2021-03-05 20:09:24,738 INFO storage.BlockManager: BlockManager stopped > 2021-03-05 20:09:24,744 INFO storage.BlockManagerMaster: BlockManagerMaster > stopped > 2021-03-05 20:09:24,752 INFO > scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 2021-03-05 20:09:24,768 INFO spark.SparkContext: Successfully stopped > SparkContext > 2021-03-05 20:09:24,768 INFO util.ShutdownHookManager: Shutdown hook called > 2021-03-05 20:09:24,769 INFO util.ShutdownHookManager: Deleting directory > /var/data/spark-67fa44df-e86c-463a-a149-25d95817ff8e/spark-a5476c14-c103-4108-b733-961400485d8a > 2021-03-05 20:09:24,772 INFO util.ShutdownHookManager: Deleting directory > /tmp/spark-9d6261f5-4394-472b-9c9a-e22bde877814 > 2021-03-05 20:09:24,778 INFO impl.MetricsSystemImpl: Stopping s3a-file-system > metrics system... > 2021-03-05 20:09:24,779 INFO impl.MetricsSystemImpl: s3a-file-system metrics > system stopped. > 2021-03-05 20:09:24,779 INFO impl.MetricsSystemImpl: s3a-file-system metrics > system shutdown complete. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34645) [K8S] Driver pod stuck in Running state after job completes
[ https://issues.apache.org/jira/browse/SPARK-34645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345731#comment-17345731 ] Arghya Saha commented on SPARK-34645: - I am facing the same issue Spark 3.1.1 + JDK 11. In which the fix is expected? > [K8S] Driver pod stuck in Running state after job completes > --- > > Key: SPARK-34645 > URL: https://issues.apache.org/jira/browse/SPARK-34645 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.2 > Environment: Kubernetes: > {code:java} > Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", > GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", > BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", > Platform:"linux/amd64"} > Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", > GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", > BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", > Platform:"linux/amd64"} > {code} >Reporter: Andy Grove >Priority: Major > > I am running automated benchmarks in k8s, using spark-submit in cluster mode, > so the driver runs in a pod. > When running with Spark 3.0.1 and 3.1.1 everything works as expected and I > see the Spark context being shut down after the job completes. > However, when running with Spark 3.0.2 I do not see the context get shut down > and the driver pod is stuck in the Running state indefinitely. > This is the output I see after job completion with 3.0.1 and 3.1.1 and this > output does not appear with 3.0.2. With 3.0.2 there is no output at all after > the job completes. > {code:java} > 2021-03-05 20:09:24,576 INFO spark.SparkContext: Invoking stop() from > shutdown hook > 2021-03-05 20:09:24,592 INFO server.AbstractConnector: Stopped > Spark@784499d0{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} > 2021-03-05 20:09:24,594 INFO ui.SparkUI: Stopped Spark web UI at > http://benchmark-runner-3e8a38780400e0d1-driver-svc.default.svc:4040 > 2021-03-05 20:09:24,599 INFO k8s.KubernetesClusterSchedulerBackend: Shutting > down all executors > 2021-03-05 20:09:24,600 INFO > k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each > executor to shut down > 2021-03-05 20:09:24,609 WARN k8s.ExecutorPodsWatchSnapshotSource: Kubernetes > client has been closed (this is expected if the application is shutting down.) > 2021-03-05 20:09:24,719 INFO spark.MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 2021-03-05 20:09:24,736 INFO memory.MemoryStore: MemoryStore cleared > 2021-03-05 20:09:24,738 INFO storage.BlockManager: BlockManager stopped > 2021-03-05 20:09:24,744 INFO storage.BlockManagerMaster: BlockManagerMaster > stopped > 2021-03-05 20:09:24,752 INFO > scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 2021-03-05 20:09:24,768 INFO spark.SparkContext: Successfully stopped > SparkContext > 2021-03-05 20:09:24,768 INFO util.ShutdownHookManager: Shutdown hook called > 2021-03-05 20:09:24,769 INFO util.ShutdownHookManager: Deleting directory > /var/data/spark-67fa44df-e86c-463a-a149-25d95817ff8e/spark-a5476c14-c103-4108-b733-961400485d8a > 2021-03-05 20:09:24,772 INFO util.ShutdownHookManager: Deleting directory > /tmp/spark-9d6261f5-4394-472b-9c9a-e22bde877814 > 2021-03-05 20:09:24,778 INFO impl.MetricsSystemImpl: Stopping s3a-file-system > metrics system... > 2021-03-05 20:09:24,779 INFO impl.MetricsSystemImpl: s3a-file-system metrics > system stopped. > 2021-03-05 20:09:24,779 INFO impl.MetricsSystemImpl: s3a-file-system metrics > system shutdown complete. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org