[jira] [Commented] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171724#comment-17171724 ] Udit Mehrotra commented on SPARK-29767: --- The issue has been open for quite sometime. Can someone please take a look at this ? > Core dump happening on executors while doing simple union of Data Frames > > > Key: SPARK-29767 > URL: https://issues.apache.org/jira/browse/SPARK-29767 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 2.4.4 > Environment: AWS EMR 5.27.0, Spark 2.4.4 >Reporter: Udit Mehrotra >Priority: Major > Attachments: coredump.zip, hs_err_pid13885.log, > part-0-0189b5c2-7f7b-4d0e-bdb8-506380253597-c000.snappy.parquet > > > Running a union operation on two DataFrames through both Scala Spark Shell > and PySpark, resulting in executor contains doing a *core dump* and existing > with Exit code 134. > The trace from the *Driver*: > {noformat} > Container exited with a non-zero exit code 134 > . > 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; > aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 > (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure > (executor 11 exited caused by one of the running tasks) Reason: Container > from a bad node: container_1572981097605_0021_01_77 on host: > ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from > container-launch. > Container id: container_1572981097605_0021_01_77 > Exit code: 134 > Exception message: /bin/bash: line 1: 12611 Aborted > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout > 2> > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack > trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted > > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout > 2> > /var/log/hadoop-yarn/co
[jira] [Commented] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062898#comment-17062898 ] Udit Mehrotra commented on SPARK-29767: --- [~hyukjin.kwon] Can you take a look at it ? There has been no activity on this for months now. I have provided the executor dump. Please let me know if there is any more information I can provide to help drive this. > Core dump happening on executors while doing simple union of Data Frames > > > Key: SPARK-29767 > URL: https://issues.apache.org/jira/browse/SPARK-29767 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 2.4.4 > Environment: AWS EMR 5.27.0, Spark 2.4.4 >Reporter: Udit Mehrotra >Priority: Major > Attachments: coredump.zip, hs_err_pid13885.log, > part-0-0189b5c2-7f7b-4d0e-bdb8-506380253597-c000.snappy.parquet > > > Running a union operation on two DataFrames through both Scala Spark Shell > and PySpark, resulting in executor contains doing a *core dump* and existing > with Exit code 134. > The trace from the *Driver*: > {noformat} > Container exited with a non-zero exit code 134 > . > 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; > aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 > (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure > (executor 11 exited caused by one of the running tasks) Reason: Container > from a bad node: container_1572981097605_0021_01_77 on host: > ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from > container-launch. > Container id: container_1572981097605_0021_01_77 > Exit code: 134 > Exception message: /bin/bash: line 1: 12611 Aborted > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout > 2> > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack > trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted > > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/
[jira] [Comment Edited] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968895#comment-16968895 ] Udit Mehrotra edited comment on SPARK-29767 at 11/7/19 3:41 AM: [~hyukjin.kwon] was finally able to get the core dump of crashing executors. Attached *hs_err_pid13885.log* the error report written along with core dump. In that I notice the following trace: {noformat} RAX= [error occurred during error reporting (printing register info), id 0xb]Stack: [0x7fbe8850f000,0x7fbe8861], sp=0x7fbe8860dad0, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xa9ae92] J 4331 sun.misc.Unsafe.getLong(Ljava/lang/Object;J)J (0 bytes) @ 0x7fbea94ffabe [0x7fbea94ffa00+0xbe] j org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J+5 j org.apache.spark.unsafe.bitset.BitSetMethods.isSet(Ljava/lang/Object;JI)Z+66 j org.apache.spark.sql.catalyst.expressions.UnsafeRow.isNullAt(I)Z+14 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.fieldToString_0_2$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/expressions/codegen/UTF8StringBuilder;)V+160 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_1$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V+76 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;+25 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.collection.Iterator$$anon$11.next()Ljava/lang/Object;+13 j scala.collection.Iterator$$anon$10.next()Ljava/lang/Object;+22 j org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(Lscala/collection/Iterator;)Lscala/collection/Iterator;+78 j org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(Lorg/apache/spark/TaskContext;ILscala/collection/Iterator;)Lscala/collection/Iterator;+8 j org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+13 j org.apache.spark.rdd.MapPartitionsRDD.compute(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+27 j org.apache.spark.rdd.RDD.computeOrReadCheckpoint(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+26 j org.apache.spark.rdd.RDD.iterator(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+33 j org.apache.spark.rdd.MapPartitionsRDD.compute(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+24 j org.apache.spark.rdd.RDD.computeOrReadCheckpoint(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+26 j org.apache.spark.rdd.RDD.iterator(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+33 j org.apache.spark.scheduler.ResultTask.runTask(Lorg/apache/spark/TaskContext;)Ljava/lang/Object;+187 j org.apache.spark.scheduler.Task.run(JILorg/apache/spark/metrics/MetricsSystem;)Ljava/lang/Object;+210 j org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply()Ljava/lang/Object;+37 j org.apache.spark.util.Utils$.tryWithSafeFinally(Lscala/Function0;Lscala/Function0;)Ljava/lang/Object;+3 j org.apache.spark.executor.Executor$TaskRunner.run()V+383 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub V [libjvm.so+0x680c5e] V [libjvm.so+0x67e024] V [libjvm.so+0x67e639] V [libjvm.so+0x6c3d41] V [libjvm.so+0xa77c22] V [libjvm.so+0x8c3b12] C [libpthread.so.0+0x7de5] start_thread+0xc5{noformat} Also attached the core dump file *coredump.zip* was (Author: uditme): [~hyukjin.kwon] was finally able to get the core dump of crashing executors. Attached *hs_err_pid13885.log* the error report written along with core dump. In that I notice the following trace: {noformat} RAX= [error occurred during error reporting (printing register info), id 0xb]Stack: [0x7fbe8850f000,0x7fbe8861], sp=0x7fbe8860dad0, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xa9ae92] J 4331 sun.misc.Unsafe.getLong(Ljava/lang/Object;J)J
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Attachment: coredump.zip > Core dump happening on executors while doing simple union of Data Frames > > > Key: SPARK-29767 > URL: https://issues.apache.org/jira/browse/SPARK-29767 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 2.4.4 > Environment: AWS EMR 5.27.0, Spark 2.4.4 >Reporter: Udit Mehrotra >Priority: Major > Attachments: coredump.zip, hs_err_pid13885.log, > part-0-0189b5c2-7f7b-4d0e-bdb8-506380253597-c000.snappy.parquet > > > Running a union operation on two DataFrames through both Scala Spark Shell > and PySpark, resulting in executor contains doing a *core dump* and existing > with Exit code 134. > The trace from the *Driver*: > {noformat} > Container exited with a non-zero exit code 134 > . > 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; > aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 > (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure > (executor 11 exited caused by one of the running tasks) Reason: Container > from a bad node: container_1572981097605_0021_01_77 on host: > ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from > container-launch. > Container id: container_1572981097605_0021_01_77 > Exit code: 134 > Exception message: /bin/bash: line 1: 12611 Aborted > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout > 2> > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack > trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted > > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout > 2> > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr > at org.apache.hadoop.ut
[jira] [Commented] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968895#comment-16968895 ] Udit Mehrotra commented on SPARK-29767: --- [~hyukjin.kwon] was finally able to get the core dump of crashing executors. Attached *hs_err_pid13885.log* the error report written along with core dump. In that I notice the following trace: {noformat} RAX= [error occurred during error reporting (printing register info), id 0xb]Stack: [0x7fbe8850f000,0x7fbe8861], sp=0x7fbe8860dad0, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xa9ae92] J 4331 sun.misc.Unsafe.getLong(Ljava/lang/Object;J)J (0 bytes) @ 0x7fbea94ffabe [0x7fbea94ffa00+0xbe] j org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J+5 j org.apache.spark.unsafe.bitset.BitSetMethods.isSet(Ljava/lang/Object;JI)Z+66 j org.apache.spark.sql.catalyst.expressions.UnsafeRow.isNullAt(I)Z+14 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.fieldToString_0_2$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/expressions/codegen/UTF8StringBuilder;)V+160 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_1$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V+76 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;+25 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.collection.Iterator$$anon$11.next()Ljava/lang/Object;+13 j scala.collection.Iterator$$anon$10.next()Ljava/lang/Object;+22 j org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(Lscala/collection/Iterator;)Lscala/collection/Iterator;+78 j org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(Lorg/apache/spark/TaskContext;ILscala/collection/Iterator;)Lscala/collection/Iterator;+8 j org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+13 j org.apache.spark.rdd.MapPartitionsRDD.compute(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+27 j org.apache.spark.rdd.RDD.computeOrReadCheckpoint(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+26 j org.apache.spark.rdd.RDD.iterator(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+33 j org.apache.spark.rdd.MapPartitionsRDD.compute(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+24 j org.apache.spark.rdd.RDD.computeOrReadCheckpoint(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+26 j org.apache.spark.rdd.RDD.iterator(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator;+33 j org.apache.spark.scheduler.ResultTask.runTask(Lorg/apache/spark/TaskContext;)Ljava/lang/Object;+187 j org.apache.spark.scheduler.Task.run(JILorg/apache/spark/metrics/MetricsSystem;)Ljava/lang/Object;+210 j org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply()Ljava/lang/Object;+37 j org.apache.spark.util.Utils$.tryWithSafeFinally(Lscala/Function0;Lscala/Function0;)Ljava/lang/Object;+3 j org.apache.spark.executor.Executor$TaskRunner.run()V+383 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub V [libjvm.so+0x680c5e] V [libjvm.so+0x67e024] V [libjvm.so+0x67e639] V [libjvm.so+0x6c3d41] V [libjvm.so+0xa77c22] V [libjvm.so+0x8c3b12] C [libpthread.so.0+0x7de5] start_thread+0xc5{noformat} > Core dump happening on executors while doing simple union of Data Frames > > > Key: SPARK-29767 > URL: https://issues.apache.org/jira/browse/SPARK-29767 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 2.4.4 > Environment: AWS EMR 5.27.0, Spark 2.4.4 >Reporter: Udit Mehrotra >Priority: Major > Attachments: hs_err_pid13885.log, > part-0-0189b5c2-7f7b-4d0e-bdb8-506380253597-c000.snappy.parquet > > > Running a union operation o
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Attachment: hs_err_pid13885.log > Core dump happening on executors while doing simple union of Data Frames > > > Key: SPARK-29767 > URL: https://issues.apache.org/jira/browse/SPARK-29767 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 2.4.4 > Environment: AWS EMR 5.27.0, Spark 2.4.4 >Reporter: Udit Mehrotra >Priority: Major > Attachments: hs_err_pid13885.log, > part-0-0189b5c2-7f7b-4d0e-bdb8-506380253597-c000.snappy.parquet > > > Running a union operation on two DataFrames through both Scala Spark Shell > and PySpark, resulting in executor contains doing a *core dump* and existing > with Exit code 134. > The trace from the *Driver*: > {noformat} > Container exited with a non-zero exit code 134 > . > 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; > aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 > (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure > (executor 11 exited caused by one of the running tasks) Reason: Container > from a bad node: container_1572981097605_0021_01_77 on host: > ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from > container-launch. > Container id: container_1572981097605_0021_01_77 > Exit code: 134 > Exception message: /bin/bash: line 1: 12611 Aborted > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout > 2> > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack > trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted > > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout > 2> > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr > at org.apache.hadoop.util.Shel
[jira] [Commented] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968757#comment-16968757 ] Udit Mehrotra commented on SPARK-29767: --- [~hyukjin.kwon] As I have mentioned in the description and you an see from the *stdout* logs it fails to write the *core dump*. Any idea how I can get around it ? > Core dump happening on executors while doing simple union of Data Frames > > > Key: SPARK-29767 > URL: https://issues.apache.org/jira/browse/SPARK-29767 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 2.4.4 > Environment: AWS EMR 5.27.0, Spark 2.4.4 >Reporter: Udit Mehrotra >Priority: Major > Attachments: > part-0-0189b5c2-7f7b-4d0e-bdb8-506380253597-c000.snappy.parquet > > > Running a union operation on two DataFrames through both Scala Spark Shell > and PySpark, resulting in executor contains doing a *core dump* and existing > with Exit code 134. > The trace from the *Driver*: > {noformat} > Container exited with a non-zero exit code 134 > . > 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; > aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 > (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure > (executor 11 exited caused by one of the running tasks) Reason: Container > from a bad node: container_1572981097605_0021_01_77 on host: > ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from > container-launch. > Container id: container_1572981097605_0021_01_77 > Exit code: 134 > Exception message: /bin/bash: line 1: 12611 Aborted > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout > 2> > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack > trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted > > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_00
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Attachment: part-0-0189b5c2-7f7b-4d0e-bdb8-506380253597-c000.snappy.parquet > Core dump happening on executors while doing simple union of Data Frames > > > Key: SPARK-29767 > URL: https://issues.apache.org/jira/browse/SPARK-29767 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 2.4.4 > Environment: AWS EMR 5.27.0, Spark 2.4.4 >Reporter: Udit Mehrotra >Priority: Major > Attachments: > part-0-0189b5c2-7f7b-4d0e-bdb8-506380253597-c000.snappy.parquet > > > Running a union operation on two DataFrames through both Scala Spark Shell > and PySpark, resulting in executor contains doing a *core dump* and existing > with Exit code 134. > The trace from the *Driver*: > {noformat} > Container exited with a non-zero exit code 134 > . > 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; > aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 > (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure > (executor 11 exited caused by one of the running tasks) Reason: Container > from a bad node: container_1572981097605_0021_01_77 on host: > ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from > container-launch. > Container id: container_1572981097605_0021_01_77 > Exit code: 134 > Exception message: /bin/bash: line 1: 12611 Aborted > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout > 2> > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack > trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted > > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout > 2> > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr > at
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Updated] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-29767: -- Description: Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.T
[jira] [Created] (SPARK-29767) Core dump happening on executors while doing simple union of Data Frames
Udit Mehrotra created SPARK-29767: - Summary: Core dump happening on executors while doing simple union of Data Frames Key: SPARK-29767 URL: https://issues.apache.org/jira/browse/SPARK-29767 Project: Spark Issue Type: Bug Components: PySpark, Spark Core Affects Versions: 2.4.4 Environment: AWS EMR 5.27.0, Spark 2.4.4 Reporter: Udit Mehrotra Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a *core dump* and existing with Exit code 134. The trace from the *Driver*: {noformat} Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_77 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_77 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_77/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_77/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodem
[jira] [Updated] (SPARK-21494) Spark 2.2.0 AES encryption not working with External shuffle
[ https://issues.apache.org/jira/browse/SPARK-21494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-21494: -- Attachment: logs.zip > Spark 2.2.0 AES encryption not working with External shuffle > > > Key: SPARK-21494 > URL: https://issues.apache.org/jira/browse/SPARK-21494 > Project: Spark > Issue Type: Bug > Components: Block Manager, Shuffle >Affects Versions: 2.2.0 > Environment: AWS EMR >Reporter: Udit Mehrotra > Attachments: logs.zip > > > Spark’s new AES based authentication mechanism does not seem to work when > configured with external shuffle service on YARN. > Here is the stack trace for the error we see in the driver logs: > ERROR YarnScheduler: Lost executor 40 on ip-10-167-104-125.ec2.internal: > Unable to create executor due to Unable to register with external shuffle > server due to: java.lang.IllegalArgumentException: Authentication failed. > at > org.apache.spark.network.crypto.AuthRpcHandler.receive(AuthRpcHandler.java:125) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:157) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118) > at > org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) > at > org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) > at > org.spark_project.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) > at > org.spark_project.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) > at > org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) > at > org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) > at > org.spark_project.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) > at > org.spark_project.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) > > Here are the settings we are configuring in ‘spark-defaults’ and ‘yarn-site’: > spark.network.crypto.enabled true > spark.network.crypto.saslFallback false > spark.authenticate true > > Turning on DEBUG logs for class ‘org.apache.spark.network.crypto’ on both > Spark and YARN side is not giving much information either about why > authentication fails. The driver and node manager logs have been attached to > the JIRA. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21494) Spark 2.2.0 AES encryption not working with External shuffle
[ https://issues.apache.org/jira/browse/SPARK-21494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-21494: -- Description: Spark’s new AES based authentication mechanism does not seem to work when configured with external shuffle service on YARN. Here is the stack trace for the error we see in the driver logs: ERROR YarnScheduler: Lost executor 40 on ip-10-167-104-125.ec2.internal: Unable to create executor due to Unable to register with external shuffle server due to: java.lang.IllegalArgumentException: Authentication failed. at org.apache.spark.network.crypto.AuthRpcHandler.receive(AuthRpcHandler.java:125) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:157) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) at org.spark_project.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) at org.spark_project.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) Here are the settings we are configuring in ‘spark-defaults’ and ‘yarn-site’: spark.network.crypto.enabled true spark.network.crypto.saslFallback false spark.authenticate true Turning on DEBUG logs for class ‘org.apache.spark.network.crypto’ on both Spark and YARN side is not giving much information either about why authentication fails. The driver and node manager logs have been attached to the JIRA. was: Spark’s new AES based authentication mechanism does not seem to work when configured with external shuffle service on YARN. Here is the stack trace for the error we see in the driver logs: ERROR YarnScheduler: Lost executor 40 on ip-10-167-104-125.ec2.internal: Unable to create executor due to Unable to register with external shuffle server due to: java.lang.IllegalArgumentException: Authentication failed. at org.apache.spark.network.crypto.AuthRpcHandler.receive(AuthRpcHandler.java:125) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:157) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) at org.spark_project.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) at org.spark_project.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) Here are the settings we are configuring in ‘spark-defaults’ and ‘yarn-site’: spark.network.crypto.enabled true spark.network.crypto.saslFallback false spark.authenticate true Turning on DEBUG logs for class ‘org.apache.spark.network.crypto’ on both Spark and YARN side is not giving much information either about why authentication fails. The driver and nod
[jira] [Created] (SPARK-21494) Spark 2.2.0 AES encryption not working with External shuffle
Udit Mehrotra created SPARK-21494: - Summary: Spark 2.2.0 AES encryption not working with External shuffle Key: SPARK-21494 URL: https://issues.apache.org/jira/browse/SPARK-21494 Project: Spark Issue Type: Bug Components: Block Manager, Shuffle Affects Versions: 2.2.0 Environment: AWS EMR Reporter: Udit Mehrotra Spark’s new AES based authentication mechanism does not seem to work when configured with external shuffle service on YARN. Here is the stack trace for the error we see in the driver logs: ERROR YarnScheduler: Lost executor 40 on ip-10-167-104-125.ec2.internal: Unable to create executor due to Unable to register with external shuffle server due to: java.lang.IllegalArgumentException: Authentication failed. at org.apache.spark.network.crypto.AuthRpcHandler.receive(AuthRpcHandler.java:125) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:157) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) at org.spark_project.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) at org.spark_project.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) Here are the settings we are configuring in ‘spark-defaults’ and ‘yarn-site’: spark.network.crypto.enabled true spark.network.crypto.saslFallback false spark.authenticate true Turning on DEBUG logs for class ‘org.apache.spark.network.crypto’ on both Spark and YARN side is not giving much information either about why authentication fails. The driver and node manager logs have been attached to the JIRA. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20515) Issue with reading Hive ORC tables having char/varchar columns in Spark SQL
Udit Mehrotra created SPARK-20515: - Summary: Issue with reading Hive ORC tables having char/varchar columns in Spark SQL Key: SPARK-20515 URL: https://issues.apache.org/jira/browse/SPARK-20515 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.2 Environment: AWS EMR Cluster Reporter: Udit Mehrotra Reading from a Hive ORC table containing char/varchar columns fails in Spark SQL. This is caused by the fact that Spark SQL internally replaces the char/varchar columns with String data type. So, while reading from the table created in Hive which has varchar/char columns, it ends up using the wrong reader and causes a ClassCastException. Here is the exception: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveVarcharWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41) at org.apache.spark.sql.hive.HiveInspectors$class.unwrap(HiveInspectors.scala:324) at org.apache.spark.sql.hive.HadoopTableReader$.unwrap(TableReader.scala:333) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$15.apply(TableReader.scala:419) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$15.apply(TableReader.scala:419) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:435) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:426) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:247) at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) While the issue has been fixed in Spark 2.1.1 and 2.2.0 with SPARK-19459, it still needs to be fixed Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20115) Fix DAGScheduler to recompute all the lost shuffle blocks when external shuffle service is unavailable
Udit Mehrotra created SPARK-20115: - Summary: Fix DAGScheduler to recompute all the lost shuffle blocks when external shuffle service is unavailable Key: SPARK-20115 URL: https://issues.apache.org/jira/browse/SPARK-20115 Project: Spark Issue Type: Bug Components: Shuffle, Spark Core, YARN Affects Versions: 2.1.0, 2.0.2 Environment: Spark on Yarn with external shuffle service enabled, running on AWS EMR cluster. Reporter: Udit Mehrotra The Spark’s DAGScheduler currently does not recompute all the lost shuffle blocks on a host when a FetchFailed exception occurs, while fetching shuffle blocks from another executor with external shuffle service enabled. Instead it only recomputes the lost shuffle blocks computed by the executor for which the FetchFailed exception occurred. This works fine for Internal shuffle scenario, where the executors serve their own shuffle blocks and hence only the shuffle blocks for that executor should be considered lost. However, when External Shuffle Service is being used, a FetchFailed exception would mean that the external shuffle service running on that host has become unavailable. This in turn is sufficient to assume that all the shuffle blocks which were managed by the Shuffle service on that host are lost. Therefore, just recomputing the shuffle blocks associated with the particular Executor for which FetchFailed exception occurred is not sufficient. We need to recompute all the shuffle blocks, managed by that service because there could be multiple executors running on that host. Since not all the shuffle blocks (for all the executors on the host) are recomputed, this causes future attempts of the reduce stage to fail as well because the new tasks scheduled still keep trying to reach the old location of the shuffle blocks (which were not recomputed) and keep throwing further FetchFailed exceptions. This ultimately causes the job to fail, after the reduce stage has been retried 4 times. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18756) Memory leak in Spark streaming
[ https://issues.apache.org/jira/browse/SPARK-18756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated SPARK-18756: -- Description: We have a Spark streaming application, that processes data from Kinesis. In our application we are observing a memory leak at the Executors with Netty buffers not being released properly, when the Spark BlockManager tries to replicate the input blocks received from Kinesis stream. The leak occurs, when we set Storage Level as MEMORY_AND_DISK_2 for the Kinesis input blocks. However, if we change the Storage level to use MEMORY_AND_DISK, which avoids creating a replica, we do not observe the leak any more. We were able to detect the leak, and obtain the stack trace by running the executors with an additional JVM option: -Dio.netty.leakDetectionLevel=advanced. Here is the stack trace of the leak: 16/12/06 22:30:12 ERROR ResourceLeakDetector: LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information. Recent access records: 0 Created at: io.netty.buffer.CompositeByteBuf.(CompositeByteBuf.java:103) io.netty.buffer.Unpooled.wrappedBuffer(Unpooled.java:335) io.netty.buffer.Unpooled.wrappedBuffer(Unpooled.java:247) org.apache.spark.util.io.ChunkedByteBuffer.toNetty(ChunkedByteBuffer.scala:69) org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$replicate(BlockManager.scala:1182) org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:997) org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926) org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866) org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926) org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:702) org.apache.spark.streaming.receiver.BlockManagerBasedBlockHandler.storeBlock(ReceivedBlockHandler.scala:80) org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushAndReportBlock(ReceiverSupervisorImpl.scala:158) org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushArrayBuffer(ReceiverSupervisorImpl.scala:129) org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:133) org.apache.spark.streaming.kinesis.KinesisReceiver.org$apache$spark$streaming$kinesis$KinesisReceiver$$storeBlockWithRanges(KinesisReceiver.scala:282) org.apache.spark.streaming.kinesis.KinesisReceiver$GeneratedBlockHandler.onPushBlock(KinesisReceiver.scala:352) org.apache.spark.streaming.receiver.BlockGenerator.pushBlock(BlockGenerator.scala:297) org.apache.spark.streaming.receiver.BlockGenerator.org$apache$spark$streaming$receiver$BlockGenerator$$keepPushingBlocks(BlockGenerator.scala:269) org.apache.spark.streaming.receiver.BlockGenerator$$anon$1.run(BlockGenerator.scala:110) We also observe a continuous increase in off heap memory usage at the executors. Any help would be appreciated. > Memory leak in Spark streaming > -- > > Key: SPARK-18756 > URL: https://issues.apache.org/jira/browse/SPARK-18756 > Project: Spark > Issue Type: Bug > Components: Block Manager, DStreams >Affects Versions: 2.0.0, 2.0.1, 2.0.2 >Reporter: Udit Mehrotra > > We have a Spark streaming application, that processes data from Kinesis. > In our application we are observing a memory leak at the Executors with Netty > buffers not being released properly, when the Spark BlockManager tries to > replicate the input blocks received from Kinesis stream. The leak occurs, > when we set Storage Level as MEMORY_AND_DISK_2 for the Kinesis input blocks. > However, if we change the Storage level to use MEMORY_AND_DISK, which avoids > creating a replica, we do not observe the leak any more. We were able to > detect the leak, and obtain the stack trace by running the executors with an > additional JVM option: -Dio.netty.leakDetectionLevel=advanced. > Here is the stack trace of the leak: > 16/12/06 22:30:12 ERROR ResourceLeakDetector: LEAK: ByteBuf.release() was not > called before it's garbage-collected. See > http://netty.io/wiki/reference-counted-objects.html for more information. > Recent access records: 0 > Created at: > io.netty.buffer.CompositeByteBuf.(CompositeByteBuf.java:103) > io.netty.buffer.Unpooled.wrappedBuffer(Unpooled.java:335) > io.netty.buffer.Unpooled.wrappedBuffer(Unpooled.java:247) > > org.apache.spark.util.io.ChunkedByteBuffer.toNetty(ChunkedByteBuffer.scala:69) > > org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$replicate(BlockManager.scala:1182) > > org.apache.spark.storage.BlockManager$
[jira] [Created] (SPARK-18756) Memory leak in Spark streaming
Udit Mehrotra created SPARK-18756: - Summary: Memory leak in Spark streaming Key: SPARK-18756 URL: https://issues.apache.org/jira/browse/SPARK-18756 Project: Spark Issue Type: Bug Components: Block Manager, DStreams Affects Versions: 2.0.2, 2.0.1, 2.0.0 Reporter: Udit Mehrotra -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)
[ https://issues.apache.org/jira/browse/SPARK-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15680276#comment-15680276 ] Udit Mehrotra commented on SPARK-17380: --- The above leak was seen with Spark 2.0 running on EMR. I noticed that the code path which causes the leak is the Block replication code, so I switched to using StorageLevel.MEMORY_AND_DISK, from StorageLevel.MEMORY_AND_DISK_2 for the Kinesis blocks received. After switching, I do not observe the above memory leak in the logs, but the application still freezes after 3-3.5 days. Spark streaming stops processing the records, and the input queue of records received from Kinesis keeps growing, until the executor runs out of memory. > Spark streaming with a multi shard Kinesis freezes after several days > (memory/resource leak?) > - > > Key: SPARK-17380 > URL: https://issues.apache.org/jira/browse/SPARK-17380 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0 >Reporter: Xeto > Attachments: exec_Leak_Hunter.zip, memory-after-freeze.png, memory.png > > > Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 > shards). > Used memory keeps growing all the time according to Ganglia. > The application works properly for about 3.5 days till all free memory has > been used. > Then, micro batches start queuing up but none is served. > Spark freezes. You can see in Ganglia that some memory is being freed but it > doesn't help the job to recover. > Is it a memory/resource leak? > The job uses back pressure and Kryo. > The code has a mapToPair(), groupByKey(), flatMap(), > persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then > storing to s3 using foreachRDD() > Cluster size: 20 machines > Spark cofiguration: > spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 > -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' > spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO > -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M > -XX:OnOutOfMemoryError='kill -9 %p' > spark.master yarn-cluster > spark.executor.instances 19 > spark.executor.cores 7 > spark.executor.memory 7500M > spark.driver.memory 7500M > spark.default.parallelism 133 > spark.yarn.executor.memoryOverhead 2950 > spark.yarn.driver.memoryOverhead 2950 > spark.eventLog.enabled false > spark.eventLog.dir hdfs:///spark-logs/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)
[ https://issues.apache.org/jira/browse/SPARK-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15680270#comment-15680270 ] Udit Mehrotra commented on SPARK-17380: --- We came across this Memory Leak in the executor logs, by using the JVM option '-Dio.netty.leakDetectionLevel=advanced', which seems like a good evidence of memory leak, and tells the location where the buffer is created. 16/11/09 06:03:28 ERROR ResourceLeakDetector: LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information. Recent access records: 0 Created at: io.netty.buffer.CompositeByteBuf.(CompositeByteBuf.java:103) io.netty.buffer.Unpooled.wrappedBuffer(Unpooled.java:335) io.netty.buffer.Unpooled.wrappedBuffer(Unpooled.java:247) org.apache.spark.util.io.ChunkedByteBuffer.toNetty(ChunkedByteBuffer.scala:69) org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$replicate(BlockManager.scala:1161) org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:976) org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:910) org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866) org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:910) org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:700) org.apache.spark.streaming.receiver.BlockManagerBasedBlockHandler.storeBlock(ReceivedBlockHandler.scala:80) org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushAndReportBlock(ReceiverSupervisorImpl.scala:158) org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushArrayBuffer(ReceiverSupervisorImpl.scala:129) org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:133) org.apache.spark.streaming.kinesis.KinesisReceiver.org$apache$spark$streaming$kinesis$KinesisReceiver$$storeBlockWithRanges(KinesisReceiver.scala:282) org.apache.spark.streaming.kinesis.KinesisReceiver$GeneratedBlockHandler.onPushBlock(KinesisReceiver.scala:352) org.apache.spark.streaming.receiver.BlockGenerator.pushBlock(BlockGenerator.scala:297) org.apache.spark.streaming.receiver.BlockGenerator.org$apache$spark$streaming$receiver$BlockGenerator$$keepPushingBlocks(BlockGenerator.scala:269) org.apache.spark.streaming.receiver.BlockGenerator$$anon$1.run(BlockGenerator.scala:110) Can we please have some action on this JIRA ? > Spark streaming with a multi shard Kinesis freezes after several days > (memory/resource leak?) > - > > Key: SPARK-17380 > URL: https://issues.apache.org/jira/browse/SPARK-17380 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0 >Reporter: Xeto > Attachments: exec_Leak_Hunter.zip, memory-after-freeze.png, memory.png > > > Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 > shards). > Used memory keeps growing all the time according to Ganglia. > The application works properly for about 3.5 days till all free memory has > been used. > Then, micro batches start queuing up but none is served. > Spark freezes. You can see in Ganglia that some memory is being freed but it > doesn't help the job to recover. > Is it a memory/resource leak? > The job uses back pressure and Kryo. > The code has a mapToPair(), groupByKey(), flatMap(), > persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then > storing to s3 using foreachRDD() > Cluster size: 20 machines > Spark cofiguration: > spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 > -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' > spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO > -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M > -XX:OnOutOfMemoryError='kill -9 %p' > spark.master yarn-cluster > spark.executor.instances 19 > spark.executor.cores 7 > spark.executor.memory 7500M > spark.driver.memory 7500M > spark.default.parallelism 133 > spark.yarn.executor.memoryOverhead 2950 > spark.yarn.driver.memoryOverhead 2950 > spark.eventLog.enabled false > spark.eventLog.dir hdfs:///spark-logs/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17512) Specifying remote files for Python based Spark jobs in Yarn cluster mode not working
Udit Mehrotra created SPARK-17512: - Summary: Specifying remote files for Python based Spark jobs in Yarn cluster mode not working Key: SPARK-17512 URL: https://issues.apache.org/jira/browse/SPARK-17512 Project: Spark Issue Type: Bug Components: PySpark, Spark Submit Affects Versions: 2.0.0 Reporter: Udit Mehrotra When I run a python application, and specify a remote path for the extra files to be included in the PYTHON_PATH using the '--py-files' or 'spark.submit.pyFiles' configuration option in YARN Cluster mode I get the following error: Exception in thread "main" java.lang.IllegalArgumentException: Launching Python applications through spark-submit is currently only supported for local files: s3:///app.py at org.apache.spark.deploy.PythonRunner$.formatPath(PythonRunner.scala:104) at org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136) at org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.deploy.PythonRunner$.formatPaths(PythonRunner.scala:136) at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$10.apply(SparkSubmit.scala:636) at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$10.apply(SparkSubmit.scala:634) at scala.Option.foreach(Option.scala:257) at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:634) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:158) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Here are sample commands which would throw this error in Spark 2.0 (sparkApp.py requires app.py): spark-submit --deploy-mode cluster --py-files s3:///app.py s3:///sparkApp.py (works fine in 1.6) spark-submit --deploy-mode cluster --conf spark.submit.pyFiles=s3:///app.py s3:///sparkApp1.py (not working in 1.6) This would work fine if app.py is downloaded locally and specified. This was working correctly using ‘—py-files’ option in earlier version of Spark, but not using the ‘spark.submit.pyFiles’ configuration option. But now, it does not work through either of the ways. The following diff shows the comment which states that it should work with ‘non-local’ paths for the YARN cluster mode, and we are specifically doing separate validation to fail if YARN client mode is used with remote paths: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L309 And then this code gets triggered at the end of each run, irrespective of whether we are using Client or Cluster mode, and internally validates that the paths should be non-local: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L634 This above validation was not getting triggered in earlier version of Spark using ‘—py-files’ option because we were not storing the arguments passed to ‘—py-files’ in the ‘spark.submit.pyFiles’ configuration for YARN. However, the following code was newly added in 2.0 which now stores it and hence this validation gets triggered even if we specify files through ‘—py-files’ option: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L545 Also, we changed the logic in YARN client, to read values directly from ‘spark.submit.pyFiles’ configuration instead of from ‘—py-files’ (earlier): https://github.com/apache/spark/commit/8ba2b7f28fee39c4839e5ea125bd25f5091a3a1e#diff-b050df3f55b82065803d6e83453b9706R543 So now its broken whether we use ‘—py-files’ or ‘spark.submit.pyFiles’ as the validation gets triggered in both cases irrespective of whether we use Client or Cluster mode with YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org