Re: Executor lost for unknown reasons error Spark 2.3 on kubernetes

2018-07-31 Thread purna pradeep
More details about executor pod which died abruptly from spark driver pod
logs


2018-07-30 19:58:41 ERROR TaskSchedulerImpl:70 - Lost executor 3 on
10.*.*.*.*: Executor lost for unknown reasons.

2018-07-30 19:58:41 WARN  TaskSetManager:66 - Lost task 32.0 in stage 9.0
(TID 133, 10.10.147.6, executor 3): ExecutorLostFailure (executor 3 exited
caused by one of the running tasks) Reason: Executor lost for unknown
reasons.

2018-07-30 19:58:41 WARN  KubernetesClusterSchedulerBackend:66 - Received
delete event of executor pod
accelerate-snowflake-test-5b6ba9d5495b3ae9a1358ae9c3f9a8c3-exec-3. Reason:
null

2018-07-30 19:58:41 WARN  KubernetesClusterSchedulerBackend:347 - Executor
with id 3 was not marked as disconnected, but the watch received an event
of type DELETED for this executor. The executor may have failed to start in
the first place and never registered with the driver.

2018-07-30 19:58:41 INFO  TaskSetManager:54 - Starting task 32.1 in stage
9.0 (TID 134, 10.*.*.*.*, executor 7, partition 32, ANY, 9262 bytes)

2018-07-30 19:58:42 INFO  ContextCleaner:54 - Cleaned accumulator 246

2018-07-30 19:58:42 INFO  ContextCleaner:54 - Cleaned accumulator 252

2018-07-30 19:58:42 INFO  ContextCleaner:54 - Cleaned accumulator 254

2018-07-30 19:58:42 INFO  BlockManagerInfo:54 - Removed broadcast_11_piece0
on spark-1532979165550-driver-svc.spark.svc:7079 in memory (size: 6.9 KB,
free: 997.6 MB)

2018-07-30 19:58:42 INFO  BlockManagerInfo:54 - Removed broadcast_11_piece0
on 10.*.*.*.*:43815 on disk (size: 6.9 KB)

2018-07-30 19:58:42 WARN  TransportChannelHandler:78 - Exception in
connection from /10.*.*.*.*:37578

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:192)

at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)

at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)

at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)

at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)

at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)

at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)

at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)

at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)

at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)

at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

at java.lang.Thread.run(Thread.java:748)

2018-07-30 19:58:42 ERROR TransportResponseHandler:154 - Still have 1
requests outstanding when connection from /10.*.*.*.*:37578 is closed

2018-07-30 19:58:42 INFO
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:54 - Disabling
executor 7.

2018-07-30 19:58:42 INFO  DAGScheduler:54 - Executor lost: 7 (epoch 1)

2018-07-30 19:58:42 WARN  BlockManagerMaster:87 - Failed to remove
broadcast 11 with removeFromMaster = true - Connection reset by peer

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:192)

at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)

at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)

at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)

at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)

at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)

at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)

at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)

at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)

at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)

at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

at

Executor lost for unknown reasons error Spark 2.3 on kubernetes

2018-07-31 Thread purna pradeep
> Hello,
>
>
>
> I’m getting below error in spark driver pod logs and executor pods are
> getting killed midway through while the job is running  and even driver pod
> Terminated with below intermittent error ,this happens if I run multiple
> jobs in parallel.
>
>
>
> Not able to see executor logs as executor pods are killed
>
>
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 23
> in stage 36.0 failed 4 times, most recent failure: Lost task 23.3 in stage
> 36.0 (TID 1006, 10.10.125.119, executor 1): ExecutorLostFailure (executor 1
> exited caused by one of the running tasks) Reason: Executor lost for
> unknown reasons.
>
> Driver stacktrace:
>
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
>
> at scala.Option.foreach(Option.scala:257)
>
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
>
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
>
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
>
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
>
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
> at
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
>
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
>
> at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)
>
> ... 42 mor
>


Executor lost for unknown reasons error Spark 2.3 on kubernetes

2018-07-30 Thread purna pradeep
Hello,



I’m getting below error in spark driver pod logs and executor pods are
getting killed midway through while the job is running  and even driver pod
Terminated with below intermittent error ,this happens if I run multiple
jobs in parallel.



Not able to see executor logs as executor pods are killed



org.apache.spark.SparkException: Job aborted due to stage failure: Task 23
in stage 36.0 failed 4 times, most recent failure: Lost task 23.3 in stage
36.0 (TID 1006, 10.10.125.119, executor 1): ExecutorLostFailure (executor 1
exited caused by one of the running tasks) Reason: Executor lost for
unknown reasons.

Driver stacktrace:

at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)

at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)

at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)

at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)

at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)

at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)

at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)

at scala.Option.foreach(Option.scala:257)

at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)

at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)

at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)

at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)

at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)

at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)

... 42 mor


Executor lost for unknown reasons error Spark 2.3 on kubernetes

2018-07-30 Thread Mamillapalli, Purna Pradeep
Hello,

I’m getting below error in spark driver pod logs and executor pods are getting 
killed midway through while the job is running  and even driver pod Terminated 
with below intermittent error ,this happens if I run multiple jobs in parallel.

Not able to see executor logs as executor pods are killed

org.apache.spark.SparkException: Job aborted due to stage failure: Task 23 in 
stage 36.0 failed 4 times, most recent failure: Lost task 23.3 in stage 36.0 
(TID 1006, 10.10.125.119, executor 1): ExecutorLostFailure (executor 1 exited 
caused by one of the running tasks) Reason: Executor lost for unknown reasons.
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)
... 42 more


Thanks,
Purna


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.