That error could mean different things, most of the time is that the JVM
crashed . If you are running yarn check the yarn logs or the stderr of your
spark job to see if there is any more details of the cause
On Fri, 19 Nov 2021 at 15:25, Joris Billen
wrote:
> Hi,
> we are seeing this error:
>
>
Hi,
we are seeing this error:
Job aborted due to stage failure: Task 0 in stage 1.0 failed 8...Reason:
Container from a bad node: container_xxx on host: dev-yyy Exit status: 134
This post suggests this has to do with blacklisted nodes:
I unfortunately haven't seen this directly. But some typical things I try
when debugging are as follows.
Do you see a corresponding error on the other side of that connection
(alpinenode7.alpinenow.local)? Or is that the same machine?
Also, do the driver logs show any longer stack trace and have
I'm seeing the following message in the log of an executor. Anyone
seen this error? After this, the executor seems to lose the cache, and
but besides that the whole thing slows down drastically - I.e. it gets
stuck in a reduce phase for 40+ minutes, whereas before it was
finishing reduces in 2~3