RE: Memory problems and missing heartbeats

2016-02-16 Thread Ignacio Blasco
Hi Ximo. Regarding to #1 you can try to increase the number of partitions
used for cogroup or reduce. AFAIK Spark needs to have enough memory space
to handle in memory all the data processed by a given partition, increasing
the number of partitions you can reduce that load. Probably we need to know
more about your workflow in order to assess if that is your case.

Nacho
El 16 feb. 2016 4:58 p. m., "JOAQUIN GUANTER GONZALBEZ" <
joaquin.guantergonzal...@telefonica.com> escribió:

> Thanks. I'll take a look at Graphite to see if that helps me out with my
> first problem.
>
> Ximo.
>
> -Mensaje original-
> De: Arkadiusz Bicz [mailto:arkadiusz.b...@gmail.com]
> Enviado el: martes, 16 de febrero de 2016 16:06
> Para: Iulian Dragoș 
> CC: JOAQUIN GUANTER GONZALBEZ ;
> user@spark.apache.org
> Asunto: Re: Memory problems and missing heartbeats
>
> I had similar as #2 problem when I used lot of caching and then doing
> shuffling It looks like when I cached too much there was no enough space
> for other spark tasks and it just hang on.
>
> That you can try to cache less and see if improve, also executor logs help
> a lot (watch out logs with information about spill) you can also monitor
> jobs jvms through spark monitoring
> http://spark.apache.org/docs/latest/monitoring.html and Graphite and
> Grafana.
>
> On Tue, Feb 16, 2016 at 2:14 PM, Iulian Dragoș 
> wrote:
> > Regarding your 2nd problem, my best guess is that you’re seeing GC
> pauses.
> > It’s not unusual, given you’re using 40GB heaps. See for instance this
> > blog post
> >
> > From conducting numerous tests, we have concluded that unless you are
> > utilizing some off-heap technology (e.g. GridGain OffHeap), no Garbage
> > Collector provided with JDK will render any kind of stable GC
> > performance with heap sizes larger that 16GB. For example, on 50GB
> > heaps we can often encounter up to 5 minute GC pauses, with average
> pauses of 2 to 4 seconds.
> >
> > Not sure if Yarn can do this, but I would try to run with a smaller
> > executor heap, and more executors per node.
> >
> > iulian
> >
> >
>
> 
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
> puede contener información privilegiada o confidencial y es para uso
> exclusivo de la persona o entidad de destino. Si no es usted. el
> destinatario indicado, queda notificado de que la lectura, utilización,
> divulgación y/o copia sin autorización puede estar prohibida en virtud de
> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
> que nos lo comunique inmediatamente por esta misma vía y proceda a su
> destrucción.
>
> The information contained in this transmission is privileged and
> confidential information intended only for the use of the individual or
> entity named above. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have received
> this transmission in error, do not read it. Please immediately reply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


RE: Memory problems and missing heartbeats

2016-02-16 Thread JOAQUIN GUANTER GONZALBEZ
Thanks. I'll take a look at Graphite to see if that helps me out with my first 
problem.

Ximo.

-Mensaje original-
De: Arkadiusz Bicz [mailto:arkadiusz.b...@gmail.com]
Enviado el: martes, 16 de febrero de 2016 16:06
Para: Iulian Dragoș 
CC: JOAQUIN GUANTER GONZALBEZ ; 
user@spark.apache.org
Asunto: Re: Memory problems and missing heartbeats

I had similar as #2 problem when I used lot of caching and then doing shuffling 
It looks like when I cached too much there was no enough space for other spark 
tasks and it just hang on.

That you can try to cache less and see if improve, also executor logs help a 
lot (watch out logs with information about spill) you can also monitor jobs 
jvms through spark monitoring 
http://spark.apache.org/docs/latest/monitoring.html and Graphite and Grafana.

On Tue, Feb 16, 2016 at 2:14 PM, Iulian Dragoș  
wrote:
> Regarding your 2nd problem, my best guess is that you’re seeing GC pauses.
> It’s not unusual, given you’re using 40GB heaps. See for instance this
> blog post
>
> From conducting numerous tests, we have concluded that unless you are
> utilizing some off-heap technology (e.g. GridGain OffHeap), no Garbage
> Collector provided with JDK will render any kind of stable GC
> performance with heap sizes larger that 16GB. For example, on 50GB
> heaps we can often encounter up to 5 minute GC pauses, with average pauses of 
> 2 to 4 seconds.
>
> Not sure if Yarn can do this, but I would try to run with a smaller
> executor heap, and more executors per node.
>
> iulian
>
>



Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, puede 
contener información privilegiada o confidencial y es para uso exclusivo de la 
persona o entidad de destino. Si no es usted. el destinatario indicado, queda 
notificado de que la lectura, utilización, divulgación y/o copia sin 
autorización puede estar prohibida en virtud de la legislación vigente. Si ha 
recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente 
por esta misma vía y proceda a su destrucción.

The information contained in this transmission is privileged and confidential 
information intended only for the use of the individual or entity named above. 
If the reader of this message is not the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this communication 
is strictly prohibited. If you have received this transmission in error, do not 
read it. Please immediately reply to the sender that you have received this 
communication in error and then delete it.

Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, pode 
conter informação privilegiada ou confidencial e é para uso exclusivo da pessoa 
ou entidade de destino. Se não é vossa senhoria o destinatário indicado, fica 
notificado de que a leitura, utilização, divulgação e/ou cópia sem autorização 
pode estar proibida em virtude da legislação vigente. Se recebeu esta mensagem 
por erro, rogamos-lhe que nos o comunique imediatamente por esta mesma via e 
proceda a sua destruição

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Memory problems and missing heartbeats

2016-02-16 Thread JOAQUIN GUANTER GONZALBEZ
A GC pause fits nicely with what I’m seeing. Many thanks for the link!

Ximo

De: Iulian Dragoș [mailto:iulian.dra...@typesafe.com]
Enviado el: martes, 16 de febrero de 2016 15:14
Para: JOAQUIN GUANTER GONZALBEZ 
CC: user@spark.apache.org
Asunto: Re: Memory problems and missing heartbeats


Regarding your 2nd problem, my best guess is that you’re seeing GC pauses. It’s 
not unusual, given you’re using 40GB heaps. See for instance this blog 
post<http://gridgain.blogspot.ch/2014/06/jdk-g1-garbage-collector-pauses-for.html>

From conducting numerous tests, we have concluded that unless you are utilizing 
some off-heap technology (e.g. GridGain OffHeap), no Garbage Collector provided 
with JDK will render any kind of stable GC performance with heap sizes larger 
that 16GB. For example, on 50GB heaps we can often encounter up to 5 minute GC 
pauses, with average pauses of 2 to 4 seconds.

Not sure if Yarn can do this, but I would try to run with a smaller executor 
heap, and more executors per node.

iulian





Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, puede 
contener información privilegiada o confidencial y es para uso exclusivo de la 
persona o entidad de destino. Si no es usted. el destinatario indicado, queda 
notificado de que la lectura, utilización, divulgación y/o copia sin 
autorización puede estar prohibida en virtud de la legislación vigente. Si ha 
recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente 
por esta misma vía y proceda a su destrucción.

The information contained in this transmission is privileged and confidential 
information intended only for the use of the individual or entity named above. 
If the reader of this message is not the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this communication 
is strictly prohibited. If you have received this transmission in error, do not 
read it. Please immediately reply to the sender that you have received this 
communication in error and then delete it.

Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, pode 
conter informação privilegiada ou confidencial e é para uso exclusivo da pessoa 
ou entidade de destino. Se não é vossa senhoria o destinatário indicado, fica 
notificado de que a leitura, utilização, divulgação e/ou cópia sem autorização 
pode estar proibida em virtude da legislação vigente. Se recebeu esta mensagem 
por erro, rogamos-lhe que nos o comunique imediatamente por esta mesma via e 
proceda a sua destruição


Re: Memory problems and missing heartbeats

2016-02-16 Thread Arkadiusz Bicz
I had similar as #2 problem when I used lot of caching and then doing
shuffling It looks like when I cached too much there was no enough
space for other spark tasks and it just hang on.

That you can try to cache less and see if improve, also executor logs
help a lot (watch out logs with information about spill) you can also
monitor jobs jvms through spark monitoring
http://spark.apache.org/docs/latest/monitoring.html and Graphite and
Grafana.

On Tue, Feb 16, 2016 at 2:14 PM, Iulian Dragoș
 wrote:
> Regarding your 2nd problem, my best guess is that you’re seeing GC pauses.
> It’s not unusual, given you’re using 40GB heaps. See for instance this blog
> post
>
> From conducting numerous tests, we have concluded that unless you are
> utilizing some off-heap technology (e.g. GridGain OffHeap), no Garbage
> Collector provided with JDK will render any kind of stable GC performance
> with heap sizes larger that 16GB. For example, on 50GB heaps we can often
> encounter up to 5 minute GC pauses, with average pauses of 2 to 4 seconds.
>
> Not sure if Yarn can do this, but I would try to run with a smaller executor
> heap, and more executors per node.
>
> iulian
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Memory problems and missing heartbeats

2016-02-16 Thread Iulian Dragoș
Regarding your 2nd problem, my best guess is that you’re seeing GC pauses.
It’s not unusual, given you’re using 40GB heaps. See for instance this blog
post


>From conducting numerous tests, we have concluded that unless you are
utilizing some off-heap technology (e.g. GridGain OffHeap), no Garbage
Collector provided with JDK will render any kind of stable GC performance
with heap sizes larger that 16GB. For example, on 50GB heaps we can often
encounter up to 5 minute GC pauses, with average pauses of 2 to 4 seconds.

Not sure if Yarn can do this, but I would try to run with a smaller
executor heap, and more executors per node.

iulian


RE: Memory problems and missing heartbeats

2016-02-16 Thread JOAQUIN GUANTER GONZALBEZ
Bumping this thread in hopes that someone will answer.

Ximo

-Mensaje original-
De: JOAQUIN GUANTER GONZALBEZ [mailto:joaquin.guantergonzal...@telefonica.com]
Enviado el: lunes, 15 de febrero de 2016 16:43
Para: user@spark.apache.org
Asunto: Memory problems and missing heartbeats

Hello,

I am facing in my Project two different issues with Spark that are driving me 
crazy. I am currently running in EMR (Spark 1.5.2 + YARN), using the 
"--executor-memory 40G" option.

Problem #1
=

Some of my processes get killed by YARN because the container is exceeding the 
physical memory YARN assigned it. I have been able to work around this issue by 
increasing the spark.yarn.executor.memoryOverhead parameter to 8G, but that 
doesn't seem like a good solution.

My understanding is that the JVM that will run my Spark process will get 40 GB 
of heap memory (-Xmx40G), and if there is memory pressure in the process then 
the GC should kick in to ensure that the heap never exceeds those 40 GB. My 
PermGen is set to 510MB, but that is a very long way from the 8GB I need to set 
as overhead. This seems to happen when I .cache() very big RDDs and I then 
perform operations that require shuffling (cogroup & co.).

- Who is using all that off heap memory?
- Are there any tools in the Spark ecosystem that might help me debug this?


Problem #2
=

Some tasks fail because the heartbeat didn't get back to the master in 120 
seconds. Again, I can more or less work around this by increasing the timeout 
to 5 minutes, but I don't feel this is addressing the real problem.

- Does the heartbeat have its own thread or would a long-running .map() block 
the heartbeat?
- What conditions would prevent the heartbeat from being sent?

Many thanks in advance for any help with this, Ximo.



Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, puede 
contener información privilegiada o confidencial y es para uso exclusivo de la 
persona o entidad de destino. Si no es usted. el destinatario indicado, queda 
notificado de que la lectura, utilización, divulgación y/o copia sin 
autorización puede estar prohibida en virtud de la legislación vigente. Si ha 
recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente 
por esta misma vía y proceda a su destrucción.

The information contained in this transmission is privileged and confidential 
information intended only for the use of the individual or entity named above. 
If the reader of this message is not the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this communication 
is strictly prohibited. If you have received this transmission in error, do not 
read it. Please immediately reply to the sender that you have received this 
communication in error and then delete it.

Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, pode 
conter informação privilegiada ou confidencial e é para uso exclusivo da pessoa 
ou entidade de destino. Se não é vossa senhoria o destinatário indicado, fica 
notificado de que a leitura, utilização, divulgação e/ou cópia sem autorização 
pode estar proibida em virtude da legislação vigente. Se recebeu esta mensagem 
por erro, rogamos-lhe que nos o comunique imediatamente por esta mesma via e 
proceda a sua destruição

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org




Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, puede 
contener información privilegiada o confidencial y es para uso exclusivo de la 
persona o entidad de destino. Si no es usted. el destinatario indicado, queda 
notificado de que la lectura, utilización, divulgación y/o copia sin 
autorización puede estar prohibida en virtud de la legislación vigente. Si ha 
recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente 
por esta misma vía y proceda a su destrucción.

The information contained in this transmission is privileged and confidential 
information intended only for the use of the individual or entity named above. 
If the reader of this message is not the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this communication 
is strictly prohibited. If you have received this transmission in error, do not 
read it. Please immediately reply to the sender that you have received this 
communication in error and then delete it.

Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, pode 
conter informação privilegiada ou confidencial e é para uso exclusivo da pessoa 
ou entidade de destino. Se não é vossa senhoria o destinatário indicado, fica 
notificado de que a leitura, utilização, divulgação e/ou cópia sem autorização 
pode

Memory problems and missing heartbeats

2016-02-15 Thread JOAQUIN GUANTER GONZALBEZ
Hello,

I am facing in my Project two different issues with Spark that are driving me 
crazy. I am currently running in EMR (Spark 1.5.2 + YARN), using the 
"--executor-memory 40G" option.

Problem #1
=

Some of my processes get killed by YARN because the container is exceeding the 
physical memory YARN assigned it. I have been able to work around this issue by 
increasing the spark.yarn.executor.memoryOverhead parameter to 8G, but that 
doesn't seem like a good solution.

My understanding is that the JVM that will run my Spark process will get 40 GB 
of heap memory (-Xmx40G), and if there is memory pressure in the process then 
the GC should kick in to ensure that the heap never exceeds those 40 GB. My 
PermGen is set to 510MB, but that is a very long way from the 8GB I need to set 
as overhead. This seems to happen when I .cache() very big RDDs and I then 
perform operations that require shuffling (cogroup & co.).

- Who is using all that off heap memory?
- Are there any tools in the Spark ecosystem that might help me debug this?


Problem #2
=

Some tasks fail because the heartbeat didn't get back to the master in 120 
seconds. Again, I can more or less work around this by increasing the timeout 
to 5 minutes, but I don't feel this is addressing the real problem.

- Does the heartbeat have its own thread or would a long-running .map() block 
the heartbeat?
- What conditions would prevent the heartbeat from being sent?

Many thanks in advance for any help with this,
Ximo.



Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, puede 
contener información privilegiada o confidencial y es para uso exclusivo de la 
persona o entidad de destino. Si no es usted. el destinatario indicado, queda 
notificado de que la lectura, utilización, divulgación y/o copia sin 
autorización puede estar prohibida en virtud de la legislación vigente. Si ha 
recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente 
por esta misma vía y proceda a su destrucción.

The information contained in this transmission is privileged and confidential 
information intended only for the use of the individual or entity named above. 
If the reader of this message is not the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this communication 
is strictly prohibited. If you have received this transmission in error, do not 
read it. Please immediately reply to the sender that you have received this 
communication in error and then delete it.

Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, pode 
conter informação privilegiada ou confidencial e é para uso exclusivo da pessoa 
ou entidade de destino. Se não é vossa senhoria o destinatário indicado, fica 
notificado de que a leitura, utilização, divulgação e/ou cópia sem autorização 
pode estar proibida em virtude da legislação vigente. Se recebeu esta mensagem 
por erro, rogamos-lhe que nos o comunique imediatamente por esta mesma via e 
proceda a sua destruição

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org