Does anybody use spark.rpc.io.mode=epoll?

2017-03-07 Thread Steven Ruppert
The epoll mode definitely exists in spark, but the official
documentation does not mention it, nor any of the other settings that
appear to be unofficially documented in:

https://github.com/jaceklaskowski/mastering-apache-spark-book/blob/master/spark-rpc-netty.adoc

I don't seem to have any particular performance problems with the
default NIO impl, but the "lower gc pressure" mentioned in the
official netty docs
https://github.com/netty/netty/wiki/Native-transports does seem
attractive.

However, the fact that it's not even documented gives me pause. Is it
deprecated, or perhaps just not useful? Do I have to stick the native
library jar into the spark classpath to use it?

-- 
*CONFIDENTIALITY NOTICE: This email message, and any documents, files or 
previous e-mail messages attached to it is for the sole use of the intended 
recipient(s) and may contain confidential and privileged information. Any 
unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email 
and destroy all copies of the original message.*

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: spark-shell running out of memory even with 6GB ?

2017-01-09 Thread Steven Ruppert
The spark-shell process alone shouldn't take up that much memory, at least
in my experience. Have you dumped the heap to see what's all in there? What
environment are you running spark in?

Doing stuff like RDD.collect() or .countByKey will pull potentially a lot
of data the spark-shell heap. Another thing thing that can fill up the
spark master process heap (which is also run in the spark-shell process) is
running lots of jobs, the logged SparkEvents of which stick around in order
for the UI to render. There are some options under `spark.ui.retained*` to
limit that if it's a problem.


On Mon, Jan 9, 2017 at 6:00 PM, Kevin Burton  wrote:

> We've had various OOM issues with spark and have been trying to track them
> down one by one.
>
> Now we have one in spark-shell which is super surprising.
>
> We currently allocate 6GB to spark shell, as confirmed via 'ps'
>
> Why the heck would the *shell* need that much memory.
>
> I'm going to try to give it more of course but would be nice to know if
> this is a legitimate memory constraint or there is a bug somewhere.
>
> PS: One thought I had was that it would be nice to have spark keep track
> of where an OOM was encountered, in what component.
>
> Kevin
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>

-- 
*CONFIDENTIALITY NOTICE: This email message, and any documents, files or 
previous e-mail messages attached to it is for the sole use of the intended 
recipient(s) and may contain confidential and privileged information. Any 
unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email 
and destroy all copies of the original message.*