SparkContext.getCallSite is in the top of profiler by memory allocation

2015-04-30 Thread Igor Petrov
Hello, we send a lot of small jobs to Spark (up to 500 in a second). When profiling I see Throwable.getStackTrace() in the top of memory profiler which is caused by SparkContext.getCallSite - this is memory consuming. we use Java API, I tried to call SparkContext.setCallSite(-) before

Apache Spark Executor - number of threads

2015-03-17 Thread Igor Petrov
Hello, is it possible to set number of threads in the Executor's pool? I see no such setting in the docs. The reason we want to try it: we want to see performance impact with different level of parallelism (having one thread per CPU, two threads per CPU, N threads per CPU). Thank You -- View

Tuning number of partitions per CPU

2015-02-13 Thread Igor Petrov
Hello, In Spark programming guide (http://spark.apache.org/docs/1.2.0/programming-guide.html) there is a recommendation: Typically you want 2-4 partitions for each CPU in your cluster. We have a Spark Master and two Spark workers each with 18 cores and 18 GB of RAM. In our application we use