Re: spark classloader question

2016-07-07 Thread Prajwal Tuladhar
You can try to play with experimental flags [1] `spark.executor.userClassPathFirst` and `spark.driver.userClassPathFirst`. But this can also potentially break other things (like: dependencies that Spark master required initializing overridden by Spark app and so on) so, you will need to verify.

Re: Can I use log4j2.xml in my Apache Saprk application

2016-06-22 Thread Prajwal Tuladhar
One way to integrate log4j2 would be to enable flags `spark.executor.userClassPathFirst` and `spark.driver.userClassPathFirst` when submitting the application. This would cause application class loader to load first, initializing log4j2 logging context. But this can also potentially break other

Re: Basic question. Access MongoDB data in Spark.

2016-06-13 Thread Prajwal Tuladhar
May be try opening an issue in their GH repo https://github.com/Stratio/Spark-MongoDB On Mon, Jun 13, 2016 at 4:10 PM, Umair Janjua wrote: > Anybody knows the stratio's mailing list? I cant seem to find it. Cheers > > On Mon, Jun 13, 2016 at 6:02 PM, Ted Yu

Re: SqlContext parquet read OutOfMemoryError: Requested array size exceeds VM limit error

2016-05-04 Thread Prajwal Tuladhar
If you are running on 64-bit JVM with less than 32G heap, you might want to enable -XX:+UseCompressedOops[1]. And if your dataframe is somehow generating more than 2^31-1 number of arrays, you might have to rethink your options. [1] https://spark.apache.org/docs/latest/tuning.html On Wed, May 4,

spark job stage failures

2016-05-04 Thread Prajwal Tuladhar
Hi, I was wondering how Spark handle stage / task failures for a job. We are running a Spark job to batch write to ElasticSearch and we are seeing one or two stage failures due to ES cluster getting over loaded (expected as we are testing with single node ES cluster). But I was assuming that