Re: spark classloader question
You can try to play with experimental flags [1] `spark.executor.userClassPathFirst` and `spark.driver.userClassPathFirst`. But this can also potentially break other things (like: dependencies that Spark master required initializing overridden by Spark app and so on) so, you will need to verify. [1] https://spark.apache.org/docs/latest/configuration.html On Thu, Jul 7, 2016 at 4:05 PM, Chen Song wrote: > Sorry to spam people who are not interested. Greatly appreciate it if > anyone who is familiar with this can share some insights. > > On Wed, Jul 6, 2016 at 2:28 PM Chen Song wrote: > >> Hi >> >> I ran into problems to use class loader in Spark. In my code (run within >> executor), I explicitly load classes using the ContextClassLoader as below. >> >> Thread.currentThread().getContextClassLoader() >> >> The jar containing the classes to be loaded is added via the --jars >> option in spark-shell/spark-submit. >> >> I always get the class not found exception. However, it seems to work if >> I compile these classes in main jar for the job (the jar containing the >> main job class). >> >> I know Spark implements its own class loaders in a particular way. Is >> there a way to work around this? In other words, what is the proper way to >> programmatically load classes in other jars added via --jars in Spark? >> >> -- -- Cheers, Praj
Re: Can I use log4j2.xml in my Apache Saprk application
One way to integrate log4j2 would be to enable flags `spark.executor.userClassPathFirst` and `spark.driver.userClassPathFirst` when submitting the application. This would cause application class loader to load first, initializing log4j2 logging context. But this can also potentially break other things (like: dependencies that Spark master required initializing overridden by Spark app and so on) so, you will need to verify. More info about those flags @ https://spark.apache.org/docs/latest/configuration.html On Wed, Jun 22, 2016 at 7:11 AM, Charan Adabala wrote: > Hi, > > We are trying to integrate log4j2.xml instead of log4j.properties in Apache > Spark application, We integrated log4j2.xml but, the problem is unable to > write the worker log of the application and there is no problem for writing > driver log. Can any one suggest how to integrate log4j2.xml in Apache Spark > application with successful writing of both worker and driver log. > > Thanks in advance.., > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Can-I-use-log4j2-xml-in-my-Apache-Saprk-application-tp27205.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- -- Cheers, Praj
Re: Basic question. Access MongoDB data in Spark.
May be try opening an issue in their GH repo https://github.com/Stratio/Spark-MongoDB On Mon, Jun 13, 2016 at 4:10 PM, Umair Janjua wrote: > Anybody knows the stratio's mailing list? I cant seem to find it. Cheers > > On Mon, Jun 13, 2016 at 6:02 PM, Ted Yu wrote: > >> Have you considered posting the question on stratio's mailing list ? >> >> You may get faster response there. >> >> >> On Mon, Jun 13, 2016 at 8:09 AM, Umair Janjua >> wrote: >> >>> Hi guys, >>> >>> I have this super basic problem which I cannot figure out. Can somebody >>> give me a hint. >>> >>> http://stackoverflow.com/questions/37793214/spark-mongdb-data-using-java >>> >>> Cheers >>> >> >> > -- -- Cheers, Praj
Re: SqlContext parquet read OutOfMemoryError: Requested array size exceeds VM limit error
If you are running on 64-bit JVM with less than 32G heap, you might want to enable -XX:+UseCompressedOops[1]. And if your dataframe is somehow generating more than 2^31-1 number of arrays, you might have to rethink your options. [1] https://spark.apache.org/docs/latest/tuning.html On Wed, May 4, 2016 at 9:44 PM, Bijay Kumar Pathak wrote: > Hi, > > I am reading the parquet file around 50+ G which has 4013 partitions with > 240 columns. Below is my configuration > > driver : 20G memory with 4 cores > executors: 45 executors with 15G memory and 4 cores. > > I tried to read the data using both Dataframe read and using hive context > to read the data using hive SQL but for the both cases, it throws me below > error with no further description on error. > > hive_context.sql("select * from test.base_table where > date='{0}'".format(part_dt)) > sqlcontext.read.parquet("/path/to/partion/") > > # > # java.lang.OutOfMemoryError: Requested array size exceeds VM limit > # -XX:OnOutOfMemoryError="kill -9 %p" > # Executing /bin/sh -c "kill -9 16953"... > > > What could be wrong over here since I think increasing memory only will > not help in this case since it reached the array size limit. > > Thanks, > Bijay > -- -- Cheers, Praj
spark job stage failures
Hi, I was wondering how Spark handle stage / task failures for a job. We are running a Spark job to batch write to ElasticSearch and we are seeing one or two stage failures due to ES cluster getting over loaded (expected as we are testing with single node ES cluster). But I was assuming that when some of the batch writes to ES fail after certain number of retries (10), it should have aborted the whole job but we are seeing that spark job marked as finished even though single job failed. How does Spark handles failure when a job or stage is marked as failed? Thanks in advance. -- -- Cheers, Praj