Re: spark classloader question

2016-07-07 Thread Prajwal Tuladhar
You can try to play with experimental flags [1]
`spark.executor.userClassPathFirst`
and `spark.driver.userClassPathFirst`. But this can also potentially break
other things (like: dependencies that Spark master required initializing
overridden by Spark app and so on) so, you will need to verify.

[1] https://spark.apache.org/docs/latest/configuration.html

On Thu, Jul 7, 2016 at 4:05 PM, Chen Song  wrote:

> Sorry to spam people who are not interested. Greatly appreciate it if
> anyone who is familiar with this can share some insights.
>
> On Wed, Jul 6, 2016 at 2:28 PM Chen Song  wrote:
>
>> Hi
>>
>> I ran into problems to use class loader in Spark. In my code (run within
>> executor), I explicitly load classes using the ContextClassLoader as below.
>>
>> Thread.currentThread().getContextClassLoader()
>>
>> The jar containing the classes to be loaded is added via the --jars
>> option in spark-shell/spark-submit.
>>
>> I always get the class not found exception. However, it seems to work if
>> I compile these classes in main jar for the job (the jar containing the
>> main job class).
>>
>> I know Spark implements its own class loaders in a particular way. Is
>> there a way to work around this? In other words, what is the proper way to
>> programmatically load classes in other jars added via --jars in Spark?
>>
>>


-- 
--
Cheers,
Praj


Re: Can I use log4j2.xml in my Apache Saprk application

2016-06-22 Thread Prajwal Tuladhar
One way to integrate log4j2 would be to enable flags
`spark.executor.userClassPathFirst` and `spark.driver.userClassPathFirst`
when submitting the application. This would cause application class loader
to load first, initializing log4j2 logging context. But this can also
potentially break other things (like: dependencies that Spark master
required initializing overridden by Spark app and so on) so, you will need
to verify.

More info about those flags @
https://spark.apache.org/docs/latest/configuration.html


On Wed, Jun 22, 2016 at 7:11 AM, Charan Adabala 
wrote:

> Hi,
>
> We are trying to integrate log4j2.xml instead of log4j.properties in Apache
> Spark application, We integrated log4j2.xml but, the problem is unable to
> write the worker log of the application and there is no problem for writing
> driver log. Can any one suggest how to integrate log4j2.xml in Apache Spark
> application with successful writing of both worker and driver log.
>
> Thanks in advance..,
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Can-I-use-log4j2-xml-in-my-Apache-Saprk-application-tp27205.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
--
Cheers,
Praj


Re: Basic question. Access MongoDB data in Spark.

2016-06-13 Thread Prajwal Tuladhar
May be try opening an issue in their GH repo
https://github.com/Stratio/Spark-MongoDB

On Mon, Jun 13, 2016 at 4:10 PM, Umair Janjua 
wrote:

> Anybody knows the stratio's mailing list? I cant seem to find it. Cheers
>
> On Mon, Jun 13, 2016 at 6:02 PM, Ted Yu  wrote:
>
>> Have you considered posting the question on stratio's mailing list ?
>>
>> You may get faster response there.
>>
>>
>> On Mon, Jun 13, 2016 at 8:09 AM, Umair Janjua 
>> wrote:
>>
>>> Hi guys,
>>>
>>> I have this super basic problem which I cannot figure out. Can somebody
>>> give me a hint.
>>>
>>> http://stackoverflow.com/questions/37793214/spark-mongdb-data-using-java
>>>
>>> Cheers
>>>
>>
>>
>


-- 
--
Cheers,
Praj


Re: SqlContext parquet read OutOfMemoryError: Requested array size exceeds VM limit error

2016-05-04 Thread Prajwal Tuladhar
If you are running on 64-bit JVM with less than 32G heap, you might want to
enable -XX:+UseCompressedOops[1]. And if your dataframe is somehow
generating more than 2^31-1 number of arrays, you might have to rethink
your options.

[1] https://spark.apache.org/docs/latest/tuning.html

On Wed, May 4, 2016 at 9:44 PM, Bijay Kumar Pathak  wrote:

> Hi,
>
> I am reading the parquet file around 50+ G which has 4013 partitions with
> 240 columns. Below is my configuration
>
> driver : 20G memory with 4 cores
> executors: 45 executors with 15G memory and 4 cores.
>
> I tried to read the data using both Dataframe read and using hive context
> to read the data using hive SQL but for the both cases, it throws me below
> error with no  further description on error.
>
> hive_context.sql("select * from test.base_table where
> date='{0}'".format(part_dt))
> sqlcontext.read.parquet("/path/to/partion/")
>
> #
> # java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 16953"...
>
>
> What could be wrong over here since I think increasing memory only will
> not help in this case since it reached the array size limit.
>
> Thanks,
> Bijay
>



-- 
--
Cheers,
Praj


spark job stage failures

2016-05-04 Thread Prajwal Tuladhar
Hi,

I was wondering how Spark handle stage / task failures for a job.

We are running a Spark job to batch write to ElasticSearch and we are
seeing one or two stage failures due to ES cluster getting over loaded
(expected as we are testing with single node ES cluster). But I was
assuming that when some of the batch writes to ES fail after certain number
of retries (10), it should have aborted the whole job but we are seeing
that spark job marked as finished even though single job failed.








How does Spark handles failure when a job or stage is marked as failed?

Thanks in advance.


-- 
--
Cheers,
Praj