Re: How to optimize group by query fired using hiveContext.sql?

2015-10-05 Thread Umesh Kacha
Hi thanks I usually get see the following errors in Spark logs and because of that I think executor gets lost all of the following happens because huge data shuffle and I cant avoid that dont know what to do please guide 15/08/16 12:26:46 WARN spark.HeartbeatReceiver: Removing executor 10 with no

Re: How to optimize group by query fired using hiveContext.sql?

2015-10-04 Thread Alex Rovner
Can you at least copy paste the error(s) you are seeing when the job fails? Without the error message(s), it's hard to even suggest anything. *Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * * On Sat, Oct 3, 2015 at 9:50 AM, Umesh Kacha

How to optimize group by query fired using hiveContext.sql?

2015-10-03 Thread unk1102
Hi I have couple of Spark jobs which uses group by query which is getting fired from hiveContext.sql() Now I know group by is evil but my use case I cant avoid group by I have around 7-8 fields on which I need to do group by. Also I am using df1.except(df2) which also seems heavy operation and

Re: How to optimize group by query fired using hiveContext.sql?

2015-10-03 Thread Alex Rovner
This sounds like you need to increase YARN overhead settings with the "spark.yarn.executor.memoryOverhead" parameter. See http://spark.apache.org/docs/latest/running-on-yarn.html for more information on the setting. If that does not work for you, please provide the error messages and the command

Re: How to optimize group by query fired using hiveContext.sql?

2015-10-03 Thread Umesh Kacha
Hi Alex thanks much for the reply. Please read the following for more details about my problem. http://stackoverflow.com/questions/32317285/spark-executor-oom-issue-on-yarn My each container has 8 core and 30 GB max memory. So I am using yarn-client mode using 40 executors with 27GB/2 cores. If

Re: How to optimize group by query fired using hiveContext.sql?

2015-10-03 Thread Alex Rovner
Can you send over your yarn logs along with the command you are using to submit your job? *Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * * On Sat, Oct 3, 2015 at 9:07 AM, Umesh Kacha wrote: > Hi Alex thanks much for the reply.

Re: How to optimize group by query fired using hiveContext.sql?

2015-10-03 Thread Umesh Kacha
Hi thanks I cant share yarn logs because of privacy in my company but I can tell you I have seen yarn logs there I have not found anything except YARN killing container because it is exceeds physical memory capacity. I am using the following command line script Above job launches around 1500