Re: the spark job is so slow - almost frozen

2016-07-21 Thread Gourav Sengupta
Andrew, you have pretty much consolidated my entire experience, please give a presentation in a meetup on this, and send across the links :) Regards, Gourav On Wed, Jul 20, 2016 at 4:35 AM, Andrew Ehrlich wrote: > Try: > > - filtering down the data as soon as possible in

Re: the spark job is so slow - almost frozen

2016-07-20 Thread Zhiliang Zhu
Thanks a lot for your kind help.  On Wednesday, July 20, 2016 11:35 AM, Andrew Ehrlich wrote: Try: - filtering down the data as soon as possible in the job, dropping columns you don’t need.- processing fewer partitions of the hive tables at a time- caching

Re: the spark job is so slow - almost frozen

2016-07-19 Thread Andrew Ehrlich
Try: - filtering down the data as soon as possible in the job, dropping columns you don’t need. - processing fewer partitions of the hive tables at a time - caching frequently accessed data, for example dimension tables, lookup tables, or other datasets that are repeatedly accessed - using the

Re: the spark job is so slow - almost frozen

2016-07-18 Thread Zhiliang Zhu
Thanks a lot for your reply . In effect , here we tried to run the sql on kettle, hive and spark hive (by HiveContext) respectively, the job seems frozen  to finish to run . In the 6 tables , need to respectively read the different columns in different tables for specific information , then do

Re: the spark job is so slow - almost frozen

2016-07-18 Thread Chanh Le
Hi, What about the network (bandwidth) between hive and spark? Does it run in Hive before then you move to Spark? Because It's complex you can use something like EXPLAIN command to show what going on. > On Jul 18, 2016, at 5:20 PM, Zhiliang Zhu wrote: > > the

Re: the spark job is so slow - almost frozen

2016-07-18 Thread Zhiliang Zhu
the sql logic in the program is very much complex , so do not describe the detailed codes   here .  On Monday, July 18, 2016 6:04 PM, Zhiliang Zhu wrote: Hi All,   Here we have one application, it needs to extract different columns from 6 hive tables, and

the spark job is so slow - almost frozen

2016-07-18 Thread Zhiliang Zhu
Hi All,   Here we have one application, it needs to extract different columns from 6 hive tables, and then does some easy calculation, there is around 100,000 number of rows in each table,finally need to output another table or file (with format of consistent columns) .  However, after lots of