control the number of reducers for groupby in data frame

2015-08-04 Thread Fang, Mike
Hi,

Does anyone know how I could control the number of reducer when we do operation 
such as groupie For data frame?
I could set spark.sql.shuffle.partitions in sql but not sure how to do in 
df.groupBy("XX") api.

Thanks,
Mike


questions on the "waiting batches" and "scheduling delay" in Streaming UI

2015-06-16 Thread Fang, Mike
Hi,

I have a spark streaming program running for ~ 25hrs. When I check the 
Streaming UI tab. I found the "Waiting batches" is 144. But the "scheduling 
delay" is 0. I am a bit confused.
If the "waiting batches" is 144, that means many batches are waiting in the 
queue to be processed? If this is the case, the scheduling delay should be high 
rather than 0. Am I missing anything?

Thanks,
Mike



Re: understanding on the "waiting batches" and "scheduling delay" in Streaming UI

2015-06-22 Thread Fang, Mike
Hi Das,

Thanks for your reply. Somehow I missed it..
I am using Spark 1.3. The data source is from kafka.
Yeah, not sure why the delay is 0. I'll run against 1.4 and give a screenshot.

Thanks,
Mike

From: Akhil Das mailto:ak...@sigmoidanalytics.com>>
Date: Thursday, June 18, 2015 at 6:05 PM
To: Mike Fang mailto:chyfan...@gmail.com>>
Cc: "user@spark.apache.org" 
mailto:user@spark.apache.org>>
Subject: Re: understanding on the "waiting batches" and "scheduling delay" in 
Streaming UI

Which version of spark? and what is your data source? For some reason, your 
processing delay is exceeding the batch duration. And its strange that you are 
not seeing any scheduling delay.

Thanks
Best Regards

On Thu, Jun 18, 2015 at 7:29 AM, Mike Fang 
mailto:chyfan...@gmail.com>> wrote:
Hi,

I have a spark streaming program running for ~ 25hrs. When I check the 
Streaming UI tab. I found the "Waiting batches" is 144. But the "scheduling 
delay" is 0. I am a bit confused.
If the "waiting batches" is 144, that means many batches are waiting in the 
queue to be processed? If this is the case, the scheduling delay should be high 
rather than 0. Am I missing anything?

Thanks,
Mike