control the number of reducers for groupby in data frame

2015-08-04 Thread Fang, Mike
Hi, Does anyone know how I could control the number of reducer when we do operation such as groupie For data frame? I could set spark.sql.shuffle.partitions in sql but not sure how to do in df.groupBy(XX) api. Thanks, Mike

Re: understanding on the waiting batches and scheduling delay in Streaming UI

2015-06-22 Thread Fang, Mike
Hi Das, Thanks for your reply. Somehow I missed it.. I am using Spark 1.3. The data source is from kafka. Yeah, not sure why the delay is 0. I'll run against 1.4 and give a screenshot. Thanks, Mike From: Akhil Das ak...@sigmoidanalytics.commailto:ak...@sigmoidanalytics.com Date: Thursday, June

questions on the waiting batches and scheduling delay in Streaming UI

2015-06-16 Thread Fang, Mike
Hi, I have a spark streaming program running for ~ 25hrs. When I check the Streaming UI tab. I found the Waiting batches is 144. But the scheduling delay is 0. I am a bit confused. If the waiting batches is 144, that means many batches are waiting in the queue to be processed? If this is the