Re: Skip empty batches - spark streaming

2016-02-11 Thread Sebastian Piu
I'm using the Kafka direct stream api but I can have a look on extending it to have this behaviour Thanks! On 11 Feb 2016 9:07 p.m., "Shixiong(Ryan) Zhu" wrote: > Are you using a custom input dstream? If so, you can make the `compute` > method return None to skip a

Re: Skip empty batches - spark streaming

2016-02-11 Thread Shixiong(Ryan) Zhu
Are you using a custom input dstream? If so, you can make the `compute` method return None to skip a batch. On Thu, Feb 11, 2016 at 1:03 PM, Sebastian Piu wrote: > I was wondering if there is there any way to skip batches with zero events > when streaming? > By skip I

Re: Skip empty batches - spark streaming

2016-02-11 Thread Shixiong(Ryan) Zhu
Yeah, DirectKafkaInputDStream always returns a RDD even if it's empty. Feel free to send a PR to improve it. On Thu, Feb 11, 2016 at 1:09 PM, Sebastian Piu wrote: > I'm using the Kafka direct stream api but I can have a look on extending > it to have this behaviour > >

Re: Skip empty batches - spark streaming

2016-02-11 Thread Sebastian Piu
Yes, and as far as I recall it also has partitions (empty) which screws up the isEmpty call if the rdd has been transformed down the line. I will have a look tomorrow at the office and see if I can collaborate On 11 Feb 2016 9:14 p.m., "Shixiong(Ryan) Zhu" wrote: > Yeah,

Skip empty batches - spark streaming

2016-02-11 Thread Sebastian Piu
I was wondering if there is there any way to skip batches with zero events when streaming? By skip I mean avoid the empty rdd from being created at all?

Re: Skip empty batches - spark streaming

2016-02-11 Thread Cody Koeninger
Please don't change the behavior of DirectKafkaInputDStream. Returning an empty rdd is (imho) the semantically correct thing to do, and some existing jobs depend on that behavior. If it's really an issue for you, you can either override directkafkainputdstream, or just check isEmpty as the first

Re: Skip empty batches - spark streaming

2016-02-11 Thread Andy Davidson
n Piu <sebastian@gmail.com> Reply-To: <sebastian@hotmail.com> Date: Thursday, February 11, 2016 at 1:19 PM To: "Shixiong (Ryan) Zhu" <shixi...@databricks.com> Cc: Sebastian Piu <sebastian@hotmail.com>, "user @spark" <user@spark.apache.or

Re: Skip empty batches - spark streaming

2016-02-11 Thread Sebastian Piu
Thanks for clarifying Cody. I will extend the current behaviour for my use case. If there is anything worth sharing I'll run it through the list Cheers On 11 Feb 2016 9:47 p.m., "Cody Koeninger" wrote: > Please don't change the behavior of DirectKafkaInputDStream. >