Thanks for clarifying Cody. I will extend the current behaviour for my use case. If there is anything worth sharing I'll run it through the list
Cheers On 11 Feb 2016 9:47 p.m., "Cody Koeninger" <c...@koeninger.org> wrote: > Please don't change the behavior of DirectKafkaInputDStream. > Returning an empty rdd is (imho) the semantically correct thing to do, and > some existing jobs depend on that behavior. > > If it's really an issue for you, you can either override > directkafkainputdstream, or just check isEmpty as the first thing you do > with the rdd (before any transformations) > > In any recent version of spark, isEmpty on a KafkaRDD is a driver-side > only operation that is basically free. > > > On Thu, Feb 11, 2016 at 3:19 PM, Sebastian Piu <sebastian....@gmail.com> > wrote: > >> Yes, and as far as I recall it also has partitions (empty) which screws >> up the isEmpty call if the rdd has been transformed down the line. I will >> have a look tomorrow at the office and see if I can collaborate >> On 11 Feb 2016 9:14 p.m., "Shixiong(Ryan) Zhu" <shixi...@databricks.com> >> wrote: >> >>> Yeah, DirectKafkaInputDStream always returns a RDD even if it's empty. >>> Feel free to send a PR to improve it. >>> >>> On Thu, Feb 11, 2016 at 1:09 PM, Sebastian Piu <sebastian....@gmail.com> >>> wrote: >>> >>>> I'm using the Kafka direct stream api but I can have a look on >>>> extending it to have this behaviour >>>> >>>> Thanks! >>>> On 11 Feb 2016 9:07 p.m., "Shixiong(Ryan) Zhu" <shixi...@databricks.com> >>>> wrote: >>>> >>>>> Are you using a custom input dstream? If so, you can make the >>>>> `compute` method return None to skip a batch. >>>>> >>>>> On Thu, Feb 11, 2016 at 1:03 PM, Sebastian Piu < >>>>> sebastian....@gmail.com> wrote: >>>>> >>>>>> I was wondering if there is there any way to skip batches with zero >>>>>> events when streaming? >>>>>> By skip I mean avoid the empty rdd from being created at all? >>>>>> >>>>> >>>>> >>> >