Thanks for clarifying Cody. I will extend the current behaviour for my use
case. If there is anything worth sharing I'll run it through the list

Cheers
On 11 Feb 2016 9:47 p.m., "Cody Koeninger" <c...@koeninger.org> wrote:

> Please don't change the behavior of DirectKafkaInputDStream.
> Returning an empty rdd is (imho) the semantically correct thing to do, and
> some existing jobs depend on that behavior.
>
> If it's really an issue for you, you can either override
> directkafkainputdstream, or just check isEmpty as the first thing you do
> with the rdd (before any transformations)
>
> In any recent version of spark, isEmpty on a KafkaRDD is a driver-side
> only operation that is basically free.
>
>
> On Thu, Feb 11, 2016 at 3:19 PM, Sebastian Piu <sebastian....@gmail.com>
> wrote:
>
>> Yes, and as far as I recall it also has partitions (empty) which screws
>> up the isEmpty call if the rdd has been transformed down the line. I will
>> have a look tomorrow at the office and see if I can collaborate
>> On 11 Feb 2016 9:14 p.m., "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
>> wrote:
>>
>>> Yeah, DirectKafkaInputDStream always returns a RDD even if it's empty.
>>> Feel free to send a PR to improve it.
>>>
>>> On Thu, Feb 11, 2016 at 1:09 PM, Sebastian Piu <sebastian....@gmail.com>
>>> wrote:
>>>
>>>> I'm using the Kafka direct stream api but I can have a look on
>>>> extending it to have this behaviour
>>>>
>>>> Thanks!
>>>> On 11 Feb 2016 9:07 p.m., "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
>>>> wrote:
>>>>
>>>>> Are you using a custom input dstream? If so, you can make the
>>>>> `compute` method return None to skip a batch.
>>>>>
>>>>> On Thu, Feb 11, 2016 at 1:03 PM, Sebastian Piu <
>>>>> sebastian....@gmail.com> wrote:
>>>>>
>>>>>> I was wondering if there is there any way to skip batches with zero
>>>>>> events when streaming?
>>>>>> By skip I mean avoid the empty rdd from being created at all?
>>>>>>
>>>>>
>>>>>
>>>
>

Reply via email to