I'm using the Kafka direct stream api but I can have a look on extending it
to have this behaviour
Thanks!
On 11 Feb 2016 9:07 p.m., "Shixiong(Ryan) Zhu"
wrote:
> Are you using a custom input dstream? If so, you can make the `compute`
> method return None to skip a
Are you using a custom input dstream? If so, you can make the `compute`
method return None to skip a batch.
On Thu, Feb 11, 2016 at 1:03 PM, Sebastian Piu
wrote:
> I was wondering if there is there any way to skip batches with zero events
> when streaming?
> By skip I
Yeah, DirectKafkaInputDStream always returns a RDD even if it's empty. Feel
free to send a PR to improve it.
On Thu, Feb 11, 2016 at 1:09 PM, Sebastian Piu
wrote:
> I'm using the Kafka direct stream api but I can have a look on extending
> it to have this behaviour
>
>
Yes, and as far as I recall it also has partitions (empty) which screws up
the isEmpty call if the rdd has been transformed down the line. I will have
a look tomorrow at the office and see if I can collaborate
On 11 Feb 2016 9:14 p.m., "Shixiong(Ryan) Zhu"
wrote:
> Yeah,
I was wondering if there is there any way to skip batches with zero events
when streaming?
By skip I mean avoid the empty rdd from being created at all?
Please don't change the behavior of DirectKafkaInputDStream.
Returning an empty rdd is (imho) the semantically correct thing to do, and
some existing jobs depend on that behavior.
If it's really an issue for you, you can either override
directkafkainputdstream, or just check isEmpty as the first
n Piu <sebastian@gmail.com>
Reply-To: <sebastian@hotmail.com>
Date: Thursday, February 11, 2016 at 1:19 PM
To: "Shixiong (Ryan) Zhu" <shixi...@databricks.com>
Cc: Sebastian Piu <sebastian@hotmail.com>, "user @spark"
<user@spark.apache.or
Thanks for clarifying Cody. I will extend the current behaviour for my use
case. If there is anything worth sharing I'll run it through the list
Cheers
On 11 Feb 2016 9:47 p.m., "Cody Koeninger" wrote:
> Please don't change the behavior of DirectKafkaInputDStream.
>