Re: Skip empty batches - spark streaming

Andy Davidson Thu, 11 Feb 2016 13:47:03 -0800

You can always call rdd.isEmpty()

Andy


    private static void save(JavaDStream<String> jsonRdd, String outputURI)
{

        jsonTweets.foreachRDD(new VoidFunction2<JavaRDD<String>, Time>() {

            private static final long serialVersionUID = 1L;



            @Override

            public void call(JavaRDD<String> rdd, Time time) throws
Exception {

                if(!rdd.isEmpty()) {

                    String dirPath = outputURI + "-" + time.milliseconds();

                    rdd.saveAsTextFile(dirPath);

                }  

            }

        });


From:  Sebastian Piu <sebastian....@gmail.com>
Reply-To:  <sebastian....@hotmail.com>
Date:  Thursday, February 11, 2016 at 1:19 PM
To:  "Shixiong (Ryan) Zhu" <shixi...@databricks.com>
Cc:  Sebastian Piu <sebastian....@hotmail.com>, "user @spark"
<user@spark.apache.org>
Subject:  Re: Skip empty batches - spark streaming

> 
> Yes, and as far as I recall it also has partitions (empty) which screws up the
> isEmpty call if the rdd has been transformed down the line. I will have a look
> tomorrow at the office and see if I can collaborate
> 
> On 11 Feb 2016 9:14 p.m., "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
> wrote:
>> Yeah, DirectKafkaInputDStream always returns a RDD even if it's empty. Feel
>> free to send a PR to improve it.
>> 
>> On Thu, Feb 11, 2016 at 1:09 PM, Sebastian Piu <sebastian....@gmail.com>
>> wrote:
>>> 
>>> I'm using the Kafka direct stream api but I can have a look on extending it
>>> to have this behaviour
>>> 
>>> Thanks!
>>> 
>>> On 11 Feb 2016 9:07 p.m., "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
>>> wrote:
>>>> Are you using a custom input dstream? If so, you can make the `compute`
>>>> method return None to skip a batch.
>>>> 
>>>> On Thu, Feb 11, 2016 at 1:03 PM, Sebastian Piu <sebastian....@gmail.com>
>>>> wrote:
>>>>> 
>>>>> I was wondering if there is there any way to skip batches with zero events
>>>>> when streaming?
>>>>> By skip I mean avoid the empty rdd from being created at all?
>>>> 
>>

Re: Skip empty batches - spark streaming

Reply via email to