You can always call rdd.isEmpty() Andy
private static void save(JavaDStream<String> jsonRdd, String outputURI) { jsonTweets.foreachRDD(new VoidFunction2<JavaRDD<String>, Time>() { private static final long serialVersionUID = 1L; @Override public void call(JavaRDD<String> rdd, Time time) throws Exception { if(!rdd.isEmpty()) { String dirPath = outputURI + "-" + time.milliseconds(); rdd.saveAsTextFile(dirPath); } } }); From: Sebastian Piu <sebastian....@gmail.com> Reply-To: <sebastian....@hotmail.com> Date: Thursday, February 11, 2016 at 1:19 PM To: "Shixiong (Ryan) Zhu" <shixi...@databricks.com> Cc: Sebastian Piu <sebastian....@hotmail.com>, "user @spark" <user@spark.apache.org> Subject: Re: Skip empty batches - spark streaming > > Yes, and as far as I recall it also has partitions (empty) which screws up the > isEmpty call if the rdd has been transformed down the line. I will have a look > tomorrow at the office and see if I can collaborate > > On 11 Feb 2016 9:14 p.m., "Shixiong(Ryan) Zhu" <shixi...@databricks.com> > wrote: >> Yeah, DirectKafkaInputDStream always returns a RDD even if it's empty. Feel >> free to send a PR to improve it. >> >> On Thu, Feb 11, 2016 at 1:09 PM, Sebastian Piu <sebastian....@gmail.com> >> wrote: >>> >>> I'm using the Kafka direct stream api but I can have a look on extending it >>> to have this behaviour >>> >>> Thanks! >>> >>> On 11 Feb 2016 9:07 p.m., "Shixiong(Ryan) Zhu" <shixi...@databricks.com> >>> wrote: >>>> Are you using a custom input dstream? If so, you can make the `compute` >>>> method return None to skip a batch. >>>> >>>> On Thu, Feb 11, 2016 at 1:03 PM, Sebastian Piu <sebastian....@gmail.com> >>>> wrote: >>>>> >>>>> I was wondering if there is there any way to skip batches with zero events >>>>> when streaming? >>>>> By skip I mean avoid the empty rdd from being created at all? >>>> >>