Hi, I don't think there's a NPE issue when using DStream/count() even there is no data feed into Spark Streaming. I tested using Kafka in my local settings, both are OK with and without data consumed.
Actually you can see the details in ReceiverInputDStream, even there is no data in this batch duration, it will generate an empty BlockRDD, so map() and transformation() in count() operator will never meet NPE. I think the problem may lies on your customized InputDStream, you should make sure to generate an empty RDD even when there is no data feed in. Thanks Jerry -----Original Message----- From: anoldbrain [mailto:anoldbr...@gmail.com] Sent: Wednesday, August 20, 2014 4:13 PM To: u...@spark.incubator.apache.org Subject: Re: NullPointerException from '.count.foreachRDD' Looking at the source codes of DStream.scala > /** > * Return a new DStream in which each RDD has a single element > generated by counting each RDD > * of this DStream. > */ > def count(): DStream[Long] = { > this.map(_ => (null, 1L)) > .transform(_.union(context.sparkContext.makeRDD(Seq((null, > 0L)), > 1))) > .reduceByKey(_ + _) > .map(_._2) > } transform is the line throwing the NullPointerException. Can anyone give some hints as what would cause "_" to be null (it is indeed null)? This only happens when there is no data to process. When there's data, no NullPointerException is thrown, and all the processing/counting proceeds correctly. I am using my custom InputDStream. Could it be that this is the source of the problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-from-count-foreachRDD-tp2066p12461.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org