Re: Spark Streaming for Each RDD - Exception on Empty
I am using Spark 1.3.1. So I don't have the 1.4.0 isEmpty. I guess I am curious on the right approach here, like I said in my original post, perhaps this isn't "bad" but I the "exceptions" I guess bother me from a programmer level... is that wrong? :) On Fri, Jun 5, 2015 at 11:07 AM, Ted Yu wrote: > John: > Which Spark release are you using ? > As of 1.4.0, RDD has this method: > > def isEmpty(): Boolean = withScope { > > FYI > > On Fri, Jun 5, 2015 at 9:01 AM, Evo Eftimov wrote: > >> Foreachpartition callback is provided with Iterator by the Spark >> Frameowrk – while iterator.hasNext() …… >> >> >> >> Also check whether this is not some sort of Python Spark API bug – Python >> seems to be the foster child here – Scala and Java are the darlings >> >> >> >> *From:* John Omernik [mailto:j...@omernik.com] >> *Sent:* Friday, June 5, 2015 4:08 PM >> *To:* user >> *Subject:* Spark Streaming for Each RDD - Exception on Empty >> >> >> >> Is there pythonic/sparkonic way to test for an empty RDD before using the >> foreachRDD? Basically I am using the Python example >> https://spark.apache.org/docs/latest/streaming-programming-guide.html to >> "put records somewhere" When I have data, it works fine, when I don't I >> get an exception. I am not sure about the performance implications of just >> throwing an exception every time there is no data, but can I just test >> before sending it? >> >> >> >> I did see one post mentioning look for take(1) from the stream to test >> for data, but I am not sure where I put that in this example... Is that in >> the lambda function? or somewhere else? Looking for pointers! >> >> Thanks! >> >> >> >> >> >> >> >> mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD)) >> >> >> >> >> >> Using this example code from the link above: >> >> >> >> *def* sendPartition(iter): >> >> connection = createNewConnection() >> >> *for* record *in* iter: >> >> connection.send(record) >> >> connection.close() >> >> >> >> dstream.foreachRDD(*lambda* rdd: rdd.foreachPartition(sendPartition)) >> >> >
Re: Spark Streaming for Each RDD - Exception on Empty
John: Which Spark release are you using ? As of 1.4.0, RDD has this method: def isEmpty(): Boolean = withScope { FYI On Fri, Jun 5, 2015 at 9:01 AM, Evo Eftimov wrote: > Foreachpartition callback is provided with Iterator by the Spark Frameowrk > – while iterator.hasNext() …… > > > > Also check whether this is not some sort of Python Spark API bug – Python > seems to be the foster child here – Scala and Java are the darlings > > > > *From:* John Omernik [mailto:j...@omernik.com] > *Sent:* Friday, June 5, 2015 4:08 PM > *To:* user > *Subject:* Spark Streaming for Each RDD - Exception on Empty > > > > Is there pythonic/sparkonic way to test for an empty RDD before using the > foreachRDD? Basically I am using the Python example > https://spark.apache.org/docs/latest/streaming-programming-guide.html to > "put records somewhere" When I have data, it works fine, when I don't I > get an exception. I am not sure about the performance implications of just > throwing an exception every time there is no data, but can I just test > before sending it? > > > > I did see one post mentioning look for take(1) from the stream to test for > data, but I am not sure where I put that in this example... Is that in the > lambda function? or somewhere else? Looking for pointers! > > Thanks! > > > > > > > > mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD)) > > > > > > Using this example code from the link above: > > > > *def* sendPartition(iter): > > connection = createNewConnection() > > *for* record *in* iter: > > connection.send(record) > > connection.close() > > > > dstream.foreachRDD(*lambda* rdd: rdd.foreachPartition(sendPartition)) > >
RE: Spark Streaming for Each RDD - Exception on Empty
Foreachpartition callback is provided with Iterator by the Spark Frameowrk – while iterator.hasNext() …… Also check whether this is not some sort of Python Spark API bug – Python seems to be the foster child here – Scala and Java are the darlings From: John Omernik [mailto:j...@omernik.com] Sent: Friday, June 5, 2015 4:08 PM To: user Subject: Spark Streaming for Each RDD - Exception on Empty Is there pythonic/sparkonic way to test for an empty RDD before using the foreachRDD? Basically I am using the Python example https://spark.apache.org/docs/latest/streaming-programming-guide.html to "put records somewhere" When I have data, it works fine, when I don't I get an exception. I am not sure about the performance implications of just throwing an exception every time there is no data, but can I just test before sending it? I did see one post mentioning look for take(1) from the stream to test for data, but I am not sure where I put that in this example... Is that in the lambda function? or somewhere else? Looking for pointers! Thanks! mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD)) Using this example code from the link above: def sendPartition(iter): connection = createNewConnection() for record in iter: connection.send(record) connection.close() dstream.foreachRDD(lambda rdd: rdd.foreachPartition(sendPartition))