Re: Spark Streaming for Each RDD - Exception on Empty

2015-06-05 Thread John Omernik
I am using Spark 1.3.1.   So I don't have the 1.4.0 isEmpty.  I guess I am
curious on the right approach here, like I said in my original post,
perhaps this isn't "bad" but I the "exceptions" I guess bother me from a
programmer level... is that wrong? :)



On Fri, Jun 5, 2015 at 11:07 AM, Ted Yu  wrote:

> John:
> Which Spark release are you using ?
> As of 1.4.0, RDD has this method:
>
>   def isEmpty(): Boolean = withScope {
>
> FYI
>
> On Fri, Jun 5, 2015 at 9:01 AM, Evo Eftimov  wrote:
>
>> Foreachpartition callback is provided with Iterator by the Spark
>> Frameowrk – while iterator.hasNext() ……
>>
>>
>>
>> Also check whether this is not some sort of Python Spark API bug – Python
>> seems to be the foster child here – Scala and Java are the darlings
>>
>>
>>
>> *From:* John Omernik [mailto:j...@omernik.com]
>> *Sent:* Friday, June 5, 2015 4:08 PM
>> *To:* user
>> *Subject:* Spark Streaming for Each RDD - Exception on Empty
>>
>>
>>
>> Is there pythonic/sparkonic way to test for an empty RDD before using the
>> foreachRDD?  Basically I am using the Python example
>> https://spark.apache.org/docs/latest/streaming-programming-guide.html to
>> "put records somewhere"  When I have data, it works fine, when I don't I
>> get an exception. I am not sure about the performance implications of just
>> throwing an exception every time there is no data, but can I just test
>> before sending it?
>>
>>
>>
>> I did see one post mentioning look for take(1) from the stream to test
>> for data, but I am not sure where I put that in this example... Is that in
>> the lambda function? or somewhere else? Looking for pointers!
>>
>> Thanks!
>>
>>
>>
>>
>>
>>
>>
>> mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD))
>>
>>
>>
>>
>>
>> Using this example code from the link above:
>>
>>
>>
>> *def* sendPartition(iter):
>>
>> connection = createNewConnection()
>>
>> *for* record *in* iter:
>>
>> connection.send(record)
>>
>> connection.close()
>>
>>
>>
>> dstream.foreachRDD(*lambda* rdd: rdd.foreachPartition(sendPartition))
>>
>>
>


Re: Spark Streaming for Each RDD - Exception on Empty

2015-06-05 Thread Ted Yu
John:
Which Spark release are you using ?
As of 1.4.0, RDD has this method:

  def isEmpty(): Boolean = withScope {

FYI

On Fri, Jun 5, 2015 at 9:01 AM, Evo Eftimov  wrote:

> Foreachpartition callback is provided with Iterator by the Spark Frameowrk
> – while iterator.hasNext() ……
>
>
>
> Also check whether this is not some sort of Python Spark API bug – Python
> seems to be the foster child here – Scala and Java are the darlings
>
>
>
> *From:* John Omernik [mailto:j...@omernik.com]
> *Sent:* Friday, June 5, 2015 4:08 PM
> *To:* user
> *Subject:* Spark Streaming for Each RDD - Exception on Empty
>
>
>
> Is there pythonic/sparkonic way to test for an empty RDD before using the
> foreachRDD?  Basically I am using the Python example
> https://spark.apache.org/docs/latest/streaming-programming-guide.html to
> "put records somewhere"  When I have data, it works fine, when I don't I
> get an exception. I am not sure about the performance implications of just
> throwing an exception every time there is no data, but can I just test
> before sending it?
>
>
>
> I did see one post mentioning look for take(1) from the stream to test for
> data, but I am not sure where I put that in this example... Is that in the
> lambda function? or somewhere else? Looking for pointers!
>
> Thanks!
>
>
>
>
>
>
>
> mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD))
>
>
>
>
>
> Using this example code from the link above:
>
>
>
> *def* sendPartition(iter):
>
> connection = createNewConnection()
>
> *for* record *in* iter:
>
> connection.send(record)
>
> connection.close()
>
>
>
> dstream.foreachRDD(*lambda* rdd: rdd.foreachPartition(sendPartition))
>
>


RE: Spark Streaming for Each RDD - Exception on Empty

2015-06-05 Thread Evo Eftimov
Foreachpartition callback is provided with Iterator by the Spark Frameowrk – 
while iterator.hasNext() ……

 

Also check whether this is not some sort of Python Spark API bug – Python seems 
to be the foster child here – Scala and Java are the darlings

 

From: John Omernik [mailto:j...@omernik.com] 
Sent: Friday, June 5, 2015 4:08 PM
To: user
Subject: Spark Streaming for Each RDD - Exception on Empty

 

Is there pythonic/sparkonic way to test for an empty RDD before using the 
foreachRDD?  Basically I am using the Python example 
https://spark.apache.org/docs/latest/streaming-programming-guide.html to "put 
records somewhere"  When I have data, it works fine, when I don't I get an 
exception. I am not sure about the performance implications of just throwing an 
exception every time there is no data, but can I just test before sending it?

 

I did see one post mentioning look for take(1) from the stream to test for 
data, but I am not sure where I put that in this example... Is that in the 
lambda function? or somewhere else? Looking for pointers!

Thanks!

 

 

 

mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD))

 

 

Using this example code from the link above:

 

def sendPartition(iter):
connection = createNewConnection()
for record in iter:
connection.send(record)
connection.close()
 
dstream.foreachRDD(lambda rdd: rdd.foreachPartition(sendPartition))