Re: overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2

2014-11-12 Thread Steve Reinhardt
I'm missing something simpler (I think).  That is, why do I need a Some instead 
of Tuple2?  Because a Some might or might not be there, but a Tuple2 must be 
there?  Or something like that?

From: Adrian Mocanu 
amoc...@verticalscope.commailto:amoc...@verticalscope.com

You are correct; the filtering I’m talking about is done implicitly. You don’t 
have to do it yourself. Spark will do it for you and remove those entries from 
the state collection.

From: Yana Kadiyska [mailto:yana.kadiy...@gmail.com]

Adrian, do you know if this is documented somewhere? I was also under the 
impression that setting a key's value to None would cause the key to be 
discarded (without any explicit filtering on the user's part) but can not find 
any official documentation to that effect

On Wed, Nov 12, 2014 at 2:43 PM, Adrian Mocanu 
amoc...@verticalscope.commailto:amoc...@verticalscope.com wrote:
My understanding is that the reason you have an Option is so you could filter 
out tuples when None is returned. This way your state data won't grow forever.

-Original Message-
From: spr

After comparing with previous code, I got it work by making the return a Some 
instead of Tuple2.  Perhaps some day I will understand this.



Re: how to blend a DStream and a broadcast variable?

2014-11-06 Thread Steve Reinhardt
Excellent. Is there an example of this somewhere?

Sent from my iPhone

 On Nov 6, 2014, at 1:43 AM, Sean Owen so...@cloudera.com wrote:
 
 Broadcast vars should work fine in Spark streaming. Broadcast vars are
 immutable however. If you have some info to cache which might change
 from batch to batch, you should be able to load it at the start of
 your 'foreachRDD' method or equivalent. That's simple and works
 assuming your batch interval isn't so short and data so big that
 loading it every time is a burden.
 
 On Wed, Nov 5, 2014 at 11:30 PM, spr s...@yarcdata.com wrote:
 My use case has one large data stream (DS1) that obviously maps to a DStream.
 The processing of DS1 involves filtering it for any of a set of known
 values, which will change over time, though slowly by streaming standards.
 If the filter data were static, it seems to obviously map to a broadcast
 variable, but it's dynamic.  (And I don't think it works to implement it as
 a DStream, because the new values need to be copied redundantly to all
 executors, not partitioned among the executors.)
 
 Looking at the Spark and Spark Streaming documentation, I have two
 questions:
 
 1) There's no mention in the Spark Streaming Programming Guide of broadcast
 variables.  Do they coexist properly?
 
 2) Once I have a broadcast variable in place in the periodic function that
 Spark Streaming executes, how can I update its value?  Obviously I can't
 literally update the value of that broadcast variable, which is immutable,
 but how can I get a new version of the variable established in all the
 executors?
 
 (And the other ever-present implicit question...)
 
 3) Is there a better way to implement this?
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/how-to-blend-a-DStream-and-a-broadcast-variable-tp18227.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: with SparkStreeaming spark-submit, don't see output after ssc.start()

2014-11-03 Thread Steve Reinhardt

From: Tobias Pfeiffer t...@preferred.jpmailto:t...@preferred.jp

Am I right that you are actually executing two different classes here?

Yes, I realized after I posted that I was calling 2 different classes, though 
they are in the same JAR.   I went back and tried it again with the same class 
in both cases, and it failed the same way.  I thought perhaps having 2 classes 
in a JAR was an issue, but commenting out one of the classes did not seem to 
make a difference.