Does reduceByKey only work properly for numeric keys?

2015-04-18 Thread SecondDatke
I'm trying to solve a Word-Count like problem, the difference lies in that, I need the count of a specific word among a specific timespan in a social message stream. My data is in the format of (time, message), and I transformed (flatMap etc.) it into a series of (time, word_id), the time is

RE: Does reduceByKey only work properly for numeric keys?

2015-04-18 Thread SecondDatke
these datetime objects implement a the notion of equality you'd expect? (This may be a dumb question; I'm thinking of the equivalent of equals() / hashCode() from the Java world.) On Sat, Apr 18, 2015 at 4:17 PM, SecondDatke lovejay-lovemu...@outlook.com wrote: I'm trying to solve a Word

RE: Does reduceByKey only work properly for numeric keys?

2015-04-18 Thread SecondDatke
() ? What release of Spark are you using ? Cheers On Sat, Apr 18, 2015 at 8:17 AM, SecondDatke lovejay-lovemu...@outlook.com wrote: I'm trying to solve a Word-Count like problem, the difference lies in that, I need the count of a specific word among a specific timespan in a social message stream

RE: Does reduceByKey only work properly for numeric keys?

2015-04-18 Thread SecondDatke
? To: lovejay-lovemu...@outlook.com CC: user@spark.apache.org Do these datetime objects implement a the notion of equality you'd expect? (This may be a dumb question; I'm thinking of the equivalent of equals() / hashCode() from the Java world.) On Sat, Apr 18, 2015 at 4:17 PM, SecondDatke

How to submit job in a different user?

2015-04-09 Thread SecondDatke
Well, maybe a Linux configure problem... I have a cluster that is about to expose to the public, and I want everyone that uses my cluster owns a user (without permissions of sudo, etc.)(e.g. 'guest'), and is able to submit tasks to Spark, which working on Mesos that running with a different,

How to work with sparse data in Python?

2015-04-06 Thread SecondDatke
I'm trying to apply Spark to a NLP problem that I'm working around. I have near 4 million tweets text and I have converted them into word vectors. It's pretty sparse because each message just has dozens of words but the vocabulary has tens of thousand words. These vectors should be loaded each