Invoking updateStateByKey twice on the same RDD

2015-02-12 Thread harsha
Can I invoke UpdateStateByKey twice on the same RDD. My requirement is as follows. 1. Get the event stream from Kafka 2. UpdateStateByKey to aggregate and filter events based on timestamp 3. Do some processing and save results to Cassandra DB 4. UpdateStateByKey to remove keys based on logout

Adjacency List representation in Spark

2014-09-17 Thread Harsha HN
in nature? Basically we are trying to fit HashMap(Adjacency List) into Spark RDD. Is there any other way other than GraphX? Thanks and Regards, Harsha

Adjacency List representation in Spark

2014-09-17 Thread Sree Harsha
in nature? Basically we are trying to fit HashMap(Adjacency List) into Spark RDD. Is there any other way other than GraphX? Thanks and Regards, Harsha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Adjacency-List-representation-in-Spark-tp1.html Sent

PairRDD's lookup method Performance

2014-09-18 Thread Harsha HN
method to get search performance of O(1). http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-RDD-over-hashmap-td893.html How can this be done in Java? HashMap is not a supported return type for any overloaded version of mappartition methods. Thanks and Regards, Harsha

Working on LZOP Files

2014-09-25 Thread Harsha HN
, decompressing it before processing. Thanks, Harsha

SPARK UI - Details post job processiong

2014-09-25 Thread Harsha HN
Hi, Details laid out in Spark UI for the job in progress is really interesting and very useful. But this gets vanished once the job is done. Is there a way to get job details post processing? Looking for Spark UI data, not standard input,output and error info. Thanks, Harsha

Re: LZO support in Spark 1.0.0 - nothing seems to work

2014-11-07 Thread Sree Harsha
@rogthefrog Were you able to figure out how to fix this issue? Even I tried all combinations that possible but no luck yet. Thanks, Harsha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494p18349.html

Difference btw MEMORY_ONLY and MEMORY_AND_DISK

2015-08-18 Thread Harsha HN
Hello Sparkers, I would like to understand difference btw these Storage levels for a RDD portion that doesn't fit in memory. As it seems like in both storage levels, whatever portion doesnt fit in memory will be spilled to disk. Any difference as such? Thanks, Harsha