Can I invoke UpdateStateByKey twice on the same RDD. My requirement is as
follows.
1. Get the event stream from Kafka
2. UpdateStateByKey to aggregate and filter events based on timestamp
3. Do some processing and save results to Cassandra DB
4. UpdateStateByKey to remove keys based on logout
in nature?
Basically we are trying to fit HashMap(Adjacency List) into Spark RDD. Is
there any other way other than GraphX?
Thanks and Regards,
Harsha
in nature?
Basically we are trying to fit HashMap(Adjacency List) into Spark RDD. Is
there any other way other than GraphX?
Thanks and Regards,
Harsha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Adjacency-List-representation-in-Spark-tp1.html
Sent
method to get search
performance of O(1).
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-RDD-over-hashmap-td893.html
How can this be done in Java? HashMap is not a supported return type for
any overloaded version of mappartition methods.
Thanks and Regards,
Harsha
, decompressing it before processing.
Thanks,
Harsha
Hi,
Details laid out in Spark UI for the job in progress is really interesting
and very useful.
But this gets vanished once the job is done.
Is there a way to get job details post processing?
Looking for Spark UI data, not standard input,output and error info.
Thanks,
Harsha
@rogthefrog
Were you able to figure out how to fix this issue?
Even I tried all combinations that possible but no luck yet.
Thanks,
Harsha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494p18349.html
Hello Sparkers,
I would like to understand difference btw these Storage levels for a RDD
portion that doesn't fit in memory.
As it seems like in both storage levels, whatever portion doesnt fit in
memory will be spilled to disk. Any difference as such?
Thanks,
Harsha