Re: rdd.cache() is not faster?

2014-06-18 Thread Gaurav Jain
KTH talks about this: http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf Best - Gaurav Jain Master's Student, D-INFK ETH Zurich -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/rdd-cache-is-not-faster-tp7804p7835.html Sent from the Apache

Re: rdd.cache() is not faster?

2014-06-18 Thread Gaurav Jain
memory usage. Playing around with different Storage levels (MEMORY_ONLY_SER, for example) might also help. Best Gaurav Jain Master's Student, D-INFK ETH Zurich Email: jaing at student dot ethz dot ch - Gaurav Jain Master's Student, D-INFK ETH Zurich -- View this message in context: http

Accessing the per-key state maintained by updateStateByKey for transformation of JavaPairDStream

2014-06-16 Thread Gaurav Jain
Hello Spark Streaming Experts I have a use-case, where I have a bunch of log-entries coming in, say every 10 seconds (Batch-interval). I create a JavaPairDStream[K,V] from these log-entries. Now, there are two things I want to do with this JavaPairDStream: 1. Use key-dependent state (updated by

Using custom class as a key for groupByKey() or reduceByKey()

2014-06-15 Thread Gaurav Jain
I have a simple Java class as follows, that I want to use as a key while applying groupByKey or reduceByKey functions: private static class FlowId { public String dcxId; public String trxId; public String msgType;

Calling JavaPairRDD.first after calling JavaPairRDD.groupByKey results in NullPointerException

2014-06-10 Thread Gaurav Jain
I am getting a strange null pointer exception when trying to list the first entry of a JavaPairRDD after calling groupByKey on it. Following is my code: JavaPairRDDTuple3lt;String, String, String, ListString KeyToAppList = KeyToApp.distinct().groupByKey();