Difference btw MEMORY_ONLY and MEMORY_AND_DISK
Hello Sparkers, I would like to understand difference btw these Storage levels for a RDD portion that doesn't fit in memory. As it seems like in both storage levels, whatever portion doesnt fit in memory will be spilled to disk. Any difference as such? Thanks, Harsha
Working on LZOP Files
Hi, Anybody using LZOP files to process in Spark? We have a huge volume of LZOP files in HDFS to process through Spark. In MapReduce framework, it automatically detects the file format and sends the decompressed version to Mappers. Any such support in Spark? As of now I am manually downloading, decompressing it before processing. Thanks, Harsha
SPARK UI - Details post job processiong
Hi, Details laid out in Spark UI for the job in progress is really interesting and very useful. But this gets vanished once the job is done. Is there a way to get job details post processing? Looking for Spark UI data, not standard input,output and error info. Thanks, Harsha
PairRDD's lookup method Performance
Hi All, My question is related to improving performance of pairRDD's lookup method. I went through below link where Tathagata Das http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=user_nodesuser=46 explains creating Hash Map over Partitions using mappartition method to get search performance of O(1). http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-RDD-over-hashmap-td893.html How can this be done in Java? HashMap is not a supported return type for any overloaded version of mappartition methods. Thanks and Regards, Harsha
Adjacency List representation in Spark
Hello We are building an adjacency list to represent a graph. Vertexes, Edges and Weights for the same has been extracted from hdfs files by a Spark job. Further we expect size of the adjacency list(Hash Map) could grow over 20Gigs. How can we represent this in RDD, so that it will distributed in nature? Basically we are trying to fit HashMap(Adjacency List) into Spark RDD. Is there any other way other than GraphX? Thanks and Regards, Harsha