Re: How to compute RDD[(String, Set[String])] that include large Set

Kevin Jung Mon, 19 Jan 2015 21:00:07 -0800

As far as I know, the tasks before calling saveAsText  are transformations so
that they are lazy computed. Then saveAsText action performs all
transformations and your Set[String] grows up at this time. It creates large
collection if you have few keys and this causes OOM easily when your
executor memory and fraction settings are not suitable for computing this.
If you want only collection counts by keys , you can use countByKey() or
map() RDD[(String, Set[String])] to RDD[(String,Long)] after creating hoge
RDD to make reduceByKey collect only counts of keys.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-compute-RDD-String-Set-String-that-include-large-Set-tp21248p21251.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to compute RDD[(String, Set[String])] that include large Set

Reply via email to