My mapPartition code as given below outputs one record for each input record.
So, the output object has equal number of records as input. I am loading the
output data into a listbuffer object. This object is turning out to be too
huge for memory leading to Out Of Memory exception. 

To be more clear my logic of partition is as below:

*Iterator(Iter1) -> Processing  -> ListBuffer(list1)

iter1.size() = list1.size()
list1 goes out of memory*

*I cannot change the partition size.* My parition is based on input key and
all the records corresponding to a key need to go into same partition. Is
there a workaround to this?

/       tempRDD = iterateRDD.mapPartitions(p => {
        var outputList = ListBuffer[String]()
        var minVal = 0L
        while (p.hasNext) {
            val tpl = p.next()
            val key = tpl._1
            val value = tpl._2
            if(key != prevKey){
              if(value < key){
                minVal = value;
                outputList.add(minVal.toString() + "\t" +key.toString())
              }else{
                minVal = key;
                outputList.add(minVal.toString() + "\t" +value.toString())
              }
            }else{
              outputList.add(minVal.toString() + "\t" +value.toString())
            }
            prevKey = key;
        }
        outputList.iterator
      })/






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Out-of-Memory-error-caused-by-output-object-in-mapPartitions-tp26229.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to