Quick update: 
It is a filter job that creates the error above, not the reduceByKey

Why would a filter cause an out of memory? 

Here is my code

val inputgsup
="hdfs://"+sparkmasterip+"/user/sense/datasets/gsup/binary/30/2014/11/0[1-9]/part*";
val gsupfile =
sc.newAPIHadoopFile[BytesWritable,BytesWritable,SequenceFileAsBinaryInputFormat](inputgsup)
val gsup = gsupfile.map(x => (GsupHandler.DeserializeKey( x._1.getBytes
),GsupHandler.DeserializeValue( x._2.getBytes ))).map(x =>
(x._1._1,x._1._2,x._2._1, x._2._2))
val gsup_results_geod = gsup.flatMap(x=> doQueryGSUP(has_expo_criteria,
has_fence_criteria, timerange_start_expo, timerange_end_expo,
timerange_start_fence, timerange_end_fence, expo_pois, fence_pois,x))
val gsup_results_reduced =
gsup_results_geod.reduceByKey((a,b)=>((a._1.toShort | b._1.toShort).toByte,
a._2+b._2))

*val gsup_results = gsup_results_reduced.filter(x=>(criteria_filter.value
contains x._2._1.toInt))*



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/JVM-Memory-Woes-tp19496p19510.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to