Not all memory can be used for Java heap space, so maybe it does run out.
Could you try repartitioning the data? To my knowledge you shouldn't be
thrown out as long as a single partition fits into memory, even if the whole
dataset does not.

To do that, exchange 

val train = parsedData.cache()
with

val train = parsedData.repartition(20).cache()


Best regards,
Simon



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-Input-Format-tp11654p11719.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to