Looks like the source of the problem is in SqlNewHad\oopRDD.compute method:



Created Parquet file reader is intended to be closed at task completion
time.
This reader contains a lot of references to  parquet.bytes.BytesInput object
which in turn contains reference sot large byte arrays (some of them are
several megabytes).
As far as in case of CoalescedRDD task is completed only after processing
larger number of parquet files, it cause file handles exhaustion and memory
overflow.

Unfortunately I didn't find any way to force cleanup of file readers.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behavior-of-CoalescedRDD-tp23819p23853.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to