Hello, I’ve just ran into an issue where the job is giving me "Managed memory leak" with spark version 2.0.2
————————————————— 2016-12-08 16:31:25,231 [Executor task launch worker-0] (TaskMemoryManager.java:381) WARN leak 46.2 MB memory from org.apache.spark.util.collection.ExternalAppendOnlyMap@22719fb8 2016-12-08 16:31:25,232 [Executor task launch worker-0] (Logging.scala:66) WARN Managed memory leak detected; size = 48442112 bytes, TID = 1 ————————————————— The program itself is very basic and looks like take() is causing the issue Program: https://gist.github.com/kutt4n/87cfcd4e794b1865b6f880412dd80bbf Debug Log: https://gist.github.com/kutt4n/ba3cf8129999dced34ceadc588856edc TaskMemoryManager.java:381 says that it's normal to see leaked memory if one of the tasks failed. In this case from the debug log - it is not quite apparent which task failed and the reason for failure. When the TSV file itself is small the issue doesn’t exist. In this particular case, the file is a 21MB clickstream data from wikipedia available at https://ndownloader.figshare.com/files/5036392 Where could i read up more about managed memory leak. Any pointers on what might be the issue would be highly helpful thanks appu