Hello,

I’ve just ran into an issue where the job is giving me "Managed memory
leak" with spark version 2.0.2

—————————————————
2016-12-08 16:31:25,231 [Executor task launch worker-0]
(TaskMemoryManager.java:381) WARN leak 46.2 MB memory from
org.apache.spark.util.collection.ExternalAppendOnlyMap@22719fb8
2016-12-08 16:31:25,232 [Executor task launch worker-0] (Logging.scala:66)
WARN Managed memory leak detected; size = 48442112 bytes, TID = 1
—————————————————


The program itself is very basic and looks like take() is causing the issue

Program: https://gist.github.com/kutt4n/87cfcd4e794b1865b6f880412dd80bbf
Debug Log: https://gist.github.com/kutt4n/ba3cf8129999dced34ceadc588856edc


TaskMemoryManager.java:381 says that it's normal to see leaked memory if
one of the tasks failed.  In this case from the debug log - it is not quite
apparent which task failed and the reason for failure.

When the TSV file itself is small the issue doesn’t exist. In this
particular case, the file is a 21MB clickstream data from wikipedia
available at https://ndownloader.figshare.com/files/5036392

Where could i read up more about managed memory leak. Any pointers on what
might be the issue would be highly helpful

thanks
appu

Reply via email to