You should have errors in yarn-nodemanager and yarn-resourcemanager logs. Something like below for heathy container
2016-05-29 00:50:50,496 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 29769 for container-id container_1464210869844_0061_01_000001: 372.6 MB of 4 GB physical memory used; 2.7 GB of 8.4 GB virtual memory used It appears that you are running out of memory. Have you also checked with jps and jmonitor for SparkSubmit (the driver process) for the failing job? It will show you the resource usage= like memory/heap/cpu etc HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 29 May 2016 at 00:26, heri wijayanto <heri0...@gmail.com> wrote: > I implement spark with join function for processing in around 250 million > rows of text. > > When I just used several hundred of rows, it could run, but when I use the > large data, it is failed. > > My spark version in 1.6.1, run above yarn-cluster mode, and we have 5 node > computers. > > Thank you very much, Ted Yu > > On Sun, May 29, 2016 at 6:48 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Can you let us know your case ? >> >> When the join failed, what was the error (consider pastebin) ? >> >> Which release of Spark are you using ? >> >> Thanks >> >> > On May 28, 2016, at 3:27 PM, heri wijayanto <heri0...@gmail.com> wrote: >> > >> > Hi everyone, >> > I perform join function in a loop, and it is failed. I found a tutorial >> from the web, it says that I should use a broadcast variable but it is not >> a good choice for doing it on the loop. >> > I need your suggestion to address this problem, thank you very much. >> > and I am sorry, I am a beginner in Spark programming >> > >