Re: join function in a loop

heri wijayanto Sat, 28 May 2016 17:17:57 -0700

Thank you, Dr Mich Talebzadeh, I will capture the error messages, but
currently, my cluster is running to do the other job. After it finished, I
will try your suggestions


On Sun, May 29, 2016 at 7:55 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> You should have errors in yarn-nodemanager and yarn-resourcemanager logs.
>
> Something like below for heathy container
>
> 2016-05-29 00:50:50,496 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 29769 for container-id
> container_1464210869844_0061_01_000001: 372.6 MB of 4 GB physical memory
> used; 2.7 GB of 8.4 GB virtual memory used
>
> It appears that you are running out of memory. Have you also checked with
> jps and jmonitor for SparkSubmit (the driver process) for the failing job?
> It will show you the resource usage= like memory/heap/cpu etc
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 29 May 2016 at 00:26, heri wijayanto <heri0...@gmail.com> wrote:
>
>> I implement spark with join function for processing in around 250 million
>> rows of text.
>>
>> When I just used several hundred of rows, it could run, but when I use
>> the large data, it is failed.
>>
>> My spark version in 1.6.1, run above yarn-cluster mode, and we have 5
>> node computers.
>>
>> Thank you very much, Ted Yu
>>
>> On Sun, May 29, 2016 at 6:48 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Can you let us know your case ?
>>>
>>> When the join failed, what was the error (consider pastebin) ?
>>>
>>> Which release of Spark are you using ?
>>>
>>> Thanks
>>>
>>> > On May 28, 2016, at 3:27 PM, heri wijayanto <heri0...@gmail.com>
>>> wrote:
>>> >
>>> > Hi everyone,
>>> > I perform join function in a loop, and it is failed. I found a
>>> tutorial from the web, it says that I should use a broadcast variable but
>>> it is not a good choice for doing it on the loop.
>>> > I need your suggestion to address this problem, thank you very much.
>>> > and I am sorry, I am a beginner in Spark programming
>>>
>>
>>
>

Re: join function in a loop

Reply via email to