unsubscribe

2023-08-22 Thread heri wijayanto
unsubscribe


unsubscribe

2023-08-09 Thread heri wijayanto
unsubscribe


Unsubscribe

2023-08-04 Thread heri wijayanto
Unsubscribe


unsubscribe

2021-01-26 Thread heri wijayanto



Re: join function in a loop

2016-05-28 Thread heri wijayanto
I am sorry, we can not divide the data set and process it separately. does
it mean that I overuse Spark for my data size because it consumes a long
time to shuffle the data?

On Sun, May 29, 2016 at 8:53 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Heri:
> Is it possible to partition your data set so that the number of rows
> involved in join is under control ?
>
> Cheers
>
> On Sat, May 28, 2016 at 5:25 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> You are welcome
>>
>> Also use can use OS command /usr/bin/free to see how much free memory
>> you have on each node.
>>
>> You should also see from SPARK GUI (first job on master node:4040, next
>> on 4041etc) the  resource and Storage (memory usage) for each SparkSubmit
>> job.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 29 May 2016 at 01:16, heri wijayanto <heri0...@gmail.com> wrote:
>>
>>> Thank you, Dr Mich Talebzadeh, I will capture the error messages, but
>>> currently, my cluster is running to do the other job. After it finished, I
>>> will try your suggestions
>>>
>>> On Sun, May 29, 2016 at 7:55 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> You should have errors in yarn-nodemanager and yarn-resourcemanager
>>>> logs.
>>>>
>>>> Something like below for heathy container
>>>>
>>>> 2016-05-29 00:50:50,496 INFO
>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>>>> Memory usage of ProcessTree 29769 for container-id
>>>> container_1464210869844_0061_01_01: 372.6 MB of 4 GB physical memory
>>>> used; 2.7 GB of 8.4 GB virtual memory used
>>>>
>>>> It appears that you are running out of memory. Have you also checked
>>>> with jps and jmonitor for SparkSubmit (the driver process) for the failing
>>>> job? It will show you the resource usage= like memory/heap/cpu etc
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 29 May 2016 at 00:26, heri wijayanto <heri0...@gmail.com> wrote:
>>>>
>>>>> I implement spark with join function for processing in around 250
>>>>> million rows of text.
>>>>>
>>>>> When I just used several hundred of rows, it could run, but when I use
>>>>> the large data, it is failed.
>>>>>
>>>>> My spark version in 1.6.1, run above yarn-cluster mode, and we have 5
>>>>> node computers.
>>>>>
>>>>> Thank you very much, Ted Yu
>>>>>
>>>>> On Sun, May 29, 2016 at 6:48 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>
>>>>>> Can you let us know your case ?
>>>>>>
>>>>>> When the join failed, what was the error (consider pastebin) ?
>>>>>>
>>>>>> Which release of Spark are you using ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> > On May 28, 2016, at 3:27 PM, heri wijayanto <heri0...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > Hi everyone,
>>>>>> > I perform join function in a loop, and it is failed. I found a
>>>>>> tutorial from the web, it says that I should use a broadcast variable but
>>>>>> it is not a good choice for doing it on the loop.
>>>>>> > I need your suggestion to address this problem, thank you very much.
>>>>>> > and I am sorry, I am a beginner in Spark programming
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: join function in a loop

2016-05-28 Thread heri wijayanto
Thank you, Dr Mich Talebzadeh, I will capture the error messages, but
currently, my cluster is running to do the other job. After it finished, I
will try your suggestions

On Sun, May 29, 2016 at 7:55 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> You should have errors in yarn-nodemanager and yarn-resourcemanager logs.
>
> Something like below for heathy container
>
> 2016-05-29 00:50:50,496 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 29769 for container-id
> container_1464210869844_0061_01_01: 372.6 MB of 4 GB physical memory
> used; 2.7 GB of 8.4 GB virtual memory used
>
> It appears that you are running out of memory. Have you also checked with
> jps and jmonitor for SparkSubmit (the driver process) for the failing job?
> It will show you the resource usage= like memory/heap/cpu etc
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 29 May 2016 at 00:26, heri wijayanto <heri0...@gmail.com> wrote:
>
>> I implement spark with join function for processing in around 250 million
>> rows of text.
>>
>> When I just used several hundred of rows, it could run, but when I use
>> the large data, it is failed.
>>
>> My spark version in 1.6.1, run above yarn-cluster mode, and we have 5
>> node computers.
>>
>> Thank you very much, Ted Yu
>>
>> On Sun, May 29, 2016 at 6:48 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Can you let us know your case ?
>>>
>>> When the join failed, what was the error (consider pastebin) ?
>>>
>>> Which release of Spark are you using ?
>>>
>>> Thanks
>>>
>>> > On May 28, 2016, at 3:27 PM, heri wijayanto <heri0...@gmail.com>
>>> wrote:
>>> >
>>> > Hi everyone,
>>> > I perform join function in a loop, and it is failed. I found a
>>> tutorial from the web, it says that I should use a broadcast variable but
>>> it is not a good choice for doing it on the loop.
>>> > I need your suggestion to address this problem, thank you very much.
>>> > and I am sorry, I am a beginner in Spark programming
>>>
>>
>>
>


Re: join function in a loop

2016-05-28 Thread heri wijayanto
I implement spark with join function for processing in around 250 million
rows of text.

When I just used several hundred of rows, it could run, but when I use the
large data, it is failed.

My spark version in 1.6.1, run above yarn-cluster mode, and we have 5 node
computers.

Thank you very much, Ted Yu

On Sun, May 29, 2016 at 6:48 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Can you let us know your case ?
>
> When the join failed, what was the error (consider pastebin) ?
>
> Which release of Spark are you using ?
>
> Thanks
>
> > On May 28, 2016, at 3:27 PM, heri wijayanto <heri0...@gmail.com> wrote:
> >
> > Hi everyone,
> > I perform join function in a loop, and it is failed. I found a tutorial
> from the web, it says that I should use a broadcast variable but it is not
> a good choice for doing it on the loop.
> > I need your suggestion to address this problem, thank you very much.
> > and I am sorry, I am a beginner in Spark programming
>


join function in a loop

2016-05-28 Thread heri wijayanto
Hi everyone,
I perform join function in a loop, and it is failed. I found a tutorial
from the web, it says that I should use a broadcast variable but it is not
a good choice for doing it on the loop.
I need your suggestion to address this problem, thank you very much.
and I am sorry, I am a beginner in Spark programming