I implement spark with join function for processing in around 250 million
rows of text.

When I just used several hundred of rows, it could run, but when I use the
large data, it is failed.

My spark version in 1.6.1, run above yarn-cluster mode, and we have 5 node
computers.

Thank you very much, Ted Yu

On Sun, May 29, 2016 at 6:48 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Can you let us know your case ?
>
> When the join failed, what was the error (consider pastebin) ?
>
> Which release of Spark are you using ?
>
> Thanks
>
> > On May 28, 2016, at 3:27 PM, heri wijayanto <heri0...@gmail.com> wrote:
> >
> > Hi everyone,
> > I perform join function in a loop, and it is failed. I found a tutorial
> from the web, it says that I should use a broadcast variable but it is not
> a good choice for doing it on the loop.
> > I need your suggestion to address this problem, thank you very much.
> > and I am sorry, I am a beginner in Spark programming
>

Reply via email to