Re: Task failures and other problems

Vadim Semenov Thu, 09 Nov 2017 14:39:47 -0800

Probably not Oracle but Cloudera 🙂

Jan, I think your DataNodes might be overloaded, I'd suggest reducing
`spark.executor.cores` if you run executors alongside DataNodes, so the
DataNode process would get some resources.


The other thing you can do is to increase `dfs.client.socket-timeout` in
hadoopConf,
I see that it's set to 120000 in your case right now

On Thu, Nov 9, 2017 at 4:28 PM, Jan-Hendrik Zab <z...@l3s.de> wrote:

>
> Jörn Franke <jornfra...@gmail.com> writes:
>
> > Maybe contact Oracle support?
>
> Something like that would be the last option I guess, university money
> is usually hard to come by for such things.
>
> > Do you have maybe accidentally configured some firewall rules? Routing
> > issues? Maybe only one of the nodes...
>
> All systems are in the same /16, the nodes don't even have a firewall
> and the two masters allow everything from the nodes and masters via the
> infiniband devices.
>
> And as I said, mapred jobs work fine and I haven't seen one network
> problem so far except for these messages.
>
> Best,
>         -jhz
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Task failures and other problems

Reply via email to