Probably not Oracle but Cloudera 🙂 Jan, I think your DataNodes might be overloaded, I'd suggest reducing `spark.executor.cores` if you run executors alongside DataNodes, so the DataNode process would get some resources.
The other thing you can do is to increase `dfs.client.socket-timeout` in hadoopConf, I see that it's set to 120000 in your case right now On Thu, Nov 9, 2017 at 4:28 PM, Jan-Hendrik Zab <z...@l3s.de> wrote: > > Jörn Franke <jornfra...@gmail.com> writes: > > > Maybe contact Oracle support? > > Something like that would be the last option I guess, university money > is usually hard to come by for such things. > > > Do you have maybe accidentally configured some firewall rules? Routing > > issues? Maybe only one of the nodes... > > All systems are in the same /16, the nodes don't even have a firewall > and the two masters allow everything from the nodes and masters via the > infiniband devices. > > And as I said, mapred jobs work fine and I haven't seen one network > problem so far except for these messages. > > Best, > -jhz > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >