I thought about why it had worked for small input files.

Obviously, all input split have been read from the (local) worker that is
on the same machine as the master (otherwise the remote worker would have
raised a FileNotFoundException). Since the input splits were very small,
reading the first input split might have been faster than the request of
the remote worker to the JobManager to assign an input split. Therefore,
the local worker could requested the next input split before the remote
worker and read the full file.

Once the input was larger, reading the first split took longer than the
request of the remote worker which got a split assigned and failed to read
the file.




2014-07-07 12:16 GMT+02:00 Stephan Ewen <se...@apache.org>:

> Hi!
>
> Okay, good to hear you solved the issue. HDFS is a good way to go in large
> setups, though shared filesystems / SANs that are mounted on all machines
> work as well (using Amazon EBS is an example for that).
>
> Stephan
>

Reply via email to