Dear community, I am facing a problem accessing data on S3 via Spark. My current configuration is the following:
- Spark 1.4.1 - Hadoop 2.7.1 - hadoop-aws-2.7.1 - mesos 0.22.1 I am accessing the data using the s3a protocol but it just hangs. The job runs through the whole data set but systematically there is one tasks never finishing. In the stderr I am reading quite some timeout errors but it looks like the application is recovering from these. It is just infinitely running without proceeding to the next stage. This is the stacktrace I am reading from the errors that the job is recovering from: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInpu tStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at sun.security.ssl.InputRecord.readFully(InputRecord.java:442) at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554) at sun.security.ssl.InputRecord.read(InputRecord.java:509) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:934) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:891) at sun.security.ssl.AppInputStream.read(AppInputStream.java:102) at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:198) at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.FilterInputStream.read(FilterInputStream.java:133) at com.amazonaws.util.ContentLengthValidationInputStream.read(ContentLengthValidationInputStream.java:77) at java.io.FilterInputStream.read(FilterInputStream.java:133) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:164) at java.io.DataInputStream.read(DataInputStream.java:149) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.readAByte(CBZip2InputStream.java:195) at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.getAndMoveToFrontDecode(CBZip2InputStream.java:949) at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock(CBZip2InputStream.java:506) at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.changeStateToProcessABlock(CBZip2InputStream.java:335) at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:425) at org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionInputStream.read(BZip2Codec.java:485) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.mapreduce.lib.input.CompressedSplitLineReader.fillBuffer(CompressedSplitLineReader.java:130) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) My gut feeling is that the job is "failing at failing". It looks like some tasks that should be failing, unfortunately are not. This seems to not happen and, thus, the job just hangs forever. Moreover, debugging this problem is really hard because there is no concrete error in the logs. Could you help me figuring out what is happening and trying to find a solution to this issue? Thank you!