[ https://issues.apache.org/jira/browse/HADOOP-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125524#comment-16125524 ]
Steve Loughran commented on HADOOP-14770: ----------------------------------------- # add the Hadoop version to the JIRA, thanks # What is the file format? simple or columnar (ORC, Parquet) # Looks like the connection is being closed on every seek, which is a sign of HADOOP-13203 not engaging (random IO), or on a sequential read, forward reads aborting/reopening rather than skipping forward. Make sure you are using the Hadoop 2.8.x JARS, then: For columnar data: enabling random IO. {code} spark.hadoop.fs.s3a.experimental.fadvise=random {code} For sequential data with big forward skips {code} spark.hadoop.fs.s3a.readahead.range = 768K {code} If this fixes it, close as a duplicate of HADOOP-13203 If this doesn't fix it, you can print both the input stream and s3a FS, as their toString() ops print all their stats. Oh, one more possible cause: split calculation isn't getting it write. Look at your s3a block size, and the format itself. > S3A http connection in s3a driver not reuse in Spark application > ---------------------------------------------------------------- > > Key: HADOOP-14770 > URL: https://issues.apache.org/jira/browse/HADOOP-14770 > Project: Hadoop Common > Issue Type: Bug > Reporter: Yonger > Assignee: Yonger > > I print out connection stats every 2 s when running Spark application against > s3-compatible storage: > ESTAB 0 0 ::ffff:10.0.2.36:44446 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44454 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44374 > ::ffff:10.0.2.254:80 > ESTAB 159724 0 ::ffff:10.0.2.36:44436 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44448 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44338 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44438 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44414 > ::ffff:10.0.2.254:80 > ESTAB 0 480 ::ffff:10.0.2.36:44450 > ::ffff:10.0.2.254:80 timer:(on,170ms,0) > ESTAB 0 0 ::ffff:10.0.2.36:44442 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44390 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44326 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44452 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44394 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44444 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44456 > ::ffff:10.0.2.254:80 > ====================== > ESTAB 0 0 ::ffff:10.0.2.36:44508 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44476 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44524 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44374 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44500 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44504 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44512 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44506 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44464 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44518 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44510 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44442 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44526 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44472 > ::ffff:10.0.2.254:80 > ESTAB 0 0 ::ffff:10.0.2.36:44466 > ::ffff:10.0.2.254:80 > the connection in the above of "=" and below were changed all the time. But > this haven't seen in MR application. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org