[ https://issues.apache.org/jira/browse/SPARK-20703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085488#comment-16085488 ]
Steve Loughran commented on SPARK-20703: ---------------------------------------- yeah, I'm not worrying too much about the new committers: we get to fix things there ourselves (and I already have). It's just that even AWS caches negative HEAD/GET calls to prevent DOS. These sequences on a nonexistent file work {code} PUT -> 200; HEAD -> 200 PUT -> 200; GET -> 200 {code} work, but not always if there's a GET/HEAD first: {code} HEAD -> 404, PUT -> 200; HEAD -> 404 HEAD -> 404, PUT -> 200; GET-> 404 GET -> 404, PUT -> 200; GET-> 404 GET -> 404, PUT -> 200; HEAD -> 404 {code} When does a HEAD go out before a a PUT? when someone goes {{FileSystem.create(path); there's a [check for the object first|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L584] to reject directory overwrites, and, if overwrite=false, that it isn't there. HADOOP-13884 covers tuning that so we can maybe avoid the checks if overwrite=true, but if overwrite=false, the HEAD is needed. It's a rare event, but it does surface sporadically; the fault injection we're doing with s3guard & committers can generate the various failures and inconsitencies, so I replicate this. Hence my offer of a fix. Returning to the HADOOP-13786 committer work # the netflix staging committer writes to file://, the dest dir is always consistent. Task commit [does an uncommitted multipart PUT to the final destination|https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java#L754], at which point the files "disappear" from everywhere until job commit. This is the one they're using in production, and the one I'm going to recommend as the first one to use. Where it has fun is that the FS of {{FileOutputCommitter.getWorkingDirectory()}} doesn't live in the same FS as the final work. Anything which assumes that is in trouble # The "magic" one, which promises better performance, will stream direct to the destination path, by recognising a path like {{destdir/__magic/attempt01/task01/__base/YEAR=2017/part-0000.orc}} and initiating the multipart PUT to {{destdir/YEAR=2017/part-0000.orc}}. In the stream.close(), rather than complete the put, it just [persists the commit data|https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L95], here to to {{destdir/__magic/attempt01/task01/__base/YEAR=2017/part-0000.orc.pending}} and committing it later. After your patch it now also saves a 0-byte file to that initial output path, so the getFileStatus succeeds, just gets 0 bytes as the outcome. This committer should be faster as it uploads the output in 128 MB blocks during the {{OutputStream.write()}}, whereas the staging committer has a task commit delay of {{data/bandwidth}} & requires VMs to have enough disk space for all active worker threads. But the magic committer it's the strange one. If it can be made to work, it's also likely to be the strategy for a WASB committer. Even with my 0-byte PUT, or if revert to the classic-and-unreliable rename-based-committer, if I turn on negative GET/HEAD caching, the HEAD check straight after a PUT completes can cause problems. Sometimes that's going to arise in production whenever things write to S3, no matter what they are trying to do. I'll do a quick patch. I don't think I can easily do a test for it though, other than through the fault injecting S3A client, which isn't ready to play with yet, even if you wanted to go near it. > Add an operator for writing data out > ------------------------------------ > > Key: SPARK-20703 > URL: https://issues.apache.org/jira/browse/SPARK-20703 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 2.2.0 > Reporter: Reynold Xin > Assignee: Liang-Chi Hsieh > Fix For: 2.3.0 > > > We should add an operator for writing data out. Right now in the explain plan > / UI there is no way to tell whether a query is writing data out, and also > there is no way to associate metrics with data writes. It'd be tremendously > valuable to do this for adding metrics and for visibility. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org