[ 
https://issues.apache.org/jira/browse/SPARK-20703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085488#comment-16085488
 ] 

Steve Loughran commented on SPARK-20703:
----------------------------------------

yeah, I'm not worrying too much about the new committers: we get to fix things 
there ourselves (and I already have). It's just that even AWS caches negative 
HEAD/GET calls to prevent DOS. 

These sequences on a nonexistent file work

{code}
PUT -> 200; HEAD -> 200
PUT -> 200; GET -> 200
{code}

work, but not always if there's a GET/HEAD first:

{code}
HEAD -> 404, PUT -> 200; HEAD -> 404
HEAD -> 404, PUT -> 200; GET-> 404
GET -> 404, PUT -> 200; GET-> 404
GET -> 404, PUT -> 200; HEAD -> 404
{code}

When does a HEAD go out before a a PUT? when someone goes 
{{FileSystem.create(path); there's a [check for the object 
first|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L584]
 to reject directory overwrites, and, if overwrite=false, that it isn't there. 
HADOOP-13884 covers tuning that so we can maybe avoid the checks if 
overwrite=true, but if overwrite=false, the HEAD is needed.

It's a rare event, but it does surface sporadically; the fault injection we're 
doing with s3guard & committers can generate the various failures and 
inconsitencies, so I replicate this. Hence my offer of a fix.

Returning to the HADOOP-13786 committer work
# the netflix staging committer writes to file://, the dest dir is always 
consistent. Task commit [does an uncommitted multipart PUT to the final 
destination|https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java#L754],
 at which point the files "disappear" from everywhere until job commit. This is 
the one they're using in production, and the one I'm going to recommend as the 
first one to use. Where it has fun is that the FS of 
{{FileOutputCommitter.getWorkingDirectory()}} doesn't live in the same FS as 
the final work. Anything which assumes that is in trouble
# The "magic" one, which promises better performance, will stream direct to the 
destination path, by recognising a path like 
{{destdir/__magic/attempt01/task01/__base/YEAR=2017/part-0000.orc}} and 
initiating the multipart PUT to {{destdir/YEAR=2017/part-0000.orc}}. In the 
stream.close(), rather than complete the put, it just [persists the commit 
data|https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L95],
 
here to to 
{{destdir/__magic/attempt01/task01/__base/YEAR=2017/part-0000.orc.pending}}  
and committing it later. After your patch it now also saves a 0-byte file to 
that initial output path, so the getFileStatus succeeds, just gets 0 bytes as 
the outcome. This committer should be faster as it uploads the output in 128 MB 
blocks during the {{OutputStream.write()}}, whereas the staging committer has a 
task commit delay of {{data/bandwidth}} & requires VMs to have enough disk 
space for all active worker threads. But the magic committer it's the strange 
one. If it can be made to work, it's also likely to be the strategy for a WASB 
committer.

Even with my 0-byte PUT, or if revert to the classic-and-unreliable 
rename-based-committer, if I turn on negative GET/HEAD caching, the HEAD check 
straight after a PUT completes can cause problems. Sometimes that's going to 
arise in production whenever things write to S3, no matter what they are trying 
to do.

I'll do a quick patch. I don't think I can easily do a test for it though, 
other than through the fault injecting S3A client, which isn't ready to play 
with yet, even if you wanted to go near it.




> Add an operator for writing data out
> ------------------------------------
>
>                 Key: SPARK-20703
>                 URL: https://issues.apache.org/jira/browse/SPARK-20703
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Reynold Xin
>            Assignee: Liang-Chi Hsieh
>             Fix For: 2.3.0
>
>
> We should add an operator for writing data out. Right now in the explain plan 
> / UI there is no way to tell whether a query is writing data out, and also 
> there is no way to associate metrics with data writes. It'd be tremendously 
> valuable to do this for adding metrics and for visibility.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to