[ https://issues.apache.org/jira/browse/HADOOP-17833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated HADOOP-17833: ------------------------------------ Summary: Improve Magic Committer Performane (was: createFile() under a magic path to skip all probes for file/dir at end of path) > Improve Magic Committer Performane > ---------------------------------- > > Key: HADOOP-17833 > URL: https://issues.apache.org/jira/browse/HADOOP-17833 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 > Affects Versions: 3.3.1 > Reporter: Steve Loughran > Priority: Minor > > Magic committer tasks can be slow because every file created with > overwrite=false triggers a HEAD (verify there's no file) and a LIST (that > there's no dir). And because of delayed manifestations, it may not behave as > expected. > ParquetOutputFormat is one example of a library which does this. > we could fix parquet to use overwrite=true, but (a) there may be surprises in > other uses (b) it'd still leave the list and (c) do nothing for other formats > call > Proposed: createFile() under a magic path to skip all probes for file/dir at > end of path > Only a single task attempt Will be writing to that directory and it should > know what it is doing. If there is conflicting file names and parts across > tasks that won't even get picked up at this point. Oh and none of the > committers ever check for this: you'll get the last file manifested (s3a) or > renamed (file) > If we skip the checks we will save 2 HTTP requests/file. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org