[ 
https://issues.apache.org/jira/browse/HADOOP-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456209#comment-16456209
 ] 

Steve Loughran commented on HADOOP-15421:
-----------------------------------------

+ [~rdblue]. ~[~jzhuge] I know iceberg has its own format which uses unique 
filenames to avoid update inconsistency, but they might have some suggestions 
here.

Current format: 
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/SuccessData.java

There's always been a version marker in the file, to allow us to switch to a 
new format & let tests discover this by checking the version field alone...

> Stabilise/formalise the JSON _SUCCESS format used in the S3A committers
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-15421
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15421
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> the S3A committers rely on an atomic PUT to save a JSON summary of the job to 
> the dest FS, containing files, statistics, etc. This is for internal testing, 
> but it turns out to be useful for spark integration testing, Hive, etc.
> IBM's stocator also generated a manifest.
> Proposed: come up with (an extensible) design that we are happy with as a 
> long lived format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to