[ https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Syed Shameerur Rahman updated HADOOP-18776: ------------------------------------------- Description: The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* which is an another type of S3 Magic committer but with a better performance by taking in few tradeoffs. The following are the differences in MagicCommitter vs OptimizedMagicCommitter ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| |commitTask |1. Lists all {{.pending}} files in its attempt directory. 2. The contents are loaded into a list of single pending uploads. 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all {{.pending}} files in its attempt directory 2. The contents are loaded into a list of single pending uploads. 3. For each pending upload, commit operation is called (complete multiPartUpload)| |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory 2. Then every pending commit in the job will be committed. 3. "SUCCESS" marker is created (if config is enabled) 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if config is enabled) 2. "__magic" directory is cleaned up.| *Performance Benefits :-* # The primary performance boost due to distributed complete multiPartUpload call being made in the taskAttempts(Task containers/Executors) rather than a single job driver. In case of MagicCommitter it is O(files/threads). # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" files and READ call to read them in the Job Driver. *TradeOffs :-* The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no see behavioral change as such # During execution, intermediate data becomes visible after commitTask operation # On a failure, all output must be deleted and the job needs to be restarted. *Performance Benchmark :-* Cluster : c4.8x large (ec2-instance) Instance : 1 (primary) + 5 (core) Data Size : 3TB Partitioned(TPC-DS store_sales data) Engine : Apache Spark 3.3.1 / Hadoop 3.3.3 Query: The following query inserts around 3000+ files into the table directory (ran for 3 iterations) {code:java} insert into <table> select ss_quantity from store_sales; {code} ||Committer||Iteration 1||Iteration 2||Iteration 3|| |Magic|126|127|122| |OptimizedMagic|50|51|58| So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to MagicCommitter. _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for all the cases where in user requires the guarantees of file not being visible in failure scenarios. Given the performance benefit, user can may choose to use this if they don't require any guarantees or have some mechanism to clean up the data before retrying.*_ was: The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* which is an another type of S3 Magic committer but with a better performance by taking in few tradeoffs. The following are the differences in MagicCommitter vs OptimizedMagicCommitter ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| |commitTask |1. Lists all {{.pending}} files in its attempt directory. 2. The contents are loaded into a list of single pending uploads. 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all {{.pending}} files in its attempt directory 2. The contents are loaded into a list of single pending uploads. 3. For each pending upload, commit operation is called (complete multiPartUpload)| |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory 2. Then every pending commit in the job will be committed. 3. "SUCCESS" marker is created (if config is enabled) 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if config is enabled) 2. "__magic" directory is cleaned up.| *Performance Benefits :-* # The primary performance boost due to distributed complete multiPartUpload call being made in the taskAttempts(Task containers/Executors) rather than a single job driver. In case of MagicCommitter it is O(files/threads). # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" files and READ call to read them in the Job Driver. *TradeOffs :-* The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no see behavioral change as such # During execution, intermediate data becomes visible after commitTask operation # On a failure, all output must be deleted and the job needs to be restarted. *Performance Benchmark :-* Cluster : c4.8x large (ec2-instance) Instance : 1 (primary) + 5 (core) Data Size : 3TB Partitioned(TPC-DS store_sales data) Engine : Apache Spark 3.3.1 Query: The following query inserts around 3000+ files into the table directory (ran for 3 iterations) {code:java} insert into <table> select ss_quantity from store_sales; {code} ||Committer||Iteration 1||Iteration 2||Iteration 3|| |Magic|126|127|122| |OptimizedMagic|50|51|58| So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to MagicCommitter. _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for all the cases where in user requires the guarantees of file not being visible in failure scenarios. Given the performance benefit, user can may choose to use this if they don't require any guarantees or have some mechanism to clean up the data before retrying.*_ > Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints > ---------------------------------------------------------------------- > > Key: HADOOP-18776 > URL: https://issues.apache.org/jira/browse/HADOOP-18776 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 > Reporter: Syed Shameerur Rahman > Priority: Major > Labels: pull-request-available > > The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* > which is an another type of S3 Magic committer but with a better performance > by taking in few tradeoffs. > The following are the differences in MagicCommitter vs OptimizedMagicCommitter > > ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| > |commitTask |1. Lists all {{.pending}} files in its attempt directory. > > 2. The contents are loaded into a list of single pending uploads. > > 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all > {{.pending}} files in its attempt directory > > 2. The contents are loaded into a list of single pending uploads. > > 3. For each pending upload, commit operation is called (complete > multiPartUpload)| > |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory > > 2. Then every pending commit in the job will be committed. > > 3. "SUCCESS" marker is created (if config is enabled) > > 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if > config is enabled) > > 2. "__magic" directory is cleaned up.| > > *Performance Benefits :-* > # The primary performance boost due to distributed complete multiPartUpload > call being made in the taskAttempts(Task containers/Executors) rather than a > single job driver. In case of MagicCommitter it is O(files/threads). > # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" > files and READ call to read them in the Job Driver. > > *TradeOffs :-* > The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users > migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no > see behavioral change as such > # During execution, intermediate data becomes visible after commitTask > operation > # On a failure, all output must be deleted and the job needs to be restarted. > > *Performance Benchmark :-* > Cluster : c4.8x large (ec2-instance) > Instance : 1 (primary) + 5 (core) > Data Size : 3TB Partitioned(TPC-DS store_sales data) > Engine : Apache Spark 3.3.1 / Hadoop 3.3.3 > Query: The following query inserts around 3000+ files into the table > directory (ran for 3 iterations) > {code:java} > insert into <table> select ss_quantity from store_sales; {code} > ||Committer||Iteration 1||Iteration 2||Iteration 3|| > |Magic|126|127|122| > |OptimizedMagic|50|51|58| > So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to > MagicCommitter. > > _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for > all the cases where in user requires the guarantees of file not being visible > in failure scenarios. Given the performance benefit, user can may choose to > use this if they don't require any guarantees or have some mechanism to clean > up the data before retrying.*_ > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org