[ https://issues.apache.org/jira/browse/HADOOP-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858874#comment-15858874 ]
Aaron Fabbri commented on HADOOP-14041: --------------------------------------- Thank you for the patch. Overall it looks pretty good. A couple of things need addressing. The core-default and the InterruptedException are the important ones. {noformat} + @InterfaceStability.Unstable + public static final String S3GUARD_CLI_PRUNE_AGE = + "fs.s3a.s3guard.cli.prune.age"; {noformat} Probably want a snippet in core-default.xml. {noformat} + @InterfaceStability.Unstable + public static final String S3GUARD_DDB_BATCH_SLEEP_MSEC_KEY = + "fs.s3a.s3guard.ddb.batch.sleep"; + public static final int S3GUARD_DDB_BATCH_SLEEP_MSEC_DEFAULT = 25; {noformat} Same here. Also wondering if we should call this "...ddb.prune.batch.sleep" as to not cause confusion with stuff like HADOOP-13904. I think prune is going to remain a special case since it is a background priority job. We could also call it "...ddb.background.sleep" to future-proof it for other background tasks (e.g. if we introduced an background scrubber or integrity checker? {noformat} + deletionBatch.add(path); + if (deletionBatch.size() == S3GUARD_DDB_BATCH_WRITE_REQUEST_LIMIT) { + Thread.sleep(delay); + processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]); + } + } catch (IOException e) { + LOG.error(e.getMessage()); + } + if (deletionBatch.size() > 0) { + Thread.sleep(delay); + processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]); + } {noformat} Minor nit: I would make sleep happen between batches (not before the first). e.g. {noformat} long batchCount = 0; ... deletionBatch.add(path); if (deletionBatch.size() == S3GUARD_DDB_BATCH_WRITE_REQUEST_LIMIT) { if (batchCount++ > 0) { // don't sleep before first batch Thread.sleep(delay); } processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]); ... {noformat} You could also use that for an interesting log message {{LOG.debug("Finished processing {} batches", batchCount);}} {noformat} + } catch (InterruptedException e) { + LOG.warn("Pruning operation was interrupted!"); + } {noformat} You need to propagate this exception, or set the threads' interrupt status. > CLI command to prune old metadata > --------------------------------- > > Key: HADOOP-14041 > URL: https://issues.apache.org/jira/browse/HADOOP-14041 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Reporter: Sean Mackrory > Assignee: Sean Mackrory > Attachments: HADOOP-14041-HADOOP-13345.001.patch, > HADOOP-14041-HADOOP-13345.002.patch, HADOOP-14041-HADOOP-13345.003.patch > > > Add a CLI command that allows users to specify an age at which to prune > metadata that hasn't been modified for an extended period of time. Since the > primary use-case targeted at the moment is list consistency, it would make > sense (especially when authoritative=false) to prune metadata that is > expected to have become consistent a long time ago. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org