[ 
https://issues.apache.org/jira/browse/HADOOP-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858874#comment-15858874
 ] 

Aaron Fabbri commented on HADOOP-14041:
---------------------------------------

Thank you for the patch.  Overall it looks pretty good.  A couple of things 
need addressing.  The core-default and the InterruptedException are the 
important ones.

{noformat}
+  @InterfaceStability.Unstable
+  public static final String S3GUARD_CLI_PRUNE_AGE =
+      "fs.s3a.s3guard.cli.prune.age";
{noformat}

Probably want a snippet in core-default.xml.

{noformat}
+  @InterfaceStability.Unstable
+  public static final String S3GUARD_DDB_BATCH_SLEEP_MSEC_KEY =
+      "fs.s3a.s3guard.ddb.batch.sleep";
+  public static final int S3GUARD_DDB_BATCH_SLEEP_MSEC_DEFAULT = 25;
{noformat}

Same here.  Also wondering if we should call this "...ddb.prune.batch.sleep" as 
to not cause confusion with stuff like HADOOP-13904.  I think prune is going to 
remain a special case since it is a background priority job.  We could also 
call it "...ddb.background.sleep" to future-proof it for other background tasks 
(e.g. if we introduced an background scrubber or integrity checker?

{noformat}
+    deletionBatch.add(path);
+          if (deletionBatch.size() == S3GUARD_DDB_BATCH_WRITE_REQUEST_LIMIT) {
+            Thread.sleep(delay);
+            processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]);
+          }
+        } catch (IOException e) {
+          LOG.error(e.getMessage());
+        }
+        if (deletionBatch.size() > 0) {
+          Thread.sleep(delay);
+          processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]);
+        }
{noformat}

Minor nit: I would make sleep happen between batches (not before the first).  
e.g. 

{noformat}
long batchCount = 0;
...
deletionBatch.add(path);
if (deletionBatch.size() == S3GUARD_DDB_BATCH_WRITE_REQUEST_LIMIT) {
    if (batchCount++ > 0) {    // don't sleep before first batch
        Thread.sleep(delay); 
    }
    processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]);
...
{noformat}

You could also use that for an interesting log message {{LOG.debug("Finished 
processing {} batches", batchCount);}}

{noformat}
+    } catch (InterruptedException e) {
+      LOG.warn("Pruning operation was interrupted!");
+    }
{noformat}
You need to propagate this exception, or set the threads' interrupt status.






> CLI command to prune old metadata
> ---------------------------------
>
>                 Key: HADOOP-14041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14041
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-14041-HADOOP-13345.001.patch, 
> HADOOP-14041-HADOOP-13345.002.patch, HADOOP-14041-HADOOP-13345.003.patch
>
>
> Add a CLI command that allows users to specify an age at which to prune 
> metadata that hasn't been modified for an extended period of time. Since the 
> primary use-case targeted at the moment is list consistency, it would make 
> sense (especially when authoritative=false) to prune metadata that is 
> expected to have become consistent a long time ago.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to