[jira] [Commented] (HADOOP-16830) Add public IOStatistics API; S3A to support

2020-07-22 Thread Aaron Fabbri (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163131#comment-17163131
 ] 

Aaron Fabbri commented on HADOOP-16830:
---

Been following along. I should be able to finish a review by the end of the 
week. Wonder if we could get [~mackrorysd] to skim over the S3A stats changes?

> Add public IOStatistics API; S3A to support
> ---
>
> Key: HADOOP-16830
> URL: https://issues.apache.org/jira/browse/HADOOP-16830
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> Applications like to collect the statistics which specific operations take, 
> by collecting exactly those operations done during the execution of FS API 
> calls by their individual worker threads, and returning these to their job 
> driver
> * S3A has a statistics API for some streams, but it's a non-standard one; 
> Impala &c can't use it
> * FileSystem storage statistics are public, but as they aren't cross-thread, 
> they don't aggregate properly
> Proposed
> # A new IOStatistics interface to serve up statistics
> # S3A to implement
> # other stores to follow
> # Pass-through from the usual wrapper classes (FS data input/output streams)
> It's hard to think about how best to offer an API for operation context 
> stats, and how to actually implement.
> ThreadLocal isn't enough because the helper threads need to update on the 
> thread local value of the instigator
> My Initial PoC doesn't address that issue, but it shows what I'm thinking of



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16798) job commit failure in S3A MR magic committer test

2020-07-07 Thread Aaron Fabbri (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153199#comment-17153199
 ] 

Aaron Fabbri commented on HADOOP-16798:
---

I missed the party on this one, but just had a thought.. Did you consider 
inserting a failure point that hangs one of the commit threads when they POST 
data? Either delay the POST or the response? Would that make it easier to 
reproduce these cases?

Thanks for the fix.

> job commit failure in S3A MR magic committer test
> -
>
> Key: HADOOP-16798
> URL: https://issues.apache.org/jira/browse/HADOOP-16798
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: stdout
>
>
> failure in 
> {code}
> ITestS3ACommitterMRJob.test_200_execute:304->Assert.fail:88 Job 
> job_1578669113137_0003 failed in state FAILED with cause Job commit failed: 
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.FutureTask@6e894de2 rejected from 
> org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor@225eed53[Terminated,
>  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
> {code}
> Stack implies thread pool rejected it, but toString says "Terminated". Race 
> condition?
> *update 2020-04-22*: it's caused when a task is aborted in the AM -the 
> threadpool is disposed of, and while that is shutting down in one thread, 
> task commit is initiated using the same thread pool. When the task 
> committer's destroy operation times out, it kills all the active uploads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13230) S3A to optionally retain directory markers; look under a marker for files when needEmptyDir=true

2020-03-10 Thread Aaron Fabbri (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056471#comment-17056471
 ] 

Aaron Fabbri commented on HADOOP-13230:
---

[~ste...@apache.org] I'll take a look if you want. You can always Download As 
-> PDF and post that here.

> S3A to optionally retain directory markers; look under a marker for files 
> when needEmptyDir=true
> 
>
> Key: HADOOP-13230
> URL: https://issues.apache.org/jira/browse/HADOOP-13230
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Aaron Fabbri
>Priority: Major
>
> Users of s3a may not realize that, in some cases, it does not interoperate 
> well with other s3 tools, such as the AWS CLI.  (See HIVE-13778, IMPALA-3558).
> Specifically, if a user:
> - Creates an empty directory with hadoop fs -mkdir s3a://bucket/path
> - Copies data into that directory via another tool, i.e. aws cli.
> - Tries to access the data in that directory with any Hadoop software.
> Then the last step fails because the fake empty directory blob that s3a wrote 
> in the first step, causes s3a (listStatus() etc.) to continue to treat that 
> directory as empty, even though the second step was supposed to populate the 
> directory with data.
> I wanted to document this fact for users. We may mark this as not-fix, "by 
> design".. May also be interesting to brainstorm solutions and/or a config 
> option to change the behavior if folks care.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16792) Let s3 clients configure request timeout

2020-01-08 Thread Aaron Fabbri (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011401#comment-17011401
 ] 

Aaron Fabbri commented on HADOOP-16792:
---

Hi [~mustafaiman] . All the site docs live here: 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/

You can see these published at apache.org 
[here|https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html].
 Also note the defaults and descriptions in 
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml

> Let s3 clients configure request timeout
> 
>
> Key: HADOOP-16792
> URL: https://issues.apache.org/jira/browse/HADOOP-16792
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>
> S3 does not guarantee latency. Every once in a while a request may straggle 
> and drive latency up for the greater procedure. In these cases, simply 
> timing-out the individual request is beneficial so that the client 
> application can retry. The retry tends to complete faster than the original 
> straggling request most of the time. Others experienced this issue too: 
> [https://arxiv.org/pdf/1911.11727.pdf] .
> S3 configuration already provides timeout facility via 
> `ClientConfiguration#setTimeout`. Exposing this configuration is beneficial 
> for latency sensitive applications. S3 client configuration is shared with 
> DynamoDB client which is also affected from unreliable worst case latency.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16725) s3guard prune can delete directories -leaving orphan children.

2019-11-25 Thread Aaron Fabbri (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981786#comment-16981786
 ] 

Aaron Fabbri commented on HADOOP-16725:
---

Sounds good. Thanks for running through and checking this out.

> s3guard prune can delete directories -leaving orphan children.
> --
>
> Key: HADOOP-16725
> URL: https://issues.apache.org/jira/browse/HADOOP-16725
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>
> When s3guard prune is invoked to delete not updated since a specific time, it 
> doesn't check to see if an expired directory entry has any children. As a 
> result -if a child is newer than the cut-off date, the dir entry can be 
> removed but not the child. This can leave S3Guard in an inconsistent state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16605) NPE in TestAdlSdkConfiguration failing in yetus

2019-10-02 Thread Aaron Fabbri (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943302#comment-16943302
 ] 

Aaron Fabbri commented on HADOOP-16605:
---

PR looks good to me. +1

> NPE in TestAdlSdkConfiguration failing in yetus
> ---
>
> Key: HADOOP-16605
> URL: https://issues.apache.org/jira/browse/HADOOP-16605
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> Yetus builds are failing with NPE in TestAdlSdkConfiguration if they go near 
> hadoop-azure-datalake. Assuming HADOOP-16438 until proven differently, though 
> HADOOP-16371 may have done something too (how?), something which wasn't 
> picked up as yetus didn't know that hadoo-azuredatalake was affected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15691) Add PathCapabilities to FS and FC to complement StreamCapabilities

2019-09-20 Thread Aaron Fabbri (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934199#comment-16934199
 ] 

Aaron Fabbri commented on HADOOP-15691:
---

+1 on latest pull request pending clean precommit result.  This has been long 
overdue, thanks for the contribution and thanks for reviewing [~adam.antal]

 

> Add PathCapabilities to FS and FC to complement StreamCapabilities
> --
>
> Key: HADOOP-15691
> URL: https://issues.apache.org/jira/browse/HADOOP-15691
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15691-001.patch, HADOOP-15691-002.patch, 
> HADOOP-15691-003.patch, HADOOP-15691-004.patch
>
>
> Add a {{PathCapabilities}} interface to both FileSystem and FileContext to 
> declare the capabilities under the path of a filesystem through both the 
> FileSystem and FileContext APIs
> This is needed for 
> * HADOOP-14707: declare that a dest FS supports permissions
> * object stores to declare that they offer PUT-in-place alongside 
> (slow-rename)
> * Anything else where the implementation semantics of an FS is so different 
> caller apps would benefit from probing for the underlying semantics
> I know, we want all filesystem to work *exactly* the same. But it doesn't 
> hold, especially for object stores —and to efficiently use them, callers need 
> to be able to ask for specific features.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16547) s3guard prune command doesn't get AWS auth chain from FS

2019-09-17 Thread Aaron Fabbri (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931680#comment-16931680
 ] 

Aaron Fabbri commented on HADOOP-16547:
---

Current patch +1 LTGM after you address [~gabor.bota]'s comment about adding a 
test. 

 

> s3guard prune command doesn't get AWS auth chain from FS
> 
>
> Key: HADOOP-16547
> URL: https://issues.apache.org/jira/browse/HADOOP-16547
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> s3guard prune command doesn't get AWS auth chain from any FS, so it just 
> drives the DDB store from the conf settings. If S3A is set up to use 
> Delegation tokens then the DTs/custom AWS auth sequence is not picked up, so 
> you get an auth failure.
> Fix:
> # instantiate the FS before calling initMetadataStore
> # review other commands to make sure problem isn't replicated



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16430) S3AFilesystem.delete to incrementally update s3guard with deletions

2019-09-04 Thread Aaron Fabbri (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922903#comment-16922903
 ] 

Aaron Fabbri commented on HADOOP-16430:
---

Latest PR commits reviewed (+1). Aside: This is nice not having to run diff on 
a diff to just see what changed. Nice to be able to follow your commit history 
in the PR.

> S3AFilesystem.delete to incrementally update s3guard with deletions
> ---
>
> Key: HADOOP-16430
> URL: https://issues.apache.org/jira/browse/HADOOP-16430
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: Screenshot 2019-07-16 at 22.08.31.png
>
>
> Currently S3AFilesystem.delete() only updates the delete at the end of a 
> paged delete operation. This makes it slow when there are many thousands of 
> files to delete ,and increases the window of vulnerability to failures
> Preferred
> * after every bulk DELETE call is issued to S3, queue the (async) delete of 
> all entries in that post.
> * at the end of the delete, await the completion of these operations.
> * inside S3AFS, also do the delete across threads, so that different HTTPS 
> connections can be used.
> This should maximise DDB throughput against tables which aren't IO limited.
> When executed against small IOP limited tables, the parallel DDB DELETE 
> batches will trigger a lot of throttling events; we should make sure these 
> aren't going to trigger failures



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16430) S3AFilesystem.delete to incrementally update s3guard with deletions

2019-08-29 Thread Aaron Fabbri (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918773#comment-16918773
 ] 

Aaron Fabbri commented on HADOOP-16430:
---

+1 LGTM after your clarifying comments.. Thanks for the contribution.

> S3AFilesystem.delete to incrementally update s3guard with deletions
> ---
>
> Key: HADOOP-16430
> URL: https://issues.apache.org/jira/browse/HADOOP-16430
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: Screenshot 2019-07-16 at 22.08.31.png
>
>
> Currently S3AFilesystem.delete() only updates the delete at the end of a 
> paged delete operation. This makes it slow when there are many thousands of 
> files to delete ,and increases the window of vulnerability to failures
> Preferred
> * after every bulk DELETE call is issued to S3, queue the (async) delete of 
> all entries in that post.
> * at the end of the delete, await the completion of these operations.
> * inside S3AFS, also do the delete across threads, so that different HTTPS 
> connections can be used.
> This should maximise DDB throughput against tables which aren't IO limited.
> When executed against small IOP limited tables, the parallel DDB DELETE 
> batches will trigger a lot of throttling events; we should make sure these 
> aren't going to trigger failures



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16430) S3AFilesystem.delete to incrementally update s3guard with deletions

2019-08-28 Thread Aaron Fabbri (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918167#comment-16918167
 ] 

Aaron Fabbri commented on HADOOP-16430:
---

Thanks for the work on this. Looks pretty good. Made a couple of comments on 
the PR.

> S3AFilesystem.delete to incrementally update s3guard with deletions
> ---
>
> Key: HADOOP-16430
> URL: https://issues.apache.org/jira/browse/HADOOP-16430
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: Screenshot 2019-07-16 at 22.08.31.png
>
>
> Currently S3AFilesystem.delete() only updates the delete at the end of a 
> paged delete operation. This makes it slow when there are many thousands of 
> files to delete ,and increases the window of vulnerability to failures
> Preferred
> * after every bulk DELETE call is issued to S3, queue the (async) delete of 
> all entries in that post.
> * at the end of the delete, await the completion of these operations.
> * inside S3AFS, also do the delete across threads, so that different HTTPS 
> connections can be used.
> This should maximise DDB throughput against tables which aren't IO limited.
> When executed against small IOP limited tables, the parallel DDB DELETE 
> batches will trigger a lot of throttling events; we should make sure these 
> aren't going to trigger failures



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16382) Clock skew can cause S3Guard to think object metadata is out of date

2019-06-19 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868071#comment-16868071
 ] 

Aaron Fabbri commented on HADOOP-16382:
---

 
{quote}It'd be nice if we actually got a timestamp off AWS on the completion of 
the PUT
{quote}
Yeah, the Date field returned in the PUT [response is defined 
as|https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]
 the "time  S3 service responded" and I don't see any mention of whether or not 
this matches the S3 metadata time.

Patch looks good to me.

> Clock skew can cause S3Guard to think object metadata is out of date
> 
>
> Key: HADOOP-16382
> URL: https://issues.apache.org/jira/browse/HADOOP-16382
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Minor
>
> When a S3Guard entry is added for an object, its last updated flag is taken 
> from the local clock: if a getFileStatus is made immediately afterwards, the 
> timestamp of the file from the HEAD may be > than the local time, so the DDB 
> entry updated.
> This is even if the clocks are *close*. When updating an entry from S3, the 
> actual timestamp of the file should be used to fix it, not local clocks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15183) S3Guard store becomes inconsistent after partial failure of rename

2019-06-12 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862604#comment-16862604
 ] 

Aaron Fabbri commented on HADOOP-15183:
---

Made it through PR 951 except two files I still need to review: 
DynamoDBMetadataStore and ITestPartialRenamesDeletes. Will try to resume by 
tomorrow.

> S3Guard store becomes inconsistent after partial failure of rename
> --
>
> Key: HADOOP-15183
> URL: https://issues.apache.org/jira/browse/HADOOP-15183
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15183-001.patch, HADOOP-15183-002.patch, 
> org.apache.hadoop.fs.s3a.auth.ITestAssumeRole-output.txt
>
>
> If an S3A rename() operation fails partway through, such as when the user 
> doesn't have permissions to delete the source files after copying to the 
> destination, then the s3guard view of the world ends up inconsistent. In 
> particular the sequence
>  (assuming src/file* is a list of files file1...file10 and read only to 
> caller)
>
> # create file rename src/file1 dest/ ; expect AccessDeniedException in the 
> delete, dest/file1 will exist
> # delete file dest/file1
> # rename src/file* dest/  ; expect failure
> # list dest; you will not see dest/file1
> You will not see file1 in the listing, presumably because it will have a 
> tombstone marker and the update at the end of the rename() didn't take place: 
> the old data is still there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16117) Upgrade to latest AWS SDK

2019-06-05 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857270#comment-16857270
 ] 

Aaron Fabbri commented on HADOOP-16117:
---

+1 LGTM for trunk. Agreed extended testing is to be expected as AWS SDK has a 
history of subtle bugs

> Upgrade to latest AWS SDK
> -
>
> Key: HADOOP-16117
> URL: https://issues.apache.org/jira/browse/HADOOP-16117
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Upgrade to the most recent AWS SDK. That's 1.11; even though there's a 2.0 
> out it'll be more significant an upgrade, with impact downstream.
> The new [AWS SDK update 
> process|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md#-qualifying-an-aws-sdk-update]
>  *must* be followed, and we should plan for 1-2 surprises afterwards anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-06-05 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857029#comment-16857029
 ] 

Aaron Fabbri edited comment on HADOOP-13980 at 6/5/19 8:55 PM:
---

Thanks for your draft of FSCK requirements [~ste...@apache.org]. This is a good 
start.

One thing that comes to mind: I don't know that we want to consider "auth mode" 
as a factor here.  Erring on the side of over-explaining this stuff for clarity:

There are two main authoritative mode flags in play:

(1) per-directory metastore bit that says "this directory is fully loaded into 
the metastore"

(2) s3a client config bit fs.s3a.metadatastore.authoritative, which allows s3a 
to short-circuit (skip) s3 on some metadata queries. This one is just a runtime 
client behavior flag. You could have multiple clients with different settings 
sharing a bucket. FSCK could also have a different config.  I think you'll 
still want some FSCK options to select the level of enforcement / paranoia as 
you outline, just don't think it needs to be conflated with client's allow auth 
flag. I'd imagine this as a growing set of invariant checks that can be 
categorized into something like basic / paranoid / full.

Whether or not a s3a client has metadatastore.authoritative bit set in its 
config doesn't really affect the contents of the metadata store or its 
relationship to the underlying storage (s3) state\*.  If the is_authoritative 
bit is set on a directory in the metastore, however, that directory listing 
from metadatastore should *match* the listing of that dir from s3. If the bit 
is not set, the metastore listing should be a subset of the s3 listing.

I would also split the consistency checks into two categories: 
MetadataStore-specific, and generic. Majority of the stuff here are generic 
tests that work with any MetadataStore. DDB also needs to check its internal 
consistency (since it uses the ancestor-exists invariant to avoid table scans).

Also agreed you'll need table scans here–but how do we expose this for FSCK 
only? FSCK traditionally reaches below the FS to check its structures. (e.g. 
ext3 fsck uses a block device below the ext3 fs to check on disk format, 
right?).

\* some nuance here, if we want to discuss further.


was (Author: fabbri):
Thanks for your draft of FSCK requirements [~ste...@apache.org]. This is a good 
start.

One thing that comes to mind: I don't know that we want to consider "auth mode" 
as a factor here.  Erring on the side of over-explaining this stuff for clarity:

There are two main authoritative mode flags in play:

(1) per-directory metastore bit that says "this directory is fully loaded into 
the metastore"

(2) s3a client config bit fs.s3a.metadatastore.authoritative, which allows s3a 
to short-circuit (skip) s3 on some metadata queries. This one is just a runtime 
client behavior flag. You could have multiple clients with different settings 
sharing a bucket. FSCK could also have a different config.  I think you'll 
still want some FSCK options to select the level of enforcement / paranoia as 
you outline, just don't think it needs to be conflated with client's allow auth 
flag. I'd imagine this as a growing set of invariant checks that can be 
categorized into something like basic / paranoid / full.

Whether or not a s3a client has metadatastore.authoritative bit set in its 
config doesn't really affect the contents of the metadata store or its 
relationship to the underlying storage (s3) state**.  If the is_authoritative 
bit is set on a directory in the metastore, however, that directory listing 
from metadatastore should *match* the listing of that dir from s3. If the bit 
is not set, the metastore listing should be a subset of the s3 listing.

I would also split the consistency checks into two categories: 
MetadataStore-specific, and generic. Majority of the stuff here are generic 
tests that work with any MetadataStore. DDB also needs to check its internal 
consistency (since it uses the ancestor-exists invariant to avoid table scans).

Also agreed you'll need table scans here–but how do we expose this for FSCK 
only? FSCK traditionally reaches below the FS to check its structures. (e.g. 
ext3 fsck uses a block device below the ext3 fs to check on disk format, 
right?).

 

** some nuance here, if we want to discuss further.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failur

[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-06-05 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857029#comment-16857029
 ] 

Aaron Fabbri commented on HADOOP-13980:
---

Thanks for your draft of FSCK requirements [~ste...@apache.org]. This is a good 
start.

One thing that comes to mind: I don't know that we want to consider "auth mode" 
as a factor here.  Erring on the side of over-explaining this stuff for clarity:

There are two main authoritative mode flags in play:

(1) per-directory metastore bit that says "this directory is fully loaded into 
the metastore"

(2) s3a client config bit fs.s3a.metadatastore.authoritative, which allows s3a 
to short-circuit (skip) s3 on some metadata queries. This one is just a runtime 
client behavior flag. You could have multiple clients with different settings 
sharing a bucket. FSCK could also have a different config.  I think you'll 
still want some FSCK options to select the level of enforcement / paranoia as 
you outline, just don't think it needs to be conflated with client's allow auth 
flag. I'd imagine this as a growing set of invariant checks that can be 
categorized into something like basic / paranoid / full.

Whether or not a s3a client has metadatastore.authoritative bit set in its 
config doesn't really affect the contents of the metadata store or its 
relationship to the underlying storage (s3) state**.  If the is_authoritative 
bit is set on a directory in the metastore, however, that directory listing 
from metadatastore should *match* the listing of that dir from s3. If the bit 
is not set, the metastore listing should be a subset of the s3 listing.

I would also split the consistency checks into two categories: 
MetadataStore-specific, and generic. Majority of the stuff here are generic 
tests that work with any MetadataStore. DDB also needs to check its internal 
consistency (since it uses the ancestor-exists invariant to avoid table scans).

Also agreed you'll need table scans here–but how do we expose this for FSCK 
only? FSCK traditionally reaches below the FS to check its structures. (e.g. 
ext3 fsck uses a block device below the ext3 fs to check on disk format, 
right?).

 

** some nuance here, if we want to discuss further.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15183) S3Guard store becomes inconsistent after partial failure of rename

2019-05-29 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851518#comment-16851518
 ] 

Aaron Fabbri commented on HADOOP-15183:
---

Looking good [~ste...@apache.org]. A lot of nice improvements here. Looks good 
to me so far. Still have a couple of files to work through (large diff) in the 
latest PR.

> S3Guard store becomes inconsistent after partial failure of rename
> --
>
> Key: HADOOP-15183
> URL: https://issues.apache.org/jira/browse/HADOOP-15183
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15183-001.patch, HADOOP-15183-002.patch, 
> org.apache.hadoop.fs.s3a.auth.ITestAssumeRole-output.txt
>
>
> If an S3A rename() operation fails partway through, such as when the user 
> doesn't have permissions to delete the source files after copying to the 
> destination, then the s3guard view of the world ends up inconsistent. In 
> particular the sequence
>  (assuming src/file* is a list of files file1...file10 and read only to 
> caller)
>
> # create file rename src/file1 dest/ ; expect AccessDeniedException in the 
> delete, dest/file1 will exist
> # delete file dest/file1
> # rename src/file* dest/  ; expect failure
> # list dest; you will not see dest/file1
> You will not see file1 in the listing, presumably because it will have a 
> tombstone marker and the update at the end of the rename() didn't take place: 
> the old data is still there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16324) S3A Delegation Token code to spell "Marshalled" as Marshaled

2019-05-22 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846140#comment-16846140
 ] 

Aaron Fabbri commented on HADOOP-16324:
---

Blimey this one is easy peasy. Either option is brilliant (+1):
 * Tell IDE to bugger off, and Bob's your uncle.
 * Change the spelling and you'll be chuffed then off to Bedfordshire.

 

> S3A Delegation Token code to spell "Marshalled" as Marshaled
> 
>
> Key: HADOOP-16324
> URL: https://issues.apache.org/jira/browse/HADOOP-16324
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
>
> Apparently {{MarshalledCredentials}} is the EN_UK locality spelling; the 
> EN_US one is {{Marshaled}}. Fix in code and docs before anything ships, 
> because those classes do end up being used by all external implementations of 
> S3A Delegation Tokens.
> I am grateful to [~rlevas] for pointing out the error of my ways.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16279) S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)

2019-05-22 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846126#comment-16846126
 ] 

Aaron Fabbri commented on HADOOP-16279:
---

{quote}GB>I'm having a hard time implementing prune for the dynamo ms if we use 
last_updated instead of mod_time.
The issue is that we won't update the parent directories' last_updated field, 
so the implementation will remove all parent directory if the query includes 
directories and not just files\{quote}

 

Interesting. What if we treat last_updated = 0 as "does not expire"?  That is, 
only prune if last_updated != 0 and last_updated < prune_time

 

Are there other issues?

> S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)
> ---
>
> Key: HADOOP-16279
> URL: https://issues.apache.org/jira/browse/HADOOP-16279
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: Screenshot 2019-05-17 at 13.21.26.png
>
>
> In HADOOP-15621 we implemented TTL for Authoritative Directory Listings and 
> added {{ExpirableMetadata}}. {{DDBPathMetadata}} extends {{PathMetadata}} 
> extends {{ExpirableMetadata}}, so all metadata entries in ddb can expire, but 
> the implementation is not done yet. 
> To complete this feature the following should be done:
> * Add new tests for metadata entry and tombstone expiry to {{ITestS3GuardTtl}}
> * Implement metadata entry and tombstone expiry 
> I would like to start a debate on whether we need to use separate expiry 
> times for entries and tombstones. My +1 on not using separate settings - so 
> only one config name and value.
> 
> Notes:
> * In HADOOP-13649 the metadata TTL is implemented in LocalMetadataStore, 
> using an existing feature in guava's cache implementation. Expiry is set with 
> {{fs.s3a.s3guard.local.ttl}}.
> * LocalMetadataStore's TTL and this TTL is different. That TTL is using the 
> guava cache's internal solution for the TTL of these entries. This is an 
> S3AFileSystem level solution in S3Guard, a layer above all metadata store.
> * This is not the same, and not using the [DDB's TTL 
> feature|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html].
>  We need a different behavior than what ddb promises: [cleaning once a day 
> with a background 
> job|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html]
>  is not usable for this feature - although it can be used as a general 
> cleanup solution separately and independently from S3Guard.
> * Use the same ttl for entries and authoritative directory listing
> * All entries can be expired. Then the returned metadata from the MS will be 
> null.
> * Add two new methods pruneExpiredTtl() and pruneExpiredTtl(String keyPrefix) 
> to MetadataStore interface. These methods will delete all expired metadata 
> from the ms.
> * Use last_updated field in ms for both file metadata and authoritative 
> directory expiry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16279) S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)

2019-05-22 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846116#comment-16846116
 ] 

Aaron Fabbri commented on HADOOP-16279:
---

Good discussion, thanks.
{quote}AF> ... when data from S3 conflicts with data from MS
 GB> I would say this is out of scope for this issue. We would like to solve 
only the metadata expiry with this, and not add policies for conflict 
resolution.
{quote}
Yes. I didn't mean it should be part of this JIRA, just sketching the larger 
design that this fits within.
{quote}SL>we shouldn't support pruning file entries if the client is in auth 
mode
{quote}
Why not just clear the authoritative bit on the parent dir? I thought we 
already did that.

 Looking at your Prune comments next [~gabor.bota].

 

> S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)
> ---
>
> Key: HADOOP-16279
> URL: https://issues.apache.org/jira/browse/HADOOP-16279
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: Screenshot 2019-05-17 at 13.21.26.png
>
>
> In HADOOP-15621 we implemented TTL for Authoritative Directory Listings and 
> added {{ExpirableMetadata}}. {{DDBPathMetadata}} extends {{PathMetadata}} 
> extends {{ExpirableMetadata}}, so all metadata entries in ddb can expire, but 
> the implementation is not done yet. 
> To complete this feature the following should be done:
> * Add new tests for metadata entry and tombstone expiry to {{ITestS3GuardTtl}}
> * Implement metadata entry and tombstone expiry 
> I would like to start a debate on whether we need to use separate expiry 
> times for entries and tombstones. My +1 on not using separate settings - so 
> only one config name and value.
> 
> Notes:
> * In HADOOP-13649 the metadata TTL is implemented in LocalMetadataStore, 
> using an existing feature in guava's cache implementation. Expiry is set with 
> {{fs.s3a.s3guard.local.ttl}}.
> * LocalMetadataStore's TTL and this TTL is different. That TTL is using the 
> guava cache's internal solution for the TTL of these entries. This is an 
> S3AFileSystem level solution in S3Guard, a layer above all metadata store.
> * This is not the same, and not using the [DDB's TTL 
> feature|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html].
>  We need a different behavior than what ddb promises: [cleaning once a day 
> with a background 
> job|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html]
>  is not usable for this feature - although it can be used as a general 
> cleanup solution separately and independently from S3Guard.
> * Use the same ttl for entries and authoritative directory listing
> * All entries can be expired. Then the returned metadata from the MS will be 
> null.
> * Add two new methods pruneExpiredTtl() and pruneExpiredTtl(String keyPrefix) 
> to MetadataStore interface. These methods will delete all expired metadata 
> from the ms.
> * Use last_updated field in ms for both file metadata and authoritative 
> directory expiry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16279) S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)

2019-05-15 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840849#comment-16840849
 ] 

Aaron Fabbri commented on HADOOP-16279:
---

{quote}With HADOOP-15999 we changed a bit how authoritative mode works. In 
S3AFileSystem#innerGetFileStatus if allowAuthoritative is false then it will 
send a HEAD request{quote}

You might be able to resolve HADOOP-14468 then. 

Caught up on that issue and asked a followup question 
[there|https://issues.apache.org/jira/browse/HADOOP-15999?focusedCommentId=16840844&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16840844].
 Using TTL or expiry to resolve the OOB problem makes sense.

{quote} AF> why we need more prune() functions added to the MS interface
GB> That prune is for removing expired entries from the ddbms. It uses 
last_updated for expiry rather than mod_time.
{quote}

I would err on the side of simplicity and deleting code, especially from public 
interfaces like MetadataStore. We want it to be easy to implement and 
understand. The prune(optional_prefix, age) contract is basically **deletes as 
many entries as possible that are older than age**. (This is not a MUST delete 
because some MS implementations may decide to keep some state around for 
internal reasons, e.g. dynamo's ancestor requirement, or not wanting to prune 
directories due to complexity).

 I think you can satisfy the prune() contract with only one of the time bases 
(mod time or last updated). It seems like an internal implementation detail 
that doesn't need to be exposed.

Interesting related thought: Can we claim that last_updated (metastore write 
time) >= mod_time? In general, assuming no clock issues or null fields? Seems 
like it. (Argument: you cannot write a FileStatus with mod_time in the future 
unless your clocks are messed up). 

With that property, can't we just simplify prune to always use last_updated?  
Or, either way, it is an internal implementation detail?

{quote}AF> smarter logic that allows you set a policy for handling S3 versus MS 
conflicts
GB> So basically what you mean is to add a conflict resolution algorithm when 
an entry is expired? 
{quote}
Not so much when entry is expired, but when data from S3 conflicts with data 
from MS.  For example, MS has tombstone but S3 says file exists. 

Thanks again for the patch and discussion!




> S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)
> ---
>
> Key: HADOOP-16279
> URL: https://issues.apache.org/jira/browse/HADOOP-16279
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In HADOOP-15621 we implemented TTL for Authoritative Directory Listings and 
> added {{ExpirableMetadata}}. {{DDBPathMetadata}} extends {{PathMetadata}} 
> extends {{ExpirableMetadata}}, so all metadata entries in ddb can expire, but 
> the implementation is not done yet. 
> To complete this feature the following should be done:
> * Add new tests for metadata entry and tombstone expiry to {{ITestS3GuardTtl}}
> * Implement metadata entry and tombstone expiry 
> I would like to start a debate on whether we need to use separate expiry 
> times for entries and tombstones. My +1 on not using separate settings - so 
> only one config name and value.
> 
> Notes:
> * In HADOOP-13649 the metadata TTL is implemented in LocalMetadataStore, 
> using an existing feature in guava's cache implementation. Expiry is set with 
> {{fs.s3a.s3guard.local.ttl}}.
> * LocalMetadataStore's TTL and this TTL is different. That TTL is using the 
> guava cache's internal solution for the TTL of these entries. This is an 
> S3AFileSystem level solution in S3Guard, a layer above all metadata store.
> * This is not the same, and not using the [DDB's TTL 
> feature|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html].
>  We need a different behavior than what ddb promises: [cleaning once a day 
> with a background 
> job|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html]
>  is not usable for this feature - although it can be used as a general 
> cleanup solution separately and independently from S3Guard.
> * Use the same ttl for entries and authoritative directory listing
> * All entries can be expired. Then the returned metadata from the MS will be 
> null.
> * Add two new methods pruneExpiredTtl() and pruneExpiredTtl(String keyPrefix) 
> to MetadataStore interface. These methods will delete all expired metadata 
> from the ms.
> * Use last_updated field in ms for both file metadata and authoritative 
> directory expiry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---

[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations

2019-05-15 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840844#comment-16840844
 ] 

Aaron Fabbri commented on HADOOP-15999:
---

What about in-band delete (create tombstone) and then OOB create? Didn't see 
this covered in the cases here but maybe I missed it.  Seems like a common case 
(OOB process dropping data in bucket). Might want to add a test case and 
document this in a new JIRA if it is not already covered here.

> S3Guard: Better support for out-of-band operations
> --
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, 
> HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, 
> HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, 
> HADOOP-15999.009.patch, out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16279) S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)

2019-05-15 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840799#comment-16840799
 ] 

Aaron Fabbri edited comment on HADOOP-16279 at 5/15/19 9:34 PM:


{quote}with HADOOP-15999 we changed a bit how authoritative mode works.{quote}

Apologies I missed the later work on that issue. Apache Jira was still 
configured with my old email address so I was not getting notifications for 
months. I'll take a look now.


was (Author: fabbri):
{quote}with HADOOP-15999 we changed a bit how authoritative mode works.\{quote}

Apologies I missed the later work on that issue. Apache Jira was still 
configured with my old email address so I was not getting notifications for 
months. I'll take a look now.

> S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)
> ---
>
> Key: HADOOP-16279
> URL: https://issues.apache.org/jira/browse/HADOOP-16279
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In HADOOP-15621 we implemented TTL for Authoritative Directory Listings and 
> added {{ExpirableMetadata}}. {{DDBPathMetadata}} extends {{PathMetadata}} 
> extends {{ExpirableMetadata}}, so all metadata entries in ddb can expire, but 
> the implementation is not done yet. 
> To complete this feature the following should be done:
> * Add new tests for metadata entry and tombstone expiry to {{ITestS3GuardTtl}}
> * Implement metadata entry and tombstone expiry 
> I would like to start a debate on whether we need to use separate expiry 
> times for entries and tombstones. My +1 on not using separate settings - so 
> only one config name and value.
> 
> Notes:
> * In HADOOP-13649 the metadata TTL is implemented in LocalMetadataStore, 
> using an existing feature in guava's cache implementation. Expiry is set with 
> {{fs.s3a.s3guard.local.ttl}}.
> * LocalMetadataStore's TTL and this TTL is different. That TTL is using the 
> guava cache's internal solution for the TTL of these entries. This is an 
> S3AFileSystem level solution in S3Guard, a layer above all metadata store.
> * This is not the same, and not using the [DDB's TTL 
> feature|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html].
>  We need a different behavior than what ddb promises: [cleaning once a day 
> with a background 
> job|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html]
>  is not usable for this feature - although it can be used as a general 
> cleanup solution separately and independently from S3Guard.
> * Use the same ttl for entries and authoritative directory listing
> * All entries can be expired. Then the returned metadata from the MS will be 
> null.
> * Add two new methods pruneExpiredTtl() and pruneExpiredTtl(String keyPrefix) 
> to MetadataStore interface. These methods will delete all expired metadata 
> from the ms.
> * Use last_updated field in ms for both file metadata and authoritative 
> directory expiry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16279) S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)

2019-05-15 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840799#comment-16840799
 ] 

Aaron Fabbri commented on HADOOP-16279:
---

{quote}with HADOOP-15999 we changed a bit how authoritative mode works.\{quote}

Apologies I missed the later work on that issue. Apache Jira was still 
configured with my old email address so I was not getting notifications for 
months. I'll take a look now.

> S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)
> ---
>
> Key: HADOOP-16279
> URL: https://issues.apache.org/jira/browse/HADOOP-16279
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In HADOOP-15621 we implemented TTL for Authoritative Directory Listings and 
> added {{ExpirableMetadata}}. {{DDBPathMetadata}} extends {{PathMetadata}} 
> extends {{ExpirableMetadata}}, so all metadata entries in ddb can expire, but 
> the implementation is not done yet. 
> To complete this feature the following should be done:
> * Add new tests for metadata entry and tombstone expiry to {{ITestS3GuardTtl}}
> * Implement metadata entry and tombstone expiry 
> I would like to start a debate on whether we need to use separate expiry 
> times for entries and tombstones. My +1 on not using separate settings - so 
> only one config name and value.
> 
> Notes:
> * In HADOOP-13649 the metadata TTL is implemented in LocalMetadataStore, 
> using an existing feature in guava's cache implementation. Expiry is set with 
> {{fs.s3a.s3guard.local.ttl}}.
> * LocalMetadataStore's TTL and this TTL is different. That TTL is using the 
> guava cache's internal solution for the TTL of these entries. This is an 
> S3AFileSystem level solution in S3Guard, a layer above all metadata store.
> * This is not the same, and not using the [DDB's TTL 
> feature|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html].
>  We need a different behavior than what ddb promises: [cleaning once a day 
> with a background 
> job|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html]
>  is not usable for this feature - although it can be used as a general 
> cleanup solution separately and independently from S3Guard.
> * Use the same ttl for entries and authoritative directory listing
> * All entries can be expired. Then the returned metadata from the MS will be 
> null.
> * Add two new methods pruneExpiredTtl() and pruneExpiredTtl(String keyPrefix) 
> to MetadataStore interface. These methods will delete all expired metadata 
> from the ms.
> * Use last_updated field in ms for both file metadata and authoritative 
> directory expiry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16251) ABFS: add FSMainOperationsBaseTest

2019-05-10 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri resolved HADOOP-16251.
---
   Resolution: Fixed
Fix Version/s: 3.2.1

> ABFS: add FSMainOperationsBaseTest
> --
>
> Key: HADOOP-16251
> URL: https://issues.apache.org/jira/browse/HADOOP-16251
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
> Fix For: 3.2.1
>
>
> Just happened to see 
> "hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FSMainOperationsBaseTest.java",
>  ABFS could inherit this test to increase its test coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16251) ABFS: add FSMainOperationsBaseTest

2019-05-10 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837600#comment-16837600
 ] 

Aaron Fabbri commented on HADOOP-16251:
---

Committed to trunk. Thank you for the contribution 
[@DadanielZ|https://github.com/DadanielZ]. If you'd like to change the comments 
to address my open question above feel free to publish another patch.

> ABFS: add FSMainOperationsBaseTest
> --
>
> Key: HADOOP-16251
> URL: https://issues.apache.org/jira/browse/HADOOP-16251
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
>
> Just happened to see 
> "hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FSMainOperationsBaseTest.java",
>  ABFS could inherit this test to increase its test coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16251) ABFS: add FSMainOperationsBaseTest

2019-05-09 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836626#comment-16836626
 ] 

Aaron Fabbri commented on HADOOP-16251:
---

I had one remaining question on your Ignore annotation and comment about perms 
on a listing test. +1 otherwise. I can commit this tomorrow morning if someone 
else doesn't get to it first.

> ABFS: add FSMainOperationsBaseTest
> --
>
> Key: HADOOP-16251
> URL: https://issues.apache.org/jira/browse/HADOOP-16251
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
>
> Just happened to see 
> "hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FSMainOperationsBaseTest.java",
>  ABFS could inherit this test to increase its test coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16251) ABFS: add FSMainOperationsBaseTest

2019-05-08 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831936#comment-16831936
 ] 

Aaron Fabbri edited comment on HADOOP-16251 at 5/9/19 1:04 AM:
---

Thanks for the patch [~DanielZhou]. We really appreciate you adding extra test 
coverage for cloud filesystems (ABFS)

Couple of questions about the patch:
{noformat}
@Ignore("There shouldn't be permission check for getFileInfo")
public void 
testListStatusThrowsExceptionForUnreadableDir() {{noformat}
Since this is a listing test, wouldn't the READ | EXECUTE checks still be valid?

*EDIT: Nevermind on the getFileInfo comment below.. I confused HA check with 
permission check there.*

Also, I'm surprised about getFileStatus / getFileInfo being listed as "N/A" for 
permission checks. It seems wrong from security perspective and -also looking 
at the code doesn't seem to be the case see this 
[link|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3202-L3204]:-
{noformat}
HdfsFileStatus getFileInfo(final String src, boolean resolveLink,
boolean needLocation, boolean needBlockToken) throws IOException {
  // if the client requests block tokens, then it can read data blocks
  // and should appear in the audit log as if getBlockLocations had been
  // called
  final String operationName = needBlockToken ? "open" : "getfileinfo";
  checkOperation(OperationCategory.READ);
  HdfsFileStatus stat = null;
  final FSPermissionChecker pc = getPermissionChecker();
  readLock();
  try {
checkOperation(OperationCategory.READ);
stat = FSDirStatAndListingOp.getFileInfo({noformat}
-Looks like the HDFS Permissions doc is incorrect, no?-


was (Author: fabbri):
Thanks for the patch [~DanielZhou]. We really appreciate you adding extra test 
coverage for cloud filesystems (ABFS)

Couple of questions about the patch:
{noformat}
@Ignore("There shouldn't be permission check for getFileInfo")
public void 
testListStatusThrowsExceptionForUnreadableDir() {{noformat}
Since this is a listing test, wouldn't the READ | EXECUTE checks still be valid?

Also, I'm surprised about getFileStatus / getFileInfo being listed as "N/A" for 
permission checks. It seems wrong from security perspective and also looking at 
the code doesn't seem to be the case - see this 
[link|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3202-L3204]:
{noformat}
HdfsFileStatus getFileInfo(final String src, boolean resolveLink,
boolean needLocation, boolean needBlockToken) throws IOException {
  // if the client requests block tokens, then it can read data blocks
  // and should appear in the audit log as if getBlockLocations had been
  // called
  final String operationName = needBlockToken ? "open" : "getfileinfo";
  checkOperation(OperationCategory.READ);
  HdfsFileStatus stat = null;
  final FSPermissionChecker pc = getPermissionChecker();
  readLock();
  try {
checkOperation(OperationCategory.READ);
stat = FSDirStatAndListingOp.getFileInfo({noformat}
Looks like the HDFS Permissions doc is incorrect, no?

> ABFS: add FSMainOperationsBaseTest
> --
>
> Key: HADOOP-16251
> URL: https://issues.apache.org/jira/browse/HADOOP-16251
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
>
> Just happened to see 
> "hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FSMainOperationsBaseTest.java",
>  ABFS could inherit this test to increase its test coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16251) ABFS: add FSMainOperationsBaseTest

2019-05-08 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835999#comment-16835999
 ] 

Aaron Fabbri commented on HADOOP-16251:
---

Sorry for the confusion [~DanielZhou]. I misread that code. I saw check READ 
and catch AccessControlException and assumed it was a permission check but it 
is not. It is checking HA status. I'll edit my comment above.

> ABFS: add FSMainOperationsBaseTest
> --
>
> Key: HADOOP-16251
> URL: https://issues.apache.org/jira/browse/HADOOP-16251
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
>
> Just happened to see 
> "hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FSMainOperationsBaseTest.java",
>  ABFS could inherit this test to increase its test coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16279) S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)

2019-05-08 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835944#comment-16835944
 ] 

Aaron Fabbri commented on HADOOP-16279:
---

Thanks for the work on this stuff [~gabor.bota]. I commented on the PR. The 
logic looks pretty good but I think the design needs discussion here. The 
current patch sort of combines the two different ideas:

1. "Authoritative TTL": how fresh a MetadataStore entry needs to be for S3A to 
skip S3 query.
2. "Max entry lifetime" in MetadataStore.

I think these concepts should be kept separate in the public APIs/configs at 
least.

There are a couple of cases when querying MetadataStore (MS):
I. MetadataStore returns null (no information on that path)
II. MetadataStore returns something (has metadata entry for that path).
  II.a. entry is newer than authoritative TTL (S3A may short-circuit and skip 
S3 query)
  II.b. entry is older than authoritative TTL (there is data but S3A needs to 
also query  S3)

The patch combines II.b and I.

Sticking with the "general design, specific implementation" ideal, I'd keep the 
public interfaces and config params designed as above instead. That doesn't 
prevent you from doing a more simple implementation (e.g. for now, return null 
from S3Guard.getWithTtl() in case II.b. as you do in your patch. That works 
because it *does* cause S3A to query S3.)

So the patch made sense except the naming and description of the configuration 
parameter (i think it should be specifically for is "authoritative", not for 
existence of an entry in MS). And I didn't understand why we need more prune() 
functions added to the MS interface. Also I thought the LocalMetadataStore use 
of guava Cache meant the work was already done there?

My hope is that later on, we can replace this implementation of II.b. (where 
getWithTtl() returns null) with smarter logic that allows you set a policy for  
handling S3 versus MS conflicts.  (In this case, get() returns a PathMetadata, 
S3A would check if auth TTL expired, if so still queries S3 and if the data in 
S3 and MS conflict, take action depending on the configured conflict policy).

Shout if I can clarify this at all.



> S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)
> ---
>
> Key: HADOOP-16279
> URL: https://issues.apache.org/jira/browse/HADOOP-16279
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In HADOOP-15621 we implemented TTL for Authoritative Directory Listings and 
> added {{ExpirableMetadata}}. {{DDBPathMetadata}} extends {{PathMetadata}} 
> extends {{ExpirableMetadata}}, so all metadata entries in ddb can expire, but 
> the implementation is not done yet. 
> To complete this feature the following should be done:
> * Add new tests for metadata entry and tombstone expiry to {{ITestS3GuardTtl}}
> * Implement metadata entry and tombstone expiry 
> I would like to start a debate on whether we need to use separate expiry 
> times for entries and tombstones. My +1 on not using separate settings - so 
> only one config name and value.
> 
> Notes:
> * In HADOOP-13649 the metadata TTL is implemented in LocalMetadataStore, 
> using an existing feature in guava's cache implementation. Expiry is set with 
> {{fs.s3a.s3guard.local.ttl}}.
> * LocalMetadataStore's TTL and this TTL is different. That TTL is using the 
> guava cache's internal solution for the TTL of these entries. This is an 
> S3AFileSystem level solution in S3Guard, a layer above all metadata store.
> * This is not the same, and not using the [DDB's TTL 
> feature|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html].
>  We need a different behavior than what ddb promises: [cleaning once a day 
> with a background 
> job|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html]
>  is not usable for this feature - although it can be used as a general 
> cleanup solution separately and independently from S3Guard.
> * Use the same ttl for entries and authoritative directory listing
> * All entries can be expired. Then the returned metadata from the MS will be 
> null.
> * Add two new methods pruneExpiredTtl() and pruneExpiredTtl(String keyPrefix) 
> to MetadataStore interface. These methods will delete all expired metadata 
> from the ms.
> * Use last_updated field in ms for both file metadata and authoritative 
> directory expiry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16279) S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)

2019-05-08 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835785#comment-16835785
 ] 

Aaron Fabbri commented on HADOOP-16279:
---

[~ste...@apache.org] I'd argue LocalMetadataStore is useful still-- but if I'm 
the only one we could consider cutting it. You should be able to use it as a 
metadata cache for read-only or single-writer operations to speed things up in 
real world worlkloads (think setting it up as authoritative on a distcp, for 
example).

I'll take a peek at the PR here. Thanks for working on this [~gabor.bota]

> S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)
> ---
>
> Key: HADOOP-16279
> URL: https://issues.apache.org/jira/browse/HADOOP-16279
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In HADOOP-15621 we implemented TTL for Authoritative Directory Listings and 
> added {{ExpirableMetadata}}. {{DDBPathMetadata}} extends {{PathMetadata}} 
> extends {{ExpirableMetadata}}, so all metadata entries in ddb can expire, but 
> the implementation is not done yet. 
> To complete this feature the following should be done:
> * Add new tests for metadata entry and tombstone expiry to {{ITestS3GuardTtl}}
> * Implement metadata entry and tombstone expiry 
> I would like to start a debate on whether we need to use separate expiry 
> times for entries and tombstones. My +1 on not using separate settings - so 
> only one config name and value.
> 
> Notes:
> * In HADOOP-13649 the metadata TTL is implemented in LocalMetadataStore, 
> using an existing feature in guava's cache implementation. Expiry is set with 
> {{fs.s3a.s3guard.local.ttl}}.
> * LocalMetadataStore's TTL and this TTL is different. That TTL is using the 
> guava cache's internal solution for the TTL of these entries. This is an 
> S3AFileSystem level solution in S3Guard, a layer above all metadata store.
> * This is not the same, and not using the [DDB's TTL 
> feature|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html].
>  We need a different behavior than what ddb promises: [cleaning once a day 
> with a background 
> job|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html]
>  is not usable for this feature - although it can be used as a general 
> cleanup solution separately and independently from S3Guard.
> * Use the same ttl for entries and authoritative directory listing
> * All entries can be expired. Then the returned metadata from the MS will be 
> null.
> * Add two new methods pruneExpiredTtl() and pruneExpiredTtl(String keyPrefix) 
> to MetadataStore interface. These methods will delete all expired metadata 
> from the ms.
> * Use last_updated field in ms for both file metadata and authoritative 
> directory expiry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16278) With S3A Filesystem, Long Running services End up Doing lot of GC and eventually die

2019-05-08 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835783#comment-16835783
 ] 

Aaron Fabbri commented on HADOOP-16278:
---

Agreed, +1 this simple patch stopping the quantiles on FS close.

> With S3A Filesystem, Long Running services End up Doing lot of GC and 
> eventually die
> 
>
> Key: HADOOP-16278
> URL: https://issues.apache.org/jira/browse/HADOOP-16278
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, hadoop-aws, metrics
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Rajat Khandelwal
>Priority: Major
> Fix For: 3.1.3
>
> Attachments: HADOOP-16278.patch, Screenshot 2019-04-30 at 12.52.42 
> PM.png, Screenshot 2019-04-30 at 2.33.59 PM.png
>
>
> I'll start with the symptoms and eventually come to the cause. 
>  
> We are using HDP 3.1 and Noticed that every couple of days the Hive Metastore 
> starts doing GC, sometimes with 30 minute long pauses. Although nothing is 
> collected and the Heap remains fully used. 
>  
> Next, we looked at the Heap Dump and found that 99% of the memory is taken up 
> by one Executor Service for its task queue. 
>  
> !Screenshot 2019-04-30 at 12.52.42 PM.png!
> The Instance is Created like this:
> {{ private static final ScheduledExecutorService scheduler = Executors}}
>  {{ .newScheduledThreadPool(1, new ThreadFactoryBuilder().setDaemon(true)}}
>  {{ .setNameFormat("MutableQuantiles-%d").build());}}
>  
> So All the instances of MutableQuantiles are using a Shared single threaded 
> ExecutorService
> The second thing to notice is this block of code in the Constructor of 
> MutableQuantiles:
> {{this.scheduledTask = scheduler.scheduleAtFixedRate(new 
> MutableQuantiles.RolloverSample(this), (long)interval, (long)interval, 
> TimeUnit.SECONDS);}}
> So As soon as a MutableQuantiles Instance is created, one task is scheduled 
> at Fix Rate. Instead of that, it could schedule them at Fixed Delay (Refer 
> HADOOP-16248). 
> Now coming to why it's related to S3. 
>  
> S3AFileSystem Creates an instance of S3AInstrumentation, which creates two 
> quantiles (related to S3Guard) with 1s(hardcoded) interval and leaves them 
> hanging. By hanging I mean perpetually scheduled. As and when new Instances 
> of S3AFileSystem are created, two new quantiles are created, which in turn 
> create two scheduled tasks and never cancel them. This way number of 
> scheduled tasks keeps on growing without ever getting cleaned up, leading to 
> GC/OOM/Crash. 
>  
> MutableQuantiles has a numInfo field which tells things like the name of the 
> metric. From the Heapdump, I found one numInfo and traced all objects 
> referencing that.
>  
> !Screenshot 2019-04-30 at 2.33.59 PM.png!
>  
> There seem to be 300K objects of for the same metric 
> (S3Guard_metadatastore_throttle_rate). 
> As expected, there are other 300K objects for the other MutableQuantiles 
> created by S3AInstrumentation class. 
> Although the number of instances of S3AInstrumentation class is only 4. 
> Clearly, there is a leak. One S3AInstrumentation instance is creating two 
> scheduled tasks to be run every second. These tasks are left scheduled and 
> not cancelled when S3AInstrumentation.close() is called. Hence, they are 
> never cleaned up. GC is also not able to collect them since they are referred 
> by the scheduler. 
> Who creates S3AInstrumentation instances? S3AFileSystem.initialize(), which 
> is called in FileSystem.get(URI, Configuration). Since hive metastore is a 
> service that deals with a lot of Path Objects and hence needs to do a lot of 
> calls to FileSystem.get, it's the one to first shows these symptoms. 
> We're seeing similar symptoms in AM for long-running jobs (for both Tez AM 
> and MR AM). 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16278) With S3A Filesystem, Long Running services End up Doing lot of GC and eventually die

2019-05-08 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835783#comment-16835783
 ] 

Aaron Fabbri edited comment on HADOOP-16278 at 5/8/19 5:47 PM:
---

Agreed, +1 this simple patch stopping the quantiles on FS close. Also wanted to 
say nice work on this Jira [~prongs].


was (Author: fabbri):
Agreed, +1 this simple patch stopping the quantiles on FS close.

> With S3A Filesystem, Long Running services End up Doing lot of GC and 
> eventually die
> 
>
> Key: HADOOP-16278
> URL: https://issues.apache.org/jira/browse/HADOOP-16278
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, hadoop-aws, metrics
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Rajat Khandelwal
>Priority: Major
> Fix For: 3.1.3
>
> Attachments: HADOOP-16278.patch, Screenshot 2019-04-30 at 12.52.42 
> PM.png, Screenshot 2019-04-30 at 2.33.59 PM.png
>
>
> I'll start with the symptoms and eventually come to the cause. 
>  
> We are using HDP 3.1 and Noticed that every couple of days the Hive Metastore 
> starts doing GC, sometimes with 30 minute long pauses. Although nothing is 
> collected and the Heap remains fully used. 
>  
> Next, we looked at the Heap Dump and found that 99% of the memory is taken up 
> by one Executor Service for its task queue. 
>  
> !Screenshot 2019-04-30 at 12.52.42 PM.png!
> The Instance is Created like this:
> {{ private static final ScheduledExecutorService scheduler = Executors}}
>  {{ .newScheduledThreadPool(1, new ThreadFactoryBuilder().setDaemon(true)}}
>  {{ .setNameFormat("MutableQuantiles-%d").build());}}
>  
> So All the instances of MutableQuantiles are using a Shared single threaded 
> ExecutorService
> The second thing to notice is this block of code in the Constructor of 
> MutableQuantiles:
> {{this.scheduledTask = scheduler.scheduleAtFixedRate(new 
> MutableQuantiles.RolloverSample(this), (long)interval, (long)interval, 
> TimeUnit.SECONDS);}}
> So As soon as a MutableQuantiles Instance is created, one task is scheduled 
> at Fix Rate. Instead of that, it could schedule them at Fixed Delay (Refer 
> HADOOP-16248). 
> Now coming to why it's related to S3. 
>  
> S3AFileSystem Creates an instance of S3AInstrumentation, which creates two 
> quantiles (related to S3Guard) with 1s(hardcoded) interval and leaves them 
> hanging. By hanging I mean perpetually scheduled. As and when new Instances 
> of S3AFileSystem are created, two new quantiles are created, which in turn 
> create two scheduled tasks and never cancel them. This way number of 
> scheduled tasks keeps on growing without ever getting cleaned up, leading to 
> GC/OOM/Crash. 
>  
> MutableQuantiles has a numInfo field which tells things like the name of the 
> metric. From the Heapdump, I found one numInfo and traced all objects 
> referencing that.
>  
> !Screenshot 2019-04-30 at 2.33.59 PM.png!
>  
> There seem to be 300K objects of for the same metric 
> (S3Guard_metadatastore_throttle_rate). 
> As expected, there are other 300K objects for the other MutableQuantiles 
> created by S3AInstrumentation class. 
> Although the number of instances of S3AInstrumentation class is only 4. 
> Clearly, there is a leak. One S3AInstrumentation instance is creating two 
> scheduled tasks to be run every second. These tasks are left scheduled and 
> not cancelled when S3AInstrumentation.close() is called. Hence, they are 
> never cleaned up. GC is also not able to collect them since they are referred 
> by the scheduler. 
> Who creates S3AInstrumentation instances? S3AFileSystem.initialize(), which 
> is called in FileSystem.get(URI, Configuration). Since hive metastore is a 
> service that deals with a lot of Path Objects and hence needs to do a lot of 
> calls to FileSystem.get, it's the one to first shows these symptoms. 
> We're seeing similar symptoms in AM for long-running jobs (for both Tez AM 
> and MR AM). 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16269) ABFS: add listFileStatus with StartFrom

2019-05-08 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835778#comment-16835778
 ] 

Aaron Fabbri commented on HADOOP-16269:
---

This was on my todo list today but [~ste...@apache.org] beat me to it. Thanks 
for the contribution [~DanielZhou] and the commit Steve.

> ABFS: add listFileStatus with StartFrom
> ---
>
> Key: HADOOP-16269
> URL: https://issues.apache.org/jira/browse/HADOOP-16269
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-16269-001.patch, HADOOP-16269-002.patch, 
> HADOOP-16269-003.patch
>
>
> Adding a ListFileStatus in a path from a entry name in lexical order.
> This is added to AzureBlobFileSystemStore and won't be exposed to FS level 
> api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16291) HDFS Permissions Guide appears incorrect about getFileStatus()/getFileInfo()

2019-05-03 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832644#comment-16832644
 ] 

Aaron Fabbri commented on HADOOP-16291:
---

Thanks [~daryn]. Thought it was strange the docs were wrong for this long. Was 
going to ask for a sanity check on this JIRA but you beat me to it.

> HDFS Permissions Guide appears incorrect about getFileStatus()/getFileInfo()
> 
>
> Key: HADOOP-16291
> URL: https://issues.apache.org/jira/browse/HADOOP-16291
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Reporter: Aaron Fabbri
>Priority: Minor
>  Labels: newbie
>
> Fix some errors in the HDFS Permissions doc.
> Noticed this when reviewing HADOOP-16251. The FS Permissions 
> [documentation|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html]
>  seems to mark a lot of permissions as Not Applicable (N/A) when that is not 
> the case. In particular getFileInfo (getFileStatus) checks READ permission 
> bit 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3202-L3204],
>  as it should.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16251) ABFS: add FSMainOperationsBaseTest

2019-05-02 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831967#comment-16831967
 ] 

Aaron Fabbri commented on HADOOP-16251:
---

FYI: Filed HADOOP-16291 to address the errors in the FS permissions docs.

> ABFS: add FSMainOperationsBaseTest
> --
>
> Key: HADOOP-16251
> URL: https://issues.apache.org/jira/browse/HADOOP-16251
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
>
> Just happened to see 
> "hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FSMainOperationsBaseTest.java",
>  ABFS could inherit this test to increase its test coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16291) HDFS Permissions Guide appears incorrect about getFileStatus()/getFileInfo()

2019-05-02 Thread Aaron Fabbri (JIRA)
Aaron Fabbri created HADOOP-16291:
-

 Summary: HDFS Permissions Guide appears incorrect about 
getFileStatus()/getFileInfo()
 Key: HADOOP-16291
 URL: https://issues.apache.org/jira/browse/HADOOP-16291
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Reporter: Aaron Fabbri


Fix some errors in the HDFS Permissions doc.

Noticed this when reviewing HADOOP-16251. The FS Permissions 
[documentation|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html]
 seems to mark a lot of permissions as Not Applicable (N/A) when that is not 
the case. In particular getFileInfo (getFileStatus) checks READ permission bit 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3202-L3204],
 as it should.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16291) HDFS Permissions Guide appears incorrect about getFileStatus()/getFileInfo()

2019-05-02 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-16291:
--
Labels: newbie  (was: )

> HDFS Permissions Guide appears incorrect about getFileStatus()/getFileInfo()
> 
>
> Key: HADOOP-16291
> URL: https://issues.apache.org/jira/browse/HADOOP-16291
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Reporter: Aaron Fabbri
>Priority: Minor
>  Labels: newbie
>
> Fix some errors in the HDFS Permissions doc.
> Noticed this when reviewing HADOOP-16251. The FS Permissions 
> [documentation|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html]
>  seems to mark a lot of permissions as Not Applicable (N/A) when that is not 
> the case. In particular getFileInfo (getFileStatus) checks READ permission 
> bit 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3202-L3204],
>  as it should.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16251) ABFS: add FSMainOperationsBaseTest

2019-05-02 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831936#comment-16831936
 ] 

Aaron Fabbri edited comment on HADOOP-16251 at 5/2/19 8:31 PM:
---

Thanks for the patch [~DanielZhou]. We really appreciate you adding extra test 
coverage for cloud filesystems (ABFS)

Couple of questions about the patch:
{noformat}
@Ignore("There shouldn't be permission check for getFileInfo")
public void 
testListStatusThrowsExceptionForUnreadableDir() {{noformat}
Since this is a listing test, wouldn't the READ | EXECUTE checks still be valid?

Also, I'm surprised about getFileStatus / getFileInfo being listed as "N/A" for 
permission checks. It seems wrong from security perspective and also looking at 
the code doesn't seem to be the case - see this 
[link|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3202-L3204]:
{noformat}
HdfsFileStatus getFileInfo(final String src, boolean resolveLink,
boolean needLocation, boolean needBlockToken) throws IOException {
  // if the client requests block tokens, then it can read data blocks
  // and should appear in the audit log as if getBlockLocations had been
  // called
  final String operationName = needBlockToken ? "open" : "getfileinfo";
  checkOperation(OperationCategory.READ);
  HdfsFileStatus stat = null;
  final FSPermissionChecker pc = getPermissionChecker();
  readLock();
  try {
checkOperation(OperationCategory.READ);
stat = FSDirStatAndListingOp.getFileInfo({noformat}
Looks like the HDFS Permissions doc is incorrect, no?


was (Author: fabbri):
Thanks for the patch [~DanielZhou]. We really appreciate you adding extra test 
coverage for cloud filesystems (ABFS)

Couple of questions about the patch:
{noformat}
@Ignore("There shouldn't be permission check for getFileInfo")
public void 
testListStatusThrowsExceptionForUnreadableDir() {{noformat}
Since this is a listing test, wouldn't the READ | EXECUTE checks still be valid?

Also, I'm surprised about getFileStatus / getFileInfo being listed as "N/A" for 
permission checks. It seems wrong from security perspective and also looking at 
the code doesn't seem to be the case 
([link|[https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3202-L3204]]):
{noformat}
HdfsFileStatus getFileInfo(final String src, boolean resolveLink,
boolean needLocation, boolean needBlockToken) throws IOException {
  // if the client requests block tokens, then it can read data blocks
  // and should appear in the audit log as if getBlockLocations had been
  // called
  final String operationName = needBlockToken ? "open" : "getfileinfo";
  checkOperation(OperationCategory.READ);
  HdfsFileStatus stat = null;
  final FSPermissionChecker pc = getPermissionChecker();
  readLock();
  try {
checkOperation(OperationCategory.READ);
stat = FSDirStatAndListingOp.getFileInfo({noformat}
Looks like the HDFS Permissions doc is incorrect, no?

> ABFS: add FSMainOperationsBaseTest
> --
>
> Key: HADOOP-16251
> URL: https://issues.apache.org/jira/browse/HADOOP-16251
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
>
> Just happened to see 
> "hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FSMainOperationsBaseTest.java",
>  ABFS could inherit this test to increase its test coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16251) ABFS: add FSMainOperationsBaseTest

2019-05-02 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831936#comment-16831936
 ] 

Aaron Fabbri commented on HADOOP-16251:
---

Thanks for the patch [~DanielZhou]. We really appreciate you adding extra test 
coverage for cloud filesystems (ABFS)

Couple of questions about the patch:
{noformat}
@Ignore("There shouldn't be permission check for getFileInfo")
public void 
testListStatusThrowsExceptionForUnreadableDir() {{noformat}
Since this is a listing test, wouldn't the READ | EXECUTE checks still be valid?

Also, I'm surprised about getFileStatus / getFileInfo being listed as "N/A" for 
permission checks. It seems wrong from security perspective and also looking at 
the code doesn't seem to be the case 
([link|[https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3202-L3204]]):
{noformat}
HdfsFileStatus getFileInfo(final String src, boolean resolveLink,
boolean needLocation, boolean needBlockToken) throws IOException {
  // if the client requests block tokens, then it can read data blocks
  // and should appear in the audit log as if getBlockLocations had been
  // called
  final String operationName = needBlockToken ? "open" : "getfileinfo";
  checkOperation(OperationCategory.READ);
  HdfsFileStatus stat = null;
  final FSPermissionChecker pc = getPermissionChecker();
  readLock();
  try {
checkOperation(OperationCategory.READ);
stat = FSDirStatAndListingOp.getFileInfo({noformat}
Looks like the HDFS Permissions doc is incorrect, no?

> ABFS: add FSMainOperationsBaseTest
> --
>
> Key: HADOOP-16251
> URL: https://issues.apache.org/jira/browse/HADOOP-16251
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
>
> Just happened to see 
> "hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FSMainOperationsBaseTest.java",
>  ABFS could inherit this test to increase its test coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16221) S3Guard: fail write that doesn't update metadata store

2019-04-30 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830848#comment-16830848
 ] 

Aaron Fabbri commented on HADOOP-16221:
---

Sorry I didn't see this until now. Thanks for the contribution and 
documentation.

I'll give some background on the existing logic at least.

As you can see we generally chose to fall back to raw s3 behavior when there 
are failures with the Metadata Store. S3Guard was targeted to existing S3 
customers so that made sense to me.

The MetadataStore is conceptually a "trailling log of metadata changes made to 
S3". You can also think of it as a consistency hint. There are are few 
guarantees with the semantics that S3 exposes (e.g. no upper bound on eventual 
consistency time–think about what that means for your write. You need a write 
journal w/ fast scalable queries and transactions to really solve this but 
you'd be better off ditching S3 for a real storage system IMO..).

We are logging things that already happened in S3. With error semantics, if you 
mutate s3 but fail to mutate MetadataStore I thought you should either (1) roll 
back transaction and return failure or (2) don't rollback and return success. 
#1 is seen as too complex and slow to do right above S3 but #2 returns success 
after successful mutation of S3 state.

So this was an intentional decision, not to return failure when you 
successfully write a file to S3.  As you note, there is no roll back.

I can see the argument for doing it the new way as well.. My bias is that it is 
important to know whether or not you actually wrote data to the backing store. 
Spent some time in finance (the wrong write can cost you) and storage companies 
which sort of formed my bias.

Essentially both options are "wrong". Before, we'd return success but give up 
consistency hints on that path, now we return failure even though we wrote the 
data to S3.

In lieu of a real storage system, I think having this well-documented and 
configurable is fine. The retries on a MetadataStore are pretty robust where 
failures should be pretty rare.

Hope this background was interesting. Feel free to email me if you ever need a 
review. My email filters tend to catch a lot of stuff that I should have 
noticed.

 

> S3Guard: fail write that doesn't update metadata store
> --
>
> Key: HADOOP-16221
> URL: https://issues.apache.org/jira/browse/HADOOP-16221
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Ben Roling
>Assignee: Ben Roling
>Priority: Major
> Fix For: 3.3.0
>
>
> Right now, a failure to write to the S3Guard metadata store (e.g. DynamoDB) 
> is [merely 
> logged|https://github.com/apache/hadoop/blob/rel/release-3.1.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2708-L2712].
>  It does not fail the S3AFileSystem write operation itself. As such, the 
> writer has no idea that anything went wrong. The implication of this is that 
> S3Guard doesn't always provide the consistency it advertises.
> For example [this 
> article|https://blog.cloudera.com/blog/2017/08/introducing-s3guard-s3-consistency-for-apache-hadoop/]
>  states:
> {quote}If a Hadoop S3A client creates or moves a file, and then a client 
> lists its directory, that file is now guaranteed to be included in the 
> listing.
> {quote}
> Unfortunately, this is sort of untrue and could result in exactly the sort of 
> problem S3Guard is supposed to avoid:
> {quote}Missing data that is silently dropped. Multi-step Hadoop jobs that 
> depend on output of previous jobs may silently omit some data. This omission 
> happens when a job chooses which files to consume based on a directory 
> listing, which may not include recently-written items.
> {quote}
> Imagine the typical multi-job Hadoop processing pipeline. Job 1 runs and 
> succeeds, but one (or more) S3Guard metadata write failed under the covers. 
> Job 2 picks up the output directory from Job 1 and runs its processing, 
> potentially seeing an inconsistent listing, silently missing some of the Job 
> 1 output files.
> S3Guard should at least provide a configuration option to fail if the 
> metadata write fails. It seems even ideally this should be the default?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16252) Use configurable dynamo table name prefix in S3Guard tests

2019-04-17 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820403#comment-16820403
 ] 

Aaron Fabbri commented on HADOOP-16252:
---

+1 LGTM. You're only affecting the table names used for lifecycle tests 
(create/destroy) which is good. 

 

I see you mentioned your testing in the PR. (Note we usually also declare which 
AWS region we run tests in as the different regions have differed 
historically).  Thanks!

 

> Use configurable dynamo table name prefix in S3Guard tests
> --
>
> Key: HADOOP-16252
> URL: https://issues.apache.org/jira/browse/HADOOP-16252
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Ben Roling
>Priority: Major
>
> Table names are hardcoded into tests for S3Guard with DynamoDB.  This makes 
> it awkward to set up a least-privilege type AWS IAM user or role that can 
> successfully execute the full test suite.  You either have to know all the 
> specific hardcoded table names and give the user Dynamo read/write access to 
> those by name or just give blanket read/write access to all Dynamo tables in 
> the account.
> I propose the tests use a configuration property to specify a prefix for the 
> table names used.  Then the full test suite can be run by a user that is 
> given read/write access to all tables with names starting with the configured 
> prefix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16118) S3Guard to support on-demand DDB tables

2019-04-10 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815057#comment-16815057
 ] 

Aaron Fabbri commented on HADOOP-16118:
---

+1 on the patch. Thank you for the contribution [~ste...@apache.org].
 * Thanks for pulling DDBCapacities out into its own file.
 * Test modifications look good.
 * Good doc updates.

> S3Guard to support on-demand DDB tables
> ---
>
> Key: HADOOP-16118
> URL: https://issues.apache.org/jira/browse/HADOOP-16118
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> AWS now supports [on demand DDB 
> capacity|https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-no-capacity-planning-and-pay-per-request-pricing/]
>  
> This has lowest cost and best scalability, so could be the default capacity. 
> + add a new option to set-capacity.
> Will depend on an SDK update: created HADOOP-16117.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16193) add extra S3A MPU test to see what happens if a file is created during the MPU

2019-04-04 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810320#comment-16810320
 ] 

Aaron Fabbri commented on HADOOP-16193:
---

Patch looks good to me but I had a failure running the test:

{noformat}
[INFO] Running 
org.apache.hadoop.fs.contract.s3a.ITestS3AContractMultipartUploader
[ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 57.812 
s <<< FAILURE! - in 
org.apache.hadoop.fs.contract.s3a.ITestS3AContractMultipartUploader
[ERROR] 
testMultipartOverlapWithTransientFile(org.apache.hadoop.fs.contract.s3a.ITestS3AContractMultipartUploader)
  Time elapsed: 32.224 s  <<< ERROR!
java.io.FileNotFoundException: No such file or directory: 
s3a://fabbri-self/test/testMultipartOverlapWithTransientFile
at 
org.apache.hadoop.fs.contract.s3a.ITestS3AContractMultipartUploader.lambda$testMultipartOverlapWithTransientFile$0(ITestS3AContractMultipartUploader.java:210)
at 
org.apache.hadoop.fs.contract.s3a.ITestS3AContractMultipartUploader.testMultipartOverlapWithTransientFile(ITestS3AContractMultipartUploader.java:209)

[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors: 
[ERROR]   
ITestS3AContractMultipartUploader.testMultipartOverlapWithTransientFile:209->lambda$testMultipartOverlapWithTransientFile$0:210
 » FileNotFound
[INFO]
[ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 1

{noformat}

> add extra S3A MPU test to see what happens if a file is created during the MPU
> --
>
> Key: HADOOP-16193
> URL: https://issues.apache.org/jira/browse/HADOOP-16193
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> Proposed extra test for the S3A MPU: if you create and then delete a file 
> while an MPU is in progress, when you finally complete the MPU the new data 
> is present.
> This verifies that the other FS operations don't somehow cancel the 
> in-progress upload, and that eventual consistency brings the latest value out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16188) s3a rename failed during copy, "Unable to copy part" + 200 error code

2019-03-14 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793058#comment-16793058
 ] 

Aaron Fabbri commented on HADOOP-16188:
---

uhhh.. I'm shaking my head here. S3 is sending a response that the SDK fails on 
so the connection doesn't time out?

Not saying we shouldn't retry ourselves, just commentary on the state of the S3 
storage stack. Feels like the SDK should retry.



> s3a rename failed during copy, "Unable to copy part" + 200 error code
> -
>
> Key: HADOOP-16188
> URL: https://issues.apache.org/jira/browse/HADOOP-16188
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Priority: Minor
>
> Error during a rename where AWS S3 seems to have some internal error *which 
> is not retried and returns status code 200"
> {code}
> com.amazonaws.SdkClientException: Unable to copy part: We encountered an 
> internal error. Please try again. (Service: Amazon S3; Status Code: 200; 
> Error Code: InternalError;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16169) ABFS: Bug fix for getPathProperties

2019-03-07 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787507#comment-16787507
 ] 

Aaron Fabbri commented on HADOOP-16169:
---

+1 Patch looks good to me [~DanielZhou].  Yetus is happy and you declared your 
integration tests, thank you. I wasn't able to test this today.. I need to get 
a new Azure account provisioned first.

> ABFS: Bug fix for getPathProperties
> ---
>
> Key: HADOOP-16169
> URL: https://issues.apache.org/jira/browse/HADOOP-16169
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-16169-001.patch
>
>
> There is a bug in AbfsClient, getPathProperties().
> For both xns accnout and non-xns account, it should use 
> AbfsRestOperationType.GetPathStatus.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16119) KMS on Hadoop RPC Engine

2019-02-28 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780873#comment-16780873
 ] 

Aaron Fabbri commented on HADOOP-16119:
---

Thank you for writing this up [~jojochuang]. The doc looks good.

> KMS on Hadoop RPC Engine
> 
>
> Key: HADOOP-16119
> URL: https://issues.apache.org/jira/browse/HADOOP-16119
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: Design doc_ KMS v2.pdf
>
>
> Per discussion on common-dev and text copied here for ease of reference.
> https://lists.apache.org/thread.html/0e2eeaf07b013f17fad6d362393f53d52041828feec53dcddff04808@%3Ccommon-dev.hadoop.apache.org%3E
> {noformat}
> Thanks all for the inputs,
> To offer additional information (while Daryn is working on his stuff),
> optimizing RPC encryption opens up another possibility: migrating KMS
> service to use Hadoop RPC.
> Today's KMS uses HTTPS + REST API, much like webhdfs. It has very
> undesirable performance (a few thousand ops per second) compared to
> NameNode. Unfortunately for each NameNode namespace operation you also need
> to access KMS too.
> Migrating KMS to Hadoop RPC greatly improves its performance (if
> implemented correctly), and RPC encryption would be a prerequisite. So
> please keep that in mind when discussing the Hadoop RPC encryption
> improvements. Cloudera is very interested to help with the Hadoop RPC
> encryption project because a lot of our customers are using at-rest
> encryption, and some of them are starting to hit KMS performance limit.
> This whole "migrating KMS to Hadoop RPC" was Daryn's idea. I heard this
> idea in the meetup and I am very thrilled to see this happening because it
> is a real issue bothering some of our customers, and I suspect it is the
> right solution to address this tech debt.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16085) S3Guard: use object version to protect against inconsistent read after replace/overwrite

2019-02-07 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763078#comment-16763078
 ] 

Aaron Fabbri commented on HADOOP-16085:
---

{quote}My other concern is that this requires enabling object versioning. I 
know Aaron Fabbri has done some testing with that and I think eventually hit 
issues. Was it just a matter of the space all the versions were taking up, or 
was it actually a performance problem once there was enough overhead?{quote}

Yeah I had  "broken" certain paths (keys) on an s3 bucket by leaving versioning 
enabled on a dev bucket where I'd frequently delete and recreate the same keys. 
 There appeared to be some scalability limit on the number of versions a 
particular key can have.  So the lifecycle policy to purge old versions would 
be important I think.

I share [~ste...@apache.org]'s hesitation on doing this all in S3Guard.  Just 
from experience with all the corner cases and S3 flakiness. I'm glad you are 
looking into it and prototyping, though; we want more people to learn this 
codebase.


> S3Guard: use object version to protect against inconsistent read after 
> replace/overwrite
> 
>
> Key: HADOOP-16085
> URL: https://issues.apache.org/jira/browse/HADOOP-16085
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Ben Roling
>Priority: Major
> Attachments: HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions.  If a file is written in 
> S3A with S3Guard and then subsequently overwritten, there is no protection 
> against the next reader seeing the old version of the file instead of the new 
> one.
> It seems like the S3Guard metadata could track the S3 object version.  When a 
> file is created or updated, the object version could be written to the 
> S3Guard metadata.  When a file is read, the read out of S3 could be performed 
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my 
> impression from looking through the code.  My organization is looking to 
> shift some datasets stored in HDFS over to S3 and is concerned about this 
> potential issue as there are some cases in our codebase that would do an 
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite 
> track down any JIRAs discussing it.  If there is one, feel free to close this 
> with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback 
> that could be provided would be appreciated.  We may consider crafting a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16085) S3Guard: use object version to protect against inconsistent read after replace/overwrite

2019-01-31 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757837#comment-16757837
 ] 

Aaron Fabbri commented on HADOOP-16085:
---

Hi guys. We've thought about this issue a little in the past. You are right 
that S3Guard mostly focuses on metadata consistency. There is some degree of 
data consistency added (e.g. it stops you from reading deleted files or from 
missing recently created ones), but we don't store etags or object versions 
today.

Working on a patch would be a good learning experience for the codebase, which 
I encourage. Also feel free to send S3Guard questions our way (even better ask 
on the email list and cc: us so others can learn as well.) The implementation 
would need to consider some things (off the top of my head) below. Not 
necessary for an RFC patch but hope it helps with the concepts.
 - Should be zero extra round trips when turned off (expense in $ and 
performance).
 - Would want to figure out where we'd need additional round trips and decide 
if it is worth it. Tests that assert certain number of S3 ops will need to be 
made aware, and documentation should outline the marginal cost of the feature).
 - What is the conflict resolution policy and how is it configured? If we get 
an unexpected etag/version on read, what do we do? (e.g. retry policy then give 
up, or retry then serve non-matching data. In latter case, do we update the 
S3Guard MetadataStore with the etag/version we ended up getting from S3?)
 - The racing writer issue. IIRC two writers racing to write the same object 
(path) in S3 cannot tell which of them will actually have their version 
materialized, unless versioning is turned on. This means if we supported this 
feature without versioning (just etags) it would be prone to the same sort of 
concurrent modification races that S3 has today. We at least need to document 
the behavior.
 - Backward / forward compatible with existing S3Guarded buckets and Dynamo 
tables.
 - Understand and document any interactions with MetadataStore expiry (related 
jira). In general, data can be expired or purged from the MetadataStore and the 
only negative consequence should be falling back to raw-S3 like consistency 
temporarily. This allows demand-loading the MetadataStore and implementing 
caching with the same APIs.
 - Another semi-related Jira to check out 
[here|https://issues.apache.org/jira/browse/HADOOP-15779].

> S3Guard: use object version to protect against inconsistent read after 
> replace/overwrite
> 
>
> Key: HADOOP-16085
> URL: https://issues.apache.org/jira/browse/HADOOP-16085
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Ben Roling
>Priority: Major
>
> Currently S3Guard doesn't track S3 object versions.  If a file is written in 
> S3A with S3Guard and then subsequently overwritten, there is no protection 
> against the next reader seeing the old version of the file instead of the new 
> one.
> It seems like the S3Guard metadata could track the S3 object version.  When a 
> file is created or updated, the object version could be written to the 
> S3Guard metadata.  When a file is read, the read out of S3 could be performed 
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my 
> impression from looking through the code.  My organization is looking to 
> shift some datasets stored in HDFS over to S3 and is concerned about this 
> potential issue as there are some cases in our codebase that would do an 
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite 
> track down any JIRAs discussing it.  If there is one, feel free to close this 
> with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback 
> that could be provided would be appreciated.  We may consider crafting a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile() + S3 Select

2019-01-24 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751744#comment-16751744
 ] 

Aaron Fabbri commented on HADOOP-15229:
---

I looked at the latest large patch today. Also played around with the builder / 
future stuff. Ran a subset (due to cost) of integration tests including the 
select tests and some contract tests.

This LGTM (+1) assuming yetus is happy and other folks' have had a chance to 
conclude their reviews. Please do file followup JIRAs and link here (e.g. more 
MR integration, etc).


> Add FileSystem builder-based openFile() API to match createFile() + S3 Select
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, 
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, 
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch, 
> HADOOP-15229-015.patch, HADOOP-15229-016.patch, HADOOP-15229-017.patch, 
> HADOOP-15229-018.patch, HADOOP-15229-019.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile() + S3 Select

2019-01-18 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746835#comment-16746835
 ] 

Aaron Fabbri commented on HADOOP-15229:
---

Thanks for the discussion [~ste...@apache.org]. No objections here.

If this is still open next week I'll try to review the latest patch and play 
around with the API some. Sorry I didn't get to it this week.

> Add FileSystem builder-based openFile() API to match createFile() + S3 Select
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, 
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, 
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch, 
> HADOOP-15229-015.patch, HADOOP-15229-016.patch, HADOOP-15229-017.patch, 
> HADOOP-15229-018.patch, HADOOP-15229-019.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2018-12-20 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726307#comment-16726307
 ] 

Aaron Fabbri commented on HADOOP-15229:
---

Finished the rest of the v14 diff. Looks good overall. I only have small nits 
(above), nothing major.

Documentation is thorough and clear. I ran all the integration tests in US West 
2.  On a fresh development box I see this:
{noformat}
Running org.apache.hadoop.fs.s3a.ITestS3ATemporaryCredentials
[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.468 s 
<<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3ATemporaryCredentials
[ERROR] testSTS(org.apache.hadoop.fs.s3a.ITestS3ATemporaryCredentials) Time 
elapsed: 7.973 s <<< ERROR!
com.amazonaws.SdkClientException: Unable to find a region via the region 
provider chain. Must provide an explicit region in the builder or setup 
environment to supply a region.
at 
org.apache.hadoop.fs.s3a.ITestS3ATemporaryCredentials.testSTS(ITestS3ATemporaryCredentials.java:90)
{noformat}

As well as some FS Closed IO exceptions (HADOOP-15819).

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, 
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, 
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-12-20 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726305#comment-16726305
 ] 

Aaron Fabbri commented on HADOOP-15819:
---

Just catching up on this as I hit the bug again. Sounds like a great find 
[~adam.antal].

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Adam Antal
>Priority: Critical
> Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, 
> HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): 
> java.lang.RuntimeException: Using closed FS!.
>   at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73)
>   at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474)
>   at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
>   at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTest

[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2018-12-20 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726235#comment-16726235
 ] 

Aaron Fabbri commented on HADOOP-15229:
---

Nice work here, as usual [~ste...@apache.org].

{noformat}
+ * the actual outcome i in the returned {@code CompletableFuture}.
{noformat}

/i/is/  This typo is pasted a couple of times.

{noformat}
+   * @throws IOException failure to resolve the link.
+   * @throws IllegalArgumentException unknown mandatory key
+   * @throws UnsupportedOperationException PathHandles are not supported.
+   */
+  protected CompletableFuture openFileWithOptions(
{noformat}
Do you want to mention that all except the UnsupportedOp exceptions are not 
thrown here but deferred to the future's get() call?

Also, are these methods still "throws IOException"?

You can elide the whitespace change to HarFileSystem.java,

Glad S3AOpContext is working out as a place to stash per-op stuff.

{noformat}

+  // method not allowed; seen on S3 Select.
+  // treated as a bad request
+  case 405:
+ioe = new AWSBadRequestException(message, s3Exception);
+break;
{noformat}
Good catch.

{noformat}
+case SelectTool.NAME:
+  // the select tool is not technically a S3Guard tool, but it's on the CLI
+  // because this is the defacto S3 CLI.
{noformat}
Indeed. 

{noformat}
 /**
+   * Closed bit. Volatile so reads are non-blocking.
+   * Updates must be in a synchronized block to guarantee an atomic check and
+   * set
+   */
{noformat}
Out of date comment (it is now an atomic). Volatile mentioned again in another 
comment. Minor nits here.

I'm not finding bugs in the actual code but here's one in the doc's example 
code ;-)

{noformat}

+try (FSDataInputStream select = future.get()) {
+  // process the output
+  stream.read();
+}
{noformat}
/stream/select/

On the current seek() implementation of the Select input stream, what are the 
next enhancements you think we will need?  Can you elaborate a bit on the need 
for single-byte reads as a seek implementation? Is it a limitation of the 
underlying AWS stream or SELECT rest API?

I got to around line 6000 in the diff and am out of time for now. Will follow 
up with comments on the rest soon.









> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, 
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, 
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2018-12-20 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726157#comment-16726157
 ] 

Aaron Fabbri commented on HADOOP-15229:
---

Applying and testing the v14 patch now while I read over the diff.

 

Replying to older comment, [~ste...@apache.org]:
{quote}I'm starting to stare at S3AFilesystem and then at Beck and Fowler 99, 
"refactoring". No real plan yet, but its too big and we could think about a 
clearer Model-View approach. 
{quote}
I also thought about trying a top half / bottom half approach. The bottom half 
interacts with S3 (AWS SDK) and it not public (except for testing). The top 
does everything else. Would have to actually go in and try some refactoring to 
see how it works though.

 

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, 
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, 
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16015) Add bouncycastle jars to hadoop-aws as test dependencies

2018-12-19 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725504#comment-16725504
 ] 

Aaron Fabbri commented on HADOOP-16015:
---

+1 Looks good to me.

> Add bouncycastle jars to hadoop-aws as test dependencies
> 
>
> Key: HADOOP-16015
> URL: https://issues.apache.org/jira/browse/HADOOP-16015
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-16015-001.patch
>
>
> This is a cherrypick of the POM changes from HADOOP-14556: add the 
> bouncy-castle dependencies to the hadoop-aws/pom.xml so the MR cluster tests 
> work again.
> I do want that full patch in, but at least with this there are fewer false 
> negatives on test failures across other feature branchs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14556) S3A to support Delegation Tokens

2018-12-19 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725483#comment-16725483
 ] 

Aaron Fabbri commented on HADOOP-14556:
---

Hi [~ste...@apache.org]. Looks like you are needing reviews on this but it is a 
lot to digest for reviewers. Any interest in doing a show-and-tell (conf call) 
to walk through the patch and answer questions? Totally up to you, and just an 
idea, but if you do this, I'm in. [~gabor.bota] might be interested as well.

> S3A to support Delegation Tokens
> 
>
> Key: HADOOP-14556
> URL: https://issues.apache.org/jira/browse/HADOOP-14556
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14556-001.patch, HADOOP-14556-002.patch, 
> HADOOP-14556-003.patch, HADOOP-14556-004.patch, HADOOP-14556-005.patch, 
> HADOOP-14556-007.patch, HADOOP-14556-008.patch, HADOOP-14556-009.patch, 
> HADOOP-14556-010.patch, HADOOP-14556-010.patch, HADOOP-14556-011.patch, 
> HADOOP-14556-012.patch, HADOOP-14556-013.patch, HADOOP-14556-014.patch, 
> HADOOP-14556-015.patch, HADOOP-14556-016.patch, HADOOP-14556-017.patch, 
> HADOOP-14556-018a.patch, HADOOP-14556-019.patch, HADOOP-14556-020.patch, 
> HADOOP-14556-021.patch, HADOOP-14556-022.patch, HADOOP-14556-023.patch, 
> HADOOP-14556-024.patch, HADOOP-14556.oath-002.patch, HADOOP-14556.oath.patch
>
>
> S3A to support delegation tokens where
> * an authenticated client can request a token via 
> {{FileSystem.getDelegationToken()}}
> * Amazon's token service is used to request short-lived session secret & id; 
> these will be saved in the token and  marshalled with jobs
> * A new authentication provider will look for a token for the current user 
> and authenticate the user if found
> This will not support renewals; the lifespan of a token will be limited to 
> the initial duration. Also, as you can't request an STS token from a 
> temporary session, IAM instances won't be able to issue tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15999) [s3a] Better support for out-of-band operations

2018-12-12 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719456#comment-16719456
 ] 

Aaron Fabbri commented on HADOOP-15999:
---

Was going to link HADOOP-15780 here, but [~gabor.bota] beat me to it.

Currently getFileStatus is always short-circuit as you mentioned. We might just 
want to make that configurable (separate config knob probably). If we are in 
"check both MS and S3" mode, we probably want a configurable or pluggable 
conflict policy. The default would probably be to go into a retry loop waiting 
for both systems (MetadataStore and S3) to agree. After retry policy is 
exhausted, throw error or continue depending on the conflict policy.

Feel feel to ping me for reviews etc.

> [s3a] Better support for out-of-band operations
> ---
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Attachments: out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2018-11-30 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705291#comment-16705291
 ] 

Aaron Fabbri commented on HADOOP-15229:
---

Looks good overall [~ste...@apache.org]. Good suggestions here too. I'd also 
lean towards an interface + composition for common code but defer to you.

As usual, good tests, good docs. Thank you. I prefer separate patches but then 
again good to see the end-to-end example of how this all comes together with 
CLI tool -> builder -> S3A implementation.

Random comments on the patch. Minor wording nits are suggestions,,
{noformat}
+   * Reject a configuration if one or more mandatory keys are
+   * not in the set of mandatory keys.
{noformat}
/not in the set of known keys/

I think you can remove the whitespace change to HarFileSystem.java

The async API is interesting. Folks who want synchronous should wrap with 
LambdaUtils' eval()? I generally favor async apis, but we should keep in mind 
we can always translate between sync and async, and the common case should be 
the default, unless there is another benefit (e.g. performance, error 
behavior). Just some thoughts here. I'm ok with your async API.
{noformat}
+The `openFile()` operation may check the state of the filesystem during this
+call, but as the state of the filesystem may change betwen this call and
+the actual `build()` and `get()` operations, this file-specific
+preconditions (file exists, file is readable, etc) MUST NOT be checked 
here.{noformat}
Took me a second to parse this but you clarify later on. Maybe change last word 
"here" to "in openFile()". Also "this file-specific" to "any file-specific". 
Also spelling on "betwen".
{noformat}
 some aspects of the state of the filesystem, MAY be checked in the initial 
+`openFile()` call, provided they are known to be invariants which will not
+change between `openFile()` and the `build().get()` sequence. For example,
+path validation.{noformat}
Makes sense. Path validation can happen in openFile but path existence, for 
example, must not.

Thanks for pulling SelectBinding stuff into separate class. Haven't tested 
yet.. I'm self-funding my AWS usage these days and I need to set that up still.

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2018-11-27 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701357#comment-16701357
 ] 

Aaron Fabbri commented on HADOOP-15229:
---

Hey [~ste...@apache.org] been on vacation but will try to review and test this 
by end of week. Thanks for working on this.

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15621) S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing

2018-10-03 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636549#comment-16636549
 ] 

Aaron Fabbri commented on HADOOP-15621:
---

Apologies, my mistake. Should be fixed now.

> S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15621.001.patch, HADOOP-15621.002.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
>  2. We could make manual invocation of `hadoop s3guard prune ...` 
> unnecessary, depending on the implementation.
>  3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
>  I think we need a new column in the dynamo table "entry last written time". 
> This is updated each time the entry is written to dynamo.
>  After that we can either
>  1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
>  2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
>  - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15621) S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing

2018-10-02 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-15621:
--
   Resolution: Fixed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for all the work on this [~gabor.bota].

> S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15621.001.patch, HADOOP-15621.002.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
>  2. We could make manual invocation of `hadoop s3guard prune ...` 
> unnecessary, depending on the implementation.
>  3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
>  I think we need a new column in the dynamo table "entry last written time". 
> This is updated each time the entry is written to dynamo.
>  After that we can either
>  1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
>  2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
>  - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15621) S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing

2018-10-02 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636139#comment-16636139
 ] 

Aaron Fabbri commented on HADOOP-15621:
---

v2 patch looks great.  I'm really happy with how small the patch is.  I am 
running through the tests (Portland, OR -> us-west-2) and will commit this 
today if everything looks good.

A couple of minor things, which I will go ahead and fix before committing:
- Checkstyle (changed member to be private)
- S3ATestUtils already has a function isMetadataStoreAuthoritative(), so 
renaming the moved function to metadataStorePersistsAuthoritativeBit(). This 
will be less confusing too.

> S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15621.001.patch, HADOOP-15621.002.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
>  2. We could make manual invocation of `hadoop s3guard prune ...` 
> unnecessary, depending on the implementation.
>  3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
>  I think we need a new column in the dynamo table "entry last written time". 
> This is updated each time the entry is written to dynamo.
>  After that we can either
>  1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
>  2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
>  - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15780) S3Guard: document how to deal with non-S3Guard processes writing data to S3Guarded buckets

2018-09-20 Thread Aaron Fabbri (JIRA)
Aaron Fabbri created HADOOP-15780:
-

 Summary: S3Guard: document how to deal with non-S3Guard processes 
writing data to S3Guarded buckets
 Key: HADOOP-15780
 URL: https://issues.apache.org/jira/browse/HADOOP-15780
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: 3.2.0
Reporter: Aaron Fabbri


Our general policy for S3Guard is this: All modifiers of a bucket that is 
configured for use with S3Guard, must use S3Guard. Otherwise, the MetadataStore 
will not be properly updated as the S3 bucket changes and problems will arise.

There are limited circumstances in which may be safe to have an external 
(non-s3guard) process writing data.  There are also scenarios where it 
definitely breaks things.

I think we should start by documenting the cases that this works / does not 
work for. After we've enumerated that, we can suggest enhancements as needed to 
make this sort of configuration easier to use.

To get the ball rolling, some things that do not work:
- Deleting a path *p* with S3Guard, then writing a new file at path *p* without 
S3guard (will still have delete marker in S3Guard, making the file appear to be 
deleted but still visible in S3 due to false "eventual consistency") (as 
[~ste...@apache.org] and I have discussed)
- When fs.s3a.metadatastore.authoritative is true, adding files to directories 
without S3Guard, then listing with S3Guard may exclude externally-written files 
from listings.

(Note, there are also S3A interop issues with other non-S3A clients even 
without S3Guard, due to the unique way S3A interprets empty directory markers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15489) S3Guard to self update on directory listings of S3

2018-09-20 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622914#comment-16622914
 ] 

Aaron Fabbri commented on HADOOP-15489:
---

[~ste...@apache.org] that was the original behavior and we changed it to 
improve performance (those constant write-backs of listings into dynamo become 
expensive).  I think it is fairly close to reasonable default behavior now, but 
I'd be ok with adding a non-default config knob to get the "always write back 
listings" behavior. I'm thinking it might be slow though.  All depends on the 
access patterns.

> S3Guard to self update on directory listings of S3
> --
>
> Key: HADOOP-15489
> URL: https://issues.apache.org/jira/browse/HADOOP-15489
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
> Environment: s3guard
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
>
> S3Guard updates its table on a getFileStatus call, but not on a directory 
> listing.
> While this makes directory listings faster (no need to push out an update), 
> it slows down subsequent queries of the files, such as a sequence of:
> {code}
> statuses = s3a.listFiles(dir)
> for (status: statuses) {
>   if (status.isFile) {
>   try(is = s3a.open(status.getPath())) {
> ... do something
>   }
> }
> {code}
> this is because the open() is doing the getFileStatus check, even after the 
> listing.
> Updating the DDB tables after a listing would give those reads a speedup, 
> albeit at the expense of initiating a (bulk) update in the list call. Of 
> course, we could consider making that async, though that design (essentially 
> a write-buffer) would require the buffer to be checked in the reads too. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15489) S3Guard to self update on directory listings of S3

2018-09-20 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-15489:
--
Priority: Minor  (was: Major)

> S3Guard to self update on directory listings of S3
> --
>
> Key: HADOOP-15489
> URL: https://issues.apache.org/jira/browse/HADOOP-15489
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
> Environment: s3guard
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Minor
>
> S3Guard updates its table on a getFileStatus call, but not on a directory 
> listing.
> While this makes directory listings faster (no need to push out an update), 
> it slows down subsequent queries of the files, such as a sequence of:
> {code}
> statuses = s3a.listFiles(dir)
> for (status: statuses) {
>   if (status.isFile) {
>   try(is = s3a.open(status.getPath())) {
> ... do something
>   }
> }
> {code}
> this is because the open() is doing the getFileStatus check, even after the 
> listing.
> Updating the DDB tables after a listing would give those reads a speedup, 
> albeit at the expense of initiating a (bulk) update in the list call. Of 
> course, we could consider making that async, though that design (essentially 
> a write-buffer) would require the buffer to be checked in the reads too. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15779) S3guard: add inconsistency detection metrics

2018-09-20 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-15779:
--
Issue Type: Sub-task  (was: Improvement)
Parent: HADOOP-15619

> S3guard: add inconsistency detection metrics
> 
>
> Key: HADOOP-15779
> URL: https://issues.apache.org/jira/browse/HADOOP-15779
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Aaron Fabbri
>Priority: Major
>
> S3Guard uses a trailing log of metadata changes made to an S3 bucket to add 
> consistency to the eventually-consistent AWS S3 service. We should add some 
> metrics that are incremented when we detect inconsistency (eventual 
> consistency) in S3.
> I'm thinking at least two counters: (1) getFileStatus() (HEAD) inconsistency 
> detected, and (2) listing inconsistency detected. We may want to further 
> separate into categories (present / not present etc.)
> This is related to Auth. Mode and TTL work that is ongoing, so let me outline 
> how I think this should all evolve:
> This should happen after HADOOP-15621 (TTL for dynamo MetadataStore), since 
> that will change *when* we query both S3 and the MetadataStore (e.g. Dynamo) 
> for metadata. There I suggest that:
>  # Prune time is different than TTL. Prune time: "how long until 
> inconsistency is no longer a problem" . TTL time "how long a MetadataStore 
> entry is considered authoritative before refresh"
>  # Prune expired: delete entries (when hadoop CLI prune command is run). TTL 
> Expired: entries become non-authoritative.
>  #  Prune implemented in each MetadataStore, but TTL filtering happens in S3A.
> Once we have this, S3A will be consulting both S3 and MetadataStore depending 
> on configuration and/or age of the entry in the MetadataStore. Today 
> HEAD/getFileStatus() is always short-circuit (skips S3 if MetadataStore 
> returns results). I think S3A should consult both when TTL is stale, and 
> invoke a callback on inconsistency that increments the new metrics. For 
> listing, we already are comparing both sources of truth (except when S3A auth 
> mode is on and a directory is marked authoritative in MS), so it would be 
> pretty simple to invoke a callback on inconsistency and bump some metrics.
> Comments / suggestions / questions welcomed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15779) S3guard: add inconsistency detection metrics

2018-09-20 Thread Aaron Fabbri (JIRA)
Aaron Fabbri created HADOOP-15779:
-

 Summary: S3guard: add inconsistency detection metrics
 Key: HADOOP-15779
 URL: https://issues.apache.org/jira/browse/HADOOP-15779
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.2.0
Reporter: Aaron Fabbri


S3Guard uses a trailing log of metadata changes made to an S3 bucket to add 
consistency to the eventually-consistent AWS S3 service. We should add some 
metrics that are incremented when we detect inconsistency (eventual 
consistency) in S3.

I'm thinking at least two counters: (1) getFileStatus() (HEAD) inconsistency 
detected, and (2) listing inconsistency detected. We may want to further 
separate into categories (present / not present etc.)

This is related to Auth. Mode and TTL work that is ongoing, so let me outline 
how I think this should all evolve:

This should happen after HADOOP-15621 (TTL for dynamo MetadataStore), since 
that will change *when* we query both S3 and the MetadataStore (e.g. Dynamo) 
for metadata. There I suggest that:
 # Prune time is different than TTL. Prune time: "how long until inconsistency 
is no longer a problem" . TTL time "how long a MetadataStore entry is 
considered authoritative before refresh"
 # Prune expired: delete entries (when hadoop CLI prune command is run). TTL 
Expired: entries become non-authoritative.
 #  Prune implemented in each MetadataStore, but TTL filtering happens in S3A.

Once we have this, S3A will be consulting both S3 and MetadataStore depending 
on configuration and/or age of the entry in the MetadataStore. Today 
HEAD/getFileStatus() is always short-circuit (skips S3 if MetadataStore returns 
results). I think S3A should consult both when TTL is stale, and invoke a 
callback on inconsistency that increments the new metrics. For listing, we 
already are comparing both sources of truth (except when S3A auth mode is on 
and a directory is marked authoritative in MS), so it would be pretty simple to 
invoke a callback on inconsistency and bump some metrics.

Comments / suggestions / questions welcomed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15779) S3guard: add inconsistency detection metrics

2018-09-20 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-15779:
--
Issue Type: Improvement  (was: Bug)

> S3guard: add inconsistency detection metrics
> 
>
> Key: HADOOP-15779
> URL: https://issues.apache.org/jira/browse/HADOOP-15779
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Aaron Fabbri
>Priority: Major
>
> S3Guard uses a trailing log of metadata changes made to an S3 bucket to add 
> consistency to the eventually-consistent AWS S3 service. We should add some 
> metrics that are incremented when we detect inconsistency (eventual 
> consistency) in S3.
> I'm thinking at least two counters: (1) getFileStatus() (HEAD) inconsistency 
> detected, and (2) listing inconsistency detected. We may want to further 
> separate into categories (present / not present etc.)
> This is related to Auth. Mode and TTL work that is ongoing, so let me outline 
> how I think this should all evolve:
> This should happen after HADOOP-15621 (TTL for dynamo MetadataStore), since 
> that will change *when* we query both S3 and the MetadataStore (e.g. Dynamo) 
> for metadata. There I suggest that:
>  # Prune time is different than TTL. Prune time: "how long until 
> inconsistency is no longer a problem" . TTL time "how long a MetadataStore 
> entry is considered authoritative before refresh"
>  # Prune expired: delete entries (when hadoop CLI prune command is run). TTL 
> Expired: entries become non-authoritative.
>  #  Prune implemented in each MetadataStore, but TTL filtering happens in S3A.
> Once we have this, S3A will be consulting both S3 and MetadataStore depending 
> on configuration and/or age of the entry in the MetadataStore. Today 
> HEAD/getFileStatus() is always short-circuit (skips S3 if MetadataStore 
> returns results). I think S3A should consult both when TTL is stale, and 
> invoke a callback on inconsistency that increments the new metrics. For 
> listing, we already are comparing both sources of truth (except when S3A auth 
> mode is on and a directory is marked authoritative in MS), so it would be 
> pretty simple to invoke a callback on inconsistency and bump some metrics.
> Comments / suggestions / questions welcomed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15748) S3 listing inconsistency can raise NPE in globber

2018-09-19 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621174#comment-16621174
 ] 

Aaron Fabbri commented on HADOOP-15748:
---

Easy +1 by inspection (haven't tested it): looks like an obvious improvement to 
check for null here and continue.

Thanks for fixing this.

> S3 listing inconsistency can raise NPE in globber
> -
>
> Key: HADOOP-15748
> URL: https://issues.apache.org/jira/browse/HADOOP-15748
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.9.1, 2.8.4
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15748-001.patch
>
>
> FileSystem Globber does a listStatus(path) and then, if only one element is 
> returned, {{getFileStatus(path).isDirectory()}} to see if it is a dir. The 
> way getFileStatus() is wrapped, IOEs are downgraded to null
> On S3, if the path has had entries deleted, the listing may include files 
> which are no longer there, so the getFileStatus(path),isDirectory triggers an 
> NPE
> While its wrong to glob against S3 when its being inconsistent, we should at 
> least fail gracefully here.
> Proposed
> # log all IOEs raised in Globber.getFileStatus @ debug
> # catch FNFEs and downgrade to warn
> # continue
> The alternative would be fail fast on FNFE, but that's more traumatic



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15773) ABFS: Fix issues raised by Yetus

2018-09-19 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620850#comment-16620850
 ] 

Aaron Fabbri commented on HADOOP-15773:
---

Patch looks good, +1.

I did not apply the patch on top of the HADOOP-15407 patch, nor test it, but I 
did review the changes and they all look good. No logic changes, just 
checkstyle stuff.

> ABFS: Fix issues raised by Yetus
> 
>
> Key: HADOOP-15773
> URL: https://issues.apache.org/jira/browse/HADOOP-15773
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15773-HADOOP-15407.001.patch
>
>
> I aggregated the HADOOP-15407 branch into a single patch and posted it on 
> HADOOP-15770 just to get an aggregate report of all current issues raised by 
> Yetus. There was a javac deprecation warning, a number of checkstyle issues, 
> some whitespace issues, and there are a couple of valid javadoc errors I see 
> locally. Let's fix them before we merge.
> I see a number of wildcard imports, all for the contents of large classes of 
> configuration constants, and I think those should stay the way they are. 
> There are a number of existing checkstyle issues in WASB, too that are 
> irrelevant for the merge, and there are some field visibility issues in tests 
> that are required that way for the tests to work as designed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15754) s3guard: testDynamoTableTagging should clear existing config

2018-09-13 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613804#comment-16613804
 ] 

Aaron Fabbri commented on HADOOP-15754:
---

wild guess: Gabor and I have a region set in our test config (e.g. 
auth-keys.xml) but [~ste...@apache.org] does not?

> s3guard: testDynamoTableTagging should clear existing config
> 
>
> Key: HADOOP-15754
> URL: https://issues.apache.org/jira/browse/HADOOP-15754
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15754.001.patch
>
>
> I recently committed HADOOP-14734 which adds support for tagging Dynamo DB 
> tables for S3Guard when they are created.
>  
> Later, when testing another patch, I hit a test failure because I still had a 
> tag option set in my test configuration (auth-keys.xml) that was adding my 
> own table tag.
> {noformat}
> [ERROR] 
> testDynamoTableTagging(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)
>   Time elapsed: 13.384 s  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<3>
>         at org.junit.Assert.fail(Assert.java:88)
>         at org.junit.Assert.failNotEquals(Assert.java:743)
>         at org.junit.Assert.assertEquals(Assert.java:118)
>         at org.junit.Assert.assertEquals(Assert.java:555)
>         at org.junit.Assert.assertEquals(Assert.java:542)
>         at 
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB.testDynamoTableTagging(ITestS3GuardToolDynamoDB.java:129)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>         at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>         at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>         at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>         at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>         at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>         at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){noformat}
> I think the solution is just to clear any tag.* options set in the 
> configuration at the beginning of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

2018-09-12 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613007#comment-16613007
 ] 

Aaron Fabbri commented on HADOOP-15426:
---

filed INFRA-17015 for the Hadoop-trunk-Commit failure.

> Make S3guard client resilient to DDB throttle events and network failures
> -
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, 
> HADOOP-15426-003.patch, HADOOP-15426-004.patch, HADOOP-15426-005.patch, 
> HADOOP-15426-006.patch, HADOOP-15426-007.patch, HADOOP-15426-008.patch, 
> HADOOP-15426-009.patch, HADOOP-15426-010.patch, Screen Shot 2018-07-24 at 
> 15.16.46.png, Screen Shot 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 
> at 16.28.53.png, Screen Shot 2018-07-27 at 14.07.38.png, 
> org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

2018-09-12 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612991#comment-16612991
 ] 

Aaron Fabbri commented on HADOOP-15426:
---

post-commit test failure seems to be an infra issue?
{noformat}
[INFO] 
[ERROR] Failed to execute goal 
org.apache.hadoop:hadoop-maven-plugins:3.2.0-SNAPSHOT:protoc (compile-protoc) 
on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 
protoc version is 'libprotoc 2.6.1', expected version is '2.5.0' -> [Help 1]
[ERROR] {noformat}

> Make S3guard client resilient to DDB throttle events and network failures
> -
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, 
> HADOOP-15426-003.patch, HADOOP-15426-004.patch, HADOOP-15426-005.patch, 
> HADOOP-15426-006.patch, HADOOP-15426-007.patch, HADOOP-15426-008.patch, 
> HADOOP-15426-009.patch, HADOOP-15426-010.patch, Screen Shot 2018-07-24 at 
> 15.16.46.png, Screen Shot 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 
> at 16.28.53.png, Screen Shot 2018-07-27 at 14.07.38.png, 
> org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

2018-09-12 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-15426:
--
   Resolution: Fixed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Latest patch looks great. Again really appreciate the extra focus on docs and 
usability.

I went ahead and committed this. I wanted to test out the `–author` feature 
with your apache ID, which we should probably discuss again on the list.  Let 
me know your personal opinion on committing with git author attribution.

Testing: us-west-2. All integration tests (parallel) and scale tests.

> Make S3guard client resilient to DDB throttle events and network failures
> -
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, 
> HADOOP-15426-003.patch, HADOOP-15426-004.patch, HADOOP-15426-005.patch, 
> HADOOP-15426-006.patch, HADOOP-15426-007.patch, HADOOP-15426-008.patch, 
> HADOOP-15426-009.patch, HADOOP-15426-010.patch, Screen Shot 2018-07-24 at 
> 15.16.46.png, Screen Shot 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 
> at 16.28.53.png, Screen Shot 2018-07-27 at 14.07.38.png, 
> org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14734) add option to tag DDB table(s) created

2018-09-12 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612981#comment-16612981
 ] 

Aaron Fabbri commented on HADOOP-14734:
---

FYI [~gabor.bota] just filed HADOOP-15754 which is low priority but would be a 
nice little improvement in the integration test.

> add option to tag DDB table(s) created
> --
>
> Key: HADOOP-14734
> URL: https://issues.apache.org/jira/browse/HADOOP-14734
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-14734-001.patch, HADOOP-14734-002.patch, 
> HADOOP-14734-003.patch, HADOOP-14734.004.patch, HADOOP-14734.005.patch
>
>
> Many organisations have a "no untagged" resource policy; s3guard runs into 
> this when a table is created untagged. If there's a strict "delete untagged 
> resources" policy, the tables will go without warning.
> Proposed: we add an option which can be used to declare the tags for a table 
> when created, use it in creation. No need to worry about updating/viewing 
> tags, as the AWS console can do that



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15754) s3guard: testDynamoTableTagging should clear existing config

2018-09-12 Thread Aaron Fabbri (JIRA)
Aaron Fabbri created HADOOP-15754:
-

 Summary: s3guard: testDynamoTableTagging should clear existing 
config
 Key: HADOOP-15754
 URL: https://issues.apache.org/jira/browse/HADOOP-15754
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.2.0
Reporter: Aaron Fabbri


I recently committed HADOOP-14734 which adds support for tagging Dynamo DB 
tables for S3Guard when they are created.

 

Later, when testing another patch, I hit a test failure because I still had a 
tag option set in my test configuration (auth-keys.xml) that was adding my own 
table tag.
{noformat}
[ERROR] 
testDynamoTableTagging(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)
  Time elapsed: 13.384 s  <<< FAILURE!

java.lang.AssertionError: expected:<2> but was:<3>

        at org.junit.Assert.fail(Assert.java:88)

        at org.junit.Assert.failNotEquals(Assert.java:743)

        at org.junit.Assert.assertEquals(Assert.java:118)

        at org.junit.Assert.assertEquals(Assert.java:555)

        at org.junit.Assert.assertEquals(Assert.java:542)

        at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB.testDynamoTableTagging(ITestS3GuardToolDynamoDB.java:129)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)

        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)

        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)

        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)

        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)

        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)

        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){noformat}
I think the solution is just to clear any tag.* options set in the 
configuration at the beginning of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

2018-09-12 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612872#comment-16612872
 ] 

Aaron Fabbri commented on HADOOP-15426:
---

Yep. Testing and reviewing now.

> Make S3guard client resilient to DDB throttle events and network failures
> -
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, 
> HADOOP-15426-003.patch, HADOOP-15426-004.patch, HADOOP-15426-005.patch, 
> HADOOP-15426-006.patch, HADOOP-15426-007.patch, HADOOP-15426-008.patch, 
> HADOOP-15426-009.patch, HADOOP-15426-010.patch, Screen Shot 2018-07-24 at 
> 15.16.46.png, Screen Shot 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 
> at 16.28.53.png, Screen Shot 2018-07-27 at 14.07.38.png, 
> org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14734) add option to tag DDB table(s) created

2018-09-12 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-14734:
--
   Resolution: Fixed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thank you for the excellent contribution [~gabor.bota] and 
[~abrahamfine].

> add option to tag DDB table(s) created
> --
>
> Key: HADOOP-14734
> URL: https://issues.apache.org/jira/browse/HADOOP-14734
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-14734-001.patch, HADOOP-14734-002.patch, 
> HADOOP-14734-003.patch, HADOOP-14734.004.patch, HADOOP-14734.005.patch
>
>
> Many organisations have a "no untagged" resource policy; s3guard runs into 
> this when a table is created untagged. If there's a strict "delete untagged 
> resources" policy, the tables will go without warning.
> Proposed: we add an option which can be used to declare the tags for a table 
> when created, use it in creation. No need to worry about updating/viewing 
> tags, as the AWS console can do that



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14734) add option to tag DDB table(s) created

2018-09-12 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612840#comment-16612840
 ] 

Aaron Fabbri commented on HADOOP-14734:
---

I tested this in us-west-2 by:
 * Running all integration and unit tests with -Ds3guard -Ddynamo
 * Doing the same with {{fs.s3a.s3guard.ddb.table.create}} and 
{{fs.s3a.s3guard.ddb.table.tag.blah}} set and confirming it creates a tagged 
table.
 * Using the CLI to manually create a tagged dynamo table.

Will commit shortly.

One possible future issue with the tagging integration test: We've seen that 
dynamo table metadata seems to be eventually consistent (i.e. table names are 
cached and create/destroy table is not always immediately visible). I didn't 
see any flakiness but if the part of the test that reads back the tags you set 
ever becomes flaky we may need to put a retry around it.  Not sure this is 
needed at this point, but FYI.

> add option to tag DDB table(s) created
> --
>
> Key: HADOOP-14734
> URL: https://issues.apache.org/jira/browse/HADOOP-14734
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-14734-001.patch, HADOOP-14734-002.patch, 
> HADOOP-14734-003.patch, HADOOP-14734.004.patch, HADOOP-14734.005.patch
>
>
> Many organisations have a "no untagged" resource policy; s3guard runs into 
> this when a table is created untagged. If there's a strict "delete untagged 
> resources" policy, the tables will go without warning.
> Proposed: we add an option which can be used to declare the tags for a table 
> when created, use it in creation. No need to worry about updating/viewing 
> tags, as the AWS console can do that



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14734) add option to tag DDB table(s) created

2018-09-12 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612754#comment-16612754
 ] 

Aaron Fabbri commented on HADOOP-14734:
---

Thanks for picking this up [~gabor.bota]. Your patch (v4) looks good to me. 
Going to test it out now.

> add option to tag DDB table(s) created
> --
>
> Key: HADOOP-14734
> URL: https://issues.apache.org/jira/browse/HADOOP-14734
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-14734-001.patch, HADOOP-14734-002.patch, 
> HADOOP-14734-003.patch, HADOOP-14734.004.patch, HADOOP-14734.005.patch
>
>
> Many organisations have a "no untagged" resource policy; s3guard runs into 
> this when a table is created untagged. If there's a strict "delete untagged 
> resources" policy, the tables will go without warning.
> Proposed: we add an option which can be used to declare the tags for a table 
> when created, use it in creation. No need to worry about updating/viewing 
> tags, as the AWS console can do that



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15709) Move S3Guard LocalMetadataStore constants to org.apache.hadoop.fs.s3a.Constants

2018-09-07 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-15709:
--
Fix Version/s: 3.2.0

> Move S3Guard LocalMetadataStore constants to 
> org.apache.hadoop.fs.s3a.Constants
> ---
>
> Key: HADOOP-15709
> URL: https://issues.apache.org/jira/browse/HADOOP-15709
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-15709.001.patch, HADOOP-15709.002.patch, 
> HADOOP-15709.003.patch
>
>
> Move the following constants from 
> {{org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore}} to  
> {{org.apache.hadoop.fs.s3a.Constants}} (where they should be):
> * DEFAULT_MAX_RECORDS
> * DEFAULT_CACHE_ENTRY_TTL_MSEC
> * CONF_MAX_RECORDS
> * CONF_CACHE_ENTRY_TTL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15709) Move S3Guard LocalMetadataStore constants to org.apache.hadoop.fs.s3a.Constants

2018-09-07 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-15709:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for the contribution [~gabor.bota] and for the 
review [~smeng]

> Move S3Guard LocalMetadataStore constants to 
> org.apache.hadoop.fs.s3a.Constants
> ---
>
> Key: HADOOP-15709
> URL: https://issues.apache.org/jira/browse/HADOOP-15709
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15709.001.patch, HADOOP-15709.002.patch, 
> HADOOP-15709.003.patch
>
>
> Move the following constants from 
> {{org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore}} to  
> {{org.apache.hadoop.fs.s3a.Constants}} (where they should be):
> * DEFAULT_MAX_RECORDS
> * DEFAULT_CACHE_ENTRY_TTL_MSEC
> * CONF_MAX_RECORDS
> * CONF_CACHE_ENTRY_TTL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15709) Move S3Guard LocalMetadataStore constants to org.apache.hadoop.fs.s3a.Constants

2018-09-07 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607402#comment-16607402
 ] 

Aaron Fabbri commented on HADOOP-15709:
---

+1 LGTM.

> Move S3Guard LocalMetadataStore constants to 
> org.apache.hadoop.fs.s3a.Constants
> ---
>
> Key: HADOOP-15709
> URL: https://issues.apache.org/jira/browse/HADOOP-15709
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15709.001.patch, HADOOP-15709.002.patch, 
> HADOOP-15709.003.patch
>
>
> Move the following constants from 
> {{org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore}} to  
> {{org.apache.hadoop.fs.s3a.Constants}} (where they should be):
> * DEFAULT_MAX_RECORDS
> * DEFAULT_CACHE_ENTRY_TTL_MSEC
> * CONF_MAX_RECORDS
> * CONF_CACHE_ENTRY_TTL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-09-06 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-15621:
--
Description: 
Similar to HADOOP-13649, I think we should add a TTL (time to live) feature to 
the Dynamo metadata store (MS) for S3Guard.

This is a similar concept to an "online algorithm" version of the CLI prune() 
function, which is the "offline algorithm".

Why: 
 1. Self healing (soft state): since we do not implement transactions around 
modification of the two systems (s3 and metadata store), certain failures can 
lead to inconsistency between S3 and the metadata store (MS) state. Having a 
time to live (TTL) on each entry in S3Guard means that any inconsistencies will 
be time bound. Thus "wait and restart your job" becomes a valid, if ugly, way 
to get around any issues with FS client failure leaving things in a bad state.
 2. We could make manual invocation of `hadoop s3guard prune ...` unnecessary, 
depending on the implementation.
 3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
directories due to the lack of true modification time.

How:
 I think we need a new column in the dynamo table "entry last written time". 
This is updated each time the entry is written to dynamo.
 After that we can either
 1. Have the client simply ignore / elide any entries that are older than the 
configured TTL.
 2. Have the client delete entries older than the TTL.

The issue with #2 is it will increase latency if done inline in the context of 
an FS operation. We could mitigate this some by using an async helper thread, 
or probabilistically doing it "some times" to amortize the expense of deleting 
stale entries (allowing some batching as well).

Caveats:
 - Clock synchronization as usual is a concern. Many clusters already keep 
clocks close enough via NTP. We should at least document the requirement along 
with the configuration knob that enables the feature.

  was:
Similar to HADOOP-13649, I think we should add a TTL (time to live) feature to 
the Dynamo metadata store (MS) for S3Guard.

Think of this as the "online algorithm" version of the CLI prune() function, 
which is the "offline algorithm".

Why: 
1. Self healing (soft state): since we do not implement transactions around 
modification of the two systems (s3 and metadata store), certain failures can 
lead to inconsistency between S3 and the metadata store (MS) state.  Having a 
time to live (TTL) on each entry in S3Guard means that any inconsistencies will 
be time bound.  Thus "wait and restart your job" becomes a valid, if ugly, way 
to get around any issues with FS client failure leaving things in a bad state.
2. We could make manual invocation of `hadoop s3guard prune ...` unnecessary, 
depending on the implementation.
3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
directories due to the lack of true modification time.

How:
I think we need a new column in the dynamo table "entry last written time".  
This is updated each time the entry is written to dynamo.
After that we can either
1. Have the client simply ignore / elide any entries that are older than the 
configured TTL.
2. Have the client delete entries older than the TTL.

The issue with #2 is it will increase latency if done inline in the context of 
an FS operation. We could mitigate this some by using an async helper thread, 
or probabilistically doing it "some times" to amortize the expense of deleting 
stale entries (allowing some batching as well).

Caveats:
- Clock synchronization as usual is a concern. Many clusters already keep 
clocks close enough via NTP. We should at least document the requirement along 
with the configuration knob that enables the feature.





> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15621.001.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" be

[jira] [Commented] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-09-06 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606497#comment-16606497
 ] 

Aaron Fabbri commented on HADOOP-15621:
---

Thanks for the v1 patch. You've done some nice work here, and I'm glad to see 
someone new mastering this part of the codebase. Looks pretty good overall.

I think we should consider a couple of changes, now you've explored the details 
of the implementation.  Let me know if you agree / disagree.
 # Don't do the "online" prune / delete here. We can do that in a later jira if 
we want. We have the prune CLI ("offline") and the fs.s3a.s3guard.cli.prune.age 
config today.  I think that is good enough for now, and want us to be able to 
performance test this without that extra variable.
 #  It seems simpler to do TTL filtering in the FS instead of each metadata 
store.
 Pros:

 - All Metadtata Stores behave the same.
 - Less code duplication (filtering logic implemented once in FS).
 - S3A would need logic to implement parts of TTL anyways (to deal with
 getFileStatus() not being authoritative if last updated timestamp in
 PathMetadata is older than TTL).  This could be done later as a better 
solution to HADOOP-14468.
 - Clearer MetadataStore API semantics (MS behavior not dependent on external 
Configuration
 API)
 - Fewer config knobs. fs.s3a.metadatastore.authoritative.ttl: How long an 
entry in the MS is considered as authoritative before it will be refreshed.
 - Easier to test.
 - Future-proof for metadata caching. A FS can choose cache policy on a per-file
 basis, e.g. from coherency hints at open() or create() time. The FS controls 
it.

Cons:
 - Would need some convenience wrappers around MetadataStore API in S3A.
 - Would require changes to MetadataStore API (include last_updated field in 
PathMetadata,
 DirListingMetadata)
 - Would require changes to LocalMetataStore (though could be quite easy--just 
store the lastUpdated field on PathMetadata and DirListingMetadata. Local MS 
can still have its own separate TTL value which is used limit memory usage.. 
just keep the two separate).

Other thoughts:
 Cool test cases, thanks.  We should also probably add an integration test that 
uses FS and S3guard all together. E.g.:
{noformat}
set auth mode = true
configure s3a auth ttl = x seconds
s3afs.mkdir(test/)
s3afs.touch(test/file)
s3afs.listStatus(test)  // this should write full dir into MS with auth=true
assert is_authoritative(s3afs.getMetadataStore().listChildren(test))  // A

*fast forward time, via sleep() or s3afs.test_time_offset += 2x* or a 
fs.getTime() mock?*

assert ! is_authoritative(s3afs.getMetadataStore().listChildren(test))  // B

{noformat}
Also maybe even do this next:
{noformat}
s3afs.listStatus(test)  // this should again write full dir into MS with 
auth=true
assert is_authoritative(s3afs.getMetadataStore().listChildren(test))  // C
{noformat}
 
 So, we the "refresh MS on TTL expiry" behavior. A cache refresh. We have shown 
that TTL expiry clears the auth bit and makes listStatus() re-load a new, 
fresh, listing back into the MS with auth=true and a new TTL time.

Does that make sense?

Other thoughts:
{noformat}
   public DDBPathMetadata(FileStatus fileStatus, Tristate isEmptyDir,

+this.lastUpdated = getInitialLastUpdated();
{noformat}
Wondering if we can do this lazily. Or, just init to 0, and make FS set it (in 
the putWithTTL() wrapper you'd add, e.g in S3Guard.java)? Getting system time 
is cheaper than it used to be (vsyscalls), but still nice to avoid until 
necessary.
{noformat}
+  void checkIsEmptyDirectory(ItemCollection items) {
{noformat}
Maybe call this setIsEmptyDirectory instead of check?
{noformat}
+return Time.monotonicNow();
{noformat}
Reminder to make sure we are being consistent throughout S3A.. using the same 
clock source. Not sure we need monotonic here but we should probably follow 
what the rest of the code uses. S3AFileStatus, for example, uses 
System.currentTimeMillis().
{noformat}
+if(entry == null) {
{noformat}
spacing nit

Overall impressive v1 patch. Thank you for being flexible and working with my 
code reviews.

> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15621.001.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> Think of this as the "online algorithm" version of the CLI prune() function, 
> which is the "offline algorithm

[jira] [Commented] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

2018-09-04 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603775#comment-16603775
 ] 

Aaron Fabbri commented on HADOOP-15426:
---

FYI I also ran through the scale tests without any issues.

> Make S3guard client resilient to DDB throttle events and network failures
> -
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, 
> HADOOP-15426-003.patch, HADOOP-15426-004.patch, HADOOP-15426-005.patch, 
> HADOOP-15426-006.patch, HADOOP-15426-007.patch, HADOOP-15426-008.patch, 
> HADOOP-15426-009.patch, Screen Shot 2018-07-24 at 15.16.46.png, Screen Shot 
> 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 at 16.28.53.png, Screen 
> Shot 2018-07-27 at 14.07.38.png, 
> org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

2018-09-04 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603774#comment-16603774
 ] 

Aaron Fabbri commented on HADOOP-15426:
---

Looking at v9 patch.  Integration tests passed in us-west2.  Overall I think 
this looks good.  I have one comment below that needs addressing (the double 
nested retry in getVersionMarkerItem())

We are almost to the point where the slogan of S3A can be Never Give Up*

*(until your retry policy has been exhausted)

Pretty cool.

{noformat}
-  private RetryPolicy dataAccessRetryPolicy;
+  /**
+   * This policy is purely for batched writes, not for processing
+   * exceptions in invoke() calls.
+   */
+  private RetryPolicy batchWriteRetryPolicy;
{noformat}

Later on in the DDBMS code:
{noformat}
+  Item getVersionMarkerItem() throws IOException {
 final PrimaryKey versionMarkerKey =
 createVersionMarkerPrimaryKey(VERSION_MARKER);
 int retryCount = 0;
-Item versionMarker = table.getItem(versionMarkerKey);
+Item versionMarker = queryVersionMarker(versionMarkerKey);
 while (versionMarker == null) {
   try {
-RetryPolicy.RetryAction action = 
dataAccessRetryPolicy.shouldRetry(null,
+RetryPolicy.RetryAction action = 
batchWriteRetryPolicy.shouldRetry(null,
 retryCount, 0, true);
 if (action.action == RetryPolicy.RetryAction.RetryDecision.FAIL) {
   break;
@@ -1085,11 +1183,26 @@ private Item getVersionMarkerItem() throws IOException {
 throw new IOException("initTable: Unexpected exception", e);
   }
   retryCount++;
-  versionMarker = table.getItem(versionMarkerKey);
+  versionMarker = queryVersionMarker(versionMarkerKey);
 }
 return versionMarker;
   }
 
+  /**
+   * Issue the query to get the version marker, with throttling for overloaded
+   * DDB tables.
+   * @param versionMarkerKey key to look up
+   * @return the marker
+   * @throws IOException failure
+   */
+  @Retries.RetryTranslated
+  private Item queryVersionMarker(final PrimaryKey versionMarkerKey)
+  throws IOException {
{noformat}

Two thoughts:
1. Do you want to change the comment where batchWriteRetryPolicy is declared?
2. Is the nested / double retry logic in getVersionMarkerItem() necessary?  Why 
not just use queryVersionMarker() directly, now that it has a retry wrapper?

Also, minor tidiness nit:  do you want to move init of 
DynamoDBMetadataStore#invoker to initDataAccessRetries as well?  It is getting 
a little confusing with the multiple invokers spread about. Might also want to 
add a comment to invoker saying it is the "utility" invoker for other 
operations like table provisioning and deletion?

{noformat}
+The DynamoDB costs come from the number of entries stores and the allocated 
capacity.
{noformat}
/stores/stored/

Thanks for adding to the docs, looks good.


> Make S3guard client resilient to DDB throttle events and network failures
> -
>
> Key: HADOOP-15426
> URL: https://issues.apache.org/jira/browse/HADOOP-15426
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, 
> HADOOP-15426-003.patch, HADOOP-15426-004.patch, HADOOP-15426-005.patch, 
> HADOOP-15426-006.patch, HADOOP-15426-007.patch, HADOOP-15426-008.patch, 
> HADOOP-15426-009.patch, Screen Shot 2018-07-24 at 15.16.46.png, Screen Shot 
> 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 at 16.28.53.png, Screen 
> Shot 2018-07-27 at 14.07.38.png, 
> org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>   at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.

[jira] [Commented] (HADOOP-14630) Contract Tests to verify create, mkdirs and rename under a file is forbidden

2018-08-29 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596817#comment-16596817
 ] 

Aaron Fabbri commented on HADOOP-14630:
---

+1 on the patch as soon as you get a green result from yetus.

I did not test this patch, just reviewed by inspection. I trust you to run the 
tests before committing.

> Contract Tests to verify create, mkdirs and rename under a file is forbidden
> 
>
> Key: HADOOP-14630
> URL: https://issues.apache.org/jira/browse/HADOOP-14630
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/azure, fs/s3, fs/swift
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14630-001.patch, HADOOP-14630-002.patch, 
> HADOOP-14630-003.patch, HADOOP-14630-004.patch
>
>
> Object stores can get into trouble in ways which an FS would never, do, ways 
> so obvious we've never done tests for them. We know what the problems are: 
> test for file and dir creation directly/indirectly under other files
> * mkdir(file/file)
> * mkdir(file/subdir)
> * dir under file/subdir/subdir
> * dir/dir2/file, verify dir & dir2 exist
> * dir/dir2/dir3, verify dir & dir2 exist 
> * rename(src, file/dest)
> * rename(src, file/dir/dest)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15107) Stabilize/tune S3A committers; review correctness & docs

2018-08-29 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596784#comment-16596784
 ] 

Aaron Fabbri commented on HADOOP-15107:
---

+1 on the v4 patch. Code all looks good.

{noformat}
+} else {
+  LOG.warn("Using standard FileOutputCommitter to commit work."
+  + " This is slow and potentially unsafe.");
+  return createFileOutputCommitter(outputPath, context);{noformat}

Good call, I like it.

On the docs changes, just some random questions:
{noformat}
```python
def recoverTask(tac):
  oldAttemptId = appAttemptId - 1
{noformat}
Interesting. New commit attempts always get the same attempt id +1? (I don't 
know how those are allocated)

The mergePathsV1 seems pretty straightforward.  Not sure why the actual code is 
so complicated.  Your pseudocode representation seems fairly intuitive.  
Overwriting stuff that exists in the destination, recursively so you don't just 
nuke directories that exist in the destination, instead descending and removing 
destination conflicts as they arise (files).  Special case if src is file but 
dest is dir (nuke dest).

{noformat}
### v2 Job Recovery Before `commitJob()`


Because the data has been renamed into the destination directory, all tasks
recorded as having being committed have no recovery needed at all:

```python
def recoverTask(tac):
```

All active and queued tasks are scheduled for execution.

There is a weakness here, the same one on a failure during `commitTask()`:
it is only safe to repeat a task which failed during that commit operation
if the name of all generated files are constant across all task attempts.

If the Job AM fails while a task attempt has been instructed to commit,
and that commit is not recorded as having completed, the state of that
in-progress task is unknown...really it isn't be safe to recover the
job at this point.
{noformat}

Interesting. What happens in this case?  Is it detected? Do we get duplicate 
data in the final job (re-attempt) output?

> Stabilize/tune S3A committers; review correctness & docs
> 
>
> Key: HADOOP-15107
> URL: https://issues.apache.org/jira/browse/HADOOP-15107
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15107-001.patch, HADOOP-15107-002.patch, 
> HADOOP-15107-003.patch, HADOOP-15107-004.patch
>
>
> I'm writing about the paper on the committers, one which, being a proper 
> paper, requires me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the 
> FileOutputCommitter too)
> # show that the Staging committer meets these requirements (most of this is 
> implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset 
> lists from committed tasks to the final destination, where they are read and 
> committed.
> # Show the magic committer also works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15107) Stabilize/tune S3A committers; review correctness & docs

2018-08-29 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596683#comment-16596683
 ] 

Aaron Fabbri commented on HADOOP-15107:
---

I don't want to rob others of the joys of learning the new committers, but I 
can review the code (patch) today.

> Stabilize/tune S3A committers; review correctness & docs
> 
>
> Key: HADOOP-15107
> URL: https://issues.apache.org/jira/browse/HADOOP-15107
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15107-001.patch, HADOOP-15107-002.patch, 
> HADOOP-15107-003.patch, HADOOP-15107-004.patch
>
>
> I'm writing about the paper on the committers, one which, being a proper 
> paper, requires me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the 
> FileOutputCommitter too)
> # show that the Staging committer meets these requirements (most of this is 
> implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset 
> lists from committed tasks to the final destination, where they are read and 
> committed.
> # Show the magic committer also works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-08-28 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595930#comment-16595930
 ] 

Aaron Fabbri commented on HADOOP-15621:
---

Hey, thank you for working on this.

{quote}

The current implementation uses {{mod_time}} field when using prune. It would 
be wise to use the same because this is an online version of prune. Thus, we 
don't need to add a new field to the item.

{quote}

mod_time is currently used to persist the same field in the FileStatus.  S3A 
*does* persist mod_time for files, just not directories.  So, I do not see how 
we can use mod time to express "table entry last written time" without breaking 
mod_time for the FileStatus.

There are two reasons to expire entries from the table

(1) because it wastes space, and after 24 hours (prune time), we assume S3 is 
consistent.

(2) because we want cache to be frequently refreshed.  That is, we want soft 
state (auto-healing over time) to make short-circuit listings from dynamo 
(auth. mode) safer in case Dynamo and S3 go out of sync; in this case, after 
TTL expires, the problem goes away as S3A will fetch the listing again from S3 
an write back a new, fresh copy of the listing.

mod_time basically works, only for files, for #1, but not for #2.  We don't 
store mod_time for directories because of the way directories are emulated on 
S3.

Thinking about this more, I'm thinking that "prune time" should be "time after 
which we believe s3 will be consistent" and TTL should be a shorter time that 
is the max lifetime of an authoritative dir listing in Dynamo.

So, for example, if prune time = 24 hours and TTL = 1 second:

After 24 hours, entries are deleted from table.  S3 is consistent so they are 
not needed.

After 1 second, a directory is no longer considered authoritative.  We might 
also disable the short-circuit behavior on getFileStatus() after the dynamo 
entry is older than TTL.

This implies that when an a row in Dynamo is older than TTL

(a) if it is a directory, we clear the auth bit before returning results to the 
FS client (S3A).

(b) if it is a file, we may want to check both S3 and Dynamo instead of 
skipping S3 which is the current behavior.

I think (b) could be followup work done later on.

Let me know if this makes sense. 

> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> Think of this as the "online algorithm" version of the CLI prune() function, 
> which is the "offline algorithm".
> Why: 
> 1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state.  Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound.  Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
> 2. We could make manual invocation of `hadoop s3guard prune ...` unnecessary, 
> depending on the implementation.
> 3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
> I think we need a new column in the dynamo table "entry last written time".  
> This is updated each time the entry is written to dynamo.
> After that we can either
> 1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
> 2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
> - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14154) Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)

2018-08-17 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-14154:
--
   Resolution: Fixed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Good work on the patch [~gabor.bota] and also on figuring 
out the performance issue. Thank you!

> Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
> -
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Rajesh Balamohan
>Assignee: Gabor Bota
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch, HADOOP-14154-spec-001.pdf, 
> HADOOP-14154-spec-002.pdf, HADOOP-14154.001.patch, HADOOP-14154.002.patch, 
> HADOOP-14154.003.patch, HADOOP-14154.004.patch, HADOOP-14154.005.patch, 
> HADOOP-14154.006.patch, HADOOP-14154.007.patch, all-logs.txt, 
> perf-eval-v1.diff, run-dir-perf-itest-v2.sh, run-dir-perf-itest.sh
>
>
> Add support for "authoritative mode" for DynamoDBMetadataStore.
> The missing feature is to persist the bit set in 
> {{DirListingMetadata.isAuthoritative}}. 
> This topic has been super confusing for folks so I will also file a 
> documentation Jira to explain the design better.
> We may want to also rename the DirListingMetadata.isAuthoritative field to 
> .isFullListing to eliminate the multiple uses and meanings of the word 
> "authoritative".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14154) Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)

2018-08-16 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583310#comment-16583310
 ] 

Aaron Fabbri commented on HADOOP-14154:
---

v7 patch: minor checkstyle cleanups.

> Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
> -
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Rajesh Balamohan
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch, HADOOP-14154-spec-001.pdf, 
> HADOOP-14154-spec-002.pdf, HADOOP-14154.001.patch, HADOOP-14154.002.patch, 
> HADOOP-14154.003.patch, HADOOP-14154.004.patch, HADOOP-14154.005.patch, 
> HADOOP-14154.006.patch, HADOOP-14154.007.patch, all-logs.txt, 
> perf-eval-v1.diff, run-dir-perf-itest-v2.sh, run-dir-perf-itest.sh
>
>
> Add support for "authoritative mode" for DynamoDBMetadataStore.
> The missing feature is to persist the bit set in 
> {{DirListingMetadata.isAuthoritative}}. 
> This topic has been super confusing for folks so I will also file a 
> documentation Jira to explain the design better.
> We may want to also rename the DirListingMetadata.isAuthoritative field to 
> .isFullListing to eliminate the multiple uses and meanings of the word 
> "authoritative".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14154) Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)

2018-08-16 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-14154:
--
Attachment: HADOOP-14154.007.patch

> Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
> -
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Rajesh Balamohan
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch, HADOOP-14154-spec-001.pdf, 
> HADOOP-14154-spec-002.pdf, HADOOP-14154.001.patch, HADOOP-14154.002.patch, 
> HADOOP-14154.003.patch, HADOOP-14154.004.patch, HADOOP-14154.005.patch, 
> HADOOP-14154.006.patch, HADOOP-14154.007.patch, all-logs.txt, 
> perf-eval-v1.diff, run-dir-perf-itest-v2.sh, run-dir-perf-itest.sh
>
>
> Add support for "authoritative mode" for DynamoDBMetadataStore.
> The missing feature is to persist the bit set in 
> {{DirListingMetadata.isAuthoritative}}. 
> This topic has been super confusing for folks so I will also file a 
> documentation Jira to explain the design better.
> We may want to also rename the DirListingMetadata.isAuthoritative field to 
> .isFullListing to eliminate the multiple uses and meanings of the word 
> "authoritative".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14154) Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)

2018-08-16 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-14154:
--
Attachment: HADOOP-14154.006.patch

> Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
> -
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Rajesh Balamohan
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch, HADOOP-14154-spec-001.pdf, 
> HADOOP-14154-spec-002.pdf, HADOOP-14154.001.patch, HADOOP-14154.002.patch, 
> HADOOP-14154.003.patch, HADOOP-14154.004.patch, HADOOP-14154.005.patch, 
> HADOOP-14154.006.patch, all-logs.txt, perf-eval-v1.diff, 
> run-dir-perf-itest-v2.sh, run-dir-perf-itest.sh
>
>
> Add support for "authoritative mode" for DynamoDBMetadataStore.
> The missing feature is to persist the bit set in 
> {{DirListingMetadata.isAuthoritative}}. 
> This topic has been super confusing for folks so I will also file a 
> documentation Jira to explain the design better.
> We may want to also rename the DirListingMetadata.isAuthoritative field to 
> .isFullListing to eliminate the multiple uses and meanings of the word 
> "authoritative".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14154) Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)

2018-08-16 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583161#comment-16583161
 ] 

Aaron Fabbri commented on HADOOP-14154:
---

Nice work [~gabor.bota]. v5 patch looks good.  I'm making one cosmetic change 
to add some parenthesis and attaching v6 patch for the precommit tests. 
{noformat}
< +    changed = changed || !dirMeta.isAuthoritative() && isAuthoritative;

---

> +    changed = changed || (!dirMeta.isAuthoritative() && 
> isAuthoritative);{noformat}
Ran through the integration tests on us west 2 successfully.  Will commit this 
evening.

> Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
> -
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Rajesh Balamohan
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch, HADOOP-14154-spec-001.pdf, 
> HADOOP-14154-spec-002.pdf, HADOOP-14154.001.patch, HADOOP-14154.002.patch, 
> HADOOP-14154.003.patch, HADOOP-14154.004.patch, HADOOP-14154.005.patch, 
> HADOOP-14154.006.patch, all-logs.txt, perf-eval-v1.diff, 
> run-dir-perf-itest-v2.sh, run-dir-perf-itest.sh
>
>
> Add support for "authoritative mode" for DynamoDBMetadataStore.
> The missing feature is to persist the bit set in 
> {{DirListingMetadata.isAuthoritative}}. 
> This topic has been super confusing for folks so I will also file a 
> documentation Jira to explain the design better.
> We may want to also rename the DirListingMetadata.isAuthoritative field to 
> .isFullListing to eliminate the multiple uses and meanings of the word 
> "authoritative".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14154) Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)

2018-08-14 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580248#comment-16580248
 ] 

Aaron Fabbri commented on HADOOP-14154:
---

Ah, interesting.  The existing code appears to work ok with LocalMetadataStore. 
 Maybe it doesn't get {{changed == true}} in DynamoDBMetadataStore because that 
implementation always creates ancestor directories all the way up the tree?  
Just a theory.

 

It seems like the boolean {{changed}} in {{dirListingUnion()}} should also take 
into account the auth bit.  It is passed a param {{dirMeta}} which is the last 
dir listing fetched from the MetadataStore, I believe.  If 
{{dirMeta.isAuthoritative == false}}, but {{isAuthoritative}} parameter is 
true, we should probably set {{changed = true}} to force a write of the auth 
bit for that directory.  The next time that directory is listed it should skip 
the extra write since both {{dirMeta.isAuthoritative}} and {{isAuthoritative}} 
will be true.  What do you think?

> Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
> -
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Rajesh Balamohan
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch, HADOOP-14154-spec-001.pdf, 
> HADOOP-14154-spec-002.pdf, HADOOP-14154.001.patch, HADOOP-14154.002.patch, 
> HADOOP-14154.003.patch, HADOOP-14154.004.patch, all-logs-v2.txt, 
> perf-eval-v1.diff, perf-eval-v2.diff, run-dir-perf-itest-v2.sh, 
> run-dir-perf-itest.sh
>
>
> Add support for "authoritative mode" for DynamoDBMetadataStore.
> The missing feature is to persist the bit set in 
> {{DirListingMetadata.isAuthoritative}}. 
> This topic has been super confusing for folks so I will also file a 
> documentation Jira to explain the design better.
> We may want to also rename the DirListingMetadata.isAuthoritative field to 
> .isFullListing to eliminate the multiple uses and meanings of the word 
> "authoritative".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14154) Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)

2018-08-13 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579034#comment-16579034
 ] 

Aaron Fabbri commented on HADOOP-14154:
---

Interesting results.  The fact that you had to make that code change suggests 
something is not working properly.  The change you added to the 
perf-eval-v2.diff should make things slower, not faster, if it is implemented 
correctly.  The idea of that logic is that is only writes the dir listing back 
to the MetadataStore if it is different than what we already got from 
MetadataStore#listChildren().  Your change should mean that it is *always* 
written back which we would expect to be slower (time) and more expensive ($).  
 Probably need to do more debugging to figure out what is happening.

> Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
> -
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Rajesh Balamohan
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch, HADOOP-14154-spec-001.pdf, 
> HADOOP-14154-spec-002.pdf, HADOOP-14154.001.patch, HADOOP-14154.002.patch, 
> HADOOP-14154.003.patch, HADOOP-14154.004.patch, all-logs-v2.txt, 
> perf-eval-v1.diff, perf-eval-v2.diff, run-dir-perf-itest-v2.sh, 
> run-dir-perf-itest.sh
>
>
> Add support for "authoritative mode" for DynamoDBMetadataStore.
> The missing feature is to persist the bit set in 
> {{DirListingMetadata.isAuthoritative}}. 
> This topic has been super confusing for folks so I will also file a 
> documentation Jira to explain the design better.
> We may want to also rename the DirListingMetadata.isAuthoritative field to 
> .isFullListing to eliminate the multiple uses and meanings of the word 
> "authoritative".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14154) Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)

2018-08-08 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574200#comment-16574200
 ] 

Aaron Fabbri commented on HADOOP-14154:
---

Attached a tiny patch and script (perf-eval-v1.diff and run-dir-perf-test.sh) I 
used to take some performance measurements today.  I ran it on my laptop from 
home.  Results may look better actually running in AWS since the WAN latency 
goes away (thus dynamo latency speedup is larger), if you want to try that.  
This also made me realize some metrics around (1) hit rate for dir listings 
(when we skip s3 list) and (2) hit rate for listStatus() writeback (it skips 
writing listing back to dynamo if nothing changed).  We could do this in a 
future JIRA.

> Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
> -
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Rajesh Balamohan
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch, HADOOP-14154-spec-001.pdf, 
> HADOOP-14154-spec-002.pdf, HADOOP-14154.001.patch, HADOOP-14154.002.patch, 
> HADOOP-14154.003.patch, perf-eval-v1.diff, run-dir-perf-itest.sh
>
>
> Add support for "authoritative mode" for DynamoDBMetadataStore.
> The missing feature is to persist the bit set in 
> {{DirListingMetadata.isAuthoritative}}. 
> This topic has been super confusing for folks so I will also file a 
> documentation Jira to explain the design better.
> We may want to also rename the DirListingMetadata.isAuthoritative field to 
> .isFullListing to eliminate the multiple uses and meanings of the word 
> "authoritative".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >