[
https://issues.apache.org/jira/browse/HADOOP-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970466#comment-16970466
]
Steve Loughran edited comment on HADOOP-16484 at 11/8/19 5:44 PM:
--
OK, this is good and I am already pleased to see it in my logs.
But I realise we've missed something -in the s3guard tool we explicitly disable
S3Guard when instantiating the FS. So we get warning messages which are not in
fact correct.
{code}
2019-11-08 17:38:35,656 [main] DEBUG s3guard.S3Guard
(S3Guard.java:getMetadataStoreClass(136)) - Metastore option source
[fs.s3a.bucket.hwdev-steve-ireland-new.metadatastore.impl via [S3AUtils]]
2019-11-08 17:38:35,657 [main] DEBUG s3guard.S3Guard
(S3Guard.java:getMetadataStore(108)) - Using NullMetadataStore metadata store
for s3a filesystem
2019-11-08 17:38:35,659 [main] INFO s3a.S3AFileSystem
(S3Guard.java:logS3GuardDisabled(849)) - S3Guard is disabled on this bucket:
hwdev-steve-ireland-new
2019-11-08 17:38:35,659 [main] DEBUG s3a.S3AUtils
(S3AUtils.java:longOption(1001)) - Value of fs.s3a.multipart.purge.age is
360
2019-11-08 17:38:35,665 [main] DEBUG s3a.MultipartUtils
(MultipartUtils.java:requestNextBatch(158)) - [1], Requesting next 5000 uploads
prefix , next key null, next upload id null
2019-11-08 17:38:35,667 [main] DEBUG s3a.Invoker (DurationInfo.java:(74))
- Starting: listMultipartUploads
2019-11-08 17:38:36,004 [main] DEBUG s3a.Invoker (DurationInfo.java:close(89))
- listMultipartUploads: duration 0:00.338s
2019-11-08 17:38:36,005 [main] DEBUG s3a.MultipartUtils
(MultipartUtils.java:requestNextBatch(165)) - New listing state: Upload
iterator: prefix ; list count 2; isTruncated=false
Total 0 uploads found.
2019-11-08 17:38:36,008 [shutdown-hook-0] DEBUG s3a.S3AFileSystem
(S3AFileSystem.java:close(3117)) - Filesystem s3a://hwdev-steve-ireland-new is
closed
{code}
Proposed: just as we force in the null metastore, we will need to set the log
to debug.
I'm just going to reopen this as a followup.
[~gabor.bota]: do you want to do this or shall I do the code and you do the
review?
was (Author: ste...@apache.org):
S3A auth mode can cause confusion in deployments, because people expect there
never to be any HTTP requests to S3 in a path marked as authoritative.
This is *not* the case when S3Guard doesn't have an entry for the path in the
table. Which is the state it is in when the directory was populated using
different tools (e.g AWS s3 command).
Proposed
1. HADOOP-16684 to give more diagnostics about the bucket
2. add an audit command to take a path and verify that it is marked in dynamoDB
as authoritative *all the way down*
This command is designed to be executed from the commandline and will return
different error codes based on different situations
* path isn't guarded
* path is not authoritative in s3a settings (dir, path)
* path not known in table: use the 404/44 response
* path contains 1+ dir entry which is non-auth
3. Use this audit after some of the bulk rename, delete, import, commit (soon:
upload, copy) operations to verify that's where appropriate, we do update the
directories. Particularly for incremental rename() where I have long suspected
we may have to do more there.
4. Review documentation and make it clear what is needed (import) after
uploading/Generating Data through other tools.
I'm going to pull in the open JIRAs on this topic as they are all related
There shouldn't be anything wrong with using the AWS S3 command to create the
test table -we just need to tell S3Guard to scan it afterwards, which "s3guard
import" does. The audit command well make sure that everything is set up in
DynamoDB before the next stage in the test suite. Then, if we still see IO
against S3 during list operations, then we can start worrying about whether or
not there is actally a bug in the s3a code. (we could use it after things like
DDB and spark & hive queries too to validate the output is being tagged as auth
too)
+add some tests of listLocatedStatus, listFiles, listStatus to verify they
don't go near S3 on parts they consider authoritative
Examine the path metadata, declare whether it should be queued for recursive
scanning
@throws ExitUtil
OK, this is good and I am already pleased to see it in my logs.
But I realise we've missed something -in the s3guard tool we explicitly disable
S3Guard when instantiating the FS. So we get warning messages which are not in
fact correct.
{code}
2019-11-08 17:38:35,656 [main] DEBUG s3guard.S3Guard
(S3Guard.java:getMetadataStoreClass(136)) - Metastore option source
[fs.s3a.bucket.hwdev-steve-ireland-new.metadatastore.impl via [S3AUtils]]
2019-11-08 17:38:35,657 [main] DEBUG s3guard.S3Guard
(S3Guard.java:getMetadataStore(108)) - Using NullMetadataStore metadata store
for s3a filesystem
2019-11-08 17:38:35,659 [main] INFO s3a.S3AFileSystem