[jira] [Comment Edited] (HADOOP-15999) S3Guard: Better support for out-of-band operations

2019-05-15 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840846#comment-16840846
 ] 

Sean Mackrory edited comment on HADOOP-15999 at 5/15/19 10:49 PM:
--

{quote} Seems like a common case{quote}

Yeah that's actually the exact case that motivated this ticket in the first 
case, IIRC...


was (Author: mackrorysd):
{quote} Seems like a common case{quote}

Yeah that's actually the exact case that motivated this ticket, IIRC...

> S3Guard: Better support for out-of-band operations
> --
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, 
> HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, 
> HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, 
> HADOOP-15999.009.patch, out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15999) S3Guard: Better support for out-of-band operations

2019-03-14 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792645#comment-16792645
 ] 

Gabor Bota edited comment on HADOOP-15999 at 3/14/19 1:44 PM:
--

Verify run successfully against Ireland. (note to myself: if I use low 
ddb.table.capacity.read and forget to modify it on the dashboard tests 
will timeout and fail)


was (Author: gabor.bota):
verify run successfully against Ireland. 

> S3Guard: Better support for out-of-band operations
> --
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, 
> HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, 
> HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, 
> HADOOP-15999.009.patch, out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15999) S3Guard: Better support for out-of-band operations

2019-03-14 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792645#comment-16792645
 ] 

Gabor Bota edited comment on HADOOP-15999 at 3/14/19 1:34 PM:
--

verify ran successfully against Ireland. 


was (Author: gabor.bota):
verify ran successfuly against ireland. 

> S3Guard: Better support for out-of-band operations
> --
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, 
> HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, 
> HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, 
> HADOOP-15999.009.patch, out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15999) S3Guard: Better support for out-of-band operations

2019-03-14 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792645#comment-16792645
 ] 

Gabor Bota edited comment on HADOOP-15999 at 3/14/19 1:34 PM:
--

verify run successfully against Ireland. 


was (Author: gabor.bota):
verify ran successfully against Ireland. 

> S3Guard: Better support for out-of-band operations
> --
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, 
> HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, 
> HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, 
> HADOOP-15999.009.patch, out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15999) S3Guard: Better support for out-of-band operations

2019-03-05 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784294#comment-16784294
 ] 

Gabor Bota edited comment on HADOOP-15999 at 3/5/19 10:29 AM:
--

I figured it out: the timeout in the LocalMS for file metadata was set to 10s 
and the test was running longer than this. Because of the timeout the metadata 
was no longer available from the local metadata store, so we got the exception 
on {{getFileStatus.}} The {{DEFAULT_S3GUARD_METASTORE_LOCAL_ENTRY_TTL}} will be 
increased to 60s, so no there will be no more issues with this kind of test.
We can increase this value safely as the LocalMS is an only testing 
implementation.

 I'll run all other tests with dynamo && local && null. If everything's ok I'll 
upload a new patch.


was (Author: gabor.bota):
I figured it out: the timeout in the LocalMS for file metadata was set to 10s 
and the test was running longer than this. Because of the timeout the metadata 
was no longer available from the local metadata store, so we hot the exception 
on {{getFileStatus.}} The {{DEFAULT_S3GUARD_METASTORE_LOCAL_ENTRY_TTL}} will be 
increased to 60s, so no there will be no more issues with this kind of test.
We can increase this value safely as the LocalMS is an only testing 
implementation.

 I'll run all other tests with dynamo && local && null. If everything's ok I'll 
upload a new patch.

> S3Guard: Better support for out-of-band operations
> --
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, 
> HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, 
> HADOOP-15999.005.patch, HADOOP-15999.006.patch, out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15999) S3Guard: Better support for out-of-band operations

2019-03-04 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783633#comment-16783633
 ] 

Gabor Bota edited comment on HADOOP-15999 at 3/4/19 6:39 PM:
-

It was really me - so running the tests in my ide with the setting:
{noformat}
  
fs.s3a.s3guard.test.implementation
local
  
{noformat}
Running the same test with *dynamo* everything passes. 
Turned out the reason for *NPE*s when using local was we had the issue with the 
reference for the localms again. When we rebuild the fs or build a new fs 
instance we have to set the same cache and the NPEs are gone.

After fixing the NPEs the next issue is 
{{java.util.concurrent.ExecutionException: java.io.FileNotFoundException:}} - 
only for *local* again.
In {{expectExceptionWhenReadingOpenFileAPI}} when the following is called:
{code:java}
  try (FSDataInputStream in = guardedFs.openFile(testFilePath).build().get()) {
  intercept(FileNotFoundException.class, () -> {
byte[] bytes = new byte[text.length()];
return in.read(bytes, 0, bytes.length);
  });
}
{code}
The *{{FSDataInputStream in = guardedFs.openFile(testFilePath).build().get()}}* 
throws *FNFE*, and that's even before it's expected. That means there's 
something wrong going on with open file API is used. I don't have a clue right 
now why would this happen just when using local and not when using dynamo, but 
I need to figure it out.


was (Author: gabor.bota):
It was really me - so running the tests in my ide with the setting:
{noformat}
  
fs.s3a.s3guard.test.implementation
local
  
{noformat}
Running the same test with *dynamo* everything passes. 
Turned out the reason for *NPE*s when using local was we had the issue with the 
reference for the localms again. When we rebuild the fs or build a new fs 
instance we have to set the same cache and the NPEs are gone.

After fixing the NPEs the next issue is 
{{java.util.concurrent.ExecutionException: java.io.FileNotFoundException:}} - 
only for *local* again.
In {{expectExceptionWhenReadingOpenFileAPI}} when the following is called:
{code:java}
  try (FSDataInputStream in = guardedFs.openFile(testFilePath).build().get()) {
  intercept(FileNotFoundException.class, () -> {
byte[] bytes = new byte[text.length()];
return in.read(bytes, 0, bytes.length);
  });
}
{code}
The *{{FSDataInputStream in = guardedFs.openFile(testFilePath).build().get()}}* 
throws *FNFE*, and that's even before it's expected. That means there's 
something wrong going on with open file API is used. I don't have a clue right 
now why would this happen just when using local and not when using dynamo, but 
I need to figure it out.

> S3Guard: Better support for out-of-band operations
> --
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, 
> HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, 
> HADOOP-15999.005.patch, HADOOP-15999.006.patch, out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more