[ 
https://issues.apache.org/jira/browse/HDDS-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kohei Sugihara updated HDDS-8371:
---------------------------------
    Description: 
The listStatus API serves a repeated path in the list when a path for the key 
is deep. We noticed the listStatus API serves a corrupt result against some 
specific keys in a bucket. The corruption is that repeats a requested key 
prefix in a final list of the listStatus result like the following:
{code:java}
# expected case
% aws s3 ls s3://bucket/a/b/c/d/e/f/g/file.zip
<timestamp> <bytes> a/b/c/d/e/f/g/file.zip
...
# actual: "a/b/c/d/e/f/g" is duplicated
% aws s3 ls s3://bucket/a/b/c/d/e/f/g/file.zip
<timestamp> <bytes> a/b/c/d/e/f/g/a/b/c/d/e/f/g/file.zip
... {code}
Environment:
 * Ozone 1.3 
[compatible|https://github.com/apache/ozone/commit/9c61a8aa497ab96c014ad3bb7b1ee4f731ebfaf8]
 version (same environment as HDDS-7701, HDDS-7925)
 * Several large files, all of them are uploaded by multipart using AWS-CLI, 
divided into 8 MB chunks
 * An FSO-enabled bucket
 * OM HA

Problem Details:

I've dug the OM DB and found metadata in the keyTable has the full path for the 
key, so it finally appears redundant prefix twice in the result of the 
listStatus API.
{code:java}
# keyName has a full path for the key
- v/b=volume/bucket parent=-9223371931475161596 Key=0g0pustv.tar.gz 
size=2276021708 time=1672052153340 checksum=null id=-9223371931457566207 
keyName=a/b/c/d/e/0g0pustv.tar.gz
- v/b=volume/bucket parent=-9223371931475161596 Key=0g0pustv.zip 
size=2333733892 time=1672052222395 checksum=null id=-9223371931408929023 
keyName=a/b/c/d/e/0g0pustv.zip
- v/b=volume/bucket parent=-9223371931475161596 Key=0nh5ww00.tar.gz 
size=249321741 time=1672052233487 checksum=null id=-9223371931393057791 
keyName=a/b/c/d/e/0nh5ww00.tar.gz
- v/b=volume/bucket parent=-9223371931475161596 Key=0nh5ww00.zip size=255764877 
time=1672052242830 checksum=null id=-9223371931388326655 
keyName=a/b/c/d/e/0nh5ww00.zip
- v/b=volume/bucket parent=-9223371931475161596 Key=5b2uha1h.tar.gz 
size=2276346612 time=1672052331175 checksum=null id=-9223371931348859135 
keyName=a/b/c/d/e/5b2uha1h.tar.gz
...

# other keys which have the same parent do not have their prefix in the key
- v/b=volume/bucket parent=-9223371931475161596 Key=kh7vbwlh.zip size=573797127 
time=1672052273970 checksum=null id=-9223371931375503871 keyName=kh7vbwlh.zip
- v/b=volume/bucket parent=-9223371931475161596 Key=ngaxsd8c.tar.gz 
size=380094900 time=1672052284433 checksum=null id=-9223371931368669695 
keyName=ngaxsd8c.tar.gz
- v/b=volume/bucket parent=-9223371931475161596 Key=ngaxsd8c.zip size=393085953 
time=1672052099618 checksum=null id=-9223371931473057023 keyName=ngaxsd8c.zip
- v/b=volume/bucket parent=-9223371931475161596 Key=nrou31c3.tar.gz 
size=568466718 time=1672052124043 checksum=null id=-9223371931461502975 
keyName=nrou31c3.tar.gz
- v/b=volume/bucket parent=-9223371931475161596 Key=nrou31c3.zip size=574807485 
time=1672052149947 checksum=null id=-9223371931446918911 keyName=nrou31c3.zip
- v/b=volume/bucket parent=-9223371931475161596 Key=ol8dhbqo.tar.gz 
size=555722830 time=1672052168904 checksum=null id=-9223371931435349759 
keyName=ol8dhbqo.tar.gz {code}
 

 

  was:
The listStatus API serves a repeated path in the list when a path for the key 
is deep. We noticed the listStatus API serves a corrupt result against some 
specific keys in a bucket. The corruption is that repeats a requested key 
prefix in a final list of the listStatus result like the following:

 
{code:java}
# expected case
% aws s3 ls s3://bucket/a/b/c/d/e/f/g/file.zip
<timestamp> <bytes> a/b/c/d/e/f/g/file.zip
...
# actual: "a/b/c/d/e/f/g" is duplicated
% aws s3 ls s3://bucket/a/b/c/d/e/f/g/file.zip
<timestamp> <bytes> a/b/c/d/e/f/g/a/b/c/d/e/f/g/file.zip
... {code}
 

 

Environment:
 * Ozone 1.3 
[compatible|https://github.com/apache/ozone/commit/9c61a8aa497ab96c014ad3bb7b1ee4f731ebfaf8]
 version (same environment as HDDS-7701, HDDS-7925)
 * Several large files, all of them are uploaded by multipart using AWS-CLI, 
divided into 8 MB chunks
 * An FSO-enabled bucket
 * OM HA

 

Problem Details:

I've dug the OM DB and found metadata in the keyTable has the full path for the 
key, so it finally appears redundant prefix twice in the result of the 
listStatus API.

 
{code:java}
# keyName has a full path for the key
- v/b=volume/bucket parent=-9223371931475161596 Key=0g0pustv.tar.gz 
size=2276021708 time=1672052153340 checksum=null id=-9223371931457566207 
keyName=a/b/c/d/e/0g0pustv.tar.gz
- v/b=volume/bucket parent=-9223371931475161596 Key=0g0pustv.zip 
size=2333733892 time=1672052222395 checksum=null id=-9223371931408929023 
keyName=a/b/c/d/e/0g0pustv.zip
- v/b=volume/bucket parent=-9223371931475161596 Key=0nh5ww00.tar.gz 
size=249321741 time=1672052233487 checksum=null id=-9223371931393057791 
keyName=a/b/c/d/e/0nh5ww00.tar.gz
- v/b=volume/bucket parent=-9223371931475161596 Key=0nh5ww00.zip size=255764877 
time=1672052242830 checksum=null id=-9223371931388326655 
keyName=a/b/c/d/e/0nh5ww00.zip
- v/b=volume/bucket parent=-9223371931475161596 Key=5b2uha1h.tar.gz 
size=2276346612 time=1672052331175 checksum=null id=-9223371931348859135 
keyName=a/b/c/d/e/5b2uha1h.tar.gz
...

# other keys which have the same parent do not have their prefix in the key
- v/b=volume/bucket parent=-9223371931475161596 Key=kh7vbwlh.zip size=573797127 
time=1672052273970 checksum=null id=-9223371931375503871 keyName=kh7vbwlh.zip
- v/b=volume/bucket parent=-9223371931475161596 Key=ngaxsd8c.tar.gz 
size=380094900 time=1672052284433 checksum=null id=-9223371931368669695 
keyName=ngaxsd8c.tar.gz
- v/b=volume/bucket parent=-9223371931475161596 Key=ngaxsd8c.zip size=393085953 
time=1672052099618 checksum=null id=-9223371931473057023 keyName=ngaxsd8c.zip
- v/b=volume/bucket parent=-9223371931475161596 Key=nrou31c3.tar.gz 
size=568466718 time=1672052124043 checksum=null id=-9223371931461502975 
keyName=nrou31c3.tar.gz
- v/b=volume/bucket parent=-9223371931475161596 Key=nrou31c3.zip size=574807485 
time=1672052149947 checksum=null id=-9223371931446918911 keyName=nrou31c3.zip
- v/b=volume/bucket parent=-9223371931475161596 Key=ol8dhbqo.tar.gz 
size=555722830 time=1672052168904 checksum=null id=-9223371931435349759 
keyName=ol8dhbqo.tar.gz {code}
 

 


> A keyName field in the keyTable might contain a full path for the key instead 
> of the file name
> ----------------------------------------------------------------------------------------------
>
>                 Key: HDDS-8371
>                 URL: https://issues.apache.org/jira/browse/HDDS-8371
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM
>    Affects Versions: 1.3.0
>            Reporter: Kohei Sugihara
>            Priority: Major
>
> The listStatus API serves a repeated path in the list when a path for the key 
> is deep. We noticed the listStatus API serves a corrupt result against some 
> specific keys in a bucket. The corruption is that repeats a requested key 
> prefix in a final list of the listStatus result like the following:
> {code:java}
> # expected case
> % aws s3 ls s3://bucket/a/b/c/d/e/f/g/file.zip
> <timestamp> <bytes> a/b/c/d/e/f/g/file.zip
> ...
> # actual: "a/b/c/d/e/f/g" is duplicated
> % aws s3 ls s3://bucket/a/b/c/d/e/f/g/file.zip
> <timestamp> <bytes> a/b/c/d/e/f/g/a/b/c/d/e/f/g/file.zip
> ... {code}
> Environment:
>  * Ozone 1.3 
> [compatible|https://github.com/apache/ozone/commit/9c61a8aa497ab96c014ad3bb7b1ee4f731ebfaf8]
>  version (same environment as HDDS-7701, HDDS-7925)
>  * Several large files, all of them are uploaded by multipart using AWS-CLI, 
> divided into 8 MB chunks
>  * An FSO-enabled bucket
>  * OM HA
> Problem Details:
> I've dug the OM DB and found metadata in the keyTable has the full path for 
> the key, so it finally appears redundant prefix twice in the result of the 
> listStatus API.
> {code:java}
> # keyName has a full path for the key
> - v/b=volume/bucket parent=-9223371931475161596 Key=0g0pustv.tar.gz 
> size=2276021708 time=1672052153340 checksum=null id=-9223371931457566207 
> keyName=a/b/c/d/e/0g0pustv.tar.gz
> - v/b=volume/bucket parent=-9223371931475161596 Key=0g0pustv.zip 
> size=2333733892 time=1672052222395 checksum=null id=-9223371931408929023 
> keyName=a/b/c/d/e/0g0pustv.zip
> - v/b=volume/bucket parent=-9223371931475161596 Key=0nh5ww00.tar.gz 
> size=249321741 time=1672052233487 checksum=null id=-9223371931393057791 
> keyName=a/b/c/d/e/0nh5ww00.tar.gz
> - v/b=volume/bucket parent=-9223371931475161596 Key=0nh5ww00.zip 
> size=255764877 time=1672052242830 checksum=null id=-9223371931388326655 
> keyName=a/b/c/d/e/0nh5ww00.zip
> - v/b=volume/bucket parent=-9223371931475161596 Key=5b2uha1h.tar.gz 
> size=2276346612 time=1672052331175 checksum=null id=-9223371931348859135 
> keyName=a/b/c/d/e/5b2uha1h.tar.gz
> ...
> # other keys which have the same parent do not have their prefix in the key
> - v/b=volume/bucket parent=-9223371931475161596 Key=kh7vbwlh.zip 
> size=573797127 time=1672052273970 checksum=null id=-9223371931375503871 
> keyName=kh7vbwlh.zip
> - v/b=volume/bucket parent=-9223371931475161596 Key=ngaxsd8c.tar.gz 
> size=380094900 time=1672052284433 checksum=null id=-9223371931368669695 
> keyName=ngaxsd8c.tar.gz
> - v/b=volume/bucket parent=-9223371931475161596 Key=ngaxsd8c.zip 
> size=393085953 time=1672052099618 checksum=null id=-9223371931473057023 
> keyName=ngaxsd8c.zip
> - v/b=volume/bucket parent=-9223371931475161596 Key=nrou31c3.tar.gz 
> size=568466718 time=1672052124043 checksum=null id=-9223371931461502975 
> keyName=nrou31c3.tar.gz
> - v/b=volume/bucket parent=-9223371931475161596 Key=nrou31c3.zip 
> size=574807485 time=1672052149947 checksum=null id=-9223371931446918911 
> keyName=nrou31c3.zip
> - v/b=volume/bucket parent=-9223371931475161596 Key=ol8dhbqo.tar.gz 
> size=555722830 time=1672052168904 checksum=null id=-9223371931435349759 
> keyName=ol8dhbqo.tar.gz {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to