[jira] [Comment Edited] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-03-23 Thread daimin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511224#comment-17511224
 ] 

daimin edited comment on HDFS-16422 at 3/23/22, 12:30 PM:
--

[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
* md5sum check before datanode down:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} 

* md5sum after datanode down, with native(ISA-L) decoder:

 

 
{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote}

* md5sum after datanode down, with pure Java decoder:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}

In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}
for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}

IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and it's ok to the the read/write lock too as it protects 
from init/release methods. Thanks [~jingzhao] again.


was (Author: cndaimin):
[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
* md5sum check before datanode down:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} 

* md5sum after datanode down, with native(ISA-L) decoder:

 

 
{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote}

* md5sum after datanode down, with pure Java decoder:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}

In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}
for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}

IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and I will submit a new PR later to do this work. Thanks 
[~jingzhao] again.

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause 

[jira] [Comment Edited] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-03-23 Thread daimin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511224#comment-17511224
 ] 

daimin edited comment on HDFS-16422 at 3/23/22, 12:26 PM:
--

[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
* md5sum check before datanode down:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} 

* md5sum after datanode down, with native(ISA-L) decoder:

 

 
{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote}

* md5sum after datanode down, with pure Java decoder:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}

In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}
for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}

IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and I will submit a new PR later to do this work. Thanks 
[~jingzhao] again.


was (Author: cndaimin):
[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
 * md5sum check before datanode down:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} * md5sum after datanode down, with native(ISA-L) decoder:

 

 
{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote} * md5sum after datanode down, with pure Java decoder:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}
In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}
IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and I will submit a new PR later to do this work. Thanks 
[~jingzhao] again.

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part 

[jira] [Comment Edited] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-03-23 Thread daimin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511224#comment-17511224
 ] 

daimin edited comment on HDFS-16422 at 3/23/22, 12:25 PM:
--

[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
 * md5sum check before datanode down:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} * md5sum after datanode down, with native(ISA-L) decoder:

 

 
{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote} * md5sum after datanode down, with pure Java decoder:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}
In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}
IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and I will submit a new PR later to do this work. Thanks 
[~jingzhao] again.


was (Author: cndaimin):
[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
 * md5sum check before datanode down:

{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} * md5sum after datanode down, with native(ISA-L) decoder:

{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote} * md5sum after datanode down, with pure Java decoder:

{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}
In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}
IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and I will submit a new PR later to do this work. Thanks 
[~jingzhao] again.

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part of data 
>