[ https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511224#comment-17511224 ]
daimin edited comment on HDFS-16422 at 3/23/22, 12:30 PM: ---------------------------------------------------------- [~jingzhao] I tested this again, and my test steps are: # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 8g # Stop one datanode # Check md5sum of these files through HDFS FUSE, this is a simple way to create concurrent preads(indirect IO on FUSE) Here is test result: * md5sum check before datanode down: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} * md5sum after datanode down, with native(ISA-L) decoder: {quote}md5sum /mnt/fuse/*g 206288b264b92af42563a14a242aa629 /mnt/fuse/1g bc86f9f549912d78c8b3d02ada5621a2 /mnt/fuse/2g c201356b7437e6aac1b574ade08b6ccb /mnt/fuse/4g ef2e6f6b4b6ab96a24e5f734e93bacc3 /mnt/fuse/8g {quote} * md5sum after datanode down, with pure Java decoder: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not thread safe, the read/write lock seems unable to protect the native decodeImpl method. And I also tested on md5sum check on same file with native(ISA-L) decoder, the result is different every time. {quote} for i in \{1..5};do md5sum /mnt/fuse/1g;done 2e68ea6738dccb4f248df81b5c55d464 /mnt/fuse/1g 54944120797266fc4e26bd465ae5e67a /mnt/fuse/1g ef4d099269fb117e357015cf424723a9 /mnt/fuse/1g 6a40dbca2636ae796b6380385ddfbc83 /mnt/fuse/1g 126fc40073dcebb67d413de95571c08b /mnt/fuse/1g {quote} IMO, HADOOP-15499 did improve the performance of decoder, however it breaked the correctness of decode method when invoked concurrently. We should take synchronized back, and it's ok to the the read/write lock too as it protects from init/release methods. Thanks [~jingzhao] again. was (Author: cndaimin): [~jingzhao] I tested this again, and my test steps are: # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 8g # Stop one datanode # Check md5sum of these files through HDFS FUSE, this is a simple way to create concurrent preads(indirect IO on FUSE) Here is test result: * md5sum check before datanode down: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} * md5sum after datanode down, with native(ISA-L) decoder: {quote}md5sum /mnt/fuse/*g 206288b264b92af42563a14a242aa629 /mnt/fuse/1g bc86f9f549912d78c8b3d02ada5621a2 /mnt/fuse/2g c201356b7437e6aac1b574ade08b6ccb /mnt/fuse/4g ef2e6f6b4b6ab96a24e5f734e93bacc3 /mnt/fuse/8g {quote} * md5sum after datanode down, with pure Java decoder: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not thread safe, the read/write lock seems unable to protect the native decodeImpl method. And I also tested on md5sum check on same file with native(ISA-L) decoder, the result is different every time. {quote} for i in \{1..5};do md5sum /mnt/fuse/1g;done 2e68ea6738dccb4f248df81b5c55d464 /mnt/fuse/1g 54944120797266fc4e26bd465ae5e67a /mnt/fuse/1g ef4d099269fb117e357015cf424723a9 /mnt/fuse/1g 6a40dbca2636ae796b6380385ddfbc83 /mnt/fuse/1g 126fc40073dcebb67d413de95571c08b /mnt/fuse/1g {quote} IMO, HADOOP-15499 did improve the performance of decoder, however it breaked the correctness of decode method when invoked concurrently. We should take synchronized back, and I will submit a new PR later to do this work. Thanks [~jingzhao] again. > Fix thread safety of EC decoding during concurrent preads > --------------------------------------------------------- > > Key: HDFS-16422 > URL: https://issues.apache.org/jira/browse/HDFS-16422 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, ec, erasure-coding > Affects Versions: 3.3.0, 3.3.1 > Reporter: daimin > Assignee: daimin > Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.3 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Reading data on an erasure-coded file with missing replicas(internal block of > block group) will cause online reconstruction: read dataUnits part of data > and decode them into the target missing data. Each DFSStripedInputStream > object has a RawErasureDecoder object, and when we doing pread concurrently, > RawErasureDecoder.decode will be invoked concurrently too. > RawErasureDecoder.decode is not thread safe, as a result of that we get wrong > data from pread occasionally. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org