[ https://issues.apache.org/jira/browse/HDDS-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16770154#comment-16770154 ]
Bharat Viswanadham commented on HDDS-1121: ------------------------------------------ Thank You [~linyiqun] for the review. {quote}we just pass checksum type and bytes per checksum to construct the checksum instance instead of. I'm not so fully understanding the root cause of this issue. Could you please describe a little on this? {quote} The issue is previously checksum object is constructed in rpcclient, and passed the same object to KeyOutputstream and BlockOutputStream. If multiple threads does the create key on the bucket, this checksum object will be shared, so the sha object also is shared across multiple threads (so internal buffers used to compute digest will be shared across threads), and this is causing issue in constructing checksum array list wrongly and we store the wrong checksum list for that block in DN. During reading the key we verify the checksum, it will not match with data, as the checksum list got corrupted. This is the root cause for the issue. So, the fix is not to share this checksum object across multiple threads, so removed the checksum object creation in RpcClient, and pass the checksum type and bytesPerChecksum and create the object when we compute checkSum. (same reason for not doing the Checksum object creation in KeyOutputStream) I hope this clear, let me know if you have any more questions. You can also try the testcase which is in the patch, I see some times getting the same error as mentioned in the Jira description. I will fix other review comments and upload a new patch. > Key read failure when data is written parallel in to Ozone > ---------------------------------------------------------- > > Key: HDDS-1121 > URL: https://issues.apache.org/jira/browse/HDDS-1121 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Reporter: Bharat Viswanadham > Assignee: Bharat Viswanadham > Priority: Major > Attachments: HDDS-1121.00.patch > > > When hive is run with multiple threads for data ingestion to ozone. After > ingestion is done, during read we see this below error. > This issue is found during hive testing, and found by [~t3rmin4t0r] > {code:java} > caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum > mismatch at index 0 > at > org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:143) > at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:239) > at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:217) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.readChunkFromContainer(BlockInputStream.java:227) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.seek(BlockInputStream.java:259) > at > org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.seek(KeyInputStream.java:249) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.seek(KeyInputStream.java:180) > at > org.apache.hadoop.fs.ozone.OzoneFSInputStream.seek(OzoneFSInputStream.java:62) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:82) > at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > at > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111) > at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:555) > at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:370) > at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:61) > at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:105) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1647) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1533) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1329) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1513) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1510) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1510) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1329) > at java.util.concurrent.FutureTask.run(FutureTask.java:266){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org