[ 
https://issues.apache.org/jira/browse/HDDS-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16770154#comment-16770154
 ] 

Bharat Viswanadham commented on HDDS-1121:
------------------------------------------

Thank You [~linyiqun] for the review.
{quote}we just pass checksum type and bytes per checksum to construct the 
checksum instance instead of. I'm not so fully understanding the root cause of 
this issue. Could you please describe a little on this?
{quote}
The issue is previously checksum object is constructed in rpcclient, and passed 
the same object to KeyOutputstream and BlockOutputStream. If multiple threads 
does the create key on the bucket, this checksum object will be shared, so the 
sha object also is shared across multiple threads (so internal buffers used to 
compute digest will be shared across threads), and this is causing issue in 
constructing checksum array list wrongly and we store the wrong checksum list 
for that block in DN. During reading the key we verify the checksum, it will 
not match with data, as the checksum list got corrupted. This is the root cause 
for the issue. So, the fix is not to share this checksum object across multiple 
threads, so removed the checksum object creation in RpcClient, and pass the 
checksum type and bytesPerChecksum and create the object when we compute 
checkSum. (same reason for not doing the Checksum object creation in 
KeyOutputStream) I hope this clear, let me know if you have any more questions.

 

You can also try the testcase which is in the patch, I see some times getting 
the same error as mentioned in the Jira description.

 

I will fix other review comments and upload a new patch.

> Key read failure when data is written parallel in to Ozone
> ----------------------------------------------------------
>
>                 Key: HDDS-1121
>                 URL: https://issues.apache.org/jira/browse/HDDS-1121
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Bharat Viswanadham
>            Assignee: Bharat Viswanadham
>            Priority: Major
>         Attachments: HDDS-1121.00.patch
>
>
> When hive is run with multiple threads for data ingestion to ozone. After 
> ingestion is done, during read we see this below error.
> This issue is found during hive testing, and found by [~t3rmin4t0r]
> {code:java}
> caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum 
> mismatch at index 0
>  at 
> org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:143)
>  at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:239)
>  at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:217)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.readChunkFromContainer(BlockInputStream.java:227)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.seek(BlockInputStream.java:259)
>  at 
> org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.seek(KeyInputStream.java:249)
>  at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.seek(KeyInputStream.java:180)
>  at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.seek(OzoneFSInputStream.java:62)
>  at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:82)
>  at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
>  at 
> org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
>  at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:555)
>  at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:370)
>  at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:61)
>  at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:105)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1647)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1533)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1329)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1513)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1510)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1510)
>  at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1329)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to