[ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969379#comment-16969379
 ] 

Shashikant Banerjee commented on HDDS-2372:
-------------------------------------------

In ratis, raft log entries can get truncated after leader election happens. The 
data write actually happens as a part of append the log entry itself. 
Currently, if the raft log gets truncated , we don't do any handling for those 
entries i.e, we don't delete/validate the chunk files written as a part of log 
entry itself as the the data always exist in the tmp files which is stamped 
with the term and log index  which are not visible and will remain as garbage 
even if the corresponding log entries in the raft log have been truncated. 

If we write to the actual chunk file which happens as a part of writing the log 
itself, then correspondingly, if the those log entries get truncated, we might 
need to handle this inside ozone by deleting the corresponding chunk files as 
well to maintain the consistency or have to validate the data while updating 
the rocks db entries as well.

> Datanode pipeline is failing with NoSuchFileException
> -----------------------------------------------------
>
>                 Key: HDDS-2372
>                 URL: https://issues.apache.org/jira/browse/HDDS-2372
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Marton Elek
>            Assignee: Shashikant Banerjee
>            Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to