[ https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962112#comment-16962112 ]
Tsz-wo Sze commented on HDDS-2372: ---------------------------------- Some questions (sorry that I don't understand the test): - Did the NoSuchFileException happen in all three data nodes? Or just one? - What did the test do? Writing a lot of chunks to one Ratis pipeline? - Did the read in B.3 fail? It sounds like yes according to "the chunk can't be read any more from the tmp file." Was the tmp file moved to another location? If yes, the read should also try reading from there. Since this can be reproduced, we should add more log messages to trace back when did the tmp file get created, moved/deleted. > Datanode pipeline is failing with NoSuchFileException > ----------------------------------------------------- > > Key: HDDS-2372 > URL: https://issues.apache.org/jira/browse/HDDS-2372 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Reporter: Marton Elek > Priority: Critical > > Found it on a k8s based test cluster using a simple 3 node cluster and > HDDS-2327 freon test. After a while the StateMachine become unhealthy after > this error: > {code:java} > datanode-0 datanode java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > java.nio.file.NoSuchFileException: > /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830 > {code} > Can be reproduced. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org