[ https://issues.apache.org/jira/browse/HDDS-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914521#comment-16914521 ]
Anu Engineer edited comment on HDDS-2026 at 8/23/19 6:10 PM: ------------------------------------------------------------- No locking version in the changes.diff. We rely on the semantics of Ozone here, where the chunks by definition are never overwritten and writes are visible only when we commit the metadata; which guarantees that read threads never will have concurrent access to chunk files while writes are going on;which allows us to read without locks since chunk files are immutable. Delete is not an issue if the file is open, OS will keep it around till we are done. was (Author: anu): No locking version in the changes.diff. We rely on the semantics of Ozone here, where the chunks by definition are never overwritten and writes are visible only when we commit the metadata; which guarantees that read threads never will have concurrent access to chunk files;which allows us to read without locks since chunk files are immutable. Delete is not an issue if the file is open, OS will keep it around till we are done. > Overlapping chunk region cannot be read concurrently > ---------------------------------------------------- > > Key: HDDS-2026 > URL: https://issues.apache.org/jira/browse/HDDS-2026 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode > Reporter: Doroszlai, Attila > Priority: Critical > Attachments: HDDS-2026-repro.patch, changes.diff, > first-cut-proposed.diff > > > Concurrent requests to datanode for the same chunk may result in the > following exception in datanode: > {code} > java.nio.channels.OverlappingFileLockException > at java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229) > at java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123) > at > java.base/sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178) > at > java.base/sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185) > at > java.base/sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118) > at > org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:175) > at > org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:213) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:574) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:195) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:271) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148) > at > org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73) > at > org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61) > {code} > It seems this is covered by retry logic, as key read is eventually successful > at client side. > The problem is that: > bq. File locks are held on behalf of the entire Java virtual machine. They > are not suitable for controlling access to a file by multiple threads within > the same virtual machine. > ([source|https://docs.oracle.com/javase/8/docs/api/java/nio/channels/FileLock.html]) > code ref: > [{{ChunkUtils.readData}}|https://github.com/apache/hadoop/blob/c92de8209d1c7da9e7ce607abeecb777c4a52c6a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L175] -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org