Gargi-jais11 commented on code in PR #8932:
URL: https://github.com/apache/ozone/pull/8932#discussion_r2306123820
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java:
##########
@@ -897,6 +901,84 @@ public DataScanResult scanData(DataTransferThrottler
throttler, Canceler cancele
return checker.fullCheck(throttler, canceler);
}
+ @Override
+ public void copyContainerDirectory(Path destination) throws IOException {
+ readLock();
+ try {
+ // Closed/ Quasi closed containers are considered for replication by
+ // replication manager if they are under-replicated.
+ ContainerProtos.ContainerDataProto.State state =
+ getContainerData().getState();
+ if (!(state == ContainerProtos.ContainerDataProto.State.CLOSED ||
+ state == ContainerProtos.ContainerDataProto.State.QUASI_CLOSED)) {
+ throw new IllegalStateException(
+ "Only (quasi)closed containers can be exported, but " +
+ "ContainerId=" + getContainerData().getContainerID() +
+ " is in state " + state);
+ }
+
+ if (!containerData.getSchemaVersion().equals(OzoneConsts.SCHEMA_V3)) {
+ compactDB();
+ // Close DB (and remove from cache) to avoid concurrent modification
+ // while copying it.
+ BlockUtils.removeDB(containerData, config);
+ }
+
+ if (containerData.getSchemaVersion().equals(OzoneConsts.SCHEMA_V3)) {
+ // Synchronize the dump and copy operation,
+ // so concurrent copy don't get dump files overwritten.
+ // We seldom got concurrent exports for a container,
+ // so it should not influence performance much.
+ synchronized (dumpLock) {
+ BlockUtils.dumpKVContainerDataToFiles(containerData, config);
+ copyContainerToDestination(destination);
+ }
+ } else {
+ copyContainerToDestination(destination);
Review Comment:
In this method, the goal is to copy the container's files to destination
volume. This is a **read-only** operation from the perspective of the
container's data and state. The code needs to guarantee that another thread
doesn't, for example, start deleting the container while it's being copied.
Acquiring a **readLock()** perfectly achieves this. It prevents any other
thread from getting a **writeLock()** to modify the container, while still
allowing other threads to perform concurrent read operations if needed.
Using a **writeLock()** here would be too restrictive and would
unnecessarily block other threads from reading the container.
> if multiple thread does same thing.
https://github.com/apache/ozone/blob/00ca998a966d42487f816393a59db6d20a7ac387/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java#L932-L938
The **synchronized** block ensures that even if two threads enter the method
concurrently, the process of dumping the RocksDB data to temporary files
(`dumpKVContainerDataToFiles`) is serialized. This prevents the two threads
from overwriting each other's temporary dump files, which would lead to
corruption.
Once the dump is complete and the files are copied, the lock is released,
and the next thread can proceed safely.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]