[GitHub] [ozone] ChenSammi commented on a diff in pull request #3292: HDDS-6541. [Merge rocksdb in datanode] Per-disk DB location management.

GitBox Thu, 14 Apr 2022 02:41:29 -0700


ChenSammi commented on code in PR #3292:
URL: https://github.com/apache/ozone/pull/3292#discussion_r850272728



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/HddsVolumeUtil.java:
##########
@@ -234,4 +257,154 @@ public static boolean checkVolume(HddsVolume hddsVolume, 
String scmId,
       return true;
     }
   }
+
+  /**
+   * Randomly pick a DbVolume for HddsVolume and init db instance.
+   * Use the HddsVolume directly if no DbVolume found.
+   * @param hddsVolume
+   * @param dbVolumeSet
+   * @param clusterID
+   * @param conf
+   */
+  public static void formatDbStoreForHddsVolume(HddsVolume hddsVolume,
+      MutableVolumeSet dbVolumeSet, String clusterID,
+      ConfigurationSource conf) throws IOException {
+    DbVolume chosenDbVolume = null;
+    File dbParentDir;
+
+    if (dbVolumeSet == null || dbVolumeSet.getVolumesList().isEmpty()) {
+      // No extra db volumes specified, just create db under the HddsVolume.
+      dbParentDir = new File(hddsVolume.getStorageDir(), clusterID);
+    } else {
+      // Randomly choose a DbVolume for simplicity.
+      List<DbVolume> dbVolumeList = StorageVolumeUtil.getDbVolumesList(
+          dbVolumeSet.getVolumesList());
+      chosenDbVolume = dbVolumeList.get(
+          ThreadLocalRandom.current().nextInt(dbVolumeList.size()));
+      dbParentDir = new File(chosenDbVolume.getStorageDir(), clusterID);
+    }
+
+    // Init subdir with the storageID of HddsVolume.
+    File storageIdDir = new File(dbParentDir, hddsVolume.getStorageID());
+    if (!storageIdDir.mkdirs() && !storageIdDir.exists()) {
+      throw new IOException("Can't make subdir under "
+          + dbParentDir.getAbsolutePath() + " for volume "
+          + hddsVolume.getStorageID());
+    }
+
+    // Init the db instance for HddsVolume under the subdir above.
+    String containerDBPath = new File(storageIdDir, CONTAINER_DB_NAME)
+        .getAbsolutePath();
+    try {
+      initPerDiskDBStore(containerDBPath, conf);
+    } catch (IOException e) {
+      throw new IOException("Can't init db instance under path "
+          + containerDBPath + " for volume " + hddsVolume.getStorageID());
+    }
+
+    // Set the dbVolume and dbParentDir of the HddsVolume for db path lookup.
+    hddsVolume.setDbVolume(chosenDbVolume);
+    hddsVolume.setDbParentDir(storageIdDir);
+  }
+
+  /**
+   * Load already formatted db instances for all HddsVolumes.
+   * @param hddsVolumeSet
+   * @param dbVolumeSet
+   * @param conf
+   * @param logger
+   */
+  public static void loadAllHddsVolumeDbStore(MutableVolumeSet hddsVolumeSet,
+      MutableVolumeSet dbVolumeSet, ConfigurationSource conf, Logger logger) {
+    // Scan subdirs under the db volumes and build a one-to-one map
+    // between each HddsVolume -> DbVolume.
+    mapDbVolumesToDataVolumesIfNeeded(hddsVolumeSet, dbVolumeSet);
+
+    List<HddsVolume> hddsVolumeList = StorageVolumeUtil.getHddsVolumesList(
+        hddsVolumeSet.getVolumesList());
+
+    for (HddsVolume volume : hddsVolumeList) {
+      String clusterID = volume.getClusterID();
+
+      // DN startup for the first time, not registered yet,
+      // so the DbVolume is not formatted.
+      if (clusterID == null) {
+        continue;
+      }
+
+      File clusterIdDir = new File(volume.getDbVolume() == null ?
+          volume.getStorageDir() : volume.getDbVolume().getStorageDir(),
+          clusterID);
+
+      if (!clusterIdDir.exists()) {
+        if (logger != null) {
+          logger.error("HddsVolume {} db instance not formatted, " +
+                  "clusterID {} directory not found",
+              volume.getStorageDir().getAbsolutePath(),
+              clusterIdDir.getAbsolutePath());
+        }
+        continue;

Review Comment:
   
   >     2. The behavior you stated sounds reasonable to me(I'll check bad 
volume if db load failed in the next push), so as I understand, we should 
continue rather than throw&out when we hit a bad volume. Agreed?
   
   After all the discussions, we should add the failed hddsVolume to failed 
volume list, then check that if failed volumes exceed the limit, right?  
   
   > 
   >     3. Let me clarify it a bit. A DbVolume is only a place holder for 
potential db instances for HddsVolumes, it is formatted when DN register 
happens(only for now, I'll change it to format on DN startup), before all 
HddsVolumes. A db instance is formatted for each HddsVolume when `checkVolume` 
is called for the first time(after all DbVolumes are formatted). Check my 
comment below about the condition check `hddsFiles.length == 1`. Upon the first 
register of DN -> SCM, DN get a clusterID back and create a directory(e.g. 
/data1/hdds/CID-) for each HddsVolume, we only create a db instance together 
with the clusterID directory. On the 2nd, 3rd,... DN restart and registers, 
`checkVolume` will found that `hddsFiles.length == 2`, so no redundant db 
instance will be created. Does it make sense?
   
   Here are two cases, 
   1. RocksDB on HddsVolume.  Your explanation makes sense. But, current 
hddsFiles.length == 2 logic is SCM HA upgrade fix.  If you depends on 
hddsFiles.length 2 to detect whether RocksDB created or not, you need to 
distinguish these two different cases. 
   2. RocksDB on dbVolume. In the case, RocksDB existence is irrelevant with 
hddsFiles.length, right? 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] ChenSammi commented on a diff in pull request #3292: HDDS-6541. [Merge rocksdb in datanode] Per-disk DB location management.

Reply via email to