[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

ASF GitHub Bot (JIRA) Tue, 06 Aug 2019 15:58:54 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290046&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290046
 ]


ASF GitHub Bot logged work on HDDS-1366:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Aug/19 22:57
            Start Date: 06/Aug/19 22:57
    Worklog Time Spent: 10m 
      Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311309493
 
 

 ##########
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##########
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
         LOG.error("Unexpected exception while updating key data : {} {}",
                 updatedKey, e.getMessage());
         return new ImmutablePair<>(getTaskName(), false);
-      } finally {
-        populateFileCountBySizeDB();
       }
+      populateFileCountBySizeDB();
     }
     LOG.info("Completed a 'process' run of FileSizeCountTask.");
     return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
    * Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
    *
    * @param dataSize Size of the key.
    * @return int bin index in upperBoundCount
    */
-  private int calcBinIndex(long dataSize) {
-    if(dataSize >= maxFileSizeUpperBound) {
-      return Integer.MIN_VALUE;
-    } else if (dataSize > SIZE_512_TB) {
-      //given the small difference in 512TB and 512TB + 1B, index for both 
would
-      //return same, to differentiate specific condition added.
-      return maxBinSize - 1;
-    }
-    int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-    if(logValue < 10){
-      return 0;
-    } else{
-      return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+    int index = 0;
+    while(dataSize != 0) {
+      dataSize >>= 1;
+      index += 1;
     }
+    return index < 10 ? 0 : index - 10;
   }
 
-  private void countFileSize(OmKeyInfo omKeyInfo) throws IOException{
-    int index = calcBinIndex(omKeyInfo.getDataSize());
-    if(index == Integer.MIN_VALUE) {
-      throw new IOException("File Size larger than permissible file size "
-          + maxFileSizeUpperBound +" bytes");
+  void countFileSize(OmKeyInfo omKeyInfo) {
+    int index;
+    if (omKeyInfo.getDataSize() >= maxFileSizeUpperBound) {
+      index = maxBinSize - 1;
+    } else {
+      index = calculateBinIndex(omKeyInfo.getDataSize());
     }
     upperBoundCount[index]++;
   }
 
-  private void populateFileCountBySizeDB() {
+  /**
+   * Populate DB with the counts of file sizes calculated
+   * using the dao.
+   *
+   */
+  void populateFileCountBySizeDB() {
     for (int i = 0; i < upperBoundCount.length; i++) {
       long fileSizeUpperBound = (long) Math.pow(2, (10 + i));
       FileCountBySize fileCountRecord =
           fileCountBySizeDao.findById(fileSizeUpperBound);
       FileCountBySize newRecord = new
           FileCountBySize(fileSizeUpperBound, upperBoundCount[i]);
-      if(fileCountRecord == null){
+      if (fileCountRecord == null) {
 
 Review comment:
   Yes, it should be `LONG.MAX_VALUE`.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 290046)
    Time Spent: 7h  (was: 6h 50m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-1366
>                 URL: https://issues.apache.org/jira/browse/HDDS-1366
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: Ozone Recon
>            Reporter: Aravindan Vijayan
>            Assignee: Shweta
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 7h
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

Reply via email to