[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

ASF GitHub Bot (JIRA) Wed, 07 Aug 2019 16:24:06 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290868&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290868
 ]


ASF GitHub Bot logged work on HDDS-1366:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Aug/19 23:22
            Start Date: 07/Aug/19 23:22
    Worklog Time Spent: 10m 
      Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311801586
 
 

 ##########
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##########
 @@ -70,39 +77,51 @@ public void testGetFileCounts() throws IOException {
     verify(utilizationService, times(1)).getFileCounts();
     verify(fileCountBySizeDao, times(1)).findAll();
 
-    assertEquals(41, resultList.size());
-    long fileSize = 4096L;
+    assertEquals(maxBinSize, resultList.size());
+    long fileSize = 4096L;              // 4KB
     int index =  findIndex(fileSize);
     long count = resultList.get(index).getCount();
     assertEquals(index, count);
 
-    fileSize = 1125899906842624L;
+    fileSize = 1125899906842624L;       // 1PB
     index = findIndex(fileSize);
-    if (index == Integer.MIN_VALUE) {
-      throw new IOException("File Size larger than permissible file size");
-    }
+    count = resultList.get(index).getCount();
+    assertEquals(maxBinSize - 1, index);
+    assertEquals(index, count);
 
-    fileSize = 1025L;
+    fileSize = 1025L;                   // 1 KB + 1B
     index = findIndex(fileSize);
-    count = resultList.get(index).getCount();
+    count = resultList.get(index).getCount(); //last extra bin for files >= 1PB
     assertEquals(index, count);
 
     fileSize = 25L;
     index = findIndex(fileSize);
     count = resultList.get(index).getCount();
     assertEquals(index, count);
+
+    fileSize = 1125899906842623L;       // 1PB - 1B
+    index = findIndex(fileSize);
+    count = resultList.get(index).getCount();
+    assertEquals(index, count);
+
+    fileSize = 1125899906842624L * 4;       // 4 PB
+    index = findIndex(fileSize);
+    count = resultList.get(index).getCount();
+    assertEquals(maxBinSize - 1, index);
+    assertEquals(index, count);
   }
 
   public int findIndex(long dataSize) {
-    int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-    if (logValue < 10) {
-      return 0;
-    } else {
-      int index = logValue - 10;
-      if (index > maxBinSize) {
-        return Integer.MIN_VALUE;
-      }
-      return (dataSize % oneKb == 0) ? index + 1 : index;
+    if (dataSize > Math.pow(2, (maxBinSize + 10 - 2))) {  // 1 PB = 2 ^ 50
+      return maxBinSize - 1;
+    }
+    int index = 0;
+    while(dataSize != 0) {
+      dataSize >>= 1;
+      index += 1;
 
 Review comment:
   This makes the unit test void. If we have the same logic used in the actual 
methods here, then the unit tests are always going to assert to true. We should 
use constant values to test against the actual methods.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 290868)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-1366
>                 URL: https://issues.apache.org/jira/browse/HDDS-1366
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: Ozone Recon
>            Reporter: Aravindan Vijayan
>            Assignee: Shweta
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

Reply via email to