[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=292545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292545
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 10/Aug/19 17:14
Start Date: 10/Aug/19 17:14
Worklog Time Spent: 10m 
  Work Description: arp7 commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 292545)
Time Spent: 12h  (was: 11h 50m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=292544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292544
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 10/Aug/19 17:14
Start Date: 10/Aug/19 17:14
Worklog Time Spent: 10m 
  Work Description: arp7 commented on issue #1146: HDDS-1366. Add ability 
in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#issuecomment-520165586
 
 
   +1 the test failures look unrelated. There are a few checkstyle failures in 
tests, can you file a followup jira to address them right away?
   
   Thanks for the contribution @shwetayakkali and thanks @avijayanhwx and 
@vivekratnavel  for the reviews!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 292544)
Time Spent: 11h 50m  (was: 11h 40m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=292337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292337
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 09/Aug/19 23:37
Start Date: 09/Aug/19 23:37
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on issue #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#issuecomment-520095919
 
 
   LGTM +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 292337)
Time Spent: 11h 40m  (was: 11.5h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=292241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292241
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 09/Aug/19 20:50
Start Date: 09/Aug/19 20:50
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on issue #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#issuecomment-520059219
 
 
   +1 LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 292241)
Time Spent: 11.5h  (was: 11h 20m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=292234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292234
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 09/Aug/19 20:10
Start Date: 09/Aug/19 20:10
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r312632071
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.OmMetadataManagerImpl;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.TypedTable;
+import org.junit.Test;
+
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+
+import static org.apache.hadoop.ozone.recon.tasks.
+OMDBUpdateEvent.OMDBUpdateAction.PUT;
+import static org.junit.Assert.assertEquals;
+
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.BDDMockito.given;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.times;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+
+/**
+ * Unit test for Container Key mapper task.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(OmKeyInfo.class)
+
+public class TestFileSizeCountTask {
+  @Test
+  public void testCalculateBinIndex() {
+FileSizeCountTask fileSizeCountTask = mock(FileSizeCountTask.class);
+
+when(fileSizeCountTask.getMaxFileSizeUpperBound()).
+thenReturn(1125899906842624L);// 1 PB
+when(fileSizeCountTask.getOneKB()).thenReturn(1024L);
+when(fileSizeCountTask.getMaxBinSize()).thenReturn(42);
+when(fileSizeCountTask.calculateBinIndex(anyLong())).thenCallRealMethod();
+when(fileSizeCountTask.nextClosestPowerIndexOfTwo(
+anyLong())).thenCallRealMethod();
+
+long fileSize = 1024L;// 1 KB
+int binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(1, binIndex);
+
+fileSize = 1023L;// 1KB - 1B
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(0, binIndex);
+
+fileSize = 562949953421312L;  // 512 TB
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 562949953421313L;  // (512 TB + 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 562949953421311L;  // (512 TB - 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(39, binIndex);
+
+fileSize = 1125899906842624L;  // 1 PB - last (extra) bin
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(41, binIndex);
+
+fileSize = 10L;
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(7, binIndex);
+
+fileSize = 1125899906842623L;  // (1 PB - 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 1125899906842624L * 4;  // 4 PB - last extra bin
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(41, binIndex);
+
+fileSize = Long.MAX_VALUE;// extra bin
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(41, binIndex);
+  }
+
+  @Test
+  public void testFileCountBySizeReprocess() throws IOException {
 
 Review comment:
   Sure.
   
 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=292205=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292205
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 09/Aug/19 18:21
Start Date: 09/Aug/19 18:21
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r312596657
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.OmMetadataManagerImpl;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.TypedTable;
+import org.junit.Test;
+
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+
+import static org.apache.hadoop.ozone.recon.tasks.
+OMDBUpdateEvent.OMDBUpdateAction.PUT;
+import static org.junit.Assert.assertEquals;
+
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.BDDMockito.given;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.times;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+
+/**
+ * Unit test for Container Key mapper task.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(OmKeyInfo.class)
+
+public class TestFileSizeCountTask {
+  @Test
+  public void testCalculateBinIndex() {
+FileSizeCountTask fileSizeCountTask = mock(FileSizeCountTask.class);
+
+when(fileSizeCountTask.getMaxFileSizeUpperBound()).
+thenReturn(1125899906842624L);// 1 PB
+when(fileSizeCountTask.getOneKB()).thenReturn(1024L);
+when(fileSizeCountTask.getMaxBinSize()).thenReturn(42);
+when(fileSizeCountTask.calculateBinIndex(anyLong())).thenCallRealMethod();
+when(fileSizeCountTask.nextClosestPowerIndexOfTwo(
+anyLong())).thenCallRealMethod();
+
+long fileSize = 1024L;// 1 KB
+int binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(1, binIndex);
+
+fileSize = 1023L;// 1KB - 1B
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(0, binIndex);
+
+fileSize = 562949953421312L;  // 512 TB
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 562949953421313L;  // (512 TB + 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 562949953421311L;  // (512 TB - 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(39, binIndex);
+
+fileSize = 1125899906842624L;  // 1 PB - last (extra) bin
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(41, binIndex);
+
+fileSize = 10L;
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(7, binIndex);
+
+fileSize = 1125899906842623L;  // (1 PB - 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 1125899906842624L * 4;  // 4 PB - last extra bin
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(41, binIndex);
+
+fileSize = Long.MAX_VALUE;// extra bin
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(41, binIndex);
+  }
+
+  @Test
+  public void testFileCountBySizeReprocess() throws IOException {
 
 Review comment:
   We will need another test 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=292206=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292206
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 09/Aug/19 18:21
Start Date: 09/Aug/19 18:21
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r312595826
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.OmMetadataManagerImpl;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.TypedTable;
+import org.junit.Test;
+
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+
+import static org.apache.hadoop.ozone.recon.tasks.
+OMDBUpdateEvent.OMDBUpdateAction.PUT;
+import static org.junit.Assert.assertEquals;
+
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.BDDMockito.given;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.times;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+
+/**
+ * Unit test for Container Key mapper task.
 
 Review comment:
   nit: change this to File Size Count task
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 292206)
Time Spent: 11h 10m  (was: 11h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=291652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-291652
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 08/Aug/19 23:20
Start Date: 08/Aug/19 23:20
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r312281064
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -62,66 +64,73 @@ public void setUpResultList() {
 (long) i));
   }
 }
+return resultList;
   }
 
   @Test
   public void testGetFileCounts() throws IOException {
-setUpResultList();
+List resultList = setUpResultList();
 
 utilizationService = mock(UtilizationService.class);
 when(utilizationService.getFileCounts()).thenCallRealMethod();
 when(utilizationService.getDao()).thenReturn(fileCountBySizeDao);
 when(fileCountBySizeDao.findAll()).thenReturn(resultList);
 
-utilizationService.getFileCounts();
+Response response = utilizationService.getFileCounts();
+// get result list from Response entity
+List responseList =
+(List) response.getEntity();
+
 verify(utilizationService, times(1)).getFileCounts();
 verify(fileCountBySizeDao, times(1)).findAll();
 
-assertEquals(maxBinSize, resultList.size());
+FileSizeCountTask fileSizeCountTask = mock(FileSizeCountTask.class);
+when(fileSizeCountTask.getMaxFileSizeUpperBound()).
+thenReturn(1125899906842624L);
+when(fileSizeCountTask.getMaxBinSize()).thenReturn(maxBinSize);
+when(fileSizeCountTask.calculateBinIndex(anyLong())).thenCallRealMethod();
+assertEquals(maxBinSize, responseList.size());
+
 long fileSize = 4096L;  // 4KB
-int index =  findIndex(fileSize);
-long count = resultList.get(index).getCount();
+int index =  fileSizeCountTask.calculateBinIndex(fileSize);
+
+long count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 1125899906842624L;   // 1PB
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
+//last extra bin for files >= 1PB
 assertEquals(maxBinSize - 1, index);
 assertEquals(index, count);
 
 fileSize = 1025L;   // 1 KB + 1B
-index = findIndex(fileSize);
-count = resultList.get(index).getCount(); //last extra bin for files >= 1PB
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 25L;
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 1125899906842623L;   // 1PB - 1B
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 1125899906842624L * 4;   // 4 PB
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
 
 Review comment:
   So, what exactly should testGetFileCounts() test?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 291652)
Time Spent: 10h 40m  (was: 10.5h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=291653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-291653
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 08/Aug/19 23:20
Start Date: 08/Aug/19 23:20
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r312281064
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -62,66 +64,73 @@ public void setUpResultList() {
 (long) i));
   }
 }
+return resultList;
   }
 
   @Test
   public void testGetFileCounts() throws IOException {
-setUpResultList();
+List resultList = setUpResultList();
 
 utilizationService = mock(UtilizationService.class);
 when(utilizationService.getFileCounts()).thenCallRealMethod();
 when(utilizationService.getDao()).thenReturn(fileCountBySizeDao);
 when(fileCountBySizeDao.findAll()).thenReturn(resultList);
 
-utilizationService.getFileCounts();
+Response response = utilizationService.getFileCounts();
+// get result list from Response entity
+List responseList =
+(List) response.getEntity();
+
 verify(utilizationService, times(1)).getFileCounts();
 verify(fileCountBySizeDao, times(1)).findAll();
 
-assertEquals(maxBinSize, resultList.size());
+FileSizeCountTask fileSizeCountTask = mock(FileSizeCountTask.class);
+when(fileSizeCountTask.getMaxFileSizeUpperBound()).
+thenReturn(1125899906842624L);
+when(fileSizeCountTask.getMaxBinSize()).thenReturn(maxBinSize);
+when(fileSizeCountTask.calculateBinIndex(anyLong())).thenCallRealMethod();
+assertEquals(maxBinSize, responseList.size());
+
 long fileSize = 4096L;  // 4KB
-int index =  findIndex(fileSize);
-long count = resultList.get(index).getCount();
+int index =  fileSizeCountTask.calculateBinIndex(fileSize);
+
+long count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 1125899906842624L;   // 1PB
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
+//last extra bin for files >= 1PB
 assertEquals(maxBinSize - 1, index);
 assertEquals(index, count);
 
 fileSize = 1025L;   // 1 KB + 1B
-index = findIndex(fileSize);
-count = resultList.get(index).getCount(); //last extra bin for files >= 1PB
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 25L;
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 1125899906842623L;   // 1PB - 1B
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 1125899906842624L * 4;   // 4 PB
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
 
 Review comment:
   So, what exactly should testGetFileCounts() test? as part of assertions?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 291653)
Time Spent: 10h 50m  (was: 10h 40m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=291649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-291649
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 08/Aug/19 23:18
Start Date: 08/Aug/19 23:18
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on pull request #1146: HDDS-1366. 
Add ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r312280564
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -62,66 +64,73 @@ public void setUpResultList() {
 (long) i));
   }
 }
+return resultList;
   }
 
   @Test
   public void testGetFileCounts() throws IOException {
-setUpResultList();
+List resultList = setUpResultList();
 
 utilizationService = mock(UtilizationService.class);
 when(utilizationService.getFileCounts()).thenCallRealMethod();
 when(utilizationService.getDao()).thenReturn(fileCountBySizeDao);
 when(fileCountBySizeDao.findAll()).thenReturn(resultList);
 
-utilizationService.getFileCounts();
+Response response = utilizationService.getFileCounts();
+// get result list from Response entity
+List responseList =
+(List) response.getEntity();
+
 verify(utilizationService, times(1)).getFileCounts();
 verify(fileCountBySizeDao, times(1)).findAll();
 
-assertEquals(maxBinSize, resultList.size());
+FileSizeCountTask fileSizeCountTask = mock(FileSizeCountTask.class);
+when(fileSizeCountTask.getMaxFileSizeUpperBound()).
+thenReturn(1125899906842624L);
+when(fileSizeCountTask.getMaxBinSize()).thenReturn(maxBinSize);
+when(fileSizeCountTask.calculateBinIndex(anyLong())).thenCallRealMethod();
+assertEquals(maxBinSize, responseList.size());
+
 long fileSize = 4096L;  // 4KB
-int index =  findIndex(fileSize);
-long count = resultList.get(index).getCount();
+int index =  fileSizeCountTask.calculateBinIndex(fileSize);
+
+long count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 1125899906842624L;   // 1PB
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
+//last extra bin for files >= 1PB
 assertEquals(maxBinSize - 1, index);
 assertEquals(index, count);
 
 fileSize = 1025L;   // 1 KB + 1B
-index = findIndex(fileSize);
-count = resultList.get(index).getCount(); //last extra bin for files >= 1PB
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 25L;
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 1125899906842623L;   // 1PB - 1B
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
+count = responseList.get(index).getCount();
 assertEquals(index, count);
 
 fileSize = 1125899906842624L * 4;   // 4 PB
-index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+index = fileSizeCountTask.calculateBinIndex(fileSize);
 
 Review comment:
   These assertions are not needed. FileSizeCountTask working is tested in 
TestFileSizeCountTask unit test class. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 291649)
Time Spent: 10.5h  (was: 10h 20m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=291645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-291645
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 08/Aug/19 23:16
Start Date: 08/Aug/19 23:16
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on pull request #1146: HDDS-1366. 
Add ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r312278472
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -67,19 +72,22 @@ public FileSizeCountTask(OMMetadataManager 
omMetadataManager,
 upperBoundCount = new long[getMaxBinSize()];
   }
 
-  protected long getOneKB() {
+  @VisibleForTesting
+  public long getOneKB() {
 
 Review comment:
   public method does not need VisibleForTesting annotation.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 291645)
Time Spent: 10h 20m  (was: 10h 10m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=291646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-291646
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 08/Aug/19 23:16
Start Date: 08/Aug/19 23:16
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on pull request #1146: HDDS-1366. 
Add ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r312278997
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -62,66 +64,73 @@ public void setUpResultList() {
 (long) i));
   }
 }
+return resultList;
   }
 
   @Test
   public void testGetFileCounts() throws IOException {
-setUpResultList();
+List resultList = setUpResultList();
 
 utilizationService = mock(UtilizationService.class);
 when(utilizationService.getFileCounts()).thenCallRealMethod();
 when(utilizationService.getDao()).thenReturn(fileCountBySizeDao);
 when(fileCountBySizeDao.findAll()).thenReturn(resultList);
 
-utilizationService.getFileCounts();
+Response response = utilizationService.getFileCounts();
+// get result list from Response entity
+List responseList =
+(List) response.getEntity();
+
 verify(utilizationService, times(1)).getFileCounts();
 
 Review comment:
   Why are we verifying the actual method call? Method calls verification is 
generally used for mocked methods (So that we know the code path went through 
that). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 291646)
Time Spent: 10h 20m  (was: 10h 10m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=291644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-291644
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 08/Aug/19 23:16
Start Date: 08/Aug/19 23:16
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on pull request #1146: HDDS-1366. 
Add ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r312268532
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -38,6 +40,9 @@
 import java.util.Iterator;
 import java.util.List;
 
+import static org.apache.hadoop.utils.BatchOperation.Operation.DELETE;
 
 Review comment:
   Let's use 
org.apache.hadoop.ozone.recon.tasks.OMDBUpdateEvent.OMDBUpdateAction to keep it 
consistent. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 291644)
Time Spent: 10h 10m  (was: 10h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=291227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-291227
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 08/Aug/19 12:20
Start Date: 08/Aug/19 12:20
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#issuecomment-519494577
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 65 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 5 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 15 | Maven dependency ordering for branch |
   | +1 | mvninstall | 589 | trunk passed |
   | +1 | compile | 354 | trunk passed |
   | +1 | checkstyle | 64 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 787 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 147 | trunk passed |
   | 0 | spotbugs | 436 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 627 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 18 | Maven dependency ordering for patch |
   | +1 | mvninstall | 536 | the patch passed |
   | +1 | compile | 360 | the patch passed |
   | +1 | javac | 360 | the patch passed |
   | +1 | checkstyle | 63 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 627 | patch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 83 | hadoop-ozone generated 7 new + 13 unchanged - 0 fixed 
= 20 total (was 13) |
   | +1 | findbugs | 643 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 349 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2704 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 47 | The patch does not generate ASF License warnings. |
   | | | 8295 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.TestMiniChaosOzoneCluster |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.om.TestOmInit |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStream |
   |   | hadoop.ozone.om.TestOzoneManagerRestInterface |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory |
   |   | hadoop.ozone.client.rpc.TestCommitWatcher |
   |   | hadoop.ozone.om.TestOmMetrics |
   |   | hadoop.ozone.om.TestOzoneManagerHA |
   |   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion |
   |   | hadoop.ozone.scm.TestSCMNodeManagerMXBean |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.om.TestScmSafeMode |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.scm.TestGetCommittedBlockLengthAndPutKey |
   |   | hadoop.ozone.om.TestKeyManagerImpl |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1146 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 49656151e0e6 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 397a563 |
   | Default Java | 1.8.0_222 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/4/artifact/out/diff-javadoc-javadoc-hadoop-ozone.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/4/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/4/testReport/ |
   | Max. process+thread count | 4049 (vs. ulimit of 5500) |
   | modules | C: hadoop-ozone/ozone-recon-codegen hadoop-ozone/ozone-recon U: 
hadoop-ozone |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/4/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290982
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 08/Aug/19 04:04
Start Date: 08/Aug/19 04:04
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#issuecomment-519354016
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 82 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 5 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 15 | Maven dependency ordering for branch |
   | +1 | mvninstall | 609 | trunk passed |
   | +1 | compile | 397 | trunk passed |
   | +1 | checkstyle | 72 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 897 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 172 | trunk passed |
   | 0 | spotbugs | 477 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 709 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 21 | Maven dependency ordering for patch |
   | +1 | mvninstall | 595 | the patch passed |
   | +1 | compile | 426 | the patch passed |
   | +1 | javac | 426 | the patch passed |
   | +1 | checkstyle | 86 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 833 | patch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 106 | hadoop-ozone generated 7 new + 13 unchanged - 0 fixed 
= 20 total (was 13) |
   | +1 | findbugs | 706 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 369 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2032 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 44 | The patch does not generate ASF License warnings. |
   | | | 8440 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.om.TestOzoneManagerHA |
   |   | hadoop.ozone.TestMiniOzoneCluster |
   |   | hadoop.ozone.client.rpc.Test2WayCommitInRatis |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.om.TestKeyManagerImpl |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.om.TestScmSafeMode |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.TestStorageContainerManager |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1146 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux ac00b7984441 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 70b4617 |
   | Default Java | 1.8.0_212 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/3/artifact/out/diff-javadoc-javadoc-hadoop-ozone.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/3/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/3/testReport/ |
   | Max. process+thread count | 4088 (vs. ulimit of 5500) |
   | modules | C: hadoop-ozone/ozone-recon-codegen hadoop-ozone/ozone-recon U: 
hadoop-ozone |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/3/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290868=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290868
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:22
Start Date: 07/Aug/19 23:22
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311801586
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -70,39 +77,51 @@ public void testGetFileCounts() throws IOException {
 verify(utilizationService, times(1)).getFileCounts();
 verify(fileCountBySizeDao, times(1)).findAll();
 
-assertEquals(41, resultList.size());
-long fileSize = 4096L;
+assertEquals(maxBinSize, resultList.size());
+long fileSize = 4096L;  // 4KB
 int index =  findIndex(fileSize);
 long count = resultList.get(index).getCount();
 assertEquals(index, count);
 
-fileSize = 1125899906842624L;
+fileSize = 1125899906842624L;   // 1PB
 index = findIndex(fileSize);
-if (index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size");
-}
+count = resultList.get(index).getCount();
+assertEquals(maxBinSize - 1, index);
+assertEquals(index, count);
 
-fileSize = 1025L;
+fileSize = 1025L;   // 1 KB + 1B
 index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+count = resultList.get(index).getCount(); //last extra bin for files >= 1PB
 assertEquals(index, count);
 
 fileSize = 25L;
 index = findIndex(fileSize);
 count = resultList.get(index).getCount();
 assertEquals(index, count);
+
+fileSize = 1125899906842623L;   // 1PB - 1B
+index = findIndex(fileSize);
+count = resultList.get(index).getCount();
+assertEquals(index, count);
+
+fileSize = 1125899906842624L * 4;   // 4 PB
+index = findIndex(fileSize);
+count = resultList.get(index).getCount();
+assertEquals(maxBinSize - 1, index);
+assertEquals(index, count);
   }
 
   public int findIndex(long dataSize) {
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if (logValue < 10) {
-  return 0;
-} else {
-  int index = logValue - 10;
-  if (index > maxBinSize) {
-return Integer.MIN_VALUE;
-  }
-  return (dataSize % oneKb == 0) ? index + 1 : index;
+if (dataSize > Math.pow(2, (maxBinSize + 10 - 2))) {  // 1 PB = 2 ^ 50
+  return maxBinSize - 1;
+}
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 
 Review comment:
   This makes the unit test void. If we have the same logic used in the actual 
methods here, then the unit tests are always going to assert to true. We should 
use constant values to test against the actual methods.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290868)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290867
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:22
Start Date: 07/Aug/19 23:22
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311799843
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -78,7 +78,8 @@ protected long getMaxFileSizeUpperBound() {
   protected int getMaxBinSize() {
 if (maxBinSize == -1) {
   // extra bin to add files > 1PB.
-  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+  // 1 KB (2 ^ 10) is the smallest tracked file.
+  maxBinSize = nextClosetPowerIndexOfTwo(maxFileSizeUpperBound) - 10 + 1;
 
 Review comment:
   nit: typo in `nextClosestPowerIndexOfTwo`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290867)
Time Spent: 9.5h  (was: 9h 20m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290866
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:22
Start Date: 07/Aug/19 23:22
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311801756
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -116,13 +126,13 @@ public void testFileCountBySizeReprocess() throws 
IOException {
 when(fileSizeCountTask.getMaxFileSizeUpperBound()).
 thenReturn(4096L);
 when(fileSizeCountTask.getOneKB()).thenReturn(1024L);
-when(fileSizeCountTask.getMaxBinSize()).thenReturn(3);
+//when(fileSizeCountTask.getMaxBinSize()).thenReturn(3);
 
 Review comment:
   This line should be removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290866)
Time Spent: 9h 20m  (was: 9h 10m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290869
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:22
Start Date: 07/Aug/19 23:22
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311800232
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -98,7 +99,9 @@ protected int getMaxBinSize() {
 keyIter = omKeyInfoTable.iterator()) {
   while (keyIter.hasNext()) {
 Table.KeyValue kv = keyIter.next();
-countFileSize(kv.getValue());
+
+// reprocess() is a PUT operation on the DB.
+updateUpperBoundCount(kv.getValue(), "PUT");
 
 Review comment:
   nit: update this with enum `Operation.PUT` in the future. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290869)
Time Spent: 9h 40m  (was: 9.5h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290833
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:43
Start Date: 07/Aug/19 22:43
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311792973
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon-codegen/src/main/java/org/hadoop/ozone/recon/schema/UtilizationSchemaDefinition.java
 ##
 @@ -65,5 +69,12 @@ void createClusterGrowthTable(Connection conn) {
 .execute();
   }
 
-
+  void createFileSizeCount(Connection conn) {
+DSL.using(conn).createTableIfNotExists(FILE_COUNT_BY_SIZE_TABLE_NAME)
+.column("file_size_kb", SQLDataType.BIGINT)
 
 Review comment:
   Sure.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290833)
Time Spent: 9h 10m  (was: 9h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290830
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:42
Start Date: 07/Aug/19 22:42
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311792886
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290832
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:42
Start Date: 07/Aug/19 22:42
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311792914
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290826
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:41
Start Date: 07/Aug/19 22:41
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311792637
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290825
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:41
Start Date: 07/Aug/19 22:41
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311792479
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,108 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.mockito.Mock;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.verify;
+
+/**
+ * Test for File size count service.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService {
+  private UtilizationService utilizationService;
+  @Mock private FileCountBySizeDao fileCountBySizeDao;
+  private List resultList = new ArrayList<>();
+  private int oneKb = 1024;
+  private int maxBinSize = 41;
+
+  public void setUpResultList() {
+for(int i = 0; i < 41; i++){
+  resultList.add(new FileCountBySize((long) Math.pow(2, (10+i)), (long) 
i));
+}
+  }
+
+  @Test
+  public void testGetFileCounts() throws IOException {
+setUpResultList();
+
+utilizationService = mock(UtilizationService.class);
+when(utilizationService.getFileCounts()).thenCallRealMethod();
+when(utilizationService.getDao()).thenReturn(fileCountBySizeDao);
+when(fileCountBySizeDao.findAll()).thenReturn(resultList);
+
+utilizationService.getFileCounts();
+verify(utilizationService, times(1)).getFileCounts();
+verify(fileCountBySizeDao, times(1)).findAll();
+
+assertEquals(41, resultList.size());
+long fileSize = 4096L;
+int index =  findIndex(fileSize);
+long count = resultList.get(index).getCount();
+assertEquals(index, count);
+
+fileSize = 1125899906842624L;
+index = findIndex(fileSize);
+if (index == Integer.MIN_VALUE) {
+  throw new IOException("File Size larger than permissible file size");
+}
+
+fileSize = 1025L;
+index = findIndex(fileSize);
+count = resultList.get(index).getCount();
+assertEquals(index, count);
+
+fileSize = 25L;
+index = findIndex(fileSize);
+count = resultList.get(index).getCount();
+assertEquals(index, count);
+  }
+
+  public int findIndex(long dataSize) {
+int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
+if (logValue < 10) {
+  return 0;
+} else {
+  int index = logValue - 10;
+  if (index > maxBinSize) {
+return Integer.MIN_VALUE;
 
 Review comment:
   Sure.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290825)
Time Spent: 8.5h  (was: 8h 20m)

> Add ability in Recon to 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290229=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290229
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 05:33
Start Date: 07/Aug/19 05:33
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311368897
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290225
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 05:33
Start Date: 07/Aug/19 05:33
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311368484
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290228
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 05:33
Start Date: 07/Aug/19 05:33
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311374276
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -0,0 +1,129 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.OmMetadataManagerImpl;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.TypedTable;
+import org.junit.Test;
+
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+
+import static org.junit.Assert.assertEquals;
+
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.BDDMockito.given;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.times;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+
+/**
+ * Unit test for Container Key mapper task.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(OmKeyInfo.class)
+
+public class TestFileSizeCountTask {
+  @Test
+  public void testCalculateBinIndex() {
+FileSizeCountTask fileSizeCountTask = mock(FileSizeCountTask.class);
+
+when(fileSizeCountTask.getMaxFileSizeUpperBound()).
+thenReturn(1125899906842624L);// 1 PB
+when(fileSizeCountTask.getOneKB()).thenReturn(1024L);
+when(fileSizeCountTask.getMaxBinSize()).thenReturn(42);
+when(fileSizeCountTask.calculateBinIndex(anyLong())).thenCallRealMethod();
+
+long fileSize = 1024L;// 1 KB
+int binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(1, binIndex);
+
+fileSize = 1023L;
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(0, binIndex);
+
+fileSize = 562949953421312L;  // 512 TB
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 562949953421313L;  // (512 TB + 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 562949953421311L;  // (512 TB - 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(39, binIndex);
+
+fileSize = 1125899906842624L;  // 1 PB - last (extra) bin
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(41, binIndex);
+
+fileSize = 10L;
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(7, binIndex);
+
+fileSize = 1125899906842623L;
 
 Review comment:
   I suppose this is 1 PB - 1B. Can you add a comment for this one and the 
previous one as well?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290228)
Time Spent: 8h 10m  (was: 8h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290224=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290224
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 05:33
Start Date: 07/Aug/19 05:33
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311366032
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon-codegen/src/main/java/org/hadoop/ozone/recon/schema/UtilizationSchemaDefinition.java
 ##
 @@ -65,5 +69,12 @@ void createClusterGrowthTable(Connection conn) {
 .execute();
   }
 
-
+  void createFileSizeCount(Connection conn) {
+DSL.using(conn).createTableIfNotExists(FILE_COUNT_BY_SIZE_TABLE_NAME)
+.column("file_size_kb", SQLDataType.BIGINT)
 
 Review comment:
   Aren't we storing file size in bytes? Can we change this to just file_size?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290224)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290226
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 05:33
Start Date: 07/Aug/19 05:33
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311369007
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290223=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290223
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 05:33
Start Date: 07/Aug/19 05:33
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311369498
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290221=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290221
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 05:33
Start Date: 07/Aug/19 05:33
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311371872
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,108 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.mockito.Mock;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.verify;
+
+/**
+ * Test for File size count service.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService {
+  private UtilizationService utilizationService;
+  @Mock private FileCountBySizeDao fileCountBySizeDao;
+  private List resultList = new ArrayList<>();
+  private int oneKb = 1024;
+  private int maxBinSize = 41;
+
+  public void setUpResultList() {
+for(int i = 0; i < 41; i++){
+  resultList.add(new FileCountBySize((long) Math.pow(2, (10+i)), (long) 
i));
+}
+  }
+
+  @Test
+  public void testGetFileCounts() throws IOException {
+setUpResultList();
+
+utilizationService = mock(UtilizationService.class);
+when(utilizationService.getFileCounts()).thenCallRealMethod();
+when(utilizationService.getDao()).thenReturn(fileCountBySizeDao);
+when(fileCountBySizeDao.findAll()).thenReturn(resultList);
+
+utilizationService.getFileCounts();
+verify(utilizationService, times(1)).getFileCounts();
+verify(fileCountBySizeDao, times(1)).findAll();
+
+assertEquals(41, resultList.size());
+long fileSize = 4096L;
+int index =  findIndex(fileSize);
+long count = resultList.get(index).getCount();
+assertEquals(index, count);
+
+fileSize = 1125899906842624L;
+index = findIndex(fileSize);
+if (index == Integer.MIN_VALUE) {
 
 Review comment:
   This is not required
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290221)
Time Spent: 7.5h  (was: 7h 20m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290222=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290222
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 05:33
Start Date: 07/Aug/19 05:33
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311372538
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,108 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.mockito.Mock;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.verify;
+
+/**
+ * Test for File size count service.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService {
+  private UtilizationService utilizationService;
+  @Mock private FileCountBySizeDao fileCountBySizeDao;
+  private List resultList = new ArrayList<>();
+  private int oneKb = 1024;
+  private int maxBinSize = 41;
+
+  public void setUpResultList() {
+for(int i = 0; i < 41; i++){
+  resultList.add(new FileCountBySize((long) Math.pow(2, (10+i)), (long) 
i));
+}
+  }
+
+  @Test
+  public void testGetFileCounts() throws IOException {
+setUpResultList();
+
+utilizationService = mock(UtilizationService.class);
+when(utilizationService.getFileCounts()).thenCallRealMethod();
+when(utilizationService.getDao()).thenReturn(fileCountBySizeDao);
+when(fileCountBySizeDao.findAll()).thenReturn(resultList);
+
+utilizationService.getFileCounts();
+verify(utilizationService, times(1)).getFileCounts();
+verify(fileCountBySizeDao, times(1)).findAll();
+
+assertEquals(41, resultList.size());
+long fileSize = 4096L;
+int index =  findIndex(fileSize);
+long count = resultList.get(index).getCount();
+assertEquals(index, count);
+
+fileSize = 1125899906842624L;
+index = findIndex(fileSize);
+if (index == Integer.MIN_VALUE) {
+  throw new IOException("File Size larger than permissible file size");
+}
+
+fileSize = 1025L;
+index = findIndex(fileSize);
+count = resultList.get(index).getCount();
+assertEquals(index, count);
+
+fileSize = 25L;
+index = findIndex(fileSize);
+count = resultList.get(index).getCount();
+assertEquals(index, count);
+  }
+
+  public int findIndex(long dataSize) {
+int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
+if (logValue < 10) {
+  return 0;
+} else {
+  int index = logValue - 10;
+  if (index > maxBinSize) {
+return Integer.MIN_VALUE;
 
 Review comment:
   This needs to be updated.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290222)
Time Spent: 7h 40m  (was: 7.5h)

> Add 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290227
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 05:33
Start Date: 07/Aug/19 05:33
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311371119
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290156=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290156
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 01:55
Start Date: 07/Aug/19 01:55
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311341219
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
   }
 
-  private void countFileSize(OmKeyInfo omKeyInfo) throws IOException{
-int index = calcBinIndex(omKeyInfo.getDataSize());
-if(index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size "
-  + maxFileSizeUpperBound +" bytes");
+  void countFileSize(OmKeyInfo omKeyInfo) {
+int index;
+if (omKeyInfo.getDataSize() >= maxFileSizeUpperBound) {
+  index = maxBinSize - 1;
+} else {
+  index = calculateBinIndex(omKeyInfo.getDataSize());
 }
 upperBoundCount[index]++;
   }
 
-  private void populateFileCountBySizeDB() {
+  /**
+   * Populate DB with the counts of file sizes calculated
+   * using the dao.
+   *
+   */
+  void populateFileCountBySizeDB() {
 for (int i = 0; i < upperBoundCount.length; i++) {
   long fileSizeUpperBound = (long) Math.pow(2, (10 + i));
   FileCountBySize fileCountRecord =
   fileCountBySizeDao.findById(fileSizeUpperBound);
   FileCountBySize newRecord = new
   FileCountBySize(fileSizeUpperBound, upperBoundCount[i]);
-  if(fileCountRecord == null){
+  if (fileCountRecord == null) {
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290156)
Time Spent: 7h 20m  (was: 7h 10m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290153
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 07/Aug/19 01:53
Start Date: 07/Aug/19 01:53
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311308069
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
   }
 
-  private void countFileSize(OmKeyInfo omKeyInfo) throws IOException{
-int index = calcBinIndex(omKeyInfo.getDataSize());
-if(index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size "
-  + maxFileSizeUpperBound +" bytes");
+  void countFileSize(OmKeyInfo omKeyInfo) {
+int index;
+if (omKeyInfo.getDataSize() >= maxFileSizeUpperBound) {
+  index = maxBinSize - 1;
+} else {
+  index = calculateBinIndex(omKeyInfo.getDataSize());
 }
 upperBoundCount[index]++;
   }
 
-  private void populateFileCountBySizeDB() {
+  /**
+   * Populate DB with the counts of file sizes calculated
+   * using the dao.
+   *
+   */
+  void populateFileCountBySizeDB() {
 for (int i = 0; i < upperBoundCount.length; i++) {
   long fileSizeUpperBound = (long) Math.pow(2, (10 + i));
   FileCountBySize fileCountRecord =
   fileCountBySizeDao.findById(fileSizeUpperBound);
   FileCountBySize newRecord = new
   FileCountBySize(fileSizeUpperBound, upperBoundCount[i]);
-  if(fileCountRecord == null){
+  if (fileCountRecord == null) {
 fileCountBySizeDao.insert(newRecord);
-  } else{
+  } else {
 fileCountBySizeDao.update(newRecord);
   }
 }
   }
 
   private void updateUpperBoundCount(OmKeyInfo value, String operation)
   throws IOException {
-int binIndex = calcBinIndex(value.getDataSize());
-if(binIndex == Integer.MIN_VALUE) {
+int binIndex = calculateBinIndex(value.getDataSize());
+if (binIndex == Integer.MIN_VALUE) {
 
 Review comment:
   Yes, it was from a previous check where there was an exception for fileSize 
> permitted value of 1 PB. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290153)
Time Spent: 7h 10m  (was: 7h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Ozone 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290046
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 06/Aug/19 22:57
Start Date: 06/Aug/19 22:57
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311309493
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
   }
 
-  private void countFileSize(OmKeyInfo omKeyInfo) throws IOException{
-int index = calcBinIndex(omKeyInfo.getDataSize());
-if(index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size "
-  + maxFileSizeUpperBound +" bytes");
+  void countFileSize(OmKeyInfo omKeyInfo) {
+int index;
+if (omKeyInfo.getDataSize() >= maxFileSizeUpperBound) {
+  index = maxBinSize - 1;
+} else {
+  index = calculateBinIndex(omKeyInfo.getDataSize());
 }
 upperBoundCount[index]++;
   }
 
-  private void populateFileCountBySizeDB() {
+  /**
+   * Populate DB with the counts of file sizes calculated
+   * using the dao.
+   *
+   */
+  void populateFileCountBySizeDB() {
 for (int i = 0; i < upperBoundCount.length; i++) {
   long fileSizeUpperBound = (long) Math.pow(2, (10 + i));
   FileCountBySize fileCountRecord =
   fileCountBySizeDao.findById(fileSizeUpperBound);
   FileCountBySize newRecord = new
   FileCountBySize(fileSizeUpperBound, upperBoundCount[i]);
-  if(fileCountRecord == null){
+  if (fileCountRecord == null) {
 
 Review comment:
   Yes, it should be `LONG.MAX_VALUE`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290046)
Time Spent: 7h  (was: 6h 50m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org


[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290044=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290044
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 06/Aug/19 22:55
Start Date: 06/Aug/19 22:55
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311308884
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
   }
 
-  private void countFileSize(OmKeyInfo omKeyInfo) throws IOException{
-int index = calcBinIndex(omKeyInfo.getDataSize());
-if(index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size "
-  + maxFileSizeUpperBound +" bytes");
+  void countFileSize(OmKeyInfo omKeyInfo) {
+int index;
+if (omKeyInfo.getDataSize() >= maxFileSizeUpperBound) {
+  index = maxBinSize - 1;
+} else {
+  index = calculateBinIndex(omKeyInfo.getDataSize());
 }
 upperBoundCount[index]++;
   }
 
-  private void populateFileCountBySizeDB() {
+  /**
+   * Populate DB with the counts of file sizes calculated
+   * using the dao.
+   *
+   */
+  void populateFileCountBySizeDB() {
 for (int i = 0; i < upperBoundCount.length; i++) {
   long fileSizeUpperBound = (long) Math.pow(2, (10 + i));
   FileCountBySize fileCountRecord =
   fileCountBySizeDao.findById(fileSizeUpperBound);
   FileCountBySize newRecord = new
   FileCountBySize(fileSizeUpperBound, upperBoundCount[i]);
-  if(fileCountRecord == null){
+  if (fileCountRecord == null) {
 
 Review comment:
   Sure, it is an extra bin to add files > maxFileSizeUpperBound. 
   Also, did you mean LONG.MAX_VALUE? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290044)
Time Spent: 6h 50m  (was: 6h 40m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290043
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 06/Aug/19 22:52
Start Date: 06/Aug/19 22:52
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311308096
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
 
 Review comment:
   Sure.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290043)
Time Spent: 6h 40m  (was: 6.5h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290042=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290042
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 06/Aug/19 22:52
Start Date: 06/Aug/19 22:52
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311308069
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
   }
 
-  private void countFileSize(OmKeyInfo omKeyInfo) throws IOException{
-int index = calcBinIndex(omKeyInfo.getDataSize());
-if(index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size "
-  + maxFileSizeUpperBound +" bytes");
+  void countFileSize(OmKeyInfo omKeyInfo) {
+int index;
+if (omKeyInfo.getDataSize() >= maxFileSizeUpperBound) {
+  index = maxBinSize - 1;
+} else {
+  index = calculateBinIndex(omKeyInfo.getDataSize());
 }
 upperBoundCount[index]++;
   }
 
-  private void populateFileCountBySizeDB() {
+  /**
+   * Populate DB with the counts of file sizes calculated
+   * using the dao.
+   *
+   */
+  void populateFileCountBySizeDB() {
 for (int i = 0; i < upperBoundCount.length; i++) {
   long fileSizeUpperBound = (long) Math.pow(2, (10 + i));
   FileCountBySize fileCountRecord =
   fileCountBySizeDao.findById(fileSizeUpperBound);
   FileCountBySize newRecord = new
   FileCountBySize(fileSizeUpperBound, upperBoundCount[i]);
-  if(fileCountRecord == null){
+  if (fileCountRecord == null) {
 fileCountBySizeDao.insert(newRecord);
-  } else{
+  } else {
 fileCountBySizeDao.update(newRecord);
   }
 }
   }
 
   private void updateUpperBoundCount(OmKeyInfo value, String operation)
   throws IOException {
-int binIndex = calcBinIndex(value.getDataSize());
-if(binIndex == Integer.MIN_VALUE) {
+int binIndex = calculateBinIndex(value.getDataSize());
+if (binIndex == Integer.MIN_VALUE) {
 
 Review comment:
   Yes, it was from a previous check where there was an exception for fileSize 
> permitted value of 1 B. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290042)
Time Spent: 6.5h  (was: 6h 20m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Ozone 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290038
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 06/Aug/19 22:39
Start Date: 06/Aug/19 22:39
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311293167
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -65,6 +64,23 @@ public FileSizeCountTask(OMMetadataManager 
omMetadataManager,
 } catch (Exception e) {
   LOG.error("Unable to fetch Key Table updates ", e);
 }
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return ONE_KB;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
 
 Review comment:
   Can we change this method to take `fileSize` as an argument? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290038)
Time Spent: 6h 20m  (was: 6h 10m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290034
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 06/Aug/19 22:36
Start Date: 06/Aug/19 22:36
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311293167
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -65,6 +64,23 @@ public FileSizeCountTask(OMMetadataManager 
omMetadataManager,
 } catch (Exception e) {
   LOG.error("Unable to fetch Key Table updates ", e);
 }
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return ONE_KB;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
 
 Review comment:
   Can we make this method to take `fileSize` as an argument? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290034)
Time Spent: 5h 40m  (was: 5.5h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290037
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 06/Aug/19 22:36
Start Date: 06/Aug/19 22:36
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311301668
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
   }
 
-  private void countFileSize(OmKeyInfo omKeyInfo) throws IOException{
-int index = calcBinIndex(omKeyInfo.getDataSize());
-if(index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size "
-  + maxFileSizeUpperBound +" bytes");
+  void countFileSize(OmKeyInfo omKeyInfo) {
+int index;
+if (omKeyInfo.getDataSize() >= maxFileSizeUpperBound) {
+  index = maxBinSize - 1;
+} else {
+  index = calculateBinIndex(omKeyInfo.getDataSize());
 }
 upperBoundCount[index]++;
   }
 
-  private void populateFileCountBySizeDB() {
+  /**
+   * Populate DB with the counts of file sizes calculated
+   * using the dao.
+   *
+   */
+  void populateFileCountBySizeDB() {
 for (int i = 0; i < upperBoundCount.length; i++) {
   long fileSizeUpperBound = (long) Math.pow(2, (10 + i));
   FileCountBySize fileCountRecord =
   fileCountBySizeDao.findById(fileSizeUpperBound);
   FileCountBySize newRecord = new
   FileCountBySize(fileSizeUpperBound, upperBoundCount[i]);
-  if(fileCountRecord == null){
+  if (fileCountRecord == null) {
 fileCountBySizeDao.insert(newRecord);
-  } else{
+  } else {
 fileCountBySizeDao.update(newRecord);
   }
 }
   }
 
   private void updateUpperBoundCount(OmKeyInfo value, String operation)
   throws IOException {
-int binIndex = calcBinIndex(value.getDataSize());
-if(binIndex == Integer.MIN_VALUE) {
+int binIndex = calculateBinIndex(value.getDataSize());
+if (binIndex == Integer.MIN_VALUE) {
 
 Review comment:
   This is not required as `calculateBinIndex` will never return 
`Integer.MIN_VALUE`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290037)
Time Spent: 6h 10m  (was: 6h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Ozone users may want to track 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290035
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 06/Aug/19 22:36
Start Date: 06/Aug/19 22:36
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311304029
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
   }
 
-  private void countFileSize(OmKeyInfo omKeyInfo) throws IOException{
-int index = calcBinIndex(omKeyInfo.getDataSize());
-if(index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size "
-  + maxFileSizeUpperBound +" bytes");
+  void countFileSize(OmKeyInfo omKeyInfo) {
+int index;
+if (omKeyInfo.getDataSize() >= maxFileSizeUpperBound) {
+  index = maxBinSize - 1;
+} else {
+  index = calculateBinIndex(omKeyInfo.getDataSize());
 }
 upperBoundCount[index]++;
   }
 
-  private void populateFileCountBySizeDB() {
+  /**
+   * Populate DB with the counts of file sizes calculated
+   * using the dao.
+   *
+   */
+  void populateFileCountBySizeDB() {
 for (int i = 0; i < upperBoundCount.length; i++) {
   long fileSizeUpperBound = (long) Math.pow(2, (10 + i));
   FileCountBySize fileCountRecord =
   fileCountBySizeDao.findById(fileSizeUpperBound);
   FileCountBySize newRecord = new
   FileCountBySize(fileSizeUpperBound, upperBoundCount[i]);
-  if(fileCountRecord == null){
+  if (fileCountRecord == null) {
 
 Review comment:
   From the logic to calculate bins, the last bin covers the total count of all 
the files > maxFileSizeUpperBound. But, while writing to DB, the last bin's key 
is written as `maxFileSizeUpperBound^2`. In this case, the last bin upper bound 
will be written as 2PB which is wrong. Can we change this logic to have last 
bin as `Integer.MAX_VALUE`? And can you verify this in the unit test for API 
response as well?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290035)
Time Spent: 5h 50m  (was: 5h 40m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290036=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290036
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 06/Aug/19 22:36
Start Date: 06/Aug/19 22:36
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311301854
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
 
 Review comment:
   Can you add a comment as to why we need to subtract this value by 10?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290036)
Time Spent: 6h  (was: 5h 50m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=290013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290013
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 06/Aug/19 21:39
Start Date: 06/Aug/19 21:39
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311286958
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,254 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount = new long[maxBinSize];
+  private long ONE_KB = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  protected long getOneKB() {
+return ONE_KB;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+fetchUpperBoundCount("reprocess");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+} finally {
+  populateFileCountBySizeDB();
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  void setMaxBinSize() {
+maxBinSize = (int)(long) (Math.log(getMaxFileSizeUpperBound())
+/Math.log(2)) - 10;
+maxBinSize += 2;  // extra bin to add files > 1PB.
+  }
+
+  void fetchUpperBoundCount(String type) {
+setMaxBinSize();
+if (type.equals("process")) {
+  //update array with file size count from DB
+  List resultSet = fileCountBySizeDao.findAll();
+  int index = 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=289221=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-289221
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 05/Aug/19 21:48
Start Date: 05/Aug/19 21:48
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r310806041
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,254 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount = new long[maxBinSize];
+  private long ONE_KB = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  protected long getOneKB() {
+return ONE_KB;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+fetchUpperBoundCount("reprocess");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+} finally {
+  populateFileCountBySizeDB();
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  void setMaxBinSize() {
+maxBinSize = (int)(long) (Math.log(getMaxFileSizeUpperBound())
+/Math.log(2)) - 10;
+maxBinSize += 2;  // extra bin to add files > 1PB.
+  }
+
+  void fetchUpperBoundCount(String type) {
+setMaxBinSize();
+if (type.equals("process")) {
+  //update array with file size count from DB
+  List resultSet = fileCountBySizeDao.findAll();
+  int index = 0;
+ 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=289219=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-289219
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 05/Aug/19 21:45
Start Date: 05/Aug/19 21:45
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r310805057
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,254 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount = new long[maxBinSize];
+  private long ONE_KB = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  protected long getOneKB() {
+return ONE_KB;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+fetchUpperBoundCount("reprocess");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+} finally {
+  populateFileCountBySizeDB();
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  void setMaxBinSize() {
+maxBinSize = (int)(long) (Math.log(getMaxFileSizeUpperBound())
+/Math.log(2)) - 10;
+maxBinSize += 2;  // extra bin to add files > 1PB.
+  }
+
+  void fetchUpperBoundCount(String type) {
+setMaxBinSize();
+if (type.equals("process")) {
+  //update array with file size count from DB
+  List resultSet = fileCountBySizeDao.findAll();
+  int index = 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=289197=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-289197
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 05/Aug/19 20:59
Start Date: 05/Aug/19 20:59
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r310789626
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,254 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount = new long[maxBinSize];
+  private long ONE_KB = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  protected long getOneKB() {
+return ONE_KB;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+fetchUpperBoundCount("reprocess");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+} finally {
+  populateFileCountBySizeDB();
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  void setMaxBinSize() {
+maxBinSize = (int)(long) (Math.log(getMaxFileSizeUpperBound())
+/Math.log(2)) - 10;
+maxBinSize += 2;  // extra bin to add files > 1PB.
+  }
+
+  void fetchUpperBoundCount(String type) {
+setMaxBinSize();
+if (type.equals("process")) {
+  //update array with file size count from DB
+  List resultSet = fileCountBySizeDao.findAll();
+  int index = 0;
+ 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-08-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=287437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-287437
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 02/Aug/19 01:42
Start Date: 02/Aug/19 01:42
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#issuecomment-517515593
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 89 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 4 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 13 | Maven dependency ordering for branch |
   | +1 | mvninstall | 627 | trunk passed |
   | +1 | compile | 372 | trunk passed |
   | +1 | checkstyle | 72 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 955 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 189 | trunk passed |
   | 0 | spotbugs | 497 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 722 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 35 | Maven dependency ordering for patch |
   | +1 | mvninstall | 592 | the patch passed |
   | +1 | compile | 396 | the patch passed |
   | +1 | javac | 396 | the patch passed |
   | -0 | checkstyle | 38 | hadoop-ozone: The patch generated 3 new + 0 
unchanged - 0 fixed = 3 total (was 0) |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 750 | patch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 94 | hadoop-ozone generated 7 new + 13 unchanged - 0 fixed 
= 20 total (was 13) |
   | +1 | findbugs | 764 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 349 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2415 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 46 | The patch does not generate ASF License warnings. |
   | | | 8789 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.om.TestSecureOzoneManager |
   |   | hadoop.ozone.client.rpc.TestContainerStateMachineFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.om.TestScmSafeMode |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.client.rpc.TestContainerStateMachine |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.7 Server=18.09.7 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1146 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux c60914aaf75c 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / d086d05 |
   | Default Java | 1.8.0_212 |
   | checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/2/artifact/out/diff-checkstyle-hadoop-ozone.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/2/artifact/out/diff-javadoc-javadoc-hadoop-ozone.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/2/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/2/testReport/ |
   | Max. process+thread count | 5321 (vs. ulimit of 5500) |
   | modules | C: hadoop-ozone/ozone-recon-codegen hadoop-ozone/ozone-recon U: 
hadoop-ozone |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/2/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=286289=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-286289
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 01/Aug/19 00:00
Start Date: 01/Aug/19 00:00
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r309410908
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,231 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
 
 Review comment:
   Can we change these ranges to (2KB, 4KB, 8KB, 16KB, ...1PB) ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 286289)
Time Spent: 4h 40m  (was: 4.5h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=286178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-286178
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 31/Jul/19 20:08
Start Date: 31/Jul/19 20:08
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r309408238
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,231 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = 41;
+  private long maxFileSizeUpperBound = 1125899906842624L;
+  private long SIZE_512_TB = 562949953421312L;
+  private long[] upperBoundCount = new long[maxBinSize];
+  private long ONE_KB = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+fetchUpperBoundCount("reprocess");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+} finally {
+  populateFileCountBySizeDB();
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  private void fetchUpperBoundCount(String type) {
+if(type.equals("process")) {
+  List resultSet = fileCountBySizeDao.findAll();
+  int index = 0;
+  if(resultSet != null) {
+for (FileCountBySize row : resultSet) {
+  upperBoundCount[index] = row.getCount();
+  index++;
+}
+  }
+} else {
+  upperBoundCount = new long[maxBinSize];
+}
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=286094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-286094
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 31/Jul/19 17:47
Start Date: 31/Jul/19 17:47
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r309349175
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,231 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = 41;
+  private long maxFileSizeUpperBound = 1125899906842624L;
+  private long SIZE_512_TB = 562949953421312L;
+  private long[] upperBoundCount = new long[maxBinSize];
+  private long ONE_KB = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+fetchUpperBoundCount("reprocess");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+} finally {
+  populateFileCountBySizeDB();
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  private void fetchUpperBoundCount(String type) {
+if(type.equals("process")) {
+  List resultSet = fileCountBySizeDao.findAll();
+  int index = 0;
+  if(resultSet != null) {
+for (FileCountBySize row : resultSet) {
+  upperBoundCount[index] = row.getCount();
+  index++;
+}
+  }
+} else {
+  upperBoundCount = new long[maxBinSize];
+}
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=286092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-286092
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 31/Jul/19 17:45
Start Date: 31/Jul/19 17:45
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r309347235
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,231 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = 41;
+  private long maxFileSizeUpperBound = 1125899906842624L;
+  private long SIZE_512_TB = 562949953421312L;
+  private long[] upperBoundCount = new long[maxBinSize];
+  private long ONE_KB = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+fetchUpperBoundCount("reprocess");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+} finally {
+  populateFileCountBySizeDB();
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  private void fetchUpperBoundCount(String type) {
+if(type.equals("process")) {
+  List resultSet = fileCountBySizeDao.findAll();
+  int index = 0;
+  if(resultSet != null) {
+for (FileCountBySize row : resultSet) {
+  upperBoundCount[index] = row.getCount();
+  index++;
+}
+  }
+} else {
+  upperBoundCount = new long[maxBinSize];
+}
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=286091=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-286091
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 31/Jul/19 17:44
Start Date: 31/Jul/19 17:44
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r309349175
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,231 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = 41;
+  private long maxFileSizeUpperBound = 1125899906842624L;
+  private long SIZE_512_TB = 562949953421312L;
+  private long[] upperBoundCount = new long[maxBinSize];
+  private long ONE_KB = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+fetchUpperBoundCount("reprocess");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+} finally {
+  populateFileCountBySizeDB();
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  private void fetchUpperBoundCount(String type) {
+if(type.equals("process")) {
+  List resultSet = fileCountBySizeDao.findAll();
+  int index = 0;
+  if(resultSet != null) {
+for (FileCountBySize row : resultSet) {
+  upperBoundCount[index] = row.getCount();
+  index++;
+}
+  }
+} else {
+  upperBoundCount = new long[maxBinSize];
+}
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=286090=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-286090
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 31/Jul/19 17:40
Start Date: 31/Jul/19 17:40
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r309347235
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,231 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = 41;
+  private long maxFileSizeUpperBound = 1125899906842624L;
+  private long SIZE_512_TB = 562949953421312L;
+  private long[] upperBoundCount = new long[maxBinSize];
+  private long ONE_KB = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+fetchUpperBoundCount("reprocess");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+} finally {
+  populateFileCountBySizeDB();
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  private void fetchUpperBoundCount(String type) {
+if(type.equals("process")) {
+  List resultSet = fileCountBySizeDao.findAll();
+  int index = 0;
+  if(resultSet != null) {
+for (FileCountBySize row : resultSet) {
+  upperBoundCount[index] = row.getCount();
+  index++;
+}
+  }
+} else {
+  upperBoundCount = new long[maxBinSize];
+}
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=284474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-284474
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 29/Jul/19 18:53
Start Date: 29/Jul/19 18:53
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on pull request #1146: HDDS-1366. 
Add ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r308384358
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private long[] upperBoundCount = new long[16];
+  private long minFileSize = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager, Configuration 
sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+  populateFileCountBySizeDB();
+
+} catch (IOException ioEx) {
+LOG.error("Unable to populate Container Key Prefix data in Recon DB. ", 
ioEx);
+return new ImmutablePair<>(getTaskName(), false);
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+LOG.info("Starting a 'process' run of FileSizeCountTask.");
+Iterator eventIterator = events.getIterator();
+while (eventIterator.hasNext()) {
+  OMDBUpdateEvent omdbUpdateEvent = 
eventIterator.next();
+  String updatedKey = omdbUpdateEvent.getKey();
+  OmKeyInfo updatedValue = omdbUpdateEvent.getValue();
+
+  try{
+switch (omdbUpdateEvent.getAction()) {
+  case PUT:
+updateCountForKey(updatedValue, 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=284475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-284475
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 29/Jul/19 18:53
Start Date: 29/Jul/19 18:53
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on pull request #1146: HDDS-1366. 
Add ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r308378424
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/api/UtilizationService.java
 ##
 @@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import javax.inject.Inject;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.ws.rs.GET;
+import javax.ws.rs.Path;
+import javax.ws.rs.Produces;
+import javax.ws.rs.core.MediaType;
+import javax.ws.rs.core.Response;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Endpoint for querying the counts of a certain file Size.
+ */
+@Path("/utilization")
+@Produces(MediaType.APPLICATION_JSON)
+public class UtilizationService {
+ private static final Logger LOG =
+ LoggerFactory.getLogger(UtilizationService.class);
+
+ @Inject
+  private Configuration sqlConfiguration ;
+
+  /**
+   * Return the file counts from Recon DB.
+   * @return {@link Response}
+   */
+  @GET
+  @Path("/fileCount")
+  public Response getFileCounts() {
+FileCountBySizeDao fileCountBySizeDao = new 
FileCountBySizeDao(sqlConfiguration);
+List resultSet = fileCountBySizeDao.findAll();
+
+Map fileSizeCountResponseMap = new LinkedHashMap<>();
 
 Review comment:
   Nit. We are relying on table row order here. If we want to make sure we 
always return the data in sorted size order, a TreeMap may be better.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 284475)
Time Spent: 3h 40m  (was: 3.5h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=283325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283325
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 26/Jul/19 12:54
Start Date: 26/Jul/19 12:54
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#issuecomment-515439202
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 48 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 4 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 17 | Maven dependency ordering for branch |
   | +1 | mvninstall | 600 | trunk passed |
   | +1 | compile | 381 | trunk passed |
   | +1 | checkstyle | 79 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 884 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 169 | trunk passed |
   | 0 | spotbugs | 419 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 613 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 26 | Maven dependency ordering for patch |
   | +1 | mvninstall | 562 | the patch passed |
   | +1 | compile | 373 | the patch passed |
   | +1 | javac | 373 | the patch passed |
   | -0 | checkstyle | 46 | hadoop-ozone: The patch generated 42 new + 0 
unchanged - 0 fixed = 42 total (was 0) |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 680 | patch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 92 | hadoop-ozone generated 5 new + 13 unchanged - 0 fixed 
= 18 total (was 13) |
   | +1 | findbugs | 636 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 281 | hadoop-hdds in the patch failed. |
   | -1 | unit | 1619 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 54 | The patch does not generate ASF License warnings. |
   | | | 7446 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdds.scm.pipeline.TestSCMPipelineManager |
   |   | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient |
   |   | hadoop.ozone.om.TestOzoneManagerHA |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.om.TestScmSafeMode |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.client.rpc.TestCommitWatcher |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1146 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 6a0cff69b35c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / aebac6d |
   | Default Java | 1.8.0_212 |
   | checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/1/artifact/out/diff-checkstyle-hadoop-ozone.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/1/artifact/out/diff-javadoc-javadoc-hadoop-ozone.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/1/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/1/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/1/testReport/ |
   | Max. process+thread count | 4384 (vs. ulimit of 5500) |
   | modules | C: hadoop-ozone/ozone-recon-codegen hadoop-ozone/ozone-recon U: 
hadoop-ozone |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1146/1/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=283151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283151
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 26/Jul/19 06:27
Start Date: 26/Jul/19 06:27
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r307603546
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -0,0 +1,311 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Injector;
+import org.apache.hadoop.hdds.client.BlockID;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
+import org.apache.hadoop.hdds.scm.pipeline.Pipeline;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup;
+import org.apache.hadoop.ozone.recon.AbstractOMMetadataManagerTest;
+import org.apache.hadoop.ozone.recon.GuiceInjectorUtilsForTestsImpl;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl;
+import org.apache.hadoop.utils.db.Table;
+import org.hadoop.ozone.recon.schema.UtilizationSchemaDefinition;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.jooq.impl.DSL;
+import org.jooq.impl.DefaultConfiguration;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import javax.sql.DataSource;
+import java.io.IOException;
+import java.util.*;
+
+import static org.junit.Assert.*;
+
+/**
+ * Unit test for Container Key mapper task.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
 
 Review comment:
   Nothing yet
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283151)
Time Spent: 3h 10m  (was: 3h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281441
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 24/Jul/19 02:37
Start Date: 24/Jul/19 02:37
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306606697
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private long[] upperBoundCount = new long[16];
+  private long minFileSize = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager, Configuration 
sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+  populateFileCountBySizeDB();
+
+} catch (IOException ioEx) {
+LOG.error("Unable to populate Container Key Prefix data in Recon DB. ", 
ioEx);
+return new ImmutablePair<>(getTaskName(), false);
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+LOG.info("Starting a 'process' run of FileSizeCountTask.");
+Iterator eventIterator = events.getIterator();
+while (eventIterator.hasNext()) {
+  OMDBUpdateEvent omdbUpdateEvent = 
eventIterator.next();
+  String updatedKey = omdbUpdateEvent.getKey();
+  OmKeyInfo updatedValue = omdbUpdateEvent.getValue();
+
+  try{
+switch (omdbUpdateEvent.getAction()) {
+  case PUT:
+updateCountForKey(updatedValue, "PUT");
+  

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281440
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 24/Jul/19 02:35
Start Date: 24/Jul/19 02:35
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306606413
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,177 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import com.google.inject.AbstractModule;
+import com.google.inject.Injector;
+import org.apache.hadoop.hdds.client.BlockID;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.scm.pipeline.Pipeline;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup;
+import org.apache.hadoop.ozone.recon.AbstractOMMetadataManagerTest;
+import org.apache.hadoop.ozone.recon.GuiceInjectorUtilsForTestsImpl;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl;
+import org.apache.hadoop.ozone.recon.tasks.FileSizeCountTask;
+import org.hadoop.ozone.recon.schema.UtilizationSchemaDefinition;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import javax.ws.rs.core.Response;
+import java.util.*;
+
+import static org.junit.Assert.assertEquals;
+
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService extends AbstractOMMetadataManagerTest {
+
+  @Rule
+  public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+  private Injector injector;
+  private OzoneManagerServiceProviderImpl ozoneManagerServiceProvider;  
//should we use interface?
+  private OMMetadataManager omMetadataManager;
+  private GuiceInjectorUtilsForTestsImpl guiceInjectorTest =
+  new GuiceInjectorUtilsForTestsImpl();
+  private boolean isSetupDone = false;
+  private UtilizationService utilizationService;
+
+  private Injector getInjector() {
+return injector;
+  }
+  private Configuration sqlConfiguration;
+
+  private void initializeInjector() throws Exception {
+omMetadataManager = initializeNewOmMetadataManager();
+OzoneConfiguration configuration =
+guiceInjectorTest.getTestOzoneConfiguration(temporaryFolder);
+
+ozoneManagerServiceProvider = new OzoneManagerServiceProviderImpl(
+configuration);
+ReconOMMetadataManager reconOMMetadataManager =
+getTestMetadataManager(omMetadataManager);
+
+Injector parentInjector = guiceInjectorTest.getInjector(
+ozoneManagerServiceProvider, reconOMMetadataManager, temporaryFolder);
+
+injector = parentInjector.createChildInjector(new AbstractModule() {
+  @Override
+  protected void configure() {
+utilizationService = new UtilizationService();
+bind(UtilizationService.class).toInstance(utilizationService);
+  }
+});
+  }
+
+  @Before
+  public void setUp() throws Exception {
+
+if (!isSetupDone) {
 
 Review comment:
   Well, bad design is bad design :-) static methods are generally bad for 
testing and need reliance on PowerMock and Mockito like frameworks which allow 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281374
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:49
Start Date: 23/Jul/19 22:49
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306565326
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/api/ContainerKeyService.java
 ##
 @@ -38,6 +37,7 @@
 import javax.ws.rs.core.MediaType;
 import javax.ws.rs.core.Response;
 
+import javax.inject.Inject;
 
 Review comment:
   Fair enough!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 281374)
Time Spent: 2h 40m  (was: 2.5h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281373
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:48
Start Date: 23/Jul/19 22:48
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306565085
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,177 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import com.google.inject.AbstractModule;
+import com.google.inject.Injector;
+import org.apache.hadoop.hdds.client.BlockID;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.scm.pipeline.Pipeline;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup;
+import org.apache.hadoop.ozone.recon.AbstractOMMetadataManagerTest;
+import org.apache.hadoop.ozone.recon.GuiceInjectorUtilsForTestsImpl;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl;
+import org.apache.hadoop.ozone.recon.tasks.FileSizeCountTask;
+import org.hadoop.ozone.recon.schema.UtilizationSchemaDefinition;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import javax.ws.rs.core.Response;
+import java.util.*;
+
+import static org.junit.Assert.assertEquals;
+
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService extends AbstractOMMetadataManagerTest {
+
+  @Rule
+  public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+  private Injector injector;
+  private OzoneManagerServiceProviderImpl ozoneManagerServiceProvider;  
//should we use interface?
+  private OMMetadataManager omMetadataManager;
+  private GuiceInjectorUtilsForTestsImpl guiceInjectorTest =
+  new GuiceInjectorUtilsForTestsImpl();
+  private boolean isSetupDone = false;
+  private UtilizationService utilizationService;
+
+  private Injector getInjector() {
+return injector;
+  }
+  private Configuration sqlConfiguration;
+
+  private void initializeInjector() throws Exception {
+omMetadataManager = initializeNewOmMetadataManager();
+OzoneConfiguration configuration =
+guiceInjectorTest.getTestOzoneConfiguration(temporaryFolder);
+
+ozoneManagerServiceProvider = new OzoneManagerServiceProviderImpl(
+configuration);
+ReconOMMetadataManager reconOMMetadataManager =
+getTestMetadataManager(omMetadataManager);
+
+Injector parentInjector = guiceInjectorTest.getInjector(
+ozoneManagerServiceProvider, reconOMMetadataManager, temporaryFolder);
+
+injector = parentInjector.createChildInjector(new AbstractModule() {
+  @Override
+  protected void configure() {
+utilizationService = new UtilizationService();
+bind(UtilizationService.class).toInstance(utilizationService);
+  }
+});
+  }
+
+  @Before
+  public void setUp() throws Exception {
+
+if (!isSetupDone) {
 
 Review comment:
   This is because power mock runner does not apply JUnit class rules. 
https://github.com/apache/hadoop/pull/1055#discussion_r301358174 
 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281372
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:46
Start Date: 23/Jul/19 22:46
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306564640
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/api/ContainerKeyService.java
 ##
 @@ -38,6 +37,7 @@
 import javax.ws.rs.core.MediaType;
 import javax.ws.rs.core.Response;
 
+import javax.inject.Inject;
 
 Review comment:
   We use javax inject for servlets and bridge them with guice bindings here - 
https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/ReconRestServletModule.java#L110
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 281372)
Time Spent: 2h 20m  (was: 2h 10m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281371
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:43
Start Date: 23/Jul/19 22:43
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306563171
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,177 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import com.google.inject.AbstractModule;
+import com.google.inject.Injector;
+import org.apache.hadoop.hdds.client.BlockID;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.scm.pipeline.Pipeline;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup;
+import org.apache.hadoop.ozone.recon.AbstractOMMetadataManagerTest;
+import org.apache.hadoop.ozone.recon.GuiceInjectorUtilsForTestsImpl;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl;
+import org.apache.hadoop.ozone.recon.tasks.FileSizeCountTask;
+import org.hadoop.ozone.recon.schema.UtilizationSchemaDefinition;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import javax.ws.rs.core.Response;
+import java.util.*;
+
+import static org.junit.Assert.assertEquals;
+
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService extends AbstractOMMetadataManagerTest {
+
+  @Rule
+  public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+  private Injector injector;
+  private OzoneManagerServiceProviderImpl ozoneManagerServiceProvider;  
//should we use interface?
 
 Review comment:
   Do we need this comment?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 281371)
Time Spent: 2h 10m  (was: 2h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281370
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:43
Start Date: 23/Jul/19 22:43
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306562245
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private long[] upperBoundCount = new long[16];
+  private long minFileSize = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager, Configuration 
sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+  populateFileCountBySizeDB();
+
+} catch (IOException ioEx) {
+LOG.error("Unable to populate Container Key Prefix data in Recon DB. ", 
ioEx);
+return new ImmutablePair<>(getTaskName(), false);
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+LOG.info("Starting a 'process' run of FileSizeCountTask.");
+Iterator eventIterator = events.getIterator();
+while (eventIterator.hasNext()) {
+  OMDBUpdateEvent omdbUpdateEvent = 
eventIterator.next();
+  String updatedKey = omdbUpdateEvent.getKey();
+  OmKeyInfo updatedValue = omdbUpdateEvent.getValue();
+
+  try{
+switch (omdbUpdateEvent.getAction()) {
+  case PUT:
+updateCountForKey(updatedValue, 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281369
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:43
Start Date: 23/Jul/19 22:43
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306562822
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private long[] upperBoundCount = new long[16];
+  private long minFileSize = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager, Configuration 
sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+  populateFileCountBySizeDB();
+
+} catch (IOException ioEx) {
+LOG.error("Unable to populate Container Key Prefix data in Recon DB. ", 
ioEx);
+return new ImmutablePair<>(getTaskName(), false);
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+LOG.info("Starting a 'process' run of FileSizeCountTask.");
+Iterator eventIterator = events.getIterator();
+while (eventIterator.hasNext()) {
+  OMDBUpdateEvent omdbUpdateEvent = 
eventIterator.next();
+  String updatedKey = omdbUpdateEvent.getKey();
+  OmKeyInfo updatedValue = omdbUpdateEvent.getValue();
+
+  try{
+switch (omdbUpdateEvent.getAction()) {
+  case PUT:
+updateCountForKey(updatedValue, 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281368
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:43
Start Date: 23/Jul/19 22:43
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306501528
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private long[] upperBoundCount = new long[16];
+  private long minFileSize = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager, Configuration 
sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+  populateFileCountBySizeDB();
+
+} catch (IOException ioEx) {
+LOG.error("Unable to populate Container Key Prefix data in Recon DB. ", 
ioEx);
 
 Review comment:
   Can we log a better error message here? We are only trying to populate file 
counts by size and not container key prefix here. Also, the indentation seems 
to be off here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 281368)
Time Spent: 1h 50m  (was: 1h 40m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281355
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:25
Start Date: 23/Jul/19 22:25
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306559378
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -0,0 +1,311 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Injector;
+import org.apache.hadoop.hdds.client.BlockID;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
+import org.apache.hadoop.hdds.scm.pipeline.Pipeline;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup;
+import org.apache.hadoop.ozone.recon.AbstractOMMetadataManagerTest;
+import org.apache.hadoop.ozone.recon.GuiceInjectorUtilsForTestsImpl;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl;
+import org.apache.hadoop.utils.db.Table;
+import org.hadoop.ozone.recon.schema.UtilizationSchemaDefinition;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.jooq.impl.DSL;
+import org.jooq.impl.DefaultConfiguration;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import javax.sql.DataSource;
+import java.io.IOException;
+import java.util.*;
+
+import static org.junit.Assert.*;
+
+/**
+ * Unit test for Container Key mapper task.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
 
 Review comment:
   Are we mocking anything in the tests?  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 281355)
Time Spent: 1h 40m  (was: 1.5h)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281354
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:24
Start Date: 23/Jul/19 22:24
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306559214
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -0,0 +1,311 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Injector;
+import org.apache.hadoop.hdds.client.BlockID;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
+import org.apache.hadoop.hdds.scm.pipeline.Pipeline;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup;
+import org.apache.hadoop.ozone.recon.AbstractOMMetadataManagerTest;
+import org.apache.hadoop.ozone.recon.GuiceInjectorUtilsForTestsImpl;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl;
+import org.apache.hadoop.utils.db.Table;
+import org.hadoop.ozone.recon.schema.UtilizationSchemaDefinition;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.jooq.impl.DSL;
+import org.jooq.impl.DefaultConfiguration;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import javax.sql.DataSource;
+import java.io.IOException;
+import java.util.*;
+
+import static org.junit.Assert.*;
+
+/**
+ * Unit test for Container Key mapper task.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestFileSizeCountTask extends AbstractOMMetadataManagerTest {
+  private OMMetadataManager omMetadataManager;
+  private ReconOMMetadataManager reconOMMetadataManager;
+  private Injector injector;
+  private OzoneManagerServiceProviderImpl ozoneManagerServiceProvider;
+  private boolean setUpIsDone = false;
+  private GuiceInjectorUtilsForTestsImpl guiceInjectorTest =
+  new GuiceInjectorUtilsForTestsImpl();
+
+  private Injector getInjector() {
+return injector;
+  }
+  private Configuration sqlConfiguration;
+
+  @Rule
+  TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+  private void initializeInjector() throws Exception {
+omMetadataManager = initializeNewOmMetadataManager();
+OzoneConfiguration configuration =
+guiceInjectorTest.getTestOzoneConfiguration(temporaryFolder);
+
+ozoneManagerServiceProvider = new OzoneManagerServiceProviderImpl(
+configuration);
+reconOMMetadataManager = getTestMetadataManager(omMetadataManager);
+
+injector = guiceInjectorTest.getInjector(
+ozoneManagerServiceProvider, reconOMMetadataManager, temporaryFolder);
+  }
+
+  @Before
+  public void setUp() throws Exception {
+// The following setup is run only once
+if (!setUpIsDone) {
+  initializeInjector();
+
+  DSL.using(new DefaultConfiguration().set(
+  injector.getInstance(DataSource.class)));
+
+  UtilizationSchemaDefinition utilizationSchemaDefinition = 
getInjector().getInstance(
+  

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281352
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:20
Start Date: 23/Jul/19 22:20
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306558000
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,177 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import com.google.inject.AbstractModule;
+import com.google.inject.Injector;
+import org.apache.hadoop.hdds.client.BlockID;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.scm.pipeline.Pipeline;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup;
+import org.apache.hadoop.ozone.recon.AbstractOMMetadataManagerTest;
+import org.apache.hadoop.ozone.recon.GuiceInjectorUtilsForTestsImpl;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl;
+import org.apache.hadoop.ozone.recon.tasks.FileSizeCountTask;
+import org.hadoop.ozone.recon.schema.UtilizationSchemaDefinition;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import javax.ws.rs.core.Response;
+import java.util.*;
+
+import static org.junit.Assert.assertEquals;
+
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService extends AbstractOMMetadataManagerTest {
+
+  @Rule
+  public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+  private Injector injector;
+  private OzoneManagerServiceProviderImpl ozoneManagerServiceProvider;  
//should we use interface?
+  private OMMetadataManager omMetadataManager;
+  private GuiceInjectorUtilsForTestsImpl guiceInjectorTest =
+  new GuiceInjectorUtilsForTestsImpl();
+  private boolean isSetupDone = false;
+  private UtilizationService utilizationService;
+
+  private Injector getInjector() {
+return injector;
+  }
+  private Configuration sqlConfiguration;
+
+  private void initializeInjector() throws Exception {
+omMetadataManager = initializeNewOmMetadataManager();
+OzoneConfiguration configuration =
+guiceInjectorTest.getTestOzoneConfiguration(temporaryFolder);
+
+ozoneManagerServiceProvider = new OzoneManagerServiceProviderImpl(
+configuration);
+ReconOMMetadataManager reconOMMetadataManager =
+getTestMetadataManager(omMetadataManager);
+
+Injector parentInjector = guiceInjectorTest.getInjector(
+ozoneManagerServiceProvider, reconOMMetadataManager, temporaryFolder);
+
+injector = parentInjector.createChildInjector(new AbstractModule() {
+  @Override
+  protected void configure() {
+utilizationService = new UtilizationService();
+bind(UtilizationService.class).toInstance(utilizationService);
+  }
+});
+  }
+
+  @Before
+  public void setUp() throws Exception {
+
+if (!isSetupDone) {
+  initializeInjector();
+  sqlConfiguration = getInjector().getInstance(Configuration.class);
+  isSetupDone = true;
+}
+
+UtilizationSchemaDefinition 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281351
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:18
Start Date: 23/Jul/19 22:18
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306557434
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,177 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import com.google.inject.AbstractModule;
+import com.google.inject.Injector;
+import org.apache.hadoop.hdds.client.BlockID;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.scm.pipeline.Pipeline;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup;
+import org.apache.hadoop.ozone.recon.AbstractOMMetadataManagerTest;
+import org.apache.hadoop.ozone.recon.GuiceInjectorUtilsForTestsImpl;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl;
+import org.apache.hadoop.ozone.recon.tasks.FileSizeCountTask;
+import org.hadoop.ozone.recon.schema.UtilizationSchemaDefinition;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import javax.ws.rs.core.Response;
+import java.util.*;
+
+import static org.junit.Assert.assertEquals;
+
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService extends AbstractOMMetadataManagerTest {
+
+  @Rule
+  public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+  private Injector injector;
+  private OzoneManagerServiceProviderImpl ozoneManagerServiceProvider;  
//should we use interface?
+  private OMMetadataManager omMetadataManager;
+  private GuiceInjectorUtilsForTestsImpl guiceInjectorTest =
+  new GuiceInjectorUtilsForTestsImpl();
+  private boolean isSetupDone = false;
+  private UtilizationService utilizationService;
+
+  private Injector getInjector() {
+return injector;
+  }
+  private Configuration sqlConfiguration;
+
+  private void initializeInjector() throws Exception {
+omMetadataManager = initializeNewOmMetadataManager();
+OzoneConfiguration configuration =
+guiceInjectorTest.getTestOzoneConfiguration(temporaryFolder);
+
+ozoneManagerServiceProvider = new OzoneManagerServiceProviderImpl(
+configuration);
+ReconOMMetadataManager reconOMMetadataManager =
+getTestMetadataManager(omMetadataManager);
+
+Injector parentInjector = guiceInjectorTest.getInjector(
+ozoneManagerServiceProvider, reconOMMetadataManager, temporaryFolder);
+
+injector = parentInjector.createChildInjector(new AbstractModule() {
+  @Override
+  protected void configure() {
+utilizationService = new UtilizationService();
+bind(UtilizationService.class).toInstance(utilizationService);
+  }
+});
+  }
+
+  @Before
+  public void setUp() throws Exception {
+
+if (!isSetupDone) {
 
 Review comment:
   This is an anti-pattern, junit @BeforeClass allows for run once semantics 
already!
 

This 

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281348
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:12
Start Date: 23/Jul/19 22:12
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306555784
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private long[] upperBoundCount = new long[16];
+  private long minFileSize = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager, Configuration 
sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+  populateFileCountBySizeDB();
+
+} catch (IOException ioEx) {
+LOG.error("Unable to populate Container Key Prefix data in Recon DB. ", 
ioEx);
+return new ImmutablePair<>(getTaskName(), false);
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+LOG.info("Starting a 'process' run of FileSizeCountTask.");
+Iterator eventIterator = events.getIterator();
+while (eventIterator.hasNext()) {
+  OMDBUpdateEvent omdbUpdateEvent = 
eventIterator.next();
+  String updatedKey = omdbUpdateEvent.getKey();
+  OmKeyInfo updatedValue = omdbUpdateEvent.getValue();
+
+  try{
+switch (omdbUpdateEvent.getAction()) {
+  case PUT:
+updateCountForKey(updatedValue, "PUT");
+  

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281342
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:02
Start Date: 23/Jul/19 22:02
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306553115
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private long[] upperBoundCount = new long[16];
+  private long minFileSize = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager, Configuration 
sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+  populateFileCountBySizeDB();
+
+} catch (IOException ioEx) {
+LOG.error("Unable to populate Container Key Prefix data in Recon DB. ", 
ioEx);
+return new ImmutablePair<>(getTaskName(), false);
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+LOG.info("Starting a 'process' run of FileSizeCountTask.");
+Iterator eventIterator = events.getIterator();
+while (eventIterator.hasNext()) {
+  OMDBUpdateEvent omdbUpdateEvent = 
eventIterator.next();
+  String updatedKey = omdbUpdateEvent.getKey();
+  OmKeyInfo updatedValue = omdbUpdateEvent.getValue();
+
+  try{
+switch (omdbUpdateEvent.getAction()) {
+  case PUT:
+updateCountForKey(updatedValue, "PUT");
+  

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281341
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 22:02
Start Date: 23/Jul/19 22:02
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306553029
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private long[] upperBoundCount = new long[16];
+  private long minFileSize = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager, Configuration 
sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+  populateFileCountBySizeDB();
+
+} catch (IOException ioEx) {
+LOG.error("Unable to populate Container Key Prefix data in Recon DB. ", 
ioEx);
+return new ImmutablePair<>(getTaskName(), false);
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+LOG.info("Starting a 'process' run of FileSizeCountTask.");
+Iterator eventIterator = events.getIterator();
+while (eventIterator.hasNext()) {
+  OMDBUpdateEvent omdbUpdateEvent = 
eventIterator.next();
+  String updatedKey = omdbUpdateEvent.getKey();
+  OmKeyInfo updatedValue = omdbUpdateEvent.getValue();
+
+  try{
+switch (omdbUpdateEvent.getAction()) {
+  case PUT:
+updateCountForKey(updatedValue, "PUT");
+  

[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281337
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 21:56
Start Date: 23/Jul/19 21:56
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306551153
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/api/ContainerKeyService.java
 ##
 @@ -38,6 +37,7 @@
 import javax.ws.rs.core.MediaType;
 import javax.ws.rs.core.Response;
 
+import javax.inject.Inject;
 
 Review comment:
   Why javax and not google?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 281337)
Time Spent: 0.5h  (was: 20m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281248=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281248
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 19:28
Start Date: 23/Jul/19 19:28
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on issue #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#issuecomment-514350136
 
 
   /label ozone
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 281248)
Time Spent: 20m  (was: 10m)

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1366) Add ability in Recon to track the number of small files in an Ozone cluster.

2019-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1366?focusedWorklogId=281245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-281245
 ]

ASF GitHub Bot logged work on HDDS-1366:


Author: ASF GitHub Bot
Created on: 23/Jul/19 19:26
Start Date: 23/Jul/19 19:26
Worklog Time Spent: 10m 
  Work Description: shwetayakkali commented on pull request #1146: 
HDDS-1366. Add ability in Recon to track the number of small files in an Ozone 
Cluster
URL: https://github.com/apache/hadoop/pull/1146
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 281245)
Time Spent: 10m
Remaining Estimate: 0h

> Add ability in Recon to track the number of small files in an Ozone cluster.
> 
>
> Key: HDDS-1366
> URL: https://issues.apache.org/jira/browse/HDDS-1366
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ozone users may want to track the number of small files they have in their 
> cluster and where they are present. Recon can help them with the information 
> by iterating the OM Key Table and dividing the keys into different buckets 
> based on the data size. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org