[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=245874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245874
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 21/May/19 09:20
Start Date: 21/May/19 09:20
Worklog Time Spent: 10m 
  Work Description: lokeshj1703 commented on issue #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#issuecomment-494311487
 
 
   @mukul1987 @xiaoyuyao Thanks for reviewing the PR! I have merged it to trunk.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245874)
Time Spent: 2h 20m  (was: 2h 10m)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=245872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245872
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 21/May/19 09:17
Start Date: 21/May/19 09:17
Worklog Time Spent: 10m 
  Work Description: lokeshj1703 commented on pull request #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245872)
Time Spent: 2h 10m  (was: 2h)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=245851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-245851
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 21/May/19 08:52
Start Date: 21/May/19 08:52
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on issue #782: HDDS-1461. Optimize 
listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#issuecomment-494301675
 
 
   +1, the patch looks good to me.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 245851)
Time Spent: 2h  (was: 1h 50m)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=243285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243285
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 16/May/19 12:17
Start Date: 16/May/19 12:17
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#issuecomment-493043382
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 525 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 3 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 66 | Maven dependency ordering for branch |
   | +1 | mvninstall | 405 | trunk passed |
   | +1 | compile | 207 | trunk passed |
   | +1 | checkstyle | 53 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 829 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 128 | trunk passed |
   | 0 | spotbugs | 236 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 417 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 28 | Maven dependency ordering for patch |
   | +1 | mvninstall | 398 | the patch passed |
   | +1 | compile | 212 | the patch passed |
   | +1 | cc | 212 | the patch passed |
   | +1 | javac | 212 | the patch passed |
   | -0 | checkstyle | 29 | hadoop-ozone: The patch generated 5 new + 0 
unchanged - 0 fixed = 5 total (was 0) |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 663 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 123 | the patch passed |
   | +1 | findbugs | 433 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 150 | hadoop-hdds in the patch failed. |
   | -1 | unit | 1309 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 40 | The patch does not generate ASF License warnings. |
   | | | 7626 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/782 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle cc |
   | uname | Linux 4cd11d86f21c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / de01422 |
   | Default Java | 1.8.0_212 |
   | checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/3/artifact/out/diff-checkstyle-hadoop-ozone.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/3/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/3/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/3/testReport/ |
   | Max. process+thread count | 4646 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-ozone/client hadoop-ozone/common 
hadoop-ozone/ozone-manager hadoop-ozone/ozonefs U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/3/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 243285)
Time Spent: 1h 50m  (was: 1h 40m)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: 

[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=243235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243235
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 16/May/19 10:10
Start Date: 16/May/19 10:10
Worklog Time Spent: 10m 
  Work Description: lokeshj1703 commented on issue #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#issuecomment-493006449
 
 
   @xiaoyuyao @mukul1987 I have made addresses the review comments in the 
latest commit.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 243235)
Time Spent: 1h 40m  (was: 1.5h)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=243208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243208
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 16/May/19 09:16
Start Date: 16/May/19 09:16
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#discussion_r284612628
 
 

 ##
 File path: 
hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java
 ##
 @@ -494,130 +491,33 @@ private boolean o3Exists(final Path f) throws 
IOException {
 }
   }
 
-  private class ListStatusIterator extends OzoneListingIterator {
-// _fileStatuses_ maintains a list of file(s) which is either the input
-// path itself or a child of the input directory path.
-private List fileStatuses = new ArrayList<>(LISTING_PAGE_SIZE);
-// _subDirStatuses_ maintains a list of sub-dirs of the input directory
-// path.
-private Map subDirStatuses =
-new HashMap<>(LISTING_PAGE_SIZE);
-private Path f; // the input path
-
-ListStatusIterator(Path f) throws IOException {
-  super(f);
-  this.f = f;
-}
+  @Override
+  public FileStatus[] listStatus(Path f) throws IOException {
+incrementCounter(Statistic.INVOCATION_LIST_STATUS);
+statistics.incrementReadOps(1);
+LOG.trace("listStatus() path:{}", f);
+int numEntries = LISTING_PAGE_SIZE;
+LinkedList statuses = new LinkedList<>();
+List tmpStatus;
+String startKey = "";
 
-/**
- * Add the key to the listStatus result if the key corresponds to the
- * input path or is an immediate child of the input path.
- *
- * @param key key to be processed
- * @return always returns true
- * @throws IOException
- */
-@Override
-boolean processKey(String key) throws IOException {
-  Path keyPath = new Path(OZONE_URI_DELIMITER + key);
-  if (key.equals(getPathKey())) {
-if (pathIsDirectory()) {
-  // if input path is a directory, we add the sub-directories and
-  // files under this directory.
-  return true;
-} else {
-  addFileStatus(keyPath);
-  return true;
-}
-  }
-  // Left with only subkeys now
-  // We add only the immediate child files and sub-dirs i.e. we go only
-  // upto one level down the directory tree structure.
-  if (pathToKey(keyPath.getParent()).equals(pathToKey(f))) {
-// This key is an immediate child. Can be file or directory
-if (key.endsWith(OZONE_URI_DELIMITER)) {
-  // Key is a directory
-  addSubDirStatus(keyPath);
+do {
+  tmpStatus = adapter.listStatus(pathToKey(f), false, startKey, 
numEntries);
+  if (!tmpStatus.isEmpty()) {
+if (startKey.isEmpty()) {
+  statuses.addAll(tmpStatus);
 } else {
-  addFileStatus(keyPath);
-}
-  } else {
-// This key is not the immediate child of the input directory. So we
-// traverse the parent tree structure of this key until we get the
-// immediate child of the input directory.
-Path immediateChildPath = getImmediateChildPath(keyPath.getParent());
-if (immediateChildPath != null) {
-  addSubDirStatus(immediateChildPath);
+  statuses.addAll(tmpStatus.subList(1, tmpStatus.size()));
 }
+startKey = pathToKey(statuses.getLast().getPath());
   }
-  return true;
-}
+} while (tmpStatus.size() == numEntries);
 
 Review comment:
   Please add a comment here to explain this line
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 243208)
Time Spent: 1.5h  (was: 1h 20m)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a 

[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=243207=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243207
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 16/May/19 09:15
Start Date: 16/May/19 09:15
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#discussion_r283858738
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
 ##
 @@ -1416,44 +1421,47 @@ public void createDirectory(OmKeyArgs args) throws 
IOException {
 try {
   metadataManager.getLock().acquireBucketLock(volumeName, bucketName);
 
-  // verify bucket exists
-  OmBucketInfo bucketInfo = getBucketInfo(volumeName, bucketName);
-
   // Check if this is the root of the filesystem.
   if (keyName.length() == 0) {
 return;
   }
 
-  verifyNoFilesInPath(volumeName, bucketName, Paths.get(keyName), false);
-  String dir = addTrailingSlashIfNeeded(keyName);
-  String dirDbKey =
-  metadataManager.getOzoneKey(volumeName, bucketName, dir);
-  FileEncryptionInfo encInfo = getFileEncryptionInfo(bucketInfo);
+  Path keyPath = Paths.get(keyName);
+  OzoneFileStatus status =
+  verifyNoFilesInPath(volumeName, bucketName, keyPath, false);
+  if (status != null && OzoneFSUtils.pathToKey(status.getPath())
+  .equals(keyPath.toString())) {
 
 Review comment:
   keyName can be used here, as keyPath was derived from that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 243207)
Time Spent: 1h 20m  (was: 1h 10m)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=243206=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243206
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 16/May/19 09:15
Start Date: 16/May/19 09:15
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#discussion_r284610750
 
 

 ##
 File path: 
hadoop-ozone/ozonefs/src/test/java/org/apache/hadoop/fs/ozone/TestOzoneFileSystem.java
 ##
 @@ -219,6 +222,32 @@ public void testListStatusOnRoot() throws Exception {
 assertFalse(fileStatus2.equals(dir12.toString()));
   }
 
+  /**
+   * Tests listStatus operation on root directory.
+   */
+  @Test
+  public void testListStatusOnLargeDirectory() throws Exception {
+Path root = new Path("/");
+Set paths = new TreeSet<>();
+int numDirs = 5111;
+for(int i = 0; i < numDirs; i++) {
+  Path p = new Path(root, String.valueOf(i));
+  fs.mkdirs(p);
+  paths.add(p.getName());
+}
+
+// ListStatus on root should return dir1 (even though /dir1 key does not
 
 Review comment:
   The comments here do not match with the assert in the next line. Can you 
please have a look.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 243206)
Time Spent: 1h 10m  (was: 1h)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=242942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-242942
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 15/May/19 23:04
Start Date: 15/May/19 23:04
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#discussion_r284485092
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
 ##
 @@ -1416,44 +1421,47 @@ public void createDirectory(OmKeyArgs args) throws 
IOException {
 try {
   metadataManager.getLock().acquireBucketLock(volumeName, bucketName);
 
 Review comment:
   this needs to be moved out of try block.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 242942)
Time Spent: 1h  (was: 50m)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=242940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-242940
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 15/May/19 23:03
Start Date: 15/May/19 23:03
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#discussion_r284484826
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
 ##
 @@ -1355,15 +1360,15 @@ public OzoneFileStatus getFileStatus(OmKeyArgs args) 
throws IOException {
 String bucketName = args.getBucketName();
 String keyName = args.getKeyName();
 
-metadataManager.getLock().acquireBucketLock(volumeName, bucketName);
 try {
+  metadataManager.getLock().acquireBucketLock(volumeName, bucketName);
 
 Review comment:
   why do we move the acquireBucketLock inside the try block? The original 
pattern seems good to me.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 242940)
Time Spent: 50m  (was: 40m)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=240312=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240312
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 10/May/19 16:22
Start Date: 10/May/19 16:22
Worklog Time Spent: 10m 
  Work Description: lokeshj1703 commented on pull request #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#discussion_r282951671
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
 ##
 @@ -1546,6 +1552,101 @@ public OmKeyInfo lookupFile(OmKeyArgs args) throws 
IOException {
 ResultCodes.NOT_A_FILE);
   }
 
+  /**
+   * List the status for a file or a directory and its contents.
+   *
+   * @param args   Key args
+   * @param recursive  For a directory if true all the descendants of a
+   *   particular directory are listed
+   * @param startKey   Key from which listing needs to start. If startKey 
exists
+   *   its status is included in the final list.
+   * @param numEntries Number of entries to list from the start key
+   * @return list of file status
+   */
+  public List listStatus(OmKeyArgs args, boolean recursive,
+  String startKey, long numEntries) throws IOException {
+Preconditions.checkNotNull(args, "Key args can not be null");
+String volumeName = args.getVolumeName();
+String bucketName = args.getBucketName();
+String keyName = args.getKeyName();
+
+List fileStatusList = new ArrayList<>();
+try {
+  metadataManager.getLock().acquireBucketLock(volumeName, bucketName);
+  if (Strings.isNullOrEmpty(startKey)) {
+OzoneFileStatus fileStatus = getFileStatus(args);
+if (fileStatus.isFile()) {
+  return Collections.singletonList(fileStatus);
+}
+startKey = OzoneFSUtils.addTrailingSlashIfNeeded(keyName);
+  }
+
+  String seekKeyInDb =
+  metadataManager.getOzoneKey(volumeName, bucketName, startKey);
+  String keyInDb = OzoneFSUtils.addTrailingSlashIfNeeded(
+  metadataManager.getOzoneKey(volumeName, bucketName, keyName));
+  TableIterator>
+  iterator = metadataManager.getKeyTable().iterator();
+  iterator.seek(seekKeyInDb);
+
+  if (!iterator.hasNext()) {
+return Collections.emptyList();
+  }
+
+  if (iterator.key().equals(keyInDb)) {
+// skip the key which needs to be listed
+iterator.next();
+  }
+
+  while (iterator.hasNext() && numEntries - fileStatusList.size() > 0) {
+String entryInDb = iterator.key();
+OmKeyInfo value = iterator.value().getValue();
+if (entryInDb.startsWith(keyInDb)) {
+  String entryKeyName = value.getKeyName();
+  if (recursive) {
+// for recursive list all the entries
+fileStatusList.add(new OzoneFileStatus(value, scmBlockSize,
+!OzoneFSUtils.isFile(entryKeyName)));
+iterator.next();
+  } else {
+// get the child of the directory to list from the entry. For
+// example if directory to list is /a and entry is /a/b/c where
+// c is a file. The immediate child is b which is a directory. c
+// should not be listed as child of a.
+String immediateChild = OzoneFSUtils
+.getImmediateChild(entryKeyName, keyName);
+boolean isFile = OzoneFSUtils.isFile(immediateChild);
+if (isFile) {
+  fileStatusList
+  .add(new OzoneFileStatus(value, scmBlockSize, !isFile));
+  iterator.next();
+} else {
+  // if entry is a directory
+  fileStatusList.add(new OzoneFileStatus(immediateChild));
+  // skip the other descendants of this child directory.
+  iterator.seek(
+  getNextGreaterString(volumeName, bucketName, 
immediateChild));
+}
+  }
+} else {
+  break;
+}
+  }
+} finally {
+  metadataManager.getLock().releaseBucketLock(volumeName, bucketName);
+}
+return fileStatusList;
+  }
+
+  private String getNextGreaterString(String volumeName, String bucketName,
+  String keyPrefix) {
+// TODO: Use string codec
+// Increment the last character of the string and return the new ozone key.
+String nextPrefix = keyPrefix.substring(0, keyPrefix.length() - 1) +
+String.valueOf((char) (keyPrefix.charAt(keyPrefix.length() - 1) + 1));
 
 Review comment:
   The first character in the ASCII table would not work in our case.
   The second commit uses codec of RDBStore for handling the UTF-8 case.
 

This is an automated 

[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-05-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=238254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-238254
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 07/May/19 05:02
Start Date: 07/May/19 05:02
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#discussion_r281460619
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
 ##
 @@ -1546,6 +1552,101 @@ public OmKeyInfo lookupFile(OmKeyArgs args) throws 
IOException {
 ResultCodes.NOT_A_FILE);
   }
 
+  /**
+   * List the status for a file or a directory and its contents.
+   *
+   * @param args   Key args
+   * @param recursive  For a directory if true all the descendants of a
+   *   particular directory are listed
+   * @param startKey   Key from which listing needs to start. If startKey 
exists
+   *   its status is included in the final list.
+   * @param numEntries Number of entries to list from the start key
+   * @return list of file status
+   */
+  public List listStatus(OmKeyArgs args, boolean recursive,
+  String startKey, long numEntries) throws IOException {
+Preconditions.checkNotNull(args, "Key args can not be null");
+String volumeName = args.getVolumeName();
+String bucketName = args.getBucketName();
+String keyName = args.getKeyName();
+
+List fileStatusList = new ArrayList<>();
+try {
+  metadataManager.getLock().acquireBucketLock(volumeName, bucketName);
+  if (Strings.isNullOrEmpty(startKey)) {
+OzoneFileStatus fileStatus = getFileStatus(args);
+if (fileStatus.isFile()) {
+  return Collections.singletonList(fileStatus);
+}
+startKey = OzoneFSUtils.addTrailingSlashIfNeeded(keyName);
+  }
+
+  String seekKeyInDb =
+  metadataManager.getOzoneKey(volumeName, bucketName, startKey);
+  String keyInDb = OzoneFSUtils.addTrailingSlashIfNeeded(
+  metadataManager.getOzoneKey(volumeName, bucketName, keyName));
+  TableIterator>
+  iterator = metadataManager.getKeyTable().iterator();
+  iterator.seek(seekKeyInDb);
+
+  if (!iterator.hasNext()) {
+return Collections.emptyList();
+  }
+
+  if (iterator.key().equals(keyInDb)) {
+// skip the key which needs to be listed
+iterator.next();
+  }
+
+  while (iterator.hasNext() && numEntries - fileStatusList.size() > 0) {
+String entryInDb = iterator.key();
+OmKeyInfo value = iterator.value().getValue();
+if (entryInDb.startsWith(keyInDb)) {
+  String entryKeyName = value.getKeyName();
+  if (recursive) {
+// for recursive list all the entries
+fileStatusList.add(new OzoneFileStatus(value, scmBlockSize,
+!OzoneFSUtils.isFile(entryKeyName)));
+iterator.next();
+  } else {
+// get the child of the directory to list from the entry. For
+// example if directory to list is /a and entry is /a/b/c where
+// c is a file. The immediate child is b which is a directory. c
+// should not be listed as child of a.
+String immediateChild = OzoneFSUtils
+.getImmediateChild(entryKeyName, keyName);
+boolean isFile = OzoneFSUtils.isFile(immediateChild);
+if (isFile) {
+  fileStatusList
+  .add(new OzoneFileStatus(value, scmBlockSize, !isFile));
+  iterator.next();
+} else {
+  // if entry is a directory
+  fileStatusList.add(new OzoneFileStatus(immediateChild));
+  // skip the other descendants of this child directory.
+  iterator.seek(
+  getNextGreaterString(volumeName, bucketName, 
immediateChild));
+}
+  }
+} else {
+  break;
+}
+  }
+} finally {
+  metadataManager.getLock().releaseBucketLock(volumeName, bucketName);
+}
+return fileStatusList;
+  }
+
+  private String getNextGreaterString(String volumeName, String bucketName,
+  String keyPrefix) {
+// TODO: Use string codec
+// Increment the last character of the string and return the new ozone key.
+String nextPrefix = keyPrefix.substring(0, keyPrefix.length() - 1) +
+String.valueOf((char) (keyPrefix.charAt(keyPrefix.length() - 1) + 1));
 
 Review comment:
   This is a great optimization.
   Should this point to first character in the ASCII table ? Also lets verify 
that this for UTF-8 encoding as well.
 

This is an 

[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-04-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=234481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-234481
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 29/Apr/19 11:55
Start Date: 29/Apr/19 11:55
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782#issuecomment-487550776
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 58 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 3 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 35 | Maven dependency ordering for branch |
   | +1 | mvninstall | 1358 | trunk passed |
   | +1 | compile | 146 | trunk passed |
   | +1 | checkstyle | 56 | trunk passed |
   | +1 | mvnsite | 157 | trunk passed |
   | +1 | shadedclient | 1039 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | findbugs | 214 | trunk passed |
   | +1 | javadoc | 126 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 15 | Maven dependency ordering for patch |
   | +1 | mvninstall | 138 | the patch passed |
   | +1 | compile | 120 | the patch passed |
   | +1 | cc | 120 | the patch passed |
   | +1 | javac | 120 | the patch passed |
   | -0 | checkstyle | 28 | hadoop-ozone: The patch generated 6 new + 0 
unchanged - 0 fixed = 6 total (was 0) |
   | +1 | mvnsite | 127 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 920 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | findbugs | 247 | the patch passed |
   | +1 | javadoc | 113 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 43 | common in the patch passed. |
   | +1 | unit | 33 | client in the patch passed. |
   | -1 | unit | 75 | ozone-manager in the patch failed. |
   | -1 | unit | 171 | ozonefs in the patch failed. |
   | +1 | asflicense | 31 | The patch does not generate ASF License warnings. |
   | | | 5245 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.om.ratis.TestOzoneManagerRatisServer |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/782 |
   | Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall 
 mvnsite  unit  shadedclient  findbugs  checkstyle  cc  |
   | uname | Linux 24af074b3c69 4.4.0-141-generic #167~14.04.1-Ubuntu SMP Mon 
Dec 10 13:20:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / b434f55 |
   | maven | version: Apache Maven 3.3.9 |
   | Default Java | 1.8.0_191 |
   | findbugs | v3.1.0-RC1 |
   | checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/1/artifact/out/diff-checkstyle-hadoop-ozone.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/1/artifact/out/patch-unit-hadoop-ozone_ozone-manager.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/1/artifact/out/patch-unit-hadoop-ozone_ozonefs.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/1/testReport/ |
   | Max. process+thread count | 2720 (vs. ulimit of 5500) |
   | modules | C: hadoop-ozone/common hadoop-ozone/client 
hadoop-ozone/ozone-manager hadoop-ozone/ozonefs U: hadoop-ozone |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-782/1/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 234481)
Time Spent: 20m  (was: 10m)

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
> 

[jira] [Work logged] (HDDS-1461) Optimize listStatus api in OzoneFileSystem

2019-04-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1461?focusedWorklogId=234435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-234435
 ]

ASF GitHub Bot logged work on HDDS-1461:


Author: ASF GitHub Bot
Created on: 29/Apr/19 10:26
Start Date: 29/Apr/19 10:26
Worklog Time Spent: 10m 
  Work Description: lokeshj1703 commented on pull request #782: HDDS-1461. 
Optimize listStatus api in OzoneFileSystem
URL: https://github.com/apache/hadoop/pull/782
 
 
   Currently in listStatus we make multiple getFileStatus calls. This can be 
optimized by converting to a single rpc call for listStatus.
   
   Also currently listStatus has to traverse a directory recursively in order 
to list its immediate children. This happens because in OzoneManager all the 
metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
this by using seek api provided by rocksdb.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 234435)
Time Spent: 10m
Remaining Estimate: 0h

> Optimize listStatus api in OzoneFileSystem
> --
>
> Key: HDDS-1461
> URL: https://issues.apache.org/jira/browse/HDDS-1461
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently in listStatus we make multiple getFileStatus calls. This can be 
> optimized by converting to a single rpc call for listStatus.
> Also currently listStatus has to traverse a directory recursively in order to 
> list its immediate children. This happens because in OzoneManager all the 
> metadata is stored in rocksdb sorted on keynames. The Jira also aims to fix 
> this by using seek api provided by rocksdb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org