[ https://issues.apache.org/jira/browse/HDDS-1986?focusedWorklogId=324797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324797 ]
ASF GitHub Bot logged work on HDDS-1986: ---------------------------------------- Author: ASF GitHub Bot Created on: 08/Oct/19 01:30 Start Date: 08/Oct/19 01:30 Worklog Time Spent: 10m Work Description: arp7 commented on pull request #1588: HDDS-1986. Fix listkeys API. URL: https://github.com/apache/hadoop/pull/1588#discussion_r332303770 ########## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java ########## @@ -680,26 +688,85 @@ public boolean isBucketEmpty(String volume, String bucket) seekPrefix = getBucketKey(volumeName, bucketName + OM_KEY_PREFIX); } int currentCount = 0; - try (TableIterator<String, ? extends KeyValue<String, OmKeyInfo>> keyIter = - getKeyTable() - .iterator()) { - KeyValue<String, OmKeyInfo> kv = keyIter.seek(seekKey); - while (currentCount < maxKeys && keyIter.hasNext()) { - kv = keyIter.next(); - // Skip the Start key if needed. - if (kv != null && skipStartKey && kv.getKey().equals(seekKey)) { - continue; + + + TreeMap<String, OmKeyInfo> cacheKeyMap = new TreeMap<>(); + Set<String> deletedKeySet = new TreeSet<>(); + Iterator<Map.Entry<CacheKey<String>, CacheValue<OmKeyInfo>>> iterator = + keyTable.cacheIterator(); + + //TODO: We can avoid this iteration if table cache has stored entries in + // treemap. Currently HashMap is used in Cache. HashMap get operation is an + // constant time operation, where as for treeMap get is log(n). + // So if we move to treemap, the get operation will be affected. As get + // is frequent operation on table. So, for now in list we iterate cache map + // and construct treeMap which match with keyPrefix and are greater than or + // equal to startKey. Later we can revisit this, if list operation + // is becoming slow. + while (iterator.hasNext()) { + Map.Entry< CacheKey<String>, CacheValue<OmKeyInfo>> entry = + iterator.next(); + + String key = entry.getKey().getCacheKey(); + OmKeyInfo omKeyInfo = entry.getValue().getCacheValue(); + // Making sure that entry in cache is not for delete key request. + + if (omKeyInfo != null) { + if (key.startsWith(seekPrefix) && key.compareTo(seekKey) >= 0) { + cacheKeyMap.put(key, omKeyInfo); } + } else { + deletedKeySet.add(key); + } + } + + // Get maxKeys from DB if it has. + + try (TableIterator<String, ? extends KeyValue<String, OmKeyInfo>> + keyIter = getKeyTable().iterator()) { + KeyValue< String, OmKeyInfo > kv; + keyIter.seek(seekKey); + // we need to iterate maxKeys + 1 here because if skipStartKey is true, + // we should skip that entry and return the result. + while (currentCount < maxKeys + 1 && keyIter.hasNext()) { + kv = keyIter.next(); if (kv != null && kv.getKey().startsWith(seekPrefix)) { - result.add(kv.getValue()); - currentCount++; + + // Entry should not be marked for delete, consider only those + // entries. + if(!deletedKeySet.contains(kv.getKey())) { + cacheKeyMap.put(kv.getKey(), kv.getValue()); + currentCount++; + } } else { // The SeekPrefix does not match any more, we can break out of the // loop. break; } } } + + // Finally DB entries and cache entries are merged, then return the count + // of maxKeys from the sorted map. + currentCount = 0; + + for (Map.Entry<String, OmKeyInfo> cacheKey : cacheKeyMap.entrySet()) { Review comment: The second iteration is unfortunate. We should see if there is a way to avoid it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 324797) Time Spent: 1h 10m (was: 1h) > Fix listkeys API > ---------------- > > Key: HDDS-1986 > URL: https://issues.apache.org/jira/browse/HDDS-1986 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Reporter: Bharat Viswanadham > Assignee: Bharat Viswanadham > Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > This Jira is to fix listKeys API in HA code path. > In HA, we have an in-memory cache, where we put the result to in-memory cache > and return the response, later it will be picked by double buffer thread and > it will flush to disk. So, now when do listkeys, it should use both in-memory > cache and rocksdb key table to list keys in a bucket. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org