haridsv commented on code in PR #8236:
URL: https://github.com/apache/hbase/pull/8236#discussion_r3241243806
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionLocator.java:
##########
@@ -130,6 +131,44 @@ default List<HRegionLocation> getRegionLocations(byte[]
row) throws IOException
*/
List<HRegionLocation> getAllRegionLocations() throws IOException;
+ /**
+ * Bulk lookup of region locations from {@code hbase:meta} in a single RPC,
starting at
+ * {@code startKey} (region start-key boundary, inclusive) and returning at
most {@code limit}
+ * regions in start-key order.
+ * <p/>
+ * The returned list includes all replicas of each region (matching
+ * {@link #getAllRegionLocations()}), and the result is also written to the
connection's region
+ * location cache.
+ * <p/>
+ * Ordering: regions are returned in ascending region start-key order (the
natural order of
+ * {@code hbase:meta} rows for a single table). Within each region, replicas
are returned in
+ * ascending replica-id order (replica 0, then 1, then 2, ...). Split
parents and offline regions
+ * are filtered out, which may cause a page to contain fewer than {@code
limit} regions but never
Review Comment:
Where is this filtering happening? I didn't see any test coverage either.
The existing methods don't do any such filtering correct, so is this even
needed?
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotRegionLocator.java:
##########
@@ -119,6 +119,12 @@ public List<HRegionLocation> getAllRegionLocations()
throws IOException {
return rawLocations;
}
+ @Override
+ public List<HRegionLocation> getRegionLocations(byte[] startKey, int limit)
throws IOException {
+ // No need to page as region locations are already in-memory.
+ return getAllRegionLocations();
+ }
Review Comment:
This is actually breaking the contract established by the javadoc by always
returning all regions.
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionLocator.java:
##########
@@ -130,6 +131,44 @@ default List<HRegionLocation> getRegionLocations(byte[]
row) throws IOException
*/
List<HRegionLocation> getAllRegionLocations() throws IOException;
+ /**
+ * Bulk lookup of region locations from {@code hbase:meta} in a single RPC,
starting at
+ * {@code startKey} (region start-key boundary, inclusive) and returning at
most {@code limit}
+ * regions in start-key order.
+ * <p/>
+ * The returned list includes all replicas of each region (matching
+ * {@link #getAllRegionLocations()}), and the result is also written to the
connection's region
+ * location cache.
+ * <p/>
+ * Ordering: regions are returned in ascending region start-key order (the
natural order of
+ * {@code hbase:meta} rows for a single table). Within each region, replicas
are returned in
+ * ascending replica-id order (replica 0, then 1, then 2, ...). Split
parents and offline regions
+ * are filtered out, which may cause a page to contain fewer than {@code
limit} regions but never
+ * disturbs ordering of the survivors.
+ * <p/>
+ * To page through all regions of a table, call repeatedly passing
+ * {@code last.getRegion().getEndKey()} as the next {@code startKey}, where
{@code last} is the
+ * final element of the previous response. All replicas of a region share
the same
+ * {@link RegionInfo}, so the last entry's end key is the correct cursor
regardless of which
+ * replica it is. Pass {@code null} for the first call. Stop paging when the
returned list is
+ * empty or when the last region's end key is {@link
HConstants#EMPTY_END_ROW} (zero-length) -
+ * that signals the end of the table; passing it back in would re-scan from
the beginning since by
+ * convention an empty start key means "from the first region".
+ * <p/>
+ * Unlike {@link #getAllRegionLocations()}, this method performs at most one
RPC against
+ * {@code hbase:meta} per invocation, so its latency is bounded by {@code
limit} rather than table
+ * size. Suitable for callers that wrap meta lookups in a lock with a fixed
timeout, e.g. for bulk
+ * region-cache warmup.
+ * @param startKey region start-key to begin scanning from (inclusive);
{@code null} or empty
+ * starts from the first region
+ * @param limit maximum number of regions to return; if <= 0, falls
back to
+ * {@code hbase.meta.scanner.caching}
+ * @return up to {@code limit} {@link HRegionLocation}s in start-key order,
possibly empty when no
+ * more regions exist
+ * @throws IOException if a remote or network exception occurs
+ */
+ List<HRegionLocation> getRegionLocations(byte[] startKey, int limit) throws
IOException;
Review Comment:
Since this is changing a public interface on a stable branch, this will
break external code implementing the interface, why not have a default
implementation perhaps throw unsupported or return a blank list?
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionLocator.java:
##########
@@ -130,6 +131,44 @@ default List<HRegionLocation> getRegionLocations(byte[]
row) throws IOException
*/
List<HRegionLocation> getAllRegionLocations() throws IOException;
+ /**
+ * Bulk lookup of region locations from {@code hbase:meta} in a single RPC,
starting at
+ * {@code startKey} (region start-key boundary, inclusive) and returning at
most {@code limit}
+ * regions in start-key order.
+ * <p/>
+ * The returned list includes all replicas of each region (matching
+ * {@link #getAllRegionLocations()}), and the result is also written to the
connection's region
+ * location cache.
+ * <p/>
+ * Ordering: regions are returned in ascending region start-key order (the
natural order of
+ * {@code hbase:meta} rows for a single table). Within each region, replicas
are returned in
+ * ascending replica-id order (replica 0, then 1, then 2, ...). Split
parents and offline regions
+ * are filtered out, which may cause a page to contain fewer than {@code
limit} regions but never
+ * disturbs ordering of the survivors.
+ * <p/>
+ * To page through all regions of a table, call repeatedly passing
+ * {@code last.getRegion().getEndKey()} as the next {@code startKey}, where
{@code last} is the
+ * final element of the previous response. All replicas of a region share
the same
+ * {@link RegionInfo}, so the last entry's end key is the correct cursor
regardless of which
+ * replica it is. Pass {@code null} for the first call. Stop paging when the
returned list is
+ * empty or when the last region's end key is {@link
HConstants#EMPTY_END_ROW} (zero-length) -
+ * that signals the end of the table; passing it back in would re-scan from
the beginning since by
+ * convention an empty start key means "from the first region".
+ * <p/>
+ * Unlike {@link #getAllRegionLocations()}, this method performs at most one
RPC against
+ * {@code hbase:meta} per invocation, so its latency is bounded by {@code
limit} rather than table
+ * size. Suitable for callers that wrap meta lookups in a lock with a fixed
timeout, e.g. for bulk
+ * region-cache warmup.
+ * @param startKey region start-key to begin scanning from (inclusive);
{@code null} or empty
+ * starts from the first region
+ * @param limit maximum number of regions to return; if <= 0, falls
back to
+ * {@code hbase.meta.scanner.caching}
+ * @return up to {@code limit} {@link HRegionLocation}s in start-key order,
possibly empty when no
+ * more regions exist
+ * @throws IOException if a remote or network exception occurs
+ */
+ List<HRegionLocation> getRegionLocations(byte[] startKey, int limit) throws
IOException;
Review Comment:
Also, this is overloading an existing methods that takes a data row to find
its locations, which is a totally different semantic and so reusing that name
for this can be very confusing. Perhaps just call this `getLocations` or may be
you can come up with a better name.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]