[ https://issues.apache.org/jira/browse/BEAM-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300705#comment-16300705 ]
ASF GitHub Bot commented on BEAM-3154: -------------------------------------- rniemo-g opened a new pull request #4312: [BEAM-3154] Support Multiple KeyRanges when reading from BigTable URL: https://github.com/apache/beam/pull/4312 Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [x] Each commit in the pull request should have a meaningful subject line and body. - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [x] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [x] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- Support using multiple key ranges when reading from BigTable. This is useful for applications that want to read non contiguous keys in one BigTable query. First, this PR tweaks BigTableSource's existing methods that split and estimate size based on samples to now work with multiple key ranges. These are relatively simple extensions, applying the previous logic of splitting one key range to each key range, with some edge cases. E.g. when estimating the size of the ranges based on samples, even if two ranges overlap a given sample range, we only want to add the sample range's size to the estimate once. Then, a range tracker for multiple key ranges is defined as ByteKeyRangesTracker. This tracker sorts the list of key ranges on instantiation and operates on them similar to ByteKeyRangeTracker. These classes operate similar enough that an abstract base class was created to share functionality. A key difference is the multi-range tracker has a method to interpolate a key across its multiple ranges. The BigTableReader uses this multi-range tracker when splitting it's source into primary and residual parts. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support multiple KeyRanges when reading from BigTable > ----------------------------------------------------- > > Key: BEAM-3154 > URL: https://issues.apache.org/jira/browse/BEAM-3154 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp > Reporter: Ryan Niemocienski > Assignee: Solomon Duskis > Priority: Minor > > BigTableIO.Read currently only supports reading one KeyRange from BT. It > would be nice to read multiple ranges from BigTable in one read. Thoughts on > the feasibility of this before I dig into it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)