Github user JamesRTaylor commented on a diff in the pull request:
https://github.com/apache/phoenix/pull/3#discussion_r14899559
--- Diff:
phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java
---
@@ -140,7 +142,14 @@ public boolean apply(HRegionLocation location) {
// distributed across regions, using this scheme compensates for
regions that
// have more rows than others, by applying tighter splits and
therefore spawning
// off more scans over the overloaded regions.
- int splitsPerRegion = regions.size() >= targetConcurrency ? 1 :
(regions.size() > targetConcurrency / 2 ? maxConcurrency : targetConcurrency) /
regions.size();
+ PTable table = tableRef.getTable();
--- End diff --
That's what the splitsPerRegion variable and subsequent logic in
ParallelIterators does - it creates additional split points within the range so
that multiple scans get run over a single region. We'd want to prefix each of
these with the same start region key. I'll open a separate JIRA for this too.
It's not a big deal - most of the time the parallelization slots would be used
up by having to do a scan in each region anyway.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---