Github user ramkrish86 commented on a diff in the pull request:
https://github.com/apache/phoenix/pull/8#discussion_r16796576
--- Diff:
phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java
---
@@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
// split each region in s splits such that:
// s = max(x) where s * x < t
//
- // The idea is to align splits with region boundaries. If rows are
not evenly
- // distributed across regions, using this scheme compensates for
regions that
- // have more rows than others, by applying tighter splits and
therefore spawning
- // off more scans over the overloaded regions.
- int splitsPerRegion = getSplitsPerRegion(regions.size());
// Create a multi-map of ServerName to List<KeyRange> which we'll
use to round robin from to ensure
// that we keep each region server busy for each query.
- ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion =
ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
+ int splitsPerRegion = getSplitsPerRegion(regions.size());
+ ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion =
ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
if (splitsPerRegion == 1) {
for (HRegionLocation region : regions) {
--- End diff --
ACtually in the current schema I have added CF in the primary key followed
by the stats cols (includes min and max key and the guide posts.) I can collect
the information per CF. When we try collecting the guideposts it would be any
way per region and inside that we could group per CF and write the same to the
stats table. The stats table wil have table name not null, region name, cf
name followed by min, max and guide posts?
By using VARBINARY ARRAY the entire set of guide posts collected for that
region can be combined as one entry.
So if we are adding guide posts per CF then the PTableStats also will
accept a map with key as CF and then the guide posts? Sorry for the asking more
questions just making myself clear.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---