[ https://issues.apache.org/jira/browse/PHOENIX-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093252#comment-15093252 ]
Samarth Jain commented on PHOENIX-2143: --------------------------------------- [~ankit.singhal] - I see that a lot of changes in your patch are white-space related. Can you make sure you don't format the code that you are not changing. Also, why is this deleted in StatisticsScannerCallable? {code} - if (mergeRegions != null) { - if (mergeRegions.getFirst() != null) { - if (LOG.isDebugEnabled()) { - LOG.debug("Deleting stale stats for the region " - + mergeRegions.getFirst().getRegionNameAsString() + " as part of major compaction"); - } - stats.deleteStats(mergeRegions.getFirst().getRegionName(), tracker, family, mutations); - } - if (mergeRegions.getSecond() != null) { - if (LOG.isDebugEnabled()) { - LOG.debug("Deleting stale stats for the region " - + mergeRegions.getSecond().getRegionNameAsString() + " as part of major compaction"); - } - stats.deleteStats(mergeRegions.getSecond().getRegionName(), tracker, family, mutations); - } - } {code} In StatsCollectorIT we already have test that run UPDATE STATISTICS. Not sure what the test that you added is testing that the existing tests are not. I notice there is more deleted code in UngroupedAggregateRegionObserver: {code} - @Override - public void postSplit(final ObserverContext<RegionCoprocessorEnvironment> e, final Region l, - final Region r) throws IOException { - final Configuration config = e.getEnvironment().getConfiguration(); - boolean async = config.getBoolean(COMMIT_STATS_ASYNC, DEFAULT_COMMIT_STATS_ASYNC); - if (!async) { - splitStatsInternal(e, l, r); - } else { - StatisticsCollectionRunTracker.getInstance(config).runTask(new Callable<Void>() { - @Override - public Void call() throws Exception { - splitStatsInternal(e, l, r); - return null; - } - }); - } - } - - private void splitStatsInternal(final ObserverContext<RegionCoprocessorEnvironment> e, - final Region l, final Region r) { - final RegionCoprocessorEnvironment env = e.getEnvironment(); - final Region region = env.getRegion(); - final TableName table = region.getRegionInfo().getTable(); - try { - boolean useCurrentTime = - env.getConfiguration().getBoolean(QueryServices.STATS_USE_CURRENT_TIME_ATTRIB, - QueryServicesOptions.DEFAULT_STATS_USE_CURRENT_TIME); - // Provides a means of clients controlling their timestamps to not use current time - // when background tasks are updating stats. Instead we track the max timestamp of - // the cells and use that. - final long clientTimeStamp = useCurrentTime ? TimeKeeper.SYSTEM.getCurrentTime() : - StatisticsCollector.NO_TIMESTAMP; - User.runAsLoginUser(new PrivilegedExceptionAction<Void>() { - @Override - public Void run() throws Exception { - StatisticsCollector stats = new StatisticsCollector(env, - table.getNameAsString(), clientTimeStamp); - try { - stats.splitStats(region, l, r); - return null; - } finally { - if (stats != null) stats.close(); - } - } - }); - } catch (IOException ioe) { - if(logger.isWarnEnabled()) { - logger.warn("Error while collecting stats during split for " + table,ioe); - } - } - } {code} Can you tell me more why is this not needed anymore? Are existing tests passing for you? > Use guidepost bytes instead of region name in stats primary key > --------------------------------------------------------------- > > Key: PHOENIX-2143 > URL: https://issues.apache.org/jira/browse/PHOENIX-2143 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: Ankit Singhal > Attachments: PHOENIX-2143.patch, PHOENIX-2143_wip.patch, > PHOENIX-2143_wip_2.patch > > > Our current SYSTEM.STATS table uses the region name as the last column in the > primary key constraint. Instead, we should use the MIN_KEY column (which > corresponds to the region start key). The advantage would be that the stats > would then be ordered by region start key allowing us to approximate the > number of guideposts which would be traversed given the start/stop row of a > scan: > {code} > SELECT SUM(guide_posts_count) FROM SYSTEM.STATS WHERE min_key > :1 AND > min_key < :2 > {code} > where :1 is the start row and :2 is the stop row of the scan. With an UNNEST > operator for ARRAYs, we could get a better approximation. > As part of the upgrade to the new Phoenix version containing this fix, stats > could simply be dropped and they'd be recalculated with the new schema. > An alternative, even more granular approach would be to *not* use arrays to > store the guide posts, but instead store them as individual rows with a > schema like this. > |PHYSICAL_NAME|VARCHAR| > |COLUMN_FAMILY|VARCHAR| > |GUIDE_POST_KEY|VARBINARY| > In this alternative, the maintenance during compaction is higher, though, as > you'd need to run a separate query to do the deletion of the old guideposts, > followed by a commit of the new guideposts. The other disadvantage (besides > requiring multiple queries) is that this couldn't be done transactionally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)