[
https://issues.apache.org/jira/browse/PHOENIX-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093252#comment-15093252
]
Samarth Jain commented on PHOENIX-2143:
---------------------------------------
[~ankit.singhal] - I see that a lot of changes in your patch are white-space
related. Can you make sure you don't format the code that you are not changing.
Also, why is this deleted in StatisticsScannerCallable?
{code}
- if (mergeRegions != null) {
- if (mergeRegions.getFirst() != null) {
- if (LOG.isDebugEnabled()) {
- LOG.debug("Deleting stale stats for the region "
- +
mergeRegions.getFirst().getRegionNameAsString() + " as part of major
compaction");
- }
-
stats.deleteStats(mergeRegions.getFirst().getRegionName(), tracker, family,
mutations);
- }
- if (mergeRegions.getSecond() != null) {
- if (LOG.isDebugEnabled()) {
- LOG.debug("Deleting stale stats for the region "
- +
mergeRegions.getSecond().getRegionNameAsString() + " as part of major
compaction");
- }
-
stats.deleteStats(mergeRegions.getSecond().getRegionName(), tracker, family,
mutations);
- }
- }
{code}
In StatsCollectorIT we already have test that run UPDATE STATISTICS. Not sure
what the test that you added is testing that the existing tests are not.
I notice there is more deleted code in UngroupedAggregateRegionObserver:
{code}
- @Override
- public void postSplit(final ObserverContext<RegionCoprocessorEnvironment>
e, final Region l,
- final Region r) throws IOException {
- final Configuration config = e.getEnvironment().getConfiguration();
- boolean async = config.getBoolean(COMMIT_STATS_ASYNC,
DEFAULT_COMMIT_STATS_ASYNC);
- if (!async) {
- splitStatsInternal(e, l, r);
- } else {
- StatisticsCollectionRunTracker.getInstance(config).runTask(new
Callable<Void>() {
- @Override
- public Void call() throws Exception {
- splitStatsInternal(e, l, r);
- return null;
- }
- });
- }
- }
-
- private void splitStatsInternal(final
ObserverContext<RegionCoprocessorEnvironment> e,
- final Region l, final Region r) {
- final RegionCoprocessorEnvironment env = e.getEnvironment();
- final Region region = env.getRegion();
- final TableName table = region.getRegionInfo().getTable();
- try {
- boolean useCurrentTime =
-
env.getConfiguration().getBoolean(QueryServices.STATS_USE_CURRENT_TIME_ATTRIB,
-
QueryServicesOptions.DEFAULT_STATS_USE_CURRENT_TIME);
- // Provides a means of clients controlling their timestamps to not
use current time
- // when background tasks are updating stats. Instead we track the
max timestamp of
- // the cells and use that.
- final long clientTimeStamp = useCurrentTime ?
TimeKeeper.SYSTEM.getCurrentTime() :
- StatisticsCollector.NO_TIMESTAMP;
- User.runAsLoginUser(new PrivilegedExceptionAction<Void>() {
- @Override
- public Void run() throws Exception {
- StatisticsCollector stats = new StatisticsCollector(env,
- table.getNameAsString(), clientTimeStamp);
- try {
- stats.splitStats(region, l, r);
- return null;
- } finally {
- if (stats != null) stats.close();
- }
- }
- });
- } catch (IOException ioe) {
- if(logger.isWarnEnabled()) {
- logger.warn("Error while collecting stats during split for " +
table,ioe);
- }
- }
- }
{code}
Can you tell me more why is this not needed anymore? Are existing tests passing
for you?
> Use guidepost bytes instead of region name in stats primary key
> ---------------------------------------------------------------
>
> Key: PHOENIX-2143
> URL: https://issues.apache.org/jira/browse/PHOENIX-2143
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: Ankit Singhal
> Attachments: PHOENIX-2143.patch, PHOENIX-2143_wip.patch,
> PHOENIX-2143_wip_2.patch
>
>
> Our current SYSTEM.STATS table uses the region name as the last column in the
> primary key constraint. Instead, we should use the MIN_KEY column (which
> corresponds to the region start key). The advantage would be that the stats
> would then be ordered by region start key allowing us to approximate the
> number of guideposts which would be traversed given the start/stop row of a
> scan:
> {code}
> SELECT SUM(guide_posts_count) FROM SYSTEM.STATS WHERE min_key > :1 AND
> min_key < :2
> {code}
> where :1 is the start row and :2 is the stop row of the scan. With an UNNEST
> operator for ARRAYs, we could get a better approximation.
> As part of the upgrade to the new Phoenix version containing this fix, stats
> could simply be dropped and they'd be recalculated with the new schema.
> An alternative, even more granular approach would be to *not* use arrays to
> store the guide posts, but instead store them as individual rows with a
> schema like this.
> |PHYSICAL_NAME|VARCHAR|
> |COLUMN_FAMILY|VARCHAR|
> |GUIDE_POST_KEY|VARBINARY|
> In this alternative, the maintenance during compaction is higher, though, as
> you'd need to run a separate query to do the deletion of the old guideposts,
> followed by a commit of the new guideposts. The other disadvantage (besides
> requiring multiple queries) is that this couldn't be done transactionally.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)