[
https://issues.apache.org/jira/browse/PHOENIX-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14149618#comment-14149618
]
James Taylor commented on PHOENIX-1296:
---------------------------------------
Thanks for the patch, [~ramkrishna]. Much of the work you're doing here is more
related to PHOENIX-1263. I'd attach this patch over there with some
modifications and/or combine it with this one. Take a look there - I'll add a
comment. Good catch on the empty key value not being processed by that loop.
The easiest way to get that to happen (rather than add an extra loop) is to add
the empty key value to our TABLE_KV_COLUMNS list here:
{code}
private static final KeyValue EMPTY_KEYVALUE_KV =
KeyValue.createFirstOnRow(
ByteUtil.EMPTY_BYTE_ARRAY,
TABLE_FAMILY_BYTES,
QueryConstants.EMPTY_COLUMN_BYTES);
private static final List<KeyValue> TABLE_KV_COLUMNS =
Arrays.<KeyValue>asList(
EMPTY_KEYVALUE_KV,
{code}
As far as your question:
bq. I don't think this we have to do. Consider there are two tenants that has
the Tenant ID as AZ and SZ (for example).
Yes, we do have to do this, but there's an easy way to do it (see below).
First, why do we have to do it? Along the same lines as your example, let's say
that both tenants AY and SY both live in the same region. Let's say we have the
following guideposts for this region: AYd, AYm, SYg, SYt
Now if we allow the ANALYZE to run only between [AY - AZ), then we'll recompute
new guideposts only for this range: say AYc, AYj, AYm, AYv, and these would
replace the guideposts for this region. The SYg and SYt guideposts would be
inadvertently removed, because we didn't scan the entire region.
The fix is pretty easy, though. On the server-side, in the
UngroupedAggregateRegionObserver, in your if statement that detects an analyze
is being done, just always set the scan start/stop row to
HConstant.EMPTY_START_ROW and HConstant.EMPTY_END_ROW. This will force the scan
to go over the entire region when an analyze is done, which is exactly what we
want.
> Scan entire region when tenant-specific table is analyzed
> ---------------------------------------------------------
>
> Key: PHOENIX-1296
> URL: https://issues.apache.org/jira/browse/PHOENIX-1296
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: ramkrishna.s.vasudevan
> Attachments: Phoenix-1296_1.patch
>
>
> Based on the issue you've uncovered (that stats must be updated completely
> for a region), there's a bit of follow on work needed if an ANALYZE is done
> on a tenant-specific table. This case will be optimized to only scan and
> analyze the current tenant's data, however we have to make sure that the
> entire region(s) containing that tenant's data is scanned (or we'll end up
> replacing the stats for that region with just the one we calculated for that
> tenant).
> We should be able to do that based on ScanUtil.isAnalyzeTable(scan) being
> true in DefaultParallelIteratorRegionSplitter and/or ParallelIterators.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)