[jira] [Commented] (PHOENIX-1296) Scan entire region when tenant-specific table is analyzed

James Taylor (JIRA) Fri, 26 Sep 2014 09:43:15 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14149618#comment-14149618
 ]


James Taylor commented on PHOENIX-1296:
---------------------------------------

Thanks for the patch, [~ramkrishna]. Much of the work you're doing here is more 
related to PHOENIX-1263. I'd attach this patch over there with some 
modifications and/or combine it with this one. Take a look there - I'll add a 
comment. Good catch on the empty key value not being processed by that loop. 
The easiest way to get that to happen (rather than add an extra loop) is to add 
the empty key value to our TABLE_KV_COLUMNS list here:
{code}
    private static final KeyValue EMPTY_KEYVALUE_KV =
    KeyValue.createFirstOnRow(
        ByteUtil.EMPTY_BYTE_ARRAY, 
        TABLE_FAMILY_BYTES,
        QueryConstants.EMPTY_COLUMN_BYTES);
    private static final List<KeyValue> TABLE_KV_COLUMNS = 
Arrays.<KeyValue>asList(
          EMPTY_KEYVALUE_KV,
{code}

As far as your question:
bq. I don't think this we have to do. Consider there are two tenants that has 
the Tenant ID as AZ and SZ (for example).

Yes, we do have to do this, but there's an easy way to do it (see below). 
First, why do we have to do it? Along the same lines as your example, let's say 
that both tenants AY and SY both live in the same region. Let's say we have the 
following guideposts for this region: AYd, AYm, SYg, SYt

Now if we allow the ANALYZE to run only between [AY - AZ), then we'll recompute 
new guideposts only for this range: say AYc, AYj, AYm, AYv, and these would 
replace the guideposts for this region. The SYg and SYt guideposts would be 
inadvertently removed, because we didn't scan the entire region.

The fix is pretty easy, though. On the server-side, in the 
UngroupedAggregateRegionObserver, in your if statement that detects an analyze 
is being done, just always set the scan start/stop row to 
HConstant.EMPTY_START_ROW and HConstant.EMPTY_END_ROW. This will force the scan 
to go over the entire region when an analyze is done, which is exactly what we 
want.


> Scan entire region when tenant-specific table is analyzed
> ---------------------------------------------------------
>
>                 Key: PHOENIX-1296
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1296
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: Phoenix-1296_1.patch
>
>
> Based on the issue you've uncovered (that stats must be updated completely 
> for a region), there's a bit of follow on work needed if an ANALYZE is done 
> on a tenant-specific table. This case will be optimized to only scan and 
> analyze the current tenant's data, however we have to make sure that the 
> entire region(s) containing that tenant's data is scanned (or we'll end up 
> replacing the stats for that region with just the one we calculated for that 
> tenant).
> We should be able to do that based on ScanUtil.isAnalyzeTable(scan) being 
> true in DefaultParallelIteratorRegionSplitter and/or ParallelIterators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1296) Scan entire region when tenant-specific table is analyzed

Reply via email to