[jira] [Comment Edited] (HIVE-19985) ACID: Skip decoding the ROW__ID sections for read-only queries

Gopal V (JIRA) Fri, 03 Aug 2018 13:15:25 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-19985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568707#comment-16568707
 ]


Gopal V edited comment on HIVE-19985 at 8/3/18 8:14 PM:
--------------------------------------------------------

+1 (orc pending release)

Tested with/without flag=true, with {{select count(1), sum(ss_net_profit) from 
store_sales;}}.

with:  31.383 seconds (cold run), 15.278 seconds (hot run)
without: 35.525 seconds (cold run),  22.934 seconds (hot run)

Latest patch has a big impact on the cached runs.

I can see there's L1 cache miss hotspot in the double System.arrayCopy to make 
AcidWrapper and then to copy it back in copyBase().

{code}
if (isAcidScan) {
+        int acidColCount = acidReader.includeAcidColumns() ? 
OrcInputFormat.getRootColumn(false) - 1 : 0;
...
+          int ixInVrb = includes.getPhysicalColumnIds().get(ixInReadSet) -
+              (acidReader.includeAcidColumns() ? 0 : OrcRecordUpdater.ROW);
{code}

Can that be changed to if(isAcidScan && innerReader.includeAcidColumns()) to 
skip that entirely, because the offsets fall back the same way to the non-acid 
impl?


was (Author: gopalv):
Tested with/without flag=true, with {{select count(1), sum(ss_net_profit) from 
store_sales;}}.

with:  31.383 seconds (cold run), 15.278 seconds (hot run)
without: 35.525 seconds (cold run),  22.934 seconds (hot run)

Latest patch has a big impact on the cached runs.

I can see there's L1 cache miss hotspot in the double System.arrayCopy to make 
AcidWrapper and then to copy it back in copyBase().

{code}
if (isAcidScan) {
+        int acidColCount = acidReader.includeAcidColumns() ? 
OrcInputFormat.getRootColumn(false) - 1 : 0;
...
+          int ixInVrb = includes.getPhysicalColumnIds().get(ixInReadSet) -
+              (acidReader.includeAcidColumns() ? 0 : OrcRecordUpdater.ROW);
{code}

Can that be changed to if(isAcidScan && innerReader.includeAcidColumns()) to 
skip that entirely, because the offsets fall back the same way to the non-acid 
impl?

> ACID: Skip decoding the ROW__ID sections for read-only queries 
> ---------------------------------------------------------------
>
>                 Key: HIVE-19985
>                 URL: https://issues.apache.org/jira/browse/HIVE-19985
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions
>            Reporter: Gopal V
>            Assignee: Eugene Koifman
>            Priority: Major
>              Labels: Branch3Candidate
>         Attachments: HIVE-19985.01.patch, HIVE-19985.04.patch
>
>
> For a base_n file there are no aborted transactions within the file and if 
> there are no pending delete deltas, the entire ACID ROW__ID can be skipped 
> for all read-only queries (i.e SELECT), though it still needs to be projected 
> out for MERGE, UPDATE and DELETE queries.
> This patch tries to entirely ignore the ACID ROW__ID fields for all tables 
> where there are no possible deletes or aborted transactions for an ACID split.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-19985) ACID: Skip decoding the ROW__ID sections for read-only queries

Reply via email to