[jira] [Updated] (PHOENIX-3023) Slow performance when limit queries are executed in parallel by default

Samarth Jain (JIRA) Fri, 24 Jun 2016 16:54:32 -0700

     [ 
https://issues.apache.org/jira/browse/PHOENIX-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Samarth Jain updated PHOENIX-3023:
----------------------------------
    Attachment: PHOENIX-3023_v2.patch

Thanks for the review, [~jamestaylor]. Attached is the updated patch. 

bq. if there's a WHERE clause (or even if the stats are out of date), you might 
need more than one scan. Is that case handled?

I made the change in this patch to not do the first scan only init when there 
is a filter. 

Essentially these two checks:

{code}
private static boolean isAmountOfDataToScanWithinThreshold(StatementContext 
context, PTable table, Integer perScanLimit) throws SQLException {
        Scan scan = context.getScan();
        /*
         * If a limit is not provided or if we have a filter, then we are not 
able to decide whether
         * the amount of data we need to scan is less than the threshold.
         */
        if (perScanLimit == null || scan.getFilter() != null) {
            return false;
        }
{code}

{code}
boolean initFirstScanOnly =
                (orderBy == OrderBy.FWD_ROW_KEY_ORDER_BY || orderBy == 
OrderBy.REV_ROW_KEY_ORDER_BY)
                        && isDataWithinThreshold;
{code}

bq.
Any consideration for if we're doing round robin and we don't call peek here?
We do round robin when there is no ordering needed. So we won't be doing 
initFirstScan in that case.




> Slow performance when limit queries are executed in parallel by default
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-3023
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3023
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.8.0
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>         Attachments: PHOENIX-3023.patch, PHOENIX-3023_v2.patch
>
>
> After 
> [this|https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h=54362430d71be788d515944573572624628a09b6]
>  commit, limit queries are executed in parallel which causes performance to 
> be ~5-10x slower. Providing a serial hint fixes it though.
> After commit:
> {code}
> select * from WIDE_PK order by mypk DESC limit 1; // this takes ~400ms
> CLIENT 1280-CHUNK 1996304 ROWS 6380181208 BYTES PARALLEL 4-WAY REVERSE FULL 
> SCAN OVER WIDE_PK SERVER 1 ROW LIMIT CLIENT MERGE SORT CLIENT 1 ROW LIMIT
> {code}
> Before commit:
> {code}
> select * from WIDE_PK order by mypk DESC limit 1; // this takes ~40ms
> CLIENT 1280-CHUNK 1996304 ROWS 6380181208 BYTES SERIAL 4-WAY REVERSE FULL 
> SCAN OVER WIDE_PK SERVER 1 ROW LIMIT CLIENT MERGE SORT CLIENT 1 ROW LIMIT
> {code}
> Test was done on a single node machine running HBase 0.98.17.  DDL used was 
> {code}CREATE TABLE WIDE_PK (MYPK CHAR(500) NOT NULL PRIMARY KEY,CF.column1 
> INTEGER,CF.column2 INTEGER,CF.column3 INTEGER,CF.column4 INTEGER,CF.column5 
> INTEGER) SALT_BUCKETS=4 with phoenix.stats.guidepost.width of 5000000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PHOENIX-3023) Slow performance when limit queries are executed in parallel by default

Reply via email to