[
https://issues.apache.org/jira/browse/PHOENIX-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
alex updated PHOENIX-846:
-------------------------
Description:
When running SELECT DISTINCT with LIMIT it does full scan and aggregation (no
pageFilter/limit used on server side),
this severely affects performance (query returns in 20sec vs 300ms without
DISTINCT)
: jdbc:phoenix:localhost> explain select DISTINCT ROWKEY from TEST_1M LIMIT 100;
+------------+
| PLAN |
+------------+
| CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
| SERVER FILTER BY FIRST KEY ONLY |
| SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [ROWKEY] |
| CLIENT MERGE SORT |
| CLIENT 100 ROW LIMIT |
+------------+
-------------------------------------------------
for comparison SELECT without DISTINCT uses a limit PageFilter=100 on server
side and doesn't do full scan (query returns in 300ms)
explain select ROWKEY from TEST_1M LIMIT 100;
+------------+
| PLAN |
+------------+
| CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
| SERVER FILTER BY FIRST KEY ONLY AND PageFilter 100 |
| CLIENT MERGE SORT |
| CLIENT 100 ROW LIMIT |
+------------+
was:
When running SELECT DISTINCT with LIMIT it does full scan on the server side
(no pageFilter/limit used on server side),
this severely affects performance (query returns in 20sec vs 300ms without
DISTINCT)
: jdbc:phoenix:localhost> explain select DISTINCT ROWKEY from TEST_1M LIMIT 100;
+------------+
| PLAN |
+------------+
| CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
| SERVER FILTER BY FIRST KEY ONLY |
| SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [ROWKEY] |
| CLIENT MERGE SORT |
| CLIENT 100 ROW LIMIT |
+------------+
-------------------------------------------------
for comparison SELECT without DISTINCT uses a limit PageFilter=100 on server
side and doesn't do full scan (query returns in 300ms)
explain select ROWKEY from TEST_1M LIMIT 100;
+------------+
| PLAN |
+------------+
| CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
| SERVER FILTER BY FIRST KEY ONLY AND PageFilter 100 |
| CLIENT MERGE SORT |
| CLIENT 100 ROW LIMIT |
+------------+
> Select DISTINCT with LIMIT does full scans
> ------------------------------------------
>
> Key: PHOENIX-846
> URL: https://issues.apache.org/jira/browse/PHOENIX-846
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 3.0.0, 4.0.0
> Reporter: alex
> Priority: Critical
>
> When running SELECT DISTINCT with LIMIT it does full scan and aggregation (no
> pageFilter/limit used on server side),
> this severely affects performance (query returns in 20sec vs 300ms without
> DISTINCT)
> : jdbc:phoenix:localhost> explain select DISTINCT ROWKEY from TEST_1M LIMIT
> 100;
> +------------+
> | PLAN |
> +------------+
> | CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
> | SERVER FILTER BY FIRST KEY ONLY |
> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [ROWKEY] |
> | CLIENT MERGE SORT |
> | CLIENT 100 ROW LIMIT |
> +------------+
> -------------------------------------------------
> for comparison SELECT without DISTINCT uses a limit PageFilter=100 on server
> side and doesn't do full scan (query returns in 300ms)
> explain select ROWKEY from TEST_1M LIMIT 100;
> +------------+
> | PLAN |
> +------------+
> | CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
> | SERVER FILTER BY FIRST KEY ONLY AND PageFilter 100 |
> | CLIENT MERGE SORT |
> | CLIENT 100 ROW LIMIT |
> +------------+
--
This message was sent by Atlassian JIRA
(v6.2#6252)