[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Wang updated PHOENIX-153: ------------------------------- Description: Support the standard SQL TABLESAMPLE clause by implementing a filter that uses a skip next hint based on the region boundaries of the table to only return n rows per region. When TABLESAMPLE clause is used, Phoenix will sample (N) percent of the the hbase table with only O(M) run time complexity. (N is size of table, M is size of stats) [Update] Usage: https://phoenix.apache.org/tablesample.html Syntax of using table sampling: select * from PERSON TABLESAMPLE(45); select count( * ) from PERSON TABLESAMPLE (49) LIMIT 2 Source Code: https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=5e33dc12bc088bd0008d89f0a5cd7d5c368efa25 was: Support the standard SQL TABLESAMPLE clause by implementing a filter that uses a skip next hint based on the region boundaries of the table to only return n rows per region. When TABLESAMPLE clause is used, Phoenix will sample (N) percent of the the hbase table with only O(M) run time complexity. (N is size of table, M is size of stats) [Update] Syntax of using table sampling: select * from PERSON TABLESAMPLE(45); select count( * ) from PERSON TABLESAMPLE (49) LIMIT 2 Source Code: https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=5e33dc12bc088bd0008d89f0a5cd7d5c368efa25 > Implement TABLESAMPLE clause > ---------------------------- > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task > Reporter: James Taylor > Assignee: Ethan Wang > Labels: enhancement > Fix For: 4.12.0 > > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. > When TABLESAMPLE clause is used, Phoenix will sample (N) percent of the the > hbase table with only O(M) run time complexity. (N is size of table, M is > size of stats) > [Update] > Usage: > https://phoenix.apache.org/tablesample.html > Syntax of using table sampling: > select * from PERSON TABLESAMPLE(45); > select count( * ) from PERSON TABLESAMPLE (49) LIMIT 2 > Source Code: > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=5e33dc12bc088bd0008d89f0a5cd7d5c368efa25 -- This message was sent by Atlassian JIRA (v6.4.14#64029)