[ 
https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062546#comment-16062546
 ] 

Ethan Wang commented on PHOENIX-153:
------------------------------------

Valid Point. 

In addition, by design, this coarse problem gets magnified when three things 
happen (and vice versa):
1, Table is too small
2, Guidepost width set too wide, or even no stats collected at all
3, User specifies to not use stats table for parallelization. 

Based on the observation from the testing on a table with 400K rows and 
GUIDE_POSTS_WIDTH =10KB or 200KB, the sampled size was usually around +-5% of 
expected size. This performance gets better and better when the GuidePosts used 
are more granular (Detailed chart attached.)

> Implement TABLESAMPLE clause
> ----------------------------
>
>                 Key: PHOENIX-153
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-153
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: James Taylor
>            Assignee: Ethan Wang
>              Labels: enhancement
>         Attachments: Sampling_Accuracy_Performance.jpg
>
>
> Support the standard SQL TABLESAMPLE clause by implementing a filter that 
> uses a skip next hint based on the region boundaries of the table to only 
> return n rows per region.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to