[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151751#comment-16151751 ] hanzhi commented on PHOENIX-153: Awesome!! > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Fix For: 4.12.0 > > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. > [Update] > Source Code Patch: > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=5e33dc12bc088bd0008d89f0a5cd7d5c368efa25 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141232#comment-16141232 ] Lars Hofhansl commented on PHOENIX-153: --- Nice job [~aertoria]! > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Fix For: 4.12.0 > > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. > [Update] > Patch: > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=5e33dc12bc088bd0008d89f0a5cd7d5c368efa25 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110284#comment-16110284 ] Ethan Wang commented on PHOENIX-153: Thanks [~jamestaylor]! > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Fix For: 4.12.0 > > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109722#comment-16109722 ] Hudson commented on PHOENIX-153: FAILURE: Integrated in Jenkins build Phoenix-master #1725 (See [https://builds.apache.org/job/Phoenix-master/1725/]) PHOENIX-153 Implement TABLESAMPLE clause (Ethan Wang) (jamestaylor: rev 5e33dc12bc088bd0008d89f0a5cd7d5c368efa25) * (edit) phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/parse/FilterableStatement.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java * (add) phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSamplerPredicate.java * (add) phoenix-core/src/it/java/org/apache/phoenix/end2end/QueryWithTableSampleIT.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/compile/JoinCompiler.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/parse/SelectStatement.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/parse/DeleteStatement.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/parse/ConcreteTableNode.java * (edit) phoenix-core/src/main/antlr3/PhoenixSQL.g * (edit) phoenix-core/src/main/java/org/apache/phoenix/parse/NamedTableNode.java > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Fix For: 4.12.0 > > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109519#comment-16109519 ] James Taylor commented on PHOENIX-153: -- +1. Nice work, [~aertoria]. Will let you know if patch doesn't apply cleanly to other 4.x branches. > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109342#comment-16109342 ] ASF GitHub Bot commented on PHOENIX-153: Github user aertoria commented on the issue: https://github.com/apache/phoenix/pull/262 Commit 0507d4f change list: 1, explain plan a new way (Thanks for the suggestion) 2, squash previous four commit into one 3, revise all commit message to start with PHOENIX-153+space preview on a Single select `CLIENT 3-CHUNK 30 ROWS 2370 BYTES PARALLEL 1-WAY 0.2-SAMPLED ROUND ROBIN FULL SCAN OVER PERSON` on a Join select ``` CLIENT 1-CHUNK 0 ROWS 0 BYTES PARALLEL 1-WAY 0.65-SAMPLED ROUND ROBIN FULL SCAN OVER INX_ADDRESS_PERSON SERVER FILTER BY FIRST KEY ONLY PARALLEL INNER-JOIN TABLE 0 CLIENT 1-CHUNK 1 ROWS 32 BYTES PARALLEL 1-WAY 0.15-SAMPLED ROUND ROBIN FULL SCAN OVER US_POPULATION DYNAMIC SERVER FILTER BY TO_CHAR("INX_ADDRESS_PERSON.0:ADDRESS") IN (US_POPULATION.STATE) ``` > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108408#comment-16108408 ] Ethan Wang commented on PHOENIX-153: Make sense. Thanks [~jamestaylor] > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108390#comment-16108390 ] James Taylor commented on PHOENIX-153: -- Seems like review comments aren't appearing here in JIRA (maybe because your commit message doesn't include the JIRA number in the expected format), so I'll repeat it here: Let's move the explain for the sampling into the first line, before we recurse down for the other steps. You can put it on the same line, after the "-WAY " like this: CLIENT PARALLEL 1-WAY 0.48-SAMPLED ... Otherwise, users will interpret the sampling as happening after the scan/filtering which isn't the case. > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108354#comment-16108354 ] ASF GitHub Bot commented on PHOENIX-153: Github user JamesRTaylor commented on the issue: https://github.com/apache/phoenix/pull/262 Ping @aertoria - would you have a few spare cycles to make that last change? Also, please squash all commits into one and amend your commit message to be prefixed with PHOENIX-153 (i.e. include the dash). Otherwise, we the pull request isn't tied to the JIRA. > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071041#comment-16071041 ] Lars Hofhansl commented on PHOENIX-153: --- The default guidepost width is 300MB. Maybe we could go down to 10MB, once we have guidepost combining. Less than that will be a huge management burden to the system. Still a good thing to do! On small tables you do not need to sample in the first place, and for large tables - where it matters - we'll have sufficiently many guide posts. (A 1TB table has over 3000 300MB guideposts, i.e. you'll have a resolution of 0.03%, which is plenty good!) > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062546#comment-16062546 ] Ethan Wang commented on PHOENIX-153: Valid Point. In addition, by design, this coarse problem gets magnified when three things happen (and vice versa): 1, Table is too small 2, Guidepost width set too wide, or even no stats collected at all 3, User specifies to not use stats table for parallelization. Based on the observation from the testing on a table with 400K rows and GUIDE_POSTS_WIDTH =10KB or 200KB, the sampled size was usually around +-5% of expected size. This performance gets better and better when the GuidePosts used are more granular (Detailed chart attached.) > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062143#comment-16062143 ] Lars Hofhansl commented on PHOENIX-153: --- Good idea. Skipping whole guideposts is pretty coarse, though. At the same time I cannot thing of anything else efficient. > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061761#comment-16061761 ] James Taylor commented on PHOENIX-153: -- Yes, +1 to following Calcite syntax > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061645#comment-16061645 ] Ethan Wang commented on PHOENIX-153: +1. After some study about _calcite/parse.jj_ and _calcite/SqlValidatorFeatureTest.java_, my understanding is that calcite seems to be very close to Postgres TABLESAMPLE syntax (which PHOENIX-153 is also designed to be similar with). I'd like to sum up two differences below (please correct me if I'm mistaken [~julianhyde]). 1, Calcite table sampling rate input is 0 to 100 (PHOENIX-153 currently is 0 to 1). 2, Syntax difference Calcite: select name from dept TABLESAMPLE system(58) PHOENIX-153: select name from dept TABLESAMPLE 0.58 Purposing change for PHOENIX-153: Let's change phoenix side to be select name from dept TABLESAMPLE(0.58) Thoughts? Reference: https://github.com/apache/calcite/blob/d619304070bf2874ab760c92ec2573ee6c19f536/piglet/src/main/javacc/PigletParser.jj https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/test/java/org/apache/calcite/test/SqlValidatorFeatureTest.java > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052505#comment-16052505 ] James Taylor commented on PHOENIX-153: -- +1. Do you have something you can point us to for the Calcite TABLESAMPLE syntax? > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052472#comment-16052472 ] Julian Hyde commented on PHOENIX-153: - Since Calcite already supports TABLESAMPLE let's save ourselves a headache and make sure that the 4.x syntax is compatible with Calcite's syntax. > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044823#comment-16044823 ] Ethan Wang commented on PHOENIX-153: Spec of this patch. Feedback plz. ++ ++Belows are SUPPORTED ++ ===BASE CASE select * from Person; select * from PERSON TABLESAMPLE 0.45; ===WHERE CLAUSE select * from PERSON where ADDRESS = 'CA' OR name>'tina3'; select * from PERSON TABLESAMPLE 0.49 where ADDRESS = 'CA' OR name>'tina3'; select * from PERSON TABLESAMPLE 0.49 where ADDRESS = 'CA' OR name>'tina3' LIMIT 1; ===Wired Table=== select * from LOCAL_ADDRESS TABLESAMPLE 0.79; select * from SYSTEM.STATS TABLESAMPLE 0.41; ===CORNER CASE=== select * from PERSON TABLESAMPLE 0; select * from PERSON TABLESAMPLE 1.45; select * from PERSON TABLESAMPLE kko; ++ ++belows are NOT SUPPORTED ++ ===Subquery and outter join not supporting=== select * from (select * from PERSON where ADDRESS = 'CA') TABLESAMPE 0.2 where Name > 'tina10' ===AGGREGATION=== select count(*) from PERSON TABLESAMPLE 0.5 LIMIT 2 > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033617#comment-16033617 ] Ethan Wang commented on PHOENIX-153: Implementation Proposal. (Feedback plz) Proposing table sampling on each Table basis (at the 'FROM' part of the query). Sample size decided by the input sampling rate applies on Primary Key's frequency. Syntax: `select name from person SAMPLE(0.10) where name='ethan'` Returns: `person SAMPLE(0.10)` part returns rows about 10% volume of the PERSON table. Reducing performance cost from PERSON table scan to Person-STATS table scan. Implementation detail: For table PERSON, assume STATS is populated with GuidePost inserted on every other PK (50% coverage). Step1, within the query scanning keyrange, iterate through the STATS table. Step2, for every GuidePost encountered, consult with a random number generator to decide if this guidepost will be included or excluded from the sample. This dice has 10% chance of winning. Step3, Once we decide to include this GuidePost, every PK on the original PERSON table that is between this-GuidePost and next-GuidePost will be included to the final sample. Repeat this process untill all GuidePost are visited. Example: PERSON |ID(PK)| |1 | |2 | |3 | |4 | |5 | |6 | STATS |GuidePost| |1 | |3| |5| During dice rolling process, GuidePost 3 is included. PK between [3,5) will be included. The final result will be rows with PK 3, 4. This implementation, a, similar to Microsoft SQLServer TABLESAMPLE, focus mainly on the performance benefit. It does not guarantee the even distribution of the sample on original table (representativity). b, it works well on any GUIDE_POST_SWIDTH on any input sample rate. However, if the table is too small, the sample output may include rows more or less than the expected count (sample_rate X table_size) Summary of other popular TABLESAMPLE implementations. Basically two categories: 1, Sampling on Query Basis. (Such as Blink DB. https://sameeragarwal.github.io/blinkdb_eurosys13.pdf) This implementation places sampling process based on entire query. such as: `select name from person where name='ethan' SAMPLE WITH ERROR 10% CONFIDENCE 95%' BlinkDB did so by assuming "the data used for similar grouping and filtering clause does not change over time across future queries". Based on heuristic experience, query engine pre-build certain stratify sample groups extracted from the actual table, cache them, and use them for evaluating an approximate result for some expensive queries. Therefore to avoid full table scan. This approach: a, Optimizes for the best performance-accuracy-trade-off. Once given the accuracy tolerance, it automatically decide the sampling rate for user. b, Engine takes filtering and grouping into consideration therefore it's powerful. But on the other side it may not perform at the same level for all kinds of queries. c, Based on heuristic info, there will be a machine gradually learning process. 2, Sampling on Table Basis. (Such as Postgres, MS SQLServer. https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation) This approach places Tablesample only on the "FROM" part of the query. such as: `select name from person TABLESAMPLE(10 PERCENT) where name='ethan'` This approach first sample the original table to a smaller 'view' based on the Primary Key frequency and a given sampling rate. Then that 'view' will participate into the rest part of the query in place of original table. Usually a randomly selection process is used during the view creation. In MS SQLServer, a linear one-pass pointer travel through each "page", and ask a random generator to decide if this page will be part of the sample. Once accepted, every single row on this page now become part of new sample. This MSSQL tablesample a, gives flexibility satisfying any sampling rate. b, gain performance by reducing the length of a table scan (but big O complexity still the same) c, only care about the performance gain, does't care about sample distribution. [~jamestaylor] [~gjacoby] [~samarthjain] > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.3.15#6346)