[ https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Atanu Mishra reopened TRAFODION-1662: ------------------------------------- Assignee: Atanu Mishra (was: Eric Owhadi) Reopen to correct the Fix version field, which was left blank > Predicate push down revisited (V2) > ---------------------------------- > > Key: TRAFODION-1662 > URL: https://issues.apache.org/jira/browse/TRAFODION-1662 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-exe > Affects Versions: 2.0-incubating > Reporter: Eric Owhadi > Assignee: Atanu Mishra > Labels: predicate, pushdown > Attachments: Advanced predicate push down feature.docx, Advanced > predicate push down feature.docx, Performance results analyzing effects of > optimizations introduced in pushdown V2.docx > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Currently Trafodion predicate push down to hbase is supporting only the > following cases: > <Column><op><Value> AND <Column> <op><value> AND… > And require columns to be “SERIALIZED” (can be compared using binary > comparator), > and value data type is not a superset of column data type. > and char type is not case insensitive or upshifted > and no support for Big Numbers > It suffer from several issues: > - Handling of nullable column: > When a nullable column is involved in the predicate, because of the way nulls > are handled in trafodion (can ether be missing cell, or cell with first byte > set to xFF), binary compare cannot do a good job at semantically treating > NULL the way a SQL expression would require. So the current behavior is that > all null column values as never filtered out and always returned, letting > trafodion perform a second pass predicate evaluation to deal with nulls. This > can quickly turn counterproductive for very sparse columns, as we would > perform useless filtering at region server side (since all nulls are pass), > and optimizer has not been coded to turn off the feature on sparse columns. > In addition, since null handling is done on trafodion side, the current code > artificially pull all key columns to make sure that a null coded as absent > cell is correctly pushed up for evaluation at trafodion layer. This can be > optimized by only requiring a single non-nullable column on current code, but > this is another story… as you will see bellow, the proposed new way of doing > pushdown will handle 100% nulls at hbase layer, therefore requiring adding > non nullable column only when a nullable column is needed in the select > statement (not in the predicate). > - Always returning predicate columns > Select a from t where b>10 would always return the b column to trafodion, > even if b is non nullable. This is not necessary and will result in useless > network and cpu consumption, even if the predicate is not re-evaluated. > The new advanced predicate push down feature will do the following: > Support any of these primitives: > <col><op><value> > <col><op><col> (nice to have, high cost of custom filter low > value after TPC-DS query survey) > Is null > Is not null > Like -> to be investigated, not yet covered in this document > And combination of these primitive with arbitrary number of OR and AND with ( > ) associations, given that within () there is only ether any number of OR or > any number of AND, no mixing OR and AND inside (). I suspect that normalizer > will always convert expression so that this mixing never happen… > And will remove the 2 shortcoming of previous implementation: all null cases > will be handled at hbase layer, never requiring re-doing evaluation and the > associated pushing up of null columns, and predicate columns will not be > pushed up if not needed by the node for other task than the predicate > evaluation. > Note that BETWEEN and IN predicate, when normalized as one of the form > supported above, will be pushed down too. Nothing in the code will need to be > done to support this. > Improvement of explain: > We currently do not show predicate push down information in the scan node. 2 > key information is needed: > - Is predicate push down used > - What columns are retrieved by the scan node (investigate why we get > column all instead of accurate information) > The first one is obviously used to determine if all the conditions are met to > have push down available, and the second is used to make sure we are not > pushing up data from columns we don’t need. > Note that columns info is inconsistently shown today. Need to fix this. > Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be > replaced with a multi value CQD that will enable various level of push down > optimization, like we have on PCODE optimization level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)