[ 
https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Atanu Mishra reopened TRAFODION-1662:
-------------------------------------
      Assignee: Atanu Mishra  (was: Eric Owhadi)

Reopen to correct the Fix version field, which was left blank

> Predicate push down revisited (V2)
> ----------------------------------
>
>                 Key: TRAFODION-1662
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-1662
>             Project: Apache Trafodion
>          Issue Type: Improvement
>          Components: sql-exe
>    Affects Versions: 2.0-incubating
>            Reporter: Eric Owhadi
>            Assignee: Atanu Mishra
>              Labels: predicate, pushdown
>         Attachments: Advanced predicate push down feature.docx, Advanced 
> predicate push down feature.docx, Performance results analyzing effects of 
> optimizations introduced in pushdown V2.docx
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Currently Trafodion predicate push down to hbase is supporting only the 
> following cases:
> <Column><op><Value> AND <Column> <op><value> AND…
> And require columns to be “SERIALIZED” (can be compared using binary 
> comparator), 
> and value data type is not a superset of column data type.
> and char type is not case insensitive or upshifted
> and no support for Big Numbers
> It suffer from several issues:
> -     Handling of nullable column:
> When a nullable column is involved in the predicate, because of the way nulls 
> are handled in trafodion (can ether be missing cell, or cell with first byte 
> set to xFF), binary compare cannot do a good job at semantically treating 
> NULL the way a SQL expression would require. So the current behavior is that 
> all null column values as never filtered out and always returned, letting 
> trafodion perform a second pass predicate evaluation to deal with nulls. This 
> can quickly turn counterproductive for very sparse columns, as we would 
> perform useless filtering at region server side (since all nulls are pass), 
> and optimizer has not been coded to turn off the feature on sparse columns.
> In addition, since null handling is done on trafodion side, the current code 
> artificially pull all key columns to make sure that a null coded as absent 
> cell is correctly pushed up for evaluation at trafodion layer. This can be 
> optimized by only requiring a single non-nullable column on current code, but 
> this is another story… as you will see bellow, the proposed new way of doing 
> pushdown will handle 100% nulls at hbase layer, therefore requiring adding 
> non nullable column only when a nullable column is needed in the select 
> statement (not in the predicate).
> -     Always returning predicate columns
> Select a from t where b>10 would always return the b column to trafodion, 
> even if b is non nullable. This is not necessary and will result in useless 
> network and cpu consumption, even if the predicate is not re-evaluated.
> The new advanced predicate push down feature will do the following:
> Support any of these primitives:
> <col><op><value>
> <col><op><col>                (nice to have, high cost of custom filter low 
> value after TPC-DS query survey) 
> Is null
> Is not null
> Like                  -> to be investigated, not yet covered in this document
> And combination of these primitive with arbitrary number of OR and AND with ( 
> ) associations, given that within () there is only ether any number of OR or 
> any number of AND, no mixing OR and AND inside (). I suspect that normalizer 
> will always convert expression so that this mixing never happen…
> And will remove the 2 shortcoming of previous implementation: all null cases 
> will be handled at hbase layer, never requiring re-doing evaluation and the 
> associated pushing up of null columns, and predicate columns will not be 
> pushed up if not needed by the node for other task than the predicate 
> evaluation.
> Note that BETWEEN and IN predicate, when normalized as one of the form 
> supported above, will be pushed down too. Nothing in the code will need to be 
> done to support this. 
> Improvement of explain:
> We currently do not show predicate push down information in the scan node. 2 
> key information is needed:
> -     Is predicate push down used
> -     What columns are retrieved by the scan node (investigate why we get 
> column all instead of accurate information)
> The first one is obviously used to determine if all the conditions are met to 
> have push down available, and the second is used to make sure we are not 
> pushing up data from columns we don’t need.
> Note that columns info is inconsistently shown today. Need to fix this.
> Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be 
> replaced with a multi value CQD that will enable various level of push down 
> optimization, like we have on PCODE optimization level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to