kasakrisz commented on pull request #1324:
URL: https://github.com/apache/hive/pull/1324#issuecomment-666155487
@HunterL
Thanks for reviewing this patch.
The expression
```
filterExpr: (((s_floor_space > 1000) and s_store_sk is not null) or
s_store_sk is not null)
```
is a result of merging two `TableScanOperator`s. Both of them are scanning
the same table: `alias: s` but they had different `filterExpr`.
TS1: ((s_floor_space > 1000) and s_store_sk is not null)
TS2: s_store_sk is not null
SharedWorkOptimizer naively combined the filter expressions using `or`
because we need the union of the records produced by both TS. You are right in
this particular case the filter expression could be reduced to `s_store_sk is
not null`
The new TS has two children
```
TableScan
alias: s
filterExpr: (((s_floor_space > 1000) and s_store_sk is not null) or
s_store_sk is not null) (type: boolean)
Filter Operator
predicate: ((s_floor_space > 1000) and s_store_sk is not null) (type:
boolean)
...
Filter Operator
predicate: s_store_sk is not null (type: boolean)
...
```
both of them are `Filter operators` which are the root of subtrees to
broadcast the proper subset of records to each reducer edge (Reducer 2 and
Reducer 3)
If `and` were used for combining the filter expressions of TS operators the
branch which does not have the filter `s_floor_space > 1000` would loose a
subset of records.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]