[
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898120#action_12898120
]
Amareshwari Sriramadasu commented on HIVE-1538:
-----------------------------------------------
I see that if a query has where clause, the FilterOperator is applied twice.
Explain on a query with where clause :
hive> explain select * from input1 where input1.key != 10;
{noformat}
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_TABREF input1)) (TOK_INSERT (TOK_DESTINATION
(TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (!=
(. (TOK_TABLE_OR_COL input1) key) 10))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
input1
TableScan
alias: input1
Filter Operator
predicate:
expr: (key <> 10)
type: boolean
Filter Operator
predicate:
expr: (key <> 10)
type: boolean
Select Operator
expressions:
expr: key
type: int
expr: value
type: int
outputColumnNames: _col0, _col1
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
Time taken: 0.099 seconds
{noformat}
I see the same from the Mapper logs also. The first FilterOperator does the
filtering and second operator always filters zero rows.
{noformat}
....
2010-08-13 13:20:21,451 INFO ExecMapper:
<MAP>Id =5
<Children>
<TS>Id =0
<Children>
<FIL>Id =1
<Children>
<FIL>Id =2
<Children>
<SEL>Id =3
<Children>
<FS>Id =4
<Parent>Id = 3 null<\Parent>
<\FS>
<\Children>
<Parent>Id = 2 null<\Parent>
<\SEL>
<\Children>
<Parent>Id = 1 null<\Parent>
<\FIL>
<\Children>
<Parent>Id = 0 null<\Parent>
<\FIL>
<\Children>
<Parent>Id = 5 null<\Parent>
<\TS>
<\Children>
<\MAP>
...
2010-08-13 13:20:21,489 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
forwarding 1 rows
2010-08-13 13:20:21,489 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 forwarding 1 rows
2010-08-13 13:20:21,600 INFO ExecMapper: ExecMapper: processing 1 rows: used
memory = 10765360
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
forwarded 1 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator:
DESERIALIZE_ERRORS:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 forwarded 1 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1
finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1
forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
PASSED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
FILTERED:1
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2
finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2
forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
PASSED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
FILTERED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3
finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3
forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4
finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4
forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Final Path: FS
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-13_13-20-11_483_2065579562420016208/_tmp.-ext-10001/000000_0
2010-08-13 13:20:21,601 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Writing to temp file: FS
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-13_13-20-11_483_2065579562420016208/_tmp.-ext-10001/_tmp.000000_0
2010-08-13 13:20:21,604 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
New Final Path: FS
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-13_13-20-11_483_2065579562420016208/_tmp.-ext-10001/000000_0
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3
Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2
Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1
Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
Close done
2010-08-13 13:20:21,629 INFO ExecMapper: ExecMapper: processed 1 rows: used
memory = 11454224
...
{noformat}
> FilterOperator is applied twice with ppd on.
> --------------------------------------------
>
> Key: HIVE-1538
> URL: https://issues.apache.org/jira/browse/HIVE-1538
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Amareshwari Sriramadasu
>
> With hive.optimize.ppd set to true, FilterOperator is applied twice. And it
> seems second operator is always filtering zero rows.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.