Hi,
I see that if a query has where clause, the FilterOperator is applied twice.
Can you tell me why is it done so?
It seems second operator is always filtering zero rows.
Explain on a query with where clause :
hive> explain select * from input1 where input1.key != 10;
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_TABREF input1)) (TOK_INSERT (TOK_DESTINATION
(TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (!=
(. (TOK_TABLE_OR_COL input1) key) 10))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
input1
TableScan
alias: input1
Filter Operator
predicate:
expr: (key <> 10)
type: boolean
Filter Operator
predicate:
expr: (key <> 10)
type: boolean
Select Operator
expressions:
expr: key
type: int
expr: value
type: int
outputColumnNames: _col0, _col1
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
I see the same from the Mapper logs also. The first FilterOperator does the
filtering and second operator always filters zero rows.
2010-08-12 14:33:22,149 INFO ExecMapper:
<MAP>Id =5
<Children>
<TS>Id =0
<Children>
<FIL>Id =1
<Children>
<FIL>Id =2
<Children>
<SEL>Id =3
<Children>
<FS>Id =4
<Parent>Id = 3 null<\Parent>
<\FS>
<\Children>
<Parent>Id = 2 null<\Parent>
<\SEL>
<\Children>
<Parent>Id = 1 null<\Parent>
<\FIL>
<\Children>
<Parent>Id = 0 null<\Parent>
<\FIL>
<\Children>
<Parent>Id = 5 null<\Parent>
<\TS>
<\Children>
<\MAP>
2010-08-12 14:33:22,272 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
forwarding 1 rows
2010-08-12 14:33:22,272 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 forwarding 1 rows
2010-08-12 14:33:22,450 INFO ExecMapper: ExecMapper: processing 1 rows: used
memory = 4417072
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
forwarded 1 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator:
DESERIALIZE_ERRORS:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 forwarded 1 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1
forwarded 0 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
FILTERED:1
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
PASSED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2
forwarded 0 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
FILTERED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
PASSED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3
forwarded 0 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4
forwarded 0 rows
2010-08-12 14:33:22,451 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Final Path: FS
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/000000_0
2010-08-12 14:33:22,451 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Writing to temp file: FS
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/_tmp.000000_0
2010-08-12 14:33:22,454 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
New Final Path: FS
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/000000_0
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3
Close done
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2
Close done
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1
Close done
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 Close done
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
Close done
2010-08-12 14:33:22,485 INFO ExecMapper: ExecMapper: processed 1 rows: used
memory = 5135888
Thanks
Amareshwari