[ https://issues.apache.org/jira/browse/HIVE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-626: ---------------------------- Attachment: HIVE-626.1.showinfo.patch I added some instrumentation to the code (see HIVE-626.1.showinfo.patch) The result of "explain extended" (below) shows that the order of the output column of the JoinOperator does not match that of the FileSinkOperator: {code} hive> explain extended > select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join zshao_bar on zshao_foo.foo_id = > zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF zshao_foo) (TOK_TABREF zshao_bar) (= (. (TOK_TABLE_OR_COL zshao_foo) foo_id) (. (TOK_TABLE_OR_COL zshao_bar) foo_id))) (TOK_TABREF zshao_count) (= (. (TOK_TABLE_OR_COL zshao_count) bar_id) (. (TOK_TABLE_OR_COL zshao_bar) bar_id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL zshao_foo) foo_name)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL zshao_bar) bar_name)) (TOK_SELEXPR (TOK_TABLE_OR_COL n))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: ... Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col1} 1 {VALUE._col0} {VALUE._col4} output names: _col1, _col6, _col10 File Output Operator compressed: true GlobalTableId: 0 directory: hdfs://xxx:9000/tmp/hive-zshao/1413634235/10002 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat properties: name binary_table serialization.ddl struct binary_table { string _col1, string _col10, i32 _col6} serialization.format com.facebook.thrift.protocol.TBinaryProtocol name: binary_table Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: $INTNAME ... {code} The output of the join has the order: output names: _col1, _col6, _col10 The FileSinkOperator expects: struct binary_table { string _col1, string _col10, i32 _col6} > Typecast bug in Join operator > ----------------------------- > > Key: HIVE-626 > URL: https://issues.apache.org/jira/browse/HIVE-626 > Project: Hadoop Hive > Issue Type: Bug > Reporter: Zheng Shao > Attachments: HIVE-626.1.showinfo.patch > > > There is a type cast error in Join operator. Produced by the following steps: > {code} > create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b > string, > foo_c string, foo_d string) row format delimited fields terminated by ',' > stored as textfile; > create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name > string, bar_a string, bar_b string, bar_c string, bar_d string) row format > delimited fields terminated by ',' stored as textfile; > create table zshao_count (bar_id int, n int) row format delimited fields > terminated by ',' stored as textfile; > Each table has a single row as follows: > zshao_foo: > 1,foo1,a,b,c,d > zshao_bar: > 10,0,1,1,bar10,a,b,c,d > zshao_count: > 10,2 > load data local inpath 'zshao_foo' overwrite into table zshao_foo; > load data local inpath 'zshao_bar' overwrite into table zshao_bar; > load data local inpath 'zshao_count' overwrite into table zshao_count; > explain extended > select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join > zshao_bar on zshao_foo.foo_id = > zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id; > {code} > The case is from David Lerman. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.