I want to follow up on this a little, here are the schemas for the source
and destination tables and the query I am trying to run.
Source table:
hive> DESCRIBE EXTENDED users;
OK
occurred_at int
id string
properties map<string,string>
account string
application string
dataset string
hour int
Detailed Table Information:
Table(tableName:users,dbName:default,owner:Josh,createTime:1231485489,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:occurred_at,type:int,comment:null),
FieldSchema(name:id,type:string,comment:null),
FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
FieldSchema(name:application,type:string,comment:null),
FieldSchema(name:dataset,type:string,comment:null),
FieldSchema(name:hour,type:int,comment:null)],parameters:{})
Destination table:
hive> DESCRIBE EXTENDED distinct_users;
OK
occurred_at int
id string
properties map<string,string>
account string
application string
dataset string
hour int
Detailed Table Information:
Table(tableName:distinct_users,dbName:default,owner:Josh,createTime:1231488500,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:occurred_at,type:int,comment:null),
FieldSchema(name:id,type:string,comment:null),
FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/distinct_users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
FieldSchema(name:application,type:string,comment:null),
FieldSchema(name:dataset,type:string,comment:null),
FieldSchema(name:hour,type:int,comment:null)],parameters:{})
The query:
hive> INSERT OVERWRITE TABLE distinct_users SELECT
TRANSFORM(users.occurred_at, users.id, users.properties) USING '/bin/cat' AS
(occurred_at, id, properties) FROM users;
FAILED: Error in semantic analysis: line 1:23 Cannot insert into target
table because column number/types are different distinct_users: Cannot
convert column 2 from string to map<string,string>.
I'm really confused because the two tables are the exact same except for
their names and I'm just trying to do an insert from one of them into the
other using a script.
For reference this appears to work:
hive> INSERT OVERWRITE TABLE distinct_users SELECT occurred_at, id,
properties FROM users;
What is it about transforming that is messing up the semantic analysis?
Josh Ferguson
On Fri, Jan 9, 2009 at 11:52 AM, Josh Ferguson <[email protected]> wrote:
> Is it possible to do a query like the following:
> INSERT OVERWRITE TABLE table1 PARTITION(...)
> FROM table2
> SELECT TRANSFORM(table2.col1, table2.col2, ...) USING '/my/script' AS
> (col1, col2, ...)
> WHERE (...)
>
> I can run the select transform segment of the query by itself fine and I
> get the results I expect.
>
> When I try and do the insert as well I'm getting errors with column type
> mismatches even though my script is outputting 3 columns with the exact same
> types in the exact order that they appear in table1. I tried doing this with
> both a mapper and reducer similar to what was shown in the Apache Con slides
> and it still didn't work. Am I doing something wrong query wise?
>
> I'm using the 0.19 release.
>
> Josh Ferguson
>