Hi Josh, Yes the transform assumes every output column will be string. And, if the input of the transform is not a string, it will be converted to a string. But we don't have a mechanism to convert string back to map<string,string> in the language. What do you think we should do to support that?
Zheng On Sat, Jan 10, 2009 at 10:25 PM, Josh Ferguson <[email protected]> wrote: > One more small update, it seems that transform doesn't work at all for > inserting into columns of type MAP<X,Y>. I suspect this is because the > semantic analyzer treats all columns out of a custom map phase as type > 'STRING' and then complains when it can't convert the assumed type into the > type necessary, which is MAP<STRING, STRING> in this case. Is this correct? > Is anyone else using a MAP type with custom map or reduce scripts? What > queries have you gotten to work? > Jos > > > On Sat, Jan 10, 2009 at 12:16 PM, Josh Ferguson <[email protected]>wrote: > >> I want to follow up on this a little, here are the schemas for the source >> and destination tables and the query I am trying to run. >> Source table: >> >> hive> DESCRIBE EXTENDED users; >> >> OK >> occurred_at int >> id string >> properties map<string,string> >> account string >> application string >> dataset string >> hour int >> Detailed Table Information: >> Table(tableName:users,dbName:default,owner:Josh,createTime:1231485489,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:occurred_at,type:int,comment:null), >> FieldSchema(name:id,type:string,comment:null), >> FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), >> FieldSchema(name:application,type:string,comment:null), >> FieldSchema(name:dataset,type:string,comment:null), >> FieldSchema(name:hour,type:int,comment:null)],parameters:{}) >> >> >> Destination table: >> >> hive> DESCRIBE EXTENDED distinct_users; >> OK >> occurred_at int >> id string >> properties map<string,string> >> account string >> application string >> dataset string >> hour int >> Detailed Table Information: >> Table(tableName:distinct_users,dbName:default,owner:Josh,createTime:1231488500,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:occurred_at,type:int,comment:null), >> FieldSchema(name:id,type:string,comment:null), >> FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/distinct_users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), >> FieldSchema(name:application,type:string,comment:null), >> FieldSchema(name:dataset,type:string,comment:null), >> FieldSchema(name:hour,type:int,comment:null)],parameters:{}) >> >> The query: >> >> hive> INSERT OVERWRITE TABLE distinct_users SELECT >> TRANSFORM(users.occurred_at, users.id, users.properties) USING '/bin/cat' >> AS (occurred_at, id, properties) FROM users; >> FAILED: Error in semantic analysis: line 1:23 Cannot insert into target >> table because column number/types are different distinct_users: Cannot >> convert column 2 from string to map<string,string>. >> >> I'm really confused because the two tables are the exact same except for >> their names and I'm just trying to do an insert from one of them into the >> other using a script. >> >> For reference this appears to work: >> >> hive> INSERT OVERWRITE TABLE distinct_users SELECT occurred_at, id, >> properties FROM users; >> >> What is it about transforming that is messing up the semantic analysis? >> >> Josh Ferguson >> >> On Fri, Jan 9, 2009 at 11:52 AM, Josh Ferguson <[email protected]>wrote: >> >>> Is it possible to do a query like the following: >>> INSERT OVERWRITE TABLE table1 PARTITION(...) >>> FROM table2 >>> SELECT TRANSFORM(table2.col1, table2.col2, ...) USING '/my/script' AS >>> (col1, col2, ...) >>> WHERE (...) >>> >>> I can run the select transform segment of the query by itself fine and I >>> get the results I expect. >>> >>> When I try and do the insert as well I'm getting errors with column type >>> mismatches even though my script is outputting 3 columns with the exact same >>> types in the exact order that they appear in table1. I tried doing this with >>> both a mapper and reducer similar to what was shown in the Apache Con slides >>> and it still didn't work. Am I doing something wrong query wise? >>> >>> I'm using the 0.19 release. >>> >>> Josh Ferguson >>> >> >> > -- Yours, Zheng
