Any pointers?
Thanks, Pradeep ________________________________ From: Pradeep Kamath [mailto:prade...@yahoo-inc.com] Sent: Wednesday, July 07, 2010 4:47 PM To: hive-user@hadoop.apache.org Subject: Complex types, lateral view and RCFile Hi, I have data with complex types (map, struct, array of maps) stored as a text file. I am able to successfully create an external table based on this data and further build a lateral view on it: hive -e 'select rownum, bag_item from complex_text LATERAL VIEW explode(bagofmap) explodedTable AS bag_item ;' 1 {"k1":"v1","k2":"v2"} 1 {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"} 2 {"a1":"b1","a2":"b2"} 2 {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"} Here is how I created this table: CREATE external TABLE if not exists complex_text ( mymap map<string, string>, mytuple struct<num:int,str:string,dbl:double>, bagofmap array<map<string,string>>, rownum int ) row format DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\002' MAP KEYS TERMINATED BY '\003' stored as textfile location '/user/pradeepk/complex_text'; The data contents are (^C stands for ctrl-C .i.e '\003'): mymapk1^Cmymapv1^Bmymapk2^Cmymapv2^A1^Bhello^B2.5^Ak1^Dv1^Ck2^Dv2^Bk3^Dv 3^Ck4^Dv4^Ck5^Dv5^Ck6^Dv6^A1 mymapk3^Cmymapv3^Bmymapk4^Cmymapv4^A2^Bbye^B3.5^Aa1^Db1^Ca2^Db2^Ba3^Db3^ Ca4^Db4^Ca5^Db5^Ca6^Db6^A2 Now I created a table using RCFile for storage based on the above table as follows: create table complex_rcfile stored as RCFile location '/user/pradeepk/complex_rcfile' as select mymap, mytuple, bagofmap, rownum from complex_text; The same query against this table gives incorrect results (nulls for the rownum column): hive -e 'select rownum, bag_item from complex_rcfile LATERAL VIEW explode(bagofmap) explodedTable AS bag_item ;' NULL {"a1":"b1","a2":"b2"} NULL {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"} NULL {"k1":"v1","k2":"v2"} NULL {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"} I have a feeling the delimiters are not being correctly interpreted in RCFile format. Strangely a non lateral view query works fine: hive -e 'select rownum, bagofmap from complex_rcfile;' 2 [{"a1":"b1","a2":"b2"},{"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}] 1 [{"k1":"v1","k2":"v2"},{"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}] Any pointers? Thanks, Pradeep