If it makes more sense you could also store your lines with with the default serde, and extract the you intend to query using a UDF
For example you could use parse_url(string urlString, string partToExtract [, string keyToExtract]) to parse url stuff.... Good luck -----Original Message----- From: Raimon Bosch [mailto:raimon.bo...@gmail.com] Sent: Friday, September 16, 2011 10:36 PM To: core-u...@hadoop.apache.org Subject: Re: Creating a hive table for a custom log Any Ideas? The most common aproach will be writting your own serde and plug it to your hive like: http://code.google.com/p/hive-json-serde/ But I'm wondering if there is some work already done in this area. Raimon Bosch wrote: > > Hi, > > I'm trying to create a table similar to apache_log but I'm trying to avoid > to write my own map-reduce task because I don't want to have my HDFS files > twice. > > So if you're working with log lines like this: > > 186.92.134.151 [31/Aug/2011:00:10:41 +0000] "GET > /client/action1/?transaction_id=8002&user_id=871793100001248&ts=1314749223525&item1=271&item2=6045&environment=2 > HTTP/1.1" > > 112.201.65.238 [31/Aug/2011:00:10:41 +0000] "GET > /client/action1/?transaction_id=9002&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2 > HTTP/1.1" > > 90.45.198.251 [31/Aug/2011:00:10:41 +0000] "GET > /client/action2/?transaction_id=9022&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2 > HTTP/1.1" > > And having in mind that the parameters could be in different orders. Which > will be the best strategy to create this table? Write my own > org.apache.hadoop.hive.contrib.serde2? Is there any resource already > implemented that I could use to perform this task? > > In the end the objective is convert all the parameters in fields and use > as type the "action". With this big table I will be able to perform my > queries, my joins or my views. > > Any ideas? > > Thanks in Advance, > Raimon Bosch. > -- View this message in context: http://old.nabble.com/Creating-a-hive-table-for-a-custom-log-tp32379849p32481457.html Sent from the Hadoop core-user mailing list archive at Nabble.com.