Hi, I'm trying to create a table similar to apache_log but I'm trying to avoid to write my own map-reduce task because I don't want to have my HDFS files twice.
So if you're working with log lines like this: 186.92.134.151 [31/Aug/2011:00:10:41 +0000] "GET /client/action1/?transaction_id=8002&user_id=871793100001248&ts=1314749223525&item1=271&item2=6045&environment=2 HTTP/1.1" 112.201.65.238 [31/Aug/2011:00:10:41 +0000] "GET /client/action1/?transaction_id=9002&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2 HTTP/1.1" 90.45.198.251 [31/Aug/2011:00:10:41 +0000] "GET /client/action2/?transaction_id=9022&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2 HTTP/1.1" And having in mind that the parameters could be in different orders. Which will be the best strategy to create this table? Write my own org.apache.hadoop.hive.contrib.serde2? Is there any resource already implemented that I could use to perform this task? In the end the objective is convert all the parameters in fields and use as type the "action". With this big table I will be able to perform my queries, my joins or my views. Any ideas? Thanks in Advance, Raimon Bosch. -- View this message in context: http://old.nabble.com/Creating-a-hive-table-for-a-custom-log-tp32379849p32379849.html Sent from the Hadoop core-user mailing list archive at Nabble.com.