Hi,

I'm trying to create a table similar to apache_log but I'm trying to avoid
to write my own map-reduce task because I don't want to have my HDFS files
twice.

So if you're working with log lines like this:

186.92.134.151 [31/Aug/2011:00:10:41 +0000] "GET
/client/action1/?transaction_id=8002&user_id=871793100001248&ts=1314749223525&item1=271&item2=6045&environment=2
HTTP/1.1"

112.201.65.238 [31/Aug/2011:00:10:41 +0000] "GET
/client/action1/?transaction_id=9002&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2
HTTP/1.1"

90.45.198.251 [31/Aug/2011:00:10:41 +0000] "GET
/client/action2/?transaction_id=9022&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2
HTTP/1.1"

And having in mind that the parameters could be in different orders. Which
will be the best strategy to create this table? Write my own
org.apache.hadoop.hive.contrib.serde2? Is there any resource already
implemented that I could use to perform this task?

In the end the objective is convert all the parameters in fields and use as
type the "action". With this big table I will be able to perform my queries,
my joins or my views.

Any ideas?

Thanks in Advance,
Raimon Bosch.
-- 
View this message in context: 
http://old.nabble.com/Creating-a-hive-table-for-a-custom-log-tp32379849p32379849.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to