Re: ETL workflow experiences with Hive

Bobby Rullo Tue, 15 Dec 2009 11:57:46 -0800

We do a couple of different things. First, we have a bunch of logsthat are just key/value pairs of transaction-id/server-events-in-json-form. I have scripts which add a new partition for every day's logdata. To hive, it's just a two column field, although the secondcolumn is a huge json field to hive. To make that column queryable, Icreated a bunch of UDF's which understand our log format, so I can dostuff like:

select count(tid) from txns where UserAgent(txn) like '%Chrome%' andFooBar(txn) = 'baz';

There's also tables which get generated from that transactions table,which are simple columnar tables.

I've written python wrappers for many Hive commands: execute_hql(),add_partition(), drop_partition(), etc.


Bobby   

On Dec 14, 2009, at 1:00 PM, Vijay wrote:

Can anyone share their ETL workflow experiences with Hive? Forexample, how do you transform data from log files to Hive tables? Doyou use hive with map/reduce scripts or do you use hiveprogrammatically? Or do you do something entirely different? Ihaven't found any samples or details about the programmatic usage ofhive.
Thanks in advance,
Vijay

Re: ETL workflow experiences with Hive

Reply via email to