Should the SQL schema for HIVE be identical to the SQL schema for a relational database?

2013-04-08 Thread Matthieu Labour
Hi

It would be terrific to get some advice on migrating a schema from RDMS to
Hive.

Should the SQL schema for HIVE be identical to the SQL schema for a
Posgresql/mysql database?

Specifically:

I have an application that generates events that look like the following:

{"ts":N+1,"userId":"123","event":"location","payload":{"verticalAccuracy":10,"longitude":-73.99718090313884,"latitude":40.72473278788106,"altitude":27.79653739929199,"horizontalAccuracy":65}}
{"ts":N+2,"userId":"123","event":"addProduct","payload":["cart","osprey-kestrel-48"]}
...

Events are being written to persistent storage (AWS S3). A 'worker' wakes
up periodically, reads the new events received and inserts them in a
postgresql database. The database has  user, product, user_product
(user_id, product_id, action:(viewed|wishlist...), timestamp), location
etc... tables.

We are migrating to HIVE.

Should we also create user, product, user_product, locations etc... as HIVE
tables and have a MapReduce job process event files to populate the HIVE
tables? Or should/can we implement a different schema that would allow for
external HIVE tables to map directly to the event files generated. Or a mix
of both?

Thank you for your help!

-matt


On hive date functions

2012-10-08 Thread Matthieu Labour
Hi
Is it possible with Hive to truncate date to a specified precision?
For example in Postgresql date_trunc('hour',timestamp '2001-02-16
20:38:40') will return 2001-02-16 20:00:00
There is the to_date function in hive
I am trying to achieve the following
select distinct date_trunc('hour', timestamp) as hour, count(*) from table
 group by hour;
Thank you for your help
Matthieu