Hi
It would be terrific to get some advice on migrating a schema from RDMS to
Hive.
Should the SQL schema for HIVE be identical to the SQL schema for a
Posgresql/mysql database?
Specifically:
I have an application that generates events that look like the following:
{"ts":N+1,"userId":"123","event":"location","payload":{"verticalAccuracy":10,"longitude":-73.99718090313884,"latitude":40.72473278788106,"altitude":27.79653739929199,"horizontalAccuracy":65}}
{"ts":N+2,"userId":"123","event":"addProduct","payload":["cart","osprey-kestrel-48"]}
...
Events are being written to persistent storage (AWS S3). A 'worker' wakes
up periodically, reads the new events received and inserts them in a
postgresql database. The database has user, product, user_product
(user_id, product_id, action:(viewed|wishlist...), timestamp), location
etc... tables.
We are migrating to HIVE.
Should we also create user, product, user_product, locations etc... as HIVE
tables and have a MapReduce job process event files to populate the HIVE
tables? Or should/can we implement a different schema that would allow for
external HIVE tables to map directly to the event files generated. Or a mix
of both?
Thank you for your help!
-matt