Changelog table

2015-11-12 Thread Alex Newman
If I have a series of entries that look like


{ "add", {"baz" : "foo" }}
{ "update", {"baz" : "bar" }}
{ "delete", "baz" }

Is there a way to get hive to compact this logic into a state table?

-- 

Sent from my mobile 5045076749
Alex Newman


Re: Changelog table

2015-11-17 Thread Gopal Vijayaraghavan

> If I have a series of entries that look like
...
> { "update", {"baz" : "bar" }}

Due to the way the split distribution works, you need a global ordering
key for each operation.

0, "ADD", "baz", ""
1, "SET", "baz", "bar"
2, "DEL", "baz", null

If you do not have updates coming in within a second, you could store a
timestamp.

Then you can write a windowing function for Hive to merge/order them.

select flatten_txns(op, key, value) over (partition by key order by ts)
from txns;

At this point, you're nearly reinventing what Hive's own
insert/update/delete statements do.

Except, compared to that, these updates are faster (since it's really an
unconditional SET).

Cheers,
Gopal