Re: How to update Hive ACID tables in Flink

2019-03-12 Thread David Morin
Yes, I use HDP 2.6.5. Thus I still have to deal with Hive 2. The migration to HDP 3 has been planned but in a couple of months. So, thanks for your reply, I investigate deeper concerning the ACID support for Orc in Hive 2. Le mar. 12 mars 2019 à 22:51, Alan Gates a écrit : > That's the old

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread Alan Gates
That's the old (Hive 2) version of ACID. In the newer version (Hive 3) there's no update, just insert and delete (update is insert + delete). If you're working against Hive 2 what you have is what you want. If you're working against Hive 3 you'll need the newer stuff. Alan. On Tue, Mar 12,

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread David Morin
Thanks Alan. Yes, the problem is fact was that this streaming API does not handle update and delete. I've used native Orc files and the next step I've planned to do is the use of ACID support as described here: https://orc.apache.org/docs/acid.html The INSERT/UPDATE/DELETE seems to be implemented:

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread Alan Gates
Have you looked at Hive's streaming ingest? https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest It is designed for this case, though it only handles insert (not update), so if you need updates you'd have to do the merge as you are currently doing. Alan. On Mon, Mar 11, 2019 at

How to update Hive ACID tables in Flink

2019-03-11 Thread David Morin
Hello, I've just implemented a pipeline based on Apache Flink to synchronize data between MySQL and Hive (transactional + bucketized) onto HDP cluster. Flink jobs run on Yarn. I've used Orc files but without ACID properties. Then, we've created external tables on these hdfs directories that