New Spark Datasource for Hive ACID tables

Abhishek Somani Fri, 26 Jul 2019 05:43:47 -0700

Hi All,

We at Qubole <https://www.qubole.com/> have open sourced a datasource that
will enable users to work on their Hive ACID Transactional Tables
<https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions> using
Spark.


Github: https://github.com/qubole/spark-acid

Hive ACID tables allow users to work on their data transactionally, and
also gives them the ability to Delete, Update and Merge data efficiently
without having to rewrite all of their data in a table, partition or file.
We believe that being able to work on these tables from Spark is a much
desired value add, as is also apparent in
https://issues.apache.org/jira/browse/SPARK-15348 and
https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
looking for it. Currently the datasource supports reading from these ACID
tables only, and we are working on adding the ability to write into these
tables via Spark as well.

The datasource is also available as a spark package, and instructions on
how to use it are available on the Github page
<https://github.com/qubole/spark-acid>.

We welcome your feedback and suggestions.

Thanks,
Abhishek Somani

New Spark Datasource for Hive ACID tables

Reply via email to