Hi All, We at Qubole <https://www.qubole.com/> have open sourced a datasource that will enable users to work on their Hive ACID Transactional Tables <https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions> using Spark.
Github: https://github.com/qubole/spark-acid Hive ACID tables allow users to work on their data transactionally, and also gives them the ability to Delete, Update and Merge data efficiently without having to rewrite all of their data in a table, partition or file. We believe that being able to work on these tables from Spark is a much desired value add, as is also apparent in https://issues.apache.org/jira/browse/SPARK-15348 and https://issues.apache.org/jira/browse/SPARK-16996 with multiple people looking for it. Currently the datasource supports reading from these ACID tables only, and we are working on adding the ability to write into these tables via Spark as well. The datasource is also available as a spark package, and instructions on how to use it are available on the Github page <https://github.com/qubole/spark-acid>. We welcome your feedback and suggestions. Thanks, Abhishek Somani