Thanks Abhishek.

Will it work on hive acid table which is not compacted ? i.e table having
base and delta files?

Let’s say hive acid table customer

Create table customer(customer_id int, customer_name string, customer_email
string) cluster by customer_id buckets 10 location ‘/test/customer’
tableproperties(transactional=true)


And table hdfs path having below directories

/test/customer/base_15234/
/test/customer/delta_1234_456


That means table having updates and major compaction not run.

Will it spark reader works ?


Thank you,
Naresh







On Fri, Jul 26, 2019 at 7:38 AM Abhishek Somani <abhisheksoman...@gmail.com>
wrote:

> Hi All,
>
> We at Qubole <https://www.qubole.com/> have open sourced a datasource
> that will enable users to work on their Hive ACID Transactional Tables
> <https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions>
> using Spark.
>
> Github: https://github.com/qubole/spark-acid
>
> Hive ACID tables allow users to work on their data transactionally, and
> also gives them the ability to Delete, Update and Merge data efficiently
> without having to rewrite all of their data in a table, partition or file.
> We believe that being able to work on these tables from Spark is a much
> desired value add, as is also apparent in
> https://issues.apache.org/jira/browse/SPARK-15348 and
> https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
> looking for it. Currently the datasource supports reading from these ACID
> tables only, and we are working on adding the ability to write into these
> tables via Spark as well.
>
> The datasource is also available as a spark package, and instructions on
> how to use it are available on the Github page
> <https://github.com/qubole/spark-acid>.
>
> We welcome your feedback and suggestions.
>
> Thanks,
> Abhishek Somani
>
-- 
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/

Reply via email to