[ 
https://issues.apache.org/jira/browse/IMPALA-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968038#comment-16968038
 ] 

Yanjia Gary Li commented on IMPALA-8778:
----------------------------------------

Hello [~tarmstrong] , I'd like to resume the discussion on this topic. Yuanbin 
finished his internship a few months ago so please assign this ticket to me. 

After reading some code on both impala and hudi sides, the following are the 
approaches I could think about.
 * As discussed above, to create a new class similar to hdfsTable with Hudi 
dependency to filter path. 
 * Implement everything on the Hudi side and send a sequence of queries to the 
impala server to ALTER the table. The hive sync tool on the Hudi repo is using 
this method. I think this approach could be easier than the one above because 
we could follow a similar strategy as the hive sync tool and we don't need to 
wait until the next release to use this feature.

To make sure this method is possible, I'd like to know what query could handle 
this situation:
 * first stage: in HDFS partition year=2019/month=10/day=1, we have 
file1_v1.parquet, file2_v1.parquet
 * second stage: we ran a Hudi job to update the partition 
year=2019/month=10/day=1, we have file1_v1.parquet, file1_v2.parquet, 
file2_v1.parquet

If we want to *drop* file1_v1.parquet and *load* file1_v2.parquet to the table, 
what query should I run? What will happen if another user submits a query when 
the metadata is updating?

Thanks

> Support read/write Apache Hudi tables
> -------------------------------------
>
>                 Key: IMPALA-8778
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8778
>             Project: IMPALA
>          Issue Type: New Feature
>            Reporter: Yuanbin Cheng
>            Assignee: Yuanbin Cheng
>            Priority: Major
>
> Apache Impala currently not support Apache Hudi, cannot even pull metadata 
> from Hive.
> Related issue: 
> [https://github.com/apache/incubator-hudi/issues/179] 
> [https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146|https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146?filter=allopenissues]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to