[ 
https://issues.apache.org/jira/browse/IMPALA-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873745#comment-16873745
 ] 

Gautam Gopalakrishnan commented on IMPALA-8708:
-----------------------------------------------

The workflow here is that there are two clusters, one on-prem and one on AWS. 
The on-prem cluster does the majority of ingestion and some content is copied 
over (BDR) to S3 for the AWS cluster. There are several dashboards and 
interactive users who use the S3 backed tables. During the copy phase, queries 
fail as files could go missing. All this is because BDR doesn't support the 
idea of creating a separate partition on copy or generate new table names at 
runtime (so we can simply rename tables when the copy is over).

With the use of S3guard, Impala knows for sure that a certain file is deleted. 
So my hope was to avoid aborting query execution and carry on with whatever 
files are present. Once the copy is over, a metadata refresh is executed to get 
things back to normal. It's the copy phase that's causing issues.



> Impala should ignore deleted files
> ----------------------------------
>
>                 Key: IMPALA-8708
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8708
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Gautam Gopalakrishnan
>            Priority: Major
>
> When querying an S3 backed table that is being modified (e.g. distcp content 
> from another cluster) and Impala is able to determine that a file in that 
> table has been deleted (e.g. using the S3guard feature in CDH), queries still 
> fail with a {{FileNotFound}} exception.
> Performing a metadata refresh after the copy completes does resolve the 
> problem. However this doesn't help during the copy phase. Requesting an 
> enhancement where Impala can ignore files if knows that they've been deleted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to