[ 
https://issues.apache.org/jira/browse/IMPALA-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873514#comment-16873514
 ] 

Tim Armstrong commented on IMPALA-8708:
---------------------------------------

[~ggop] can you elaborate on what the workflow looks like? Can any file be 
deleted at any point while it is being queried? What file format is this? 
Parquet?

I think this is quite difficult to do cleanly (i.e. it is not as simple as 
checking whether file exists when opening it), since the file could disappear 
part-way through being scanned and the error could bubble up through any number 
of code paths. So it would be possible for some rows from a deleted file to 
appear.

There's some precedent for this in the abort_on_error behaviour that skips over 
parse errors. It might be possible to detect disk I/O errors and not propagate 
those failures to a query failure.



> Impala should ignore deleted files
> ----------------------------------
>
>                 Key: IMPALA-8708
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8708
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Gautam Gopalakrishnan
>            Priority: Major
>
> When querying an S3 backed table that is being modified (e.g. distcp content 
> from another cluster) and Impala is able to determine that a file in that 
> table has been deleted (e.g. using the S3guard feature in CDH), queries still 
> fail with a {{FileNotFound}} exception.
> Performing a metadata refresh after the copy completes does resolve the 
> problem. However this doesn't help during the copy phase. Requesting an 
> enhancement where Impala can ignore files if knows that they've been deleted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to