[ https://issues.apache.org/jira/browse/IMPALA-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873514#comment-16873514 ]
Tim Armstrong commented on IMPALA-8708: --------------------------------------- [~ggop] can you elaborate on what the workflow looks like? Can any file be deleted at any point while it is being queried? What file format is this? Parquet? I think this is quite difficult to do cleanly (i.e. it is not as simple as checking whether file exists when opening it), since the file could disappear part-way through being scanned and the error could bubble up through any number of code paths. So it would be possible for some rows from a deleted file to appear. There's some precedent for this in the abort_on_error behaviour that skips over parse errors. It might be possible to detect disk I/O errors and not propagate those failures to a query failure. > Impala should ignore deleted files > ---------------------------------- > > Key: IMPALA-8708 > URL: https://issues.apache.org/jira/browse/IMPALA-8708 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 3.2.0 > Reporter: Gautam Gopalakrishnan > Priority: Major > > When querying an S3 backed table that is being modified (e.g. distcp content > from another cluster) and Impala is able to determine that a file in that > table has been deleted (e.g. using the S3guard feature in CDH), queries still > fail with a {{FileNotFound}} exception. > Performing a metadata refresh after the copy completes does resolve the > problem. However this doesn't help during the copy phase. Requesting an > enhancement where Impala can ignore files if knows that they've been deleted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org