[ 
https://issues.apache.org/jira/browse/IMPALA-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823528#comment-17823528
 ] 

ASF subversion and git services commented on IMPALA-12640:
----------------------------------------------------------

Commit 4428db37b3884373482071fc918936b0c080e47c in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4428db37b ]

IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations

We must invoke validateDataFilesExist for RowDelta operations (DELETE/
UPDATE/MERGE). Without this a concurrent RewriteFiles (compaction) and
RowDelta can corrupt a table.

IcebergBufferedDeleteSink now also collects the filenames of the data
files that are referenced in the position delete files. It adds them to
the DML exec state which is then collected by the Coordinator. The
Coordinator passes the file paths to CatalogD which executes Iceberg's
RowDelta operation and now invokes validateDataFilesExist() with the
file paths. Additionally it also invokes validateDeletedFiles().

This patch set also resolves IMPALA-12640 which is about replacing
IcebergDeleteSink with IcebergBufferedDeleteSink, as from now on
we use the buffered version for all DML operations that write
position delete files.

Testing:
 * adds new stress test with DELETE + UPDATE + OPTIMIZE

Change-Id: I4869eb863ff0afe8f691ccf2fd681a92d36b405c
Reviewed-on: http://gerrit.cloudera.org:8080/21099
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Gabor Kaszab <gaborkas...@cloudera.com>


> Remove IcebergDeleteSink
> ------------------------
>
>                 Key: IMPALA-12640
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12640
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> UPDATE part 3 CR (https://gerrit.cloudera.org/#/c/20760/) introduces a new 
> sink operator for position delete records: IcebergBufferedDeleteSink.
> The new operator can be used in the context of UPDATEs even in the case when 
> updating a partition column value, or the table has SORT BY properties.
> IcebergBufferedDeleteSink doesn't require sorting by delete partitions, file 
> paths, and positions, as it takes care of it.
> The only area where IcebergBufferedDeleteSink lags behind IcebergDeleteSink 
> is that it cannot spill to disk. But since it stores filepaths and positions 
> in a compact format it is unlikely that it would ever need to spill to disk 
> in a real life situation. E.g. even if there are 100M rows need to be deleted 
> per Impala executor, the amount of memory required is not much larger than 
> 800 MBs per executor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to