Zoltán Borók-Nagy created IMPALA-10135: ------------------------------------------
Summary: Insert events doesn't contain the inserted data files Key: IMPALA-10135 URL: https://issues.apache.org/jira/browse/IMPALA-10135 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles. The problem is that Impala misuses Sets.difference(set1, set2). From the API doc at [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] "The returned set contains all elements that are contained by {{set1}} and not contained by {{set2}}. {{set2}} may also contain elements not present in {{set1}}; these are simply ignored." So the name "difference" is a bit misleading, its rather a subtraction between set1 and set2. Unfortunately Impala passes the parameters in wrong order: Sets.difference(beforeInsert, afterInsert): [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] So the result will be always empty. There's another problem with INSERT OVERWRITEs, in that case we never fill the data files of the insert event. -- This message was sent by Atlassian Jira (v8.3.4#803005)