[ https://issues.apache.org/jira/browse/IMPALA-12839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877493#comment-17877493 ]
ASF subversion and git services commented on IMPALA-12839: ---------------------------------------------------------- Commit 9e0649b9ceac86643d69afeb62d32e01bbc43717 in impala's branch refs/heads/master from Noemi Pap-Takacs [ https://gitbox.apache.org/repos/asf?p=impala.git;h=9e0649b9c ] IMPALA-12867: Filter files to OPTIMIZE based on file size The OPTIMIZE TABLE statement is currently used to rewrite the entire Iceberg table. With the 'FILE_SIZE_THRESHOLD_MB' option, the user can specify a file size limit to rewrite only small files. Syntax: OPTIMIZE TABLE <table_name> [(FILE_SIZE_THRESHOLD_MB=<value>)]; The value of the threshold is the file size in MBs. It must be a non-negative integer. Data files larger than the given limit will only be rewritten if they are referenced from delete files. If only 1 file is selected in a partition, it will not be rewritten. If the threshold is 0, only the delete files and the referenced data files will be rewritten. IMPALA-12839: 'Optimizing empty table should be no-op' is also resolved in this patch. With the file selection option, the OPTIMIZE operation can operate in 3 different modes: - REWRITE_ALL: The entire table is rewritten. Either because the compaction was triggered by a simple 'OPTIMIZE TABLE' command without a specified 'FILE_SIZE_THRESHOLD_MB' parameter, or because all files of the table are deletes/referenced by deletes or are smaller than the limit. - PARTIAL: If the value of 'FILE_SIZE_THRESHOLD_MB' parameter is specified then only the small data files without deletes are selected and the delete files are merged. Large data files without deletes are kept to avoid unnecessary resource consuming writes. - NOOP: When no files qualify for the selection criteria, there is no need to rewrite any files. This is a no-operation. Testing: - Parser test - FE unit tests - E2E tests Change-Id: Icfbb589513aacdb68a86c1aec4a0d39b12091820 Reviewed-on: http://gerrit.cloudera.org:8080/21388 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Optimizing empty Iceberg table should be no-op > ---------------------------------------------- > > Key: IMPALA-12839 > URL: https://issues.apache.org/jira/browse/IMPALA-12839 > Project: IMPALA > Issue Type: Bug > Reporter: Noemi Pap-Takacs > Assignee: Noemi Pap-Takacs > Priority: Major > Labels: impala-iceberg > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org