[ 
https://issues.apache.org/jira/browse/IMPALA-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842513#comment-17842513
 ] 

ASF subversion and git services commented on IMPALA-13046:
----------------------------------------------------------

Commit 20f908b1ab3ce068b9ce14aa8f35a897f5cd0c88 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=20f908b1a ]

IMPALA-13046: Update Iceberg mixed format deletes test

Updates iceberg-mixed-format-position-deletes.test for HIVE-28069. Newer
versions of Hive will now remove a data file if a delete would negate
all rows in the data file to reduce the number of small files produced.
The test now ensures every data file it expects to produce will have a
row after delete (or circumvent the merge logic by using different
formats).

Change-Id: I87c23cc541983223c6b766372f4e582c33ae6836
Reviewed-on: http://gerrit.cloudera.org:8080/21373
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Rework Iceberg mixed format delete test for Hive optimization
> -------------------------------------------------------------
>
>                 Key: IMPALA-13046
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13046
>             Project: IMPALA
>          Issue Type: Task
>            Reporter: Michael Smith
>            Assignee: Michael Smith
>            Priority: Major
>             Fix For: Impala 4.x
>
>
> A Hive improvement (HIVE-27731) to Iceberg support changed Hive's behavior 
> around handling deletes. It used to always add a delete file, but now if a 
> delete would negate all the contents of a data file it instead removes the 
> data file. This breaks iceberg-mixed-format-position-deletes.test
> {code}
> query_test/test_iceberg.py:1472: in test_read_mixed_format_position_deletes
>     vector, unique_database)
> common/impala_test_suite.py:820: in run_test_case
>     self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:627: in __verify_results_and_errors
>     replace_filenames_with_placeholder)
> common/test_result_verifier.py:520: in verify_raw_results
>     VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:290: in verify_query_result_is_subset
>     unicode(expected_row), unicode(actual_results))
> E   AssertionError: Could not find expected row 
> row_regex:'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/.*-data-.*.parquet','.*B','','.*'
>  in actual rows:
> E   
> 'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/00000-0-data-jenkins_20240427231939_2457cdfb-2e04-471a-9661-4551626f60ee-job_17142740646310_0071-4-00001.orc','306B','','NONE'
> {code}
> as the only resulting data file is a .orc file containing the only valid row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to