Daniel Becker has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/22177 )

Change subject: IMPALA-13594: Read Puffin stats also from older snapshots
......................................................................

IMPALA-13594: Read Puffin stats also from older snapshots

Before this change, Puffin stats were only read from the current
snapshot. Now we also consider older snapshots, and for each column we
choose the most recent available stats. Note that this means that the
stats for different columns may come from different snapshots.

In case there are both HMS and Puffin stats for a column, the more
recent one will be used - for HMS stats we use the
'impala.lastComputeStatsTime' table property, and for Puffin stats we
use the snapshot timestamp to determine which is more recent.

Testing:
 - updated existing test cases and added new ones in
   test_iceberg_with_puffin.py
 - reorganised the tests in TestIcebergTableWithPuffinStats in
   test_iceberg_with_puffin.py: tests that modify table properties and
   other state that other tests rely on are now run separately to
   provide a clean environment for all tests.

Change-Id: Ia37abe8c9eab6d91946c8f6d3df5fb0889704a39
---
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java
M 
java/puffin-data-generator/src/main/java/org/apache/impala/puffindatagenerator/PuffinDataGenerator.java
A testdata/ice_puffin/00000-c24f24ca-05a1-493f-ae7b-659daf21b5a9.metadata.json
A testdata/ice_puffin/00001-ae590078-1d64-45cb-892f-80b58829d673.metadata.json
A testdata/ice_puffin/00002-5302617e-4ca6-4e44-a513-0f2082b05700.metadata.json
A testdata/ice_puffin/00003-442f9acd-964c-43d7-92b8-e0737a39719a.metadata.json
A testdata/ice_puffin/00004-18244103-c1f4-4733-99ae-10b56c36f900.metadata.json
M testdata/ice_puffin/generated/all_files_corrupt.metadata.json
M testdata/ice_puffin/generated/all_stats.stats
M testdata/ice_puffin/generated/all_stats_in_1_file.metadata.json
M testdata/ice_puffin/generated/corrupt_file.stats
M testdata/ice_puffin/generated/corrupt_file1.stats
M testdata/ice_puffin/generated/corrupt_file2.stats
M testdata/ice_puffin/generated/current_snapshot_id.stats
M testdata/ice_puffin/generated/duplicate_stats_in_1_file.metadata.json
M testdata/ice_puffin/generated/duplicate_stats_in_1_file.stats
M testdata/ice_puffin/generated/duplicate_stats_in_2_files.metadata.json
M testdata/ice_puffin/generated/duplicate_stats_in_2_files1.stats
M testdata/ice_puffin/generated/duplicate_stats_in_2_files2.stats
M testdata/ice_puffin/generated/existing_file.stats
M testdata/ice_puffin/generated/file_contains_invalid_field_id.metadata.json
M testdata/ice_puffin/generated/file_contains_invalid_field_id.stats
M testdata/ice_puffin/generated/invalidAndCorruptSketches.metadata.json
M testdata/ice_puffin/generated/invalidAndCorruptSketches.stats
M testdata/ice_puffin/generated/metadata_ndv_ok_sketches_corrupt.stats
M testdata/ice_puffin/generated/metadata_ndv_ok_stats_file_corrupt.metadata.json
M testdata/ice_puffin/generated/missing_file.metadata.json
M testdata/ice_puffin/generated/multiple_field_ids.metadata.json
M testdata/ice_puffin/generated/multiple_field_ids.stats
M testdata/ice_puffin/generated/non_corrupt_file.stats
M testdata/ice_puffin/generated/not_all_blobs_current.metadata.json
M testdata/ice_puffin/generated/not_all_blobs_current.stats
M testdata/ice_puffin/generated/not_current_snapshot_id.stats
M testdata/ice_puffin/generated/one_file_corrupt_one_not.metadata.json
M testdata/ice_puffin/generated/one_file_current_one_not.metadata.json
A 
testdata/ice_puffin/generated/some_blobs_current_some_not_in_2_files.metadata.json
A testdata/ice_puffin/generated/some_blobs_current_some_not_in_2_files1.stats
A testdata/ice_puffin/generated/some_blobs_current_some_not_in_2_files2.stats
M testdata/ice_puffin/generated/stats_divided.metadata.json
M testdata/ice_puffin/generated/stats_divided1.stats
M testdata/ice_puffin/generated/stats_divided2.stats
M testdata/ice_puffin/generated/stats_for_unsupported_type.metadata.json
M testdata/ice_puffin/generated/stats_for_unsupported_type.stats
A 
testdata/ice_puffin/snap-2630643801692665966-1-5010cf53-bb8c-4dd2-94a6-ce516a3152d6.avro
A 
testdata/ice_puffin/snap-3941638984336887328-1-5896bbb0-146d-4f27-be25-c23bf13bf8ab.avro
A 
testdata/ice_puffin/snap-4323499932319869599-1-88451644-3db8-481b-8dfa-618535418394.avro
A 
testdata/ice_puffin/snap-6623980626006176926-1-5832292a-f62d-4c25-a10a-bf1a46098ead.avro
M tests/custom_cluster/test_iceberg_with_puffin.py
49 files changed, 2,491 insertions(+), 433 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/22177/4
--
To view, visit http://gerrit.cloudera.org:8080/22177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia37abe8c9eab6d91946c8f6d3df5fb0889704a39
Gerrit-Change-Number: 22177
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to