[ 
https://issues.apache.org/jira/browse/IMPALA-12557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788948#comment-17788948
 ] 

ASF subversion and git services commented on IMPALA-12557:
----------------------------------------------------------

Commit a9ad48484a739cc66caf573cc30afa298cd0b767 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a9ad48484 ]

IMPALA-12557: DELETE throws DateTimeParseException when deleting from 
time-partitioned table

There's a bug in IcebergDeleteSink that prevents Impala from
successfully executing a DELETE operation on Iceberg tables. During
DELETE we retrieve partition values from the virtual column
ICEBERG__PARTITION__SERIALIZED. This contains the transformed values,
e.g. in case of DAY-partitioning it contains the number of days since
the UNIX epoch.

Currently IcebergDeleteSink just uses these values as they are.
There are two problems with this. First, we want to place the delete
files under human-readable partition directories like other engines
do, and like our own INSERT statement does. I.e. we want a partition
directory /ts_day=2023-11-11/ and not /ts_day=19672/. The other problem
is that 'IcebergUtil.partitionDataFromDataFile()' also expects the
human-readable representation. This could be resolved at the CatalogD
side to just accept the integer values, but then we would still need
some logic in the IcebergDeleteSink to generate the human-readable
values for the file paths.

Moreover, partition values from INSERT statements are also
received in the human-readable representation at the Catalog.

This patch fixes the error by adding functions that transforms the
partition values to their human-readable representations. This is
done in the IcebergDeleteSink, so the Catalog-side logic is not
affected.

The above only affects the time-based transforms (YEAR, MONTH, DAY,
HOUR), as other partition transform values don't use different
representations.

Some notes on HOUR transform and daylight saving time:
There is no 1:1 mapping between an offset and the human-readable
representation in a timezone that has daylight saving time. This is not
an issue, as Impala's TIMESTAMP type is timezone-less. This also won't
be an issue for the TIMESTAMPTZ type as timestamp values are normalized
to UTC when stored, and UTC doesn't have daylight saving time.

Testing:
 * C++ backend tests
 * E2E tests for all time-based transforms, and also partition evolution
 * Also added an extra test about TRUNCATEing numeric values which was
   untested

Change-Id: I1cfeaed6409289663eb0f65b1ee2ecebd93e6118
Reviewed-on: http://gerrit.cloudera.org:8080/20711
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> DELETE throws DateTimeParseException when deleting from DAY-partitioned 
> Iceberg table
> -------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12557
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12557
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> DELETE throws DateTimeParseException when deleting from DAY-partitioned 
> Iceberg table.
> Stack trace:
> {noformat}
>   [1] java.time.format.DateTimeFormatter.parseResolved0 
> (DateTimeFormatter.java:1,949)
>   [2] java.time.format.DateTimeFormatter.parse (DateTimeFormatter.java:1,851)
>   [3] java.time.LocalDate.parse (LocalDate.java:400)
>   [4] org.apache.iceberg.expressions.Literals$StringLiteral.to 
> (Literals.java:495)
>   [5] org.apache.iceberg.types.Conversions.fromPartitionString 
> (Conversions.java:70)
>   [6] org.apache.impala.util.IcebergUtil.getPartitionValue 
> (IcebergUtil.java:748)
>   [7] org.apache.impala.util.IcebergUtil.partitionDataFromDataFile 
> (IcebergUtil.java:726)
>   [8] org.apache.impala.service.IcebergCatalogOpExecutor.deleteRows 
> (IcebergCatalogOpExecutor.java:364)
>   [9] org.apache.impala.service.IcebergCatalogOpExecutor.execute 
> (IcebergCatalogOpExecutor.java:337)
>   [10] org.apache.impala.service.CatalogOpExecutor.updateCatalogImpl 
> (CatalogOpExecutor.java:7,042)
>   [11] org.apache.impala.service.CatalogOpExecutor.updateCatalog 
> (CatalogOpExecutor.java:6,786)
>   [12] org.apache.impala.service.JniCatalog.lambda$updateCatalog$15 
> (JniCatalog.java:487)
> ...{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to