agnes-xinyi-lu opened a new issue, #6911:
URL: https://github.com/apache/iceberg/issues/6911
### Apache Iceberg version
None
### Query engine
None
### Please describe the bug 🐞
We started getting this exception in some of our UTs after upgrading to
1.1.0. Basically in the test we use a string field in the partition spec, and
provide some string partition values like it's converted by a
datetime("2020-20-20") write was fine, but read will throw exceptions like this:
java.time.format.DateTimeParseException: Text '2021-20-20' could not be
parsed: Invalid value for MonthOfYear (valid values 1 - 12): 20
at
java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:1920)
at
java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1855)
at java.time.LocalDate.parse(LocalDate.java:400)
at
org.apache.iceberg.expressions.Literals$StringLiteral.to(Literals.java:495)
at
org.apache.iceberg.expressions.ExpressionUtil.sanitizeString(ExpressionUtil.java:380)
at
org.apache.iceberg.expressions.ExpressionUtil.sanitize(ExpressionUtil.java:320)
at
org.apache.iceberg.expressions.ExpressionUtil.access$300(ExpressionUtil.java:38)
at
org.apache.iceberg.expressions.ExpressionUtil$StringSanitizer.predicate(ExpressionUtil.java:269)
at
org.apache.iceberg.expressions.ExpressionUtil$StringSanitizer.predicate(ExpressionUtil.java:197)
at
org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:347)
at
org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:366)
at
org.apache.iceberg.expressions.ExpressionUtil.toSanitizedString(ExpressionUtil.java:82)
at org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:142)
at org.apache.iceberg.DataTableScan.planFiles(DataTableScan.java:27)
The reason is in
[planFiles](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotScan.java#:~:text=ExpressionUtil.toSanitizedString(filter()))),
logInfo uses sanitizedString to log the filter. But when user defines the
field as string type, Iceberg shouldn't assume it to follow any pattern right?
Even if it's an invalid date/month/year, it should still work. And it doesn't
feel right to throw in logInfo.
In the latest master both SnapshotScan and BaseAllMetadataTableScan have
this check in the log, we can probably change it to use ExpressionParser.ToJson
instead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]