ndrluis commented on code in PR #3405:
URL: https://github.com/apache/iceberg-python/pull/3405#discussion_r3294817872
##########
pyiceberg/expressions/literals.py:
##########
@@ -68,6 +68,10 @@
UUID_BYTES_LENGTH = 16
+def _truncate_numeric_string_to_int(value: str) -> int:
+ return int(Decimal(value))
Review Comment:
`int(Decimal(value))` fixes the precision issue, but it also means we
materialize the full integer before doing the bounds check. For example,
`literal("1e1000000").to(IntegerType())` eventually returns `IntAboveMax`, but
only after constructing a very large Python `int` first. In my local check this
took ~17s.
Can we compare the parsed `Decimal` against `IntegerType.max/min` or
`LongType.max/min` before converting to `int`?
```python
number = Decimal(self.value)
if number > IntegerType.max:
return IntAboveMax()
elif number < IntegerType.min:
return IntBelowMin()
return LongLiteral(int(number))
```
That should preserve the precision fix while avoiding excessive CPU/memory
use for obviously out-of-range values.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]