[GitHub] [iceberg] Fokko commented on a diff in pull request #8067: Python: Add more tests for the Avro writer

via GitHub Thu, 03 Aug 2023 04:30:49 -0700


Fokko commented on code in PR #8067:
URL: https://github.com/apache/iceberg/pull/8067#discussion_r1283066035



##########
python/pyiceberg/avro/reader.py:
##########
@@ -229,11 +260,19 @@ def skip(self, decoder: BinaryDecoder) -> None:
 
 @dataclass(frozen=True)
 class DecimalReader(Reader):
+    """Reads a value as a decimal.
+
+    Decimal bytes are decoded as signed short, int or long depending on the
+    size of bytes.
+    """
+
     precision: int = dataclassfield()
     scale: int = dataclassfield()
 
     def read(self, decoder: BinaryDecoder) -> Decimal:
-        return decoder.read_decimal_from_bytes(self.precision, self.scale)
+        data = decoder.read(decoder.read_int())
+        unscaled_datum = int.from_bytes(data, byteorder="big", signed=True)
+        return unscaled_to_decimal(unscaled_datum, self.scale)

Review Comment:
   > Why move this here? The encoder still has decimal logic. I don't mind 
either way whether decimal is handled by the encoder/decoder or in the 
reader/writer, but it seems like we should be consistent.
   
   I've moved it to the `decimal.py`, and created a function 
`bytes_to_decimal`, since we already have `decimal_to_bytes`.
   
   > Also, this reader assumes that the decimal is stored as variable-length 
binary and not fixed (because it is reading the length). The spec requires that 
decimals are written in Avro as fixed, length minBytesRequired(P): 
https://iceberg.apache.org/spec/#avro
   
   😱  Ah, this was always wrong, thanks for catching this. Let me add a test 
there



##########
python/pyiceberg/avro/reader.py:
##########
@@ -229,11 +260,19 @@ def skip(self, decoder: BinaryDecoder) -> None:
 
 @dataclass(frozen=True)
 class DecimalReader(Reader):
+    """Reads a value as a decimal.
+
+    Decimal bytes are decoded as signed short, int or long depending on the
+    size of bytes.
+    """
+
     precision: int = dataclassfield()
     scale: int = dataclassfield()
 
     def read(self, decoder: BinaryDecoder) -> Decimal:
-        return decoder.read_decimal_from_bytes(self.precision, self.scale)
+        data = decoder.read(decoder.read_int())
+        unscaled_datum = int.from_bytes(data, byteorder="big", signed=True)
+        return unscaled_to_decimal(unscaled_datum, self.scale)

Review Comment:
   > Why move this here? The encoder still has decimal logic. I don't mind 
either way whether decimal is handled by the encoder/decoder or in the 
reader/writer, but it seems like we should be consistent.
   
   I've moved it to the `decimal.py`, and created a function 
`bytes_to_decimal`, since we already have `decimal_to_bytes`.
   
   > Also, this reader assumes that the decimal is stored as variable-length 
binary and not fixed (because it is reading the length). The spec requires that 
decimals are written in Avro as fixed, length minBytesRequired(P): 
https://iceberg.apache.org/spec/#avro
   
   😱  Ah, this was always wrong, thanks for catching this. Let me add a test 
there



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Fokko commented on a diff in pull request #8067: Python: Add more tests for the Avro writer

Reply via email to