Fokko commented on issue #1751:
URL:
https://github.com/apache/iceberg-python/issues/1751#issuecomment-2697492762
@andormarkus Sure thing, does the following help:
```
from io import BytesIO
from pyiceberg.avro.decoder_fast import CythonBinaryDecoder
from pyiceberg.avro.encoder import BinaryEncoder
from pyiceberg.avro.resolver import construct_writer, resolve_reader
from pyiceberg.manifest import DATA_FILE_TYPE, DEFAULT_READ_VERSION,
DataFile, DataFileContent, FileFormat
from pyiceberg.typedef import Record
def test_serialize():
data_file = DataFile(
content=DataFileContent.DATA,
file_path="s3://some-path/some-file.parquet",
file_format=FileFormat.PARQUET,
partition=Record(),
record_count=131327,
file_size_in_bytes=220669226,
column_sizes={1: 220661854},
value_counts={1: 131327},
null_value_counts={1: 0},
nan_value_counts={},
lower_bounds={1: b"aaaaaaaaaaaaaaaa"},
upper_bounds={1: b"zzzzzzzzzzzzzzzz"},
key_metadata=b"\xde\xad\xbe\xef",
split_offsets=[4, 133697593],
equality_ids=[],
sort_order_id=4,
)
# Encode
output = BytesIO()
encoder = BinaryEncoder(output)
schema = DATA_FILE_TYPE[DEFAULT_READ_VERSION]
construct_writer(file_schema=schema).write(encoder, data_file)
# Decode
decoder = CythonBinaryDecoder(output.getvalue())
result = resolve_reader(
schema,
schema,
read_types={-1: DataFile},
).read(decoder)
assert result.file_path == "s3://some-path/some-file.parquet"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]