gitzwz commented on issue #1479:
URL:
https://github.com/apache/iceberg-python/issues/1479#issuecomment-2567363410
The most time-consuming process is this :
```Python
for manifest_entry in chain(
*executor.map(
lambda args: _open_manifest(*args),
[
(
self.io,
manifest,
partition_evaluators[manifest.partition_spec_id],
metrics_evaluator,
)
for manifest in manifests
if self._check_sequence_number(min_sequence_number,
manifest)
],
)
):
...
```
For instance, consider a scenario with 6 manifest files, each containing
7,000 entries. With **max-workers=32**, the code spawns 6 threads, each
completing after approximately 30 seconds concurrently. In contrast, with
**max-workers=1**, the code processes the manifest files sequentially, yet
still finishes in roughly 30 seconds.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]