flyrain commented on issue #2365:
URL: https://github.com/apache/polaris/issues/2365#issuecomment-3721345170
I am not sure the issue as currently described is actually valid.
The base64 encoded manifest objects[1] being discussed are not the manifest
files themselves. They are objects representing manifest files, which can be
reconstructed from the manifest entries stored in the ManifestList files. As a
result, the in memory footprint should be roughly equivalent to the size of a
single manifest list file per snapshot, plus some additional base64 encoding
overhead. That overhead does not seem significant enough on its own to explain
large heap pressure.
This pattern is also handled in practice today. For example, multiple Spark
procedures/actions and Spark planning process these manifest representations
within a single node, typically the driver, without materializing full manifest
files in memory. One concrete example is here:
https://github.com/apache/iceberg/blob/bf1074ff373c929614a3f92dd4e46780028ac1ca/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java#L290
Given this, I am not convinced that embedding these manifest representations
is inherently problematic from a memory perspective. If there are concrete
scenarios where this still leads to excessive memory usage, it would be helpful
to clarify where the amplification happens beyond what is already expected from
manifest list processing.
Happy to be corrected if I am missing something, but wanted to share this
observation before we anchor further design decisions on this assumption.
1.
https://github.com/apache/polaris/blob/c9efc6c1af202686945efe2e19125e8f116a0206/runtime/service/src/main/java/org/apache/polaris/service/task/TableCleanupTaskHandler.java#L194
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]