herbherbherb commented on PR #15769: URL: https://github.com/apache/iceberg/pull/15769#issuecomment-4305009028
@pvary Thank you for the feedback [here](https://github.com/apache/iceberg/pull/15316#issuecomment-4069625607). I completely understand your concern regarding exposing the internal components of `IcebergSink` and forcing the community to maintain those contracts long-term. Based on your suggestion, I took a step back and revised the architecture to avoid exposing those internals. Instead of changing access modifiers, I am proposing we add a few independent hooks: 1. OutputFileFactoryProvider — custom `OutputFileFactory` creation (https://github.com/apache/iceberg/issues/15763 / https://github.com/apache/iceberg/pull/15764) 2. PostCommitHook — callback after successful Iceberg commit (https://github.com/apache/iceberg/issues/15768 / https://github.com/apache/iceberg/pull/15769) 3. writeObserver — observes each record written and produces per-checkpoint metadata that flows through the sink pipeline (https://github.com/apache/iceberg/issues/15783 / https://github.com/apache/iceberg/pull/15784) Does this plugin-based approach alleviate your concerns about locking in the internal architecture? If this looks like a better direction, I would really appreciate it if you could take a look and review these individual PRs when you have a chance, much appreciated. Also tagging @bryanck -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
