Claudenw commented on PR #13357: URL: https://github.com/apache/iceberg/pull/13357#issuecomment-3028505797
@kumarpritam863 Interestingly the code you reference is the fix in the JDBC Catalog. It contains the fix for issue https://github.com/apache/iceberg/issues/13343. While we can and have fixed it there, the problem is that there is no single threaded Catalog.init before Catalog first use. The argument that this change saves a single initialization only holds for lightly loaded system with few tasks. In systems with multiple tasks and reasonable load we have consistently seen the issue arise. When we encountered the issue, prior to the fixed code you reference, the initialization would fail and the connector fail to start. With the patch in place we expect that the initialization will complete but will also take significantly longer and with multiple network requests required to resolve the issue. Most alerting systems will detect if a connector fails to start, which would be the result of a badly configured catalog after this change. I agree this is only evident in JdbcCatalog, but then it is an exemplar of the issue, and an exemplar that does not have an easy fix. I have not looked at other Catalogs to see if they can benefit from an early initialization, but I can envision several architectures where it would be advantageous. This is not a generic problem of the Kafka Connector architecture and while it appears tightly coupled to the JdbcCatalog, the problem arises because Iceberg requires a Catalog but does not place constraints upon initialization and does not recognize that some initialization is best implemented in a single thread. Otherwise, the Catalog interface would have a global initialization method and a local initialization method. In which case I would expect to see the global initialization called during `container.start()` and the local initialization called during `task.start()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
