Claudenw commented on PR #13357:
URL: https://github.com/apache/iceberg/pull/13357#issuecomment-3028505797

   @kumarpritam863 Interestingly the code you reference is the fix in the JDBC 
Catalog.  It contains the fix for issue 
https://github.com/apache/iceberg/issues/13343.
   
   While we can and have fixed it there, the problem is that there is no single 
threaded Catalog.init before Catalog first use.
   
   The argument that this change saves a single initialization only holds for 
lightly loaded system with few tasks.  In systems with multiple tasks and 
reasonable load we have consistently seen the issue arise.  When we encountered 
the issue, prior to the fixed code you reference, the initialization would fail 
and the connector fail to start.  With the patch in place we expect that the 
initialization will complete but will also take significantly longer and with 
multiple network requests required to resolve the issue.
   
   Most alerting systems will detect if a connector fails to start, which would 
be the result of a badly configured catalog after this change.
   
   I agree this is only evident in JdbcCatalog, but then it is an exemplar of 
the issue, and an exemplar that does not have an easy fix.  I have not looked 
at other Catalogs to see if they can benefit from an early initialization, but 
I can envision several architectures where it would be advantageous.
   
   This is not a generic problem of the Kafka Connector architecture and while 
it appears tightly coupled to the JdbcCatalog, the problem arises because 
Iceberg requires a Catalog but does not place constraints upon initialization 
and does not recognize that some initialization is best implemented in a single 
thread.  Otherwise, the Catalog interface would have a global initialization 
method and a local initialization method.  In which case I would expect to see 
the global initialization called during `container.start()` and the local 
initialization called during `task.start()`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to