ThorneANN opened a new pull request, #4403:
URL: https://github.com/apache/flink-cdc/pull/4403

   、 CustomPostgresSchema#readTableSchema invokes jdbcConnection.readSchema with
     the full captured-table filter, so a single call already loads metadata for
     every captured table. However the cache-population loop only iterates the
     requested subset, discarding the rest. As a result, snapshot startup 
performs
     one full pg_catalog scan per split, scaling as O(N²) with the number of
     captured tables and causing severe latency on multi-tenant Postgres 
deployments
     that capture hundreds of tables across schemas.
     
     This change caches every table discovered by readSchema into 
schemasByTableId,
     while the returned tableChanges still contains only the 
originally-requested
     subset. Subsequent splits are served entirely from the cache.
     
     Also fixes a related issue where getTableSchema(List<TableId>) re-fetched
     already-cached tables by passing the full tableIds list to readTableSchema
     instead of the unmatched subset. 
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to