Hi all, Recently, when the Coordinator in our company's Druid cluster pulls metadata, there is a performance bottleneck. The main reason is the huge amount of metadata, which leads to a very slow process of scanning the full table of metadata storage and deserializing metadata. The size of the full metadata has been reduced through TTL, Compaction, Rollup, and etc., but the effect is not very significant. Therefore, I want to design a scheme for Coordinator to pull metadata incrementally, that is, each time Coordinator only pulls newly added metadata, so as to reduce the query pressure of metadata storage and the pressure of deserializing metadata. The general idea is to add a column last_update to the druid_segments table to record the update time of each record. Furthermore, when we query the metadata table, we can add filter conditions for the last_update column to avoid full table scan operations. Moreover, whether it is MySQL or PostgreSQL as the metadata storage medium, it can support automatic update of the timestamp field, which is somewhat similar to the characteristics of triggers. So, have you encountered this problem before? If so, how did you solve it? In addition, do you have any suggestions or comments on the above incremental acquisition of metadata? Please let me know, thanks a lot.
Regards, Benedict Jin --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
