Propose a scheme for Coordinator to pull metadata incrementally

Benedict Jin Mon, 05 Apr 2021 20:07:14 -0700

Hi all,

Recently, when the Coordinator in our company's Druid cluster pulls metadata, 
there is a performance bottleneck. The main reason is the huge amount of 
metadata, which leads to a very slow process of scanning the full table of 
metadata storage and deserializing metadata. The size of the full metadata has 
been reduced through TTL, Compaction, Rollup, and etc., but the effect is not 
very significant. Therefore, I want to design a scheme for Coordinator to pull 
metadata incrementally, that is, each time Coordinator only pulls newly added 
metadata, so as to reduce the query pressure of metadata storage and the 
pressure of deserializing metadata. The general idea is to add a column 
last_update to the druid_segments table to record the update time of each 
record. Furthermore, when we query the metadata table, we can add filter 
conditions for the last_update column to avoid full table scan operations. 
Moreover, whether it is MySQL or PostgreSQL as the metadata storage medium, it 
can support 
 automatic update of the timestamp field, which is somewhat similar to the 
characteristics of triggers. So, have you encountered this problem before? If 
so, how did you solve it? In addition, do you have any suggestions or comments 
on the above incremental acquisition of metadata? Please let me know, thanks a 
lot.


Regards,
Benedict Jin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Propose a scheme for Coordinator to pull metadata incrementally

Reply via email to