Re: Propose a scheme for Coordinator to pull metadata incrementally

Samarth Jain Tue, 06 Apr 2021 10:20:43 -0700

Hi Benedict,

I am curious to understand what functionality of Druid are you seeing the
slowness in? Is it the coordinator work of assigning segments to
historicals that is slower or is it the querying of segment information
that is slower? Have you looked into CPU/network metrics for your metadata
RDS? Maybe scaling up to a bigger instance would help. It would also be
good to see the query patterns and possibly tweak or add new indexes to
help speed up. Also, do you have the cleanup of metadata rows enabled (
https://druid.apache.org/docs/latest/tutorials/tutorial-delete-data.html#run-a-kill-task
and *druid.coordinator.kill*.*on*)   that should further help control the
size of druid_segments table.


On Tue, Apr 6, 2021 at 8:08 AM Ben Krug <[email protected]> wrote:

> I suppose, if we were going down this path, something like tombstones in
> Cassandra could be used.
> But it would increase the complexity significantly.
> Ie, a new row is inserted with a deletion marker and a timestamp, that
> indicates that the corresponding row is deleted.
> Now, when anyone does scan the table, they need to check for tombstones too
> and process that logic.  Then, after
> a configurable amount of time, both the original row and the tombstone row
> can be cleaned up.
>
> Probably a lot of work and complexity for this one use case, though.
>
> On Tue, Apr 6, 2021 at 4:02 AM Abhishek Agarwal <[email protected]
> >
> wrote:
>
> > If an entry is deleted from the metadata, how is the coordinator going to
> > update its own state?
> >
> > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe <[email protected]> wrote:
> >
> > > Hey,
> > > I'm not a Druid developer, so it's quite possible I'm missing many
> > > considerations here, but from a first glance, I like your offer, as it
> > > resembles the *tsColumn *in JDBC lookups (
> > >
> > >
> >
> https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> > > ).
> > >
> > > Anyway, just my 2 cents.
> > >
> > > Thanks!
> > >           Itai
> > >
> > > On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin <[email protected]>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Recently, when the Coordinator in our company's Druid cluster pulls
> > > > metadata, there is a performance bottleneck. The main reason is the
> > huge
> > > > amount of metadata, which leads to a very slow process of scanning
> the
> > > full
> > > > table of metadata storage and deserializing metadata. The size of the
> > > full
> > > > metadata has been reduced through TTL, Compaction, Rollup, and etc.,
> > but
> > > > the effect is not very significant. Therefore, I want to design a
> > scheme
> > > > for Coordinator to pull metadata incrementally, that is, each time
> > > > Coordinator only pulls newly added metadata, so as to reduce the
> query
> > > > pressure of metadata storage and the pressure of deserializing
> > metadata.
> > > > The general idea is to add a column last_update to the druid_segments
> > > table
> > > > to record the update time of each record. Furthermore, when we query
> > the
> > > > metadata table, we can add filter conditions for the last_update
> column
> > > to
> > > > avoid full table scan operations. Moreover, whether it is MySQL or
> > > > PostgreSQL as the metadata storage medium, it can support
> > > >  automatic update of the timestamp field, which is somewhat similar
> to
> > > the
> > > > characteristics of triggers. So, have you encountered this problem
> > > before?
> > > > If so, how did you solve it? In addition, do you have any suggestions
> > or
> > > > comments on the above incremental acquisition of metadata? Please let
> > me
> > > > know, thanks a lot.
> > > >
> > > > Regards,
> > > > Benedict Jin
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > > >
> > >
> >
>

Re: Propose a scheme for Coordinator to pull metadata incrementally

Reply via email to