Re: Propose a scheme for Coordinator to pull metadata incrementally

Lucas Capistrant Tue, 06 Apr 2021 14:04:38 -0700

Hey Benedict,

Adding on to what Samarth says in their reply, could you provide some more
context on this one to help the group understand more about your issue:


   - Is this the area of the code that you are saying in non-performant?
   Link
   
<https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/metadata/SqlSegmentsMetadataManager.java#L1003>
   - How many rows is your druid_segments table
   - Out of those rows, how many segments match "used=true"
   - What are the general specs of the machine running your metastore and
   which metastore are you using?

I'm always eager to see coordinator performance improve, but I think we
should be cautious about any changes to metastore table schemas!

- Lucas

On Tue, Apr 6, 2021 at 1:28 PM Ben Krug <[email protected]> wrote:

> Oh, that's easier than tombstones.  flag is_deleted and update timestamp
> (so it gets pulled again).
>
> On Tue, Apr 6, 2021 at 10:48 AM Tijo Thomas <[email protected]>
> wrote:
>
> > Abhishek,
> > Good point.  Do we need one more col for storing if it's deleted or not?
> >
> > On Tue, Apr 6, 2021 at 4:32 PM Abhishek Agarwal <
> [email protected]
> > >
> > wrote:
> >
> > > If an entry is deleted from the metadata, how is the coordinator going
> to
> > > update its own state?
> > >
> > > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe <[email protected]>
> wrote:
> > >
> > > > Hey,
> > > > I'm not a Druid developer, so it's quite possible I'm missing many
> > > > considerations here, but from a first glance, I like your offer, as
> it
> > > > resembles the *tsColumn *in JDBC lookups (
> > > >
> > > >
> > >
> >
> https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> > > > ).
> > > >
> > > > Anyway, just my 2 cents.
> > > >
> > > > Thanks!
> > > >           Itai
> > > >
> > > > On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin <[email protected]>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Recently, when the Coordinator in our company's Druid cluster pulls
> > > > > metadata, there is a performance bottleneck. The main reason is the
> > > huge
> > > > > amount of metadata, which leads to a very slow process of scanning
> > the
> > > > full
> > > > > table of metadata storage and deserializing metadata. The size of
> the
> > > > full
> > > > > metadata has been reduced through TTL, Compaction, Rollup, and
> etc.,
> > > but
> > > > > the effect is not very significant. Therefore, I want to design a
> > > scheme
> > > > > for Coordinator to pull metadata incrementally, that is, each time
> > > > > Coordinator only pulls newly added metadata, so as to reduce the
> > query
> > > > > pressure of metadata storage and the pressure of deserializing
> > > metadata.
> > > > > The general idea is to add a column last_update to the
> druid_segments
> > > > table
> > > > > to record the update time of each record. Furthermore, when we
> query
> > > the
> > > > > metadata table, we can add filter conditions for the last_update
> > column
> > > > to
> > > > > avoid full table scan operations. Moreover, whether it is MySQL or
> > > > > PostgreSQL as the metadata storage medium, it can support
> > > > >  automatic update of the timestamp field, which is somewhat similar
> > to
> > > > the
> > > > > characteristics of triggers. So, have you encountered this problem
> > > > before?
> > > > > If so, how did you solve it? In addition, do you have any
> suggestions
> > > or
> > > > > comments on the above incremental acquisition of metadata? Please
> let
> > > me
> > > > > know, thanks a lot.
> > > > >
> > > > > Regards,
> > > > > Benedict Jin
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [email protected]
> > > > > For additional commands, e-mail: [email protected]
> > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Thanks & Regards
> > Tijo Thomas
> >
>

Re: Propose a scheme for Coordinator to pull metadata incrementally

Reply via email to