Hi Ben Krug,

Thank you very much for your ideas, but I also feel that the introduction of 
Cassandra is too heavy. The tombstones feature in Cassandra you mentioned can 
actually be supported by timed tasks in MySQL or PostgreSQL.

Regards,
Benedict Jin

On 2021/04/06 15:08:03, Ben Krug <ben.k...@imply.io> wrote: 
> I suppose, if we were going down this path, something like tombstones in
> Cassandra could be used.
> But it would increase the complexity significantly.
> Ie, a new row is inserted with a deletion marker and a timestamp, that
> indicates that the corresponding row is deleted.
> Now, when anyone does scan the table, they need to check for tombstones too
> and process that logic.  Then, after
> a configurable amount of time, both the original row and the tombstone row
> can be cleaned up.
> 
> Probably a lot of work and complexity for this one use case, though.
> 
> On Tue, Apr 6, 2021 at 4:02 AM Abhishek Agarwal <abhishek.agar...@imply.io>
> wrote:
> 
> > If an entry is deleted from the metadata, how is the coordinator going to
> > update its own state?
> >
> > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe <itai.ya...@gmail.com> wrote:
> >
> > > Hey,
> > > I'm not a Druid developer, so it's quite possible I'm missing many
> > > considerations here, but from a first glance, I like your offer, as it
> > > resembles the *tsColumn *in JDBC lookups (
> > >
> > >
> > https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> > > ).
> > >
> > > Anyway, just my 2 cents.
> > >
> > > Thanks!
> > >           Itai
> > >
> > > On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin <asdf2...@apache.org> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Recently, when the Coordinator in our company's Druid cluster pulls
> > > > metadata, there is a performance bottleneck. The main reason is the
> > huge
> > > > amount of metadata, which leads to a very slow process of scanning the
> > > full
> > > > table of metadata storage and deserializing metadata. The size of the
> > > full
> > > > metadata has been reduced through TTL, Compaction, Rollup, and etc.,
> > but
> > > > the effect is not very significant. Therefore, I want to design a
> > scheme
> > > > for Coordinator to pull metadata incrementally, that is, each time
> > > > Coordinator only pulls newly added metadata, so as to reduce the query
> > > > pressure of metadata storage and the pressure of deserializing
> > metadata.
> > > > The general idea is to add a column last_update to the druid_segments
> > > table
> > > > to record the update time of each record. Furthermore, when we query
> > the
> > > > metadata table, we can add filter conditions for the last_update column
> > > to
> > > > avoid full table scan operations. Moreover, whether it is MySQL or
> > > > PostgreSQL as the metadata storage medium, it can support
> > > >  automatic update of the timestamp field, which is somewhat similar to
> > > the
> > > > characteristics of triggers. So, have you encountered this problem
> > > before?
> > > > If so, how did you solve it? In addition, do you have any suggestions
> > or
> > > > comments on the above incremental acquisition of metadata? Please let
> > me
> > > > know, thanks a lot.
> > > >
> > > > Regards,
> > > > Benedict Jin
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > >
> > > >
> > >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to