[389-devel] Re: Implementing Referential Integrity by using a queue instead of a file

William Brown Sun, 09 Jul 2017 15:14:51 -0700

On Fri, 2017-07-07 at 11:44 +0200, Ludwig Krispenz wrote:
> On 07/07/2017 10:44 AM, Ludwig Krispenz wrote:
> >
> > On 07/07/2017 07:10 AM, William Brown wrote:
> >>>>>> Any thoughts or objections on the above would be welcome.
> >>>>> The only problem with going to a queue is if the server goes down
> >>>>> unexpectedly.  In such a case those RI updates would be lost.
> >>>> We already have this issue because there is a delay between the change
> >>>> to the object and the log being sync() to disk. So we can already lose
> >>>> changes here. TBH the only fix is ot remove the async model. I actually
> >>>> question why we still need async/delay processing of the refint
> >>>> plugin ...
> >>> Historically speaking, a long time ago, we used to see high CPU when the
> >>> RI plugin was engaged.  Setting the delay to 1 second, and allowing the
> >>> log thread to do the work, improved performance.  Of course this is now
> >>> obsolete with the betxn plugin model and other improvements, but I
> >>> wanted to share why the feature even existed.
> >> I guess that would be related to internal op searches / lack of
> >> indexing. These days it's not as big of an issue.
> > boldly said. How do you know, did you verify it ?
> > we have seen many customer issues with refereint which were resolved 
> > by making it async, just removing this option without proof of a 
> > better solution is no good.
> > I also am not sure if we need to tie anything into the betxn. There 
> > are operations, which, in my opinion, can be delayed, even redone by 
> > fixup tasks, so it is not necessary to have it in one txn, and if the 
> > option is there to delay it if you want, we should not take it away
> >> So, lets open a ticket to remove delayed processing mode then?
> > you can, but I will oppose to implement it :-)
> I would even go and suggest to implement the delay feature for memberof, 
> like a continuous fixup task.


I disagree: Right now a delayed / async mode causes us issue because you
have a seperation between "the change" then the database reflecting
that. Be it through memberof or refint. We have lost the atomic nature
of the DB as a result.

Memberof especially so because SSSD uses this for membership. If we
implemented this we could cause undue delays to access controls be it
addition or removal while a consumer waits for the processing to occur.

Additionally, the async modes *force* the plugin to run on a single
master, rather than all masters lest we cause many replication
conflicts.

For the sake of replication master consistency being able to run the
same configuration on all masters is extremely important.

Finally, any delays in refint/memberof processing:

* We have a better MO algorithm there, it's just not been implemented
yet.
* Perhaps refint is missing indexes, or needs an algorithmic change of
it's own.


I think that it's a better investment of time to *fix* our problems,
rather than hide and delay them behind async processing tasks (which
arguably, will cause other random delays by being batched, and holding
the write lock for a longer time period than a single write).

So, I will open this ticket, and I think that given your concerns about
this Ludwig, we'll need to keep in mind that refint processing
performance is a key aspect of this change, and that we will need to
perform some significant load testing to guarantee we will not cause a
regression in this space. 

-- 
Sincerely,

William Brown
Software Engineer
Red Hat, Australia/Brisbane

signature.asc
Description: This is a digitally signed message part

_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org

[389-devel] Re: Implementing Referential Integrity by using a queue instead of a file

Reply via email to