> On 14 Jan 2021, at 21:32, Pierre Rogier <prog...@redhat.com> wrote: > > Hi William, > > > It's a scenario we will need to fix via your BE work because of the MVCC > > transaction model that > > LMDB will force us to adopt :) > > As I see things in the early phases the lmdb read txn will probably only be > managed at the db plugin level rather than at backend level. That means that > we will have the same inconsistency risk than today (i.e as if using bdb and > the implicit txn). > The txn model redesign you are speaking about should only occur in one of the > last phases (once bdb does no more coexists with lmdb). > It must be done because it could provide a serious performance boost for read > operations (IMHO, In most cases we could avoid to duplicate the db data) > But we should not do it while bdb is still around because of the risk of lock > issue and excessive retries.
Yep, agreed. It will be needed for a large read performance boost, but just to prevent exactly this kind of issue. We should be able to move to a model where everything is always within a transaction. We could introduce it earlier and have the read txns be a no-op for bdb and continue using the implied transactions that we currently have, but also perhaps there is then no benefit to doing this earlier :) > > Note I put a phasing section in > https://directory.fedoraproject.org/docs/389ds/design/backend-redesign-phase3.html#phasing > explaining that. But I guess I should move it within Ludwig's document that > englobs it. > > Pierre > > On Thu, Jan 14, 2021 at 12:01 AM William Brown <wbr...@suse.de> wrote: > > > > On 13 Jan 2021, at 21:24, Pierre Rogier <prog...@redhat.com> wrote: > > > > Thank you Willian, > > So far your scenario (entry found when reading base entry but no more > > existing when computing the candidates) is the only one that matches the > > symptoms. > > It's a scenario we will need to fix via your BE work because of the MVCC > transaction model that LMDB will force us to adopt :) > > > And that triggered a thought: > > We cannot do anything for SUBTREE and ONE_LEVEL searches > > because the fact that the base entry id is not in the candidate may be > > normal > > but IMHO we should improve the BASE search case. > > In this case the candidate list is directly set to the base entry id > > ==> if the candidate entry (in ldbm_back_next_search_entry) is not found > > and the scope is BASE then we should return a LDAP_NO_SUCH_ENTRY error .. > > I suspect that Mark has seen this email and submitted a PR to resolve this > exact case :) > > > > > > Pierre > > > > > > On Wed, Jan 13, 2021 at 1:45 AM William Brown <wbr...@suse.de> wrote: > > Hey there, > > > > https://github.com/389ds/389-ds-base/pull/4525/files > > > > I had a look and I can see a few possible contributing factors, but without > > a core and the exact state I can't be sure if this is correct. It's all > > just hypothetical from reading the code. > > > > > > The crash is in deref_do_deref_attr() which is called as part of > > deref_pre_entry(). This is the SLAPI_PLUGIN_PRE_ENTRY_FN which is called by > > "./ldap/servers/slapd/result.c:1488: rc = plugin_call_plugins(pb, > > SLAPI_PLUGIN_PRE_ENTRY_FN);" > > > > > > I think what's important here is that the search is conducted in > > ./ldap/servers/slapd/opshared.c:818 rc = (*be->be_search)(pb); Is *not* > > in a transaction. That means that while the single search in be_search() is > > consistent due to an implied transaction, the subsequent search in > > deref_pre_entry() is likely conducted in a seperate transaction. This > > allows for other operations to potentially interleave and cause changes - > > modrdn or delete would certainly be candidates to cause a DN to be remove > > between these two points. It would be extremely hard to reproduce as a race > > condition of course. > > > > > > A question you asked is why don't we get a "no such entry" error or > > similar? I think that this is because build_candidate_list in ldbm_search.c > > doesn't actually create an error if the base_candidates list is empty, > > because an IDL is allocated with a value of 0 (no matching entries). this > > allows the search to proceed, and there are no errors, and the result set > > is set to NULL with size 0. I can't see where LDAP_NO_SUCH_OBJECT is set in > > this process, but without looking further into it, my suspicion is that > > entries of size 0 WONT return an error condition to internal_search_pb, so > > it's valid for this to be empty. > > > > Anyway, again, this is just reading the code for 20 minutes, and is not a > > complete in depth investigation, but maybe it's some ideas about what > > happened? > > > > Hope it helps :) > > > > > > > > — > > Sincerely, > > > > William Brown > > > > Senior Software Engineer, 389 Directory Server > > SUSE Labs, Australia > > _______________________________________________ > > 389-devel mailing list -- 389-devel@lists.fedoraproject.org > > To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org > > Fedora Code of Conduct: > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > List Archives: > > https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org > > > > > > -- > > -- > > > > 389 Directory Server Development Team > > _______________________________________________ > > 389-devel mailing list -- 389-devel@lists.fedoraproject.org > > To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org > > Fedora Code of Conduct: > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > List Archives: > > https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org > > — > Sincerely, > > William Brown > > Senior Software Engineer, 389 Directory Server > SUSE Labs, Australia > _______________________________________________ > 389-devel mailing list -- 389-devel@lists.fedoraproject.org > To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org > > > -- > -- > > 389 Directory Server Development Team > _______________________________________________ > 389-devel mailing list -- 389-devel@lists.fedoraproject.org > To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org — Sincerely, William Brown Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-devel mailing list -- 389-devel@lists.fedoraproject.org To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org