> On 14 Jan 2021, at 21:32, Pierre Rogier <prog...@redhat.com> wrote:
> 
> Hi William, 
> 
> > It's a scenario we will need to fix via your BE work because of the MVCC 
> > transaction model that 
> > LMDB will force us to adopt :)
>   
> As I see things in the early phases the lmdb read txn will probably only be 
> managed at the db plugin level rather than at backend level. That means that 
> we will have the same inconsistency risk than today (i.e as if using bdb and 
> the implicit txn).  
> The txn model redesign you are speaking about should only occur in one of the 
> last phases (once bdb does no more coexists with lmdb).
> It must be done because it could provide a serious performance boost for read 
> operations (IMHO, In most cases we could avoid to duplicate the db data)
> But we should not do it while bdb is still around because of the risk of lock 
> issue and excessive retries.

Yep, agreed. It will be needed for a large read performance boost, but just to 
prevent exactly this kind of issue. We should be able to move to a model where 
everything is always within a transaction.

We could introduce it earlier and have the read txns be a no-op for bdb and 
continue using the implied transactions that we currently have, but also 
perhaps there is then no benefit to doing this earlier :) 

> 
> Note I put a phasing section in
> https://directory.fedoraproject.org/docs/389ds/design/backend-redesign-phase3.html#phasing
> explaining that. But I guess I should move it within Ludwig's document that 
> englobs it.
> 
> Pierre
> 
> On Thu, Jan 14, 2021 at 12:01 AM William Brown <wbr...@suse.de> wrote:
> 
> 
> > On 13 Jan 2021, at 21:24, Pierre Rogier <prog...@redhat.com> wrote:
> > 
> > Thank you Willian,
> > So far your scenario (entry found when reading base entry but no more 
> > existing when computing the candidates) is the only one that matches the 
> > symptoms.
> 
> It's a scenario we will need to fix via your BE work because of the MVCC 
> transaction model that LMDB will force us to adopt :) 
> 
> > And that triggered a thought: 
> >  We cannot do anything for SUBTREE and ONE_LEVEL searches
> >   because the fact that the base entry id is not in the candidate may be 
> > normal
> >  but IMHO we should improve the BASE search case.
> > In this case the candidate list is directly set to the base entry id
> >  ==> if the candidate entry (in ldbm_back_next_search_entry) is not found 
> > and the scope is BASE then we should return a LDAP_NO_SUCH_ENTRY error ..
> 
> I suspect that Mark has seen this email and submitted a PR to resolve this 
> exact case :) 
> 
> 
> > 
> >        Pierre
> > 
> > 
> > On Wed, Jan 13, 2021 at 1:45 AM William Brown <wbr...@suse.de> wrote:
> > Hey there,
> > 
> > https://github.com/389ds/389-ds-base/pull/4525/files
> > 
> > I had a look and I can see a few possible contributing factors, but without 
> > a core and the exact state I can't be sure if this is correct. It's all 
> > just hypothetical from reading the code.
> > 
> > 
> > The crash is in deref_do_deref_attr() which is called as part of 
> > deref_pre_entry(). This is the SLAPI_PLUGIN_PRE_ENTRY_FN which is called by 
> > "./ldap/servers/slapd/result.c:1488:    rc = plugin_call_plugins(pb, 
> > SLAPI_PLUGIN_PRE_ENTRY_FN);"
> > 
> > 
> > I think what's important here is that the search is conducted in 
> > ./ldap/servers/slapd/opshared.c:818  rc = (*be->be_search)(pb);  Is *not* 
> > in a transaction. That means that while the single search in be_search() is 
> > consistent due to an implied transaction, the subsequent search in 
> > deref_pre_entry() is likely conducted in a seperate transaction. This 
> > allows for other operations to potentially interleave and cause changes - 
> > modrdn or delete would certainly be candidates to cause a DN to be remove 
> > between these two points. It would be extremely hard to reproduce as a race 
> > condition of course. 
> > 
> > 
> > A question you asked is why don't we get a "no such entry" error or 
> > similar? I think that this is because build_candidate_list in ldbm_search.c 
> > doesn't actually create an error if the base_candidates list is empty, 
> > because an IDL is allocated with a value of 0 (no matching entries). this 
> > allows the search to proceed, and there are no errors, and the result set 
> > is set to NULL with size 0. I can't see where LDAP_NO_SUCH_OBJECT is set in 
> > this process, but without looking further into it, my suspicion is that 
> > entries of size 0 WONT return an error condition to internal_search_pb, so 
> > it's valid for this to be empty.
> > 
> > Anyway, again, this is just reading the code for 20 minutes, and is not a 
> > complete in depth investigation, but maybe it's some ideas about what 
> > happened?
> > 
> > Hope it helps :) 
> > 
> > 
> > 
> > —
> > Sincerely,
> > 
> > William Brown
> > 
> > Senior Software Engineer, 389 Directory Server
> > SUSE Labs, Australia
> > _______________________________________________
> > 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> > To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
> > Fedora Code of Conduct: 
> > https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives: 
> > https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
> > 
> > 
> > -- 
> > --
> > 
> > 389 Directory Server Development Team
> > _______________________________________________
> > 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> > To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
> > Fedora Code of Conduct: 
> > https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives: 
> > https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
> 
> —
> Sincerely,
> 
> William Brown
> 
> Senior Software Engineer, 389 Directory Server
> SUSE Labs, Australia
> _______________________________________________
> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
> 
> 
> -- 
> --
> 
> 389 Directory Server Development Team
> _______________________________________________
> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

—
Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs, Australia
_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

Reply via email to