> On 27 Feb 2019, at 10:21, Rich Megginson <rmegg...@redhat.com> wrote:
> 
> On 2/26/19 4:26 PM, William Brown wrote:
>> 
>>> On 26 Feb 2019, at 18:32, Ludwig Krispenz <lkris...@redhat.com> wrote:
>>> 
>>> Hi, I need a bit of time to read the docs and clear my thoughts, but one 
>>> comment below
>>> On 02/25/2019 01:49 AM, William Brown wrote:
>>>>> On 23 Feb 2019, at 02:46, Mark Reynolds <mreyno...@redhat.com> wrote:
>>>>> 
>>>>> I want to start a brief discussion about a major problem we have backend 
>>>>> transaction plugins and the entry caches.  I'm finding that when we get 
>>>>> into a nested state of be txn plugins and one of the later plugins that 
>>>>> is called fails then while we don't commit the disk changes (they are 
>>>>> aborted/rolled back) we DO keep the entry cache changes!
>>>>> 
>>>>> For example, a modrdn operation triggers the referential integrity plugin 
>>>>> which renames the member attribute in some group and changes that group's 
>>>>> entry cache entry, but then later on the memberOf plugin fails for some 
>>>>> reason.  The database transaction is aborted, but the entry cache changes 
>>>>> that RI plugin did are still present :-(  I have also found other entry 
>>>>> cache issues with modrdn and BE TXN plugins, and we know of other 
>>>>> currently non-reproducible entry cache crashes as well related to 
>>>>> mishandling of cache entries after failed operations.
>>>>> 
>>>>> It's time to rework how we use the entry cache.  We basically need a 
>>>>> transaction style caching mechanism - we should not commit any entry 
>>>>> cache changes until the original operation is fully successful. 
>>>>> Unfortunately the way the entry cache is currently designed and used it 
>>>>> will be a major change to try to change it.
>>>>> 
>>>>> William wrote up this doc: 
>>>>> http://www.port389.org/docs/389ds/design/cache_redesign.html
>>>>> 
>>>>> But this also does not currently cover the nested plugin scenario either 
>>>>> (not yet).  I do know how how difficult it would be to implement 
>>>>> William's proposal, or how difficult it would be to incorporate the txn 
>>>>> style caching into his design.  What kind of time frame could this even 
>>>>> be implemented in?  William what are your thoughts?
>>>> I like coffee? How cool are planes? My thoughts are simple :)
>>>> 
>>>> I think there is a pretty simple mental simplification we can make here 
>>>> though. Nested transactions “don’t really exist”. We just have *recursive* 
>>>> operations inside of one transaction.
>>>> 
>>>> Once reframed like that, the entire situation becomes simpler. We have one 
>>>> thread in a write transaction that can have recursive/batched operations 
>>>> as required, which means that either “all operations succeed” or “none 
>>>> do”. Really, this is the behaviour we want anyway, and it’s the 
>>>> transaction model of LMDB and other kv stores that we could consider 
>>>> (wired tiger, sled in the future).
>>> I think the recursive/nested transaction on the database level are not the 
>>> problem, we do this correctly already, either all or no change becomes 
>>> persistent.
>>> What we do not manage is modifications we do in parallel on the in memory 
>>> structure like the entry cache, changes to the EC are not managed by any 
>>> txn and I do not see how any of the database txn models would help, they do 
>>> not know about ec and can abort changes.
>>> We would need to incorporate the EC into a generic txn model, or have a way 
>>> to flag ec entries as garbage for if a txn is aborted
>> The issue is we allow parallel writes, which breaks the consistency 
>> guarantees of the EC anyway. LMDB won’t allow parallel writes (it’s single 
>> write - concurrent parallel readers), and most other modern kv stores take 
>> this approach too, so we should be remodelling our transactions to match 
>> this IMO. It will make the process of how we reason about the EC much much 
>> simpler I think.
> 
> 
> Some sort of in-memory data structure with fast lookup and transactional 
> semantics (modify operations are stored as mvcc/cow so each read of the 
> database with a given txn handle sees its own view of the ec, a txn commit 
> updates the parent txn ec view, or the global ec view if no parent, from the 
> copy, a txn abort deletes the txn's copy of the ec) is needed.  A quick 
> google search turns up several hits.  I'm not sure if the B+Tree proposed at 
> http://www.port389.org/docs/389ds/design/cache_redesign.html has 
> transactional semantics, or if such code could be added to its implementation.

It does, this is a MVCC B+Tree implementation. 

> 
> With LMDB, if we could make the on-disk entry representation the same as the 
> in-memory entry representation, then we could use LMDB as the entry cache too 
> - the database would be the entry cache as well.

Yes, Ludwig has suggested this because it would remove the need for an Entry 
Cache at all. 

> 
> 
>> 
>>>>> If William's design is too huge of a change that will take too long to 
>>>>> safely implement then perhaps we need to look into revising the existing 
>>>>> cache design where we use "cache_add_tentative" style functions and only 
>>>>> apply them at the end of the op.  This is also not a trivial change.
>>>> It’s pretty massive as a change - if we want to do it right. I’d say we 
>>>> need:
>>>> 
>>>> * development and testing of a MVCC/COW cache implementation (proof that 
>>>> it really really works transactionally)
>>>> * allow “disable/disconnect” of the entry cache, but with the higher level 
>>>> txn’s so that we can prove the txn semantics are correct
>>>> * re-architect our transaction calls so that they are “higher” up. An 
>>>> example is that internal_modify shouldn’t start a txn, it should be given 
>>>> the current txn state as an arg. Combined with the above, we can prove we 
>>>> haven’t corrupted our server transaction guarantees.
>>>> * integrate the transactional cache.
>>>> 
>>>> I don’t know if I would still write a transactional cache the same way as 
>>>> I proposed in that design, but I think the ideas are on the right path.
>>>> 
>>>>> And what impact would changing the entry cache have on Ludwig's plugable 
>>>>> backend work?
>>>> Should be none, it’s seperate layers. If anything this change is going to 
>>>> make Ludwig’s work better because our current model won’t really take good 
>>>> advantage of the MVCC nature of modern kv stores.
>>>> 
>>>>> Anyway we need to start thinking about redesigning the entry cache - no 
>>>>> matter what approach we want to take.  If anyone has any ideas or 
>>>>> comments please share them, but I think due to the severity of this flaw 
>>>>> redesigning the entry cache should be one of our next major goals in DS 
>>>>> (1.4.1?).
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Mark
>>>>> _______________________________________________
>>>>> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
>>>>> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
>>>>> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
>>>>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>>> List Archives: 
>>>>> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
>>>> —
>>>> Sincerely,
>>>> 
>>>> William Brown
>>>> Software Engineer, 389 Directory Server
>>>> SUSE Labs
>>>> _______________________________________________
>>>> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
>>>> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
>>>> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
>>>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>>>> List Archives: 
>>>> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
>>> -- 
>>> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
>>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>>> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, 
>>> Eric Shander
>>> _______________________________________________
>>> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
>>> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
>>> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
>>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>>> List Archives: 
>>> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
>> —
>> Sincerely,
>> 
>> William Brown
>> Software Engineer, 389 Directory Server
>> SUSE Labs
>> _______________________________________________
>> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
>> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
>> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives: 
>> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
> 
> _______________________________________________
> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

—
Sincerely,

William Brown
Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

Reply via email to