Re: [controller-dev] 答复: Re: A explanation of https://bugs.opendaylight.org/show_bug.cgi?id=7033

2016-10-28 Thread Tom Pantelis
On Fri, Oct 28, 2016 at 11:08 AM, Robert Varga  wrote:

> On 10/28/2016 04:24 PM, Tom Pantelis wrote:
> >
> >
> > On Fri, Oct 28, 2016 at 10:00 AM, Robert Varga  > > wrote:
> >
> > On 10/28/2016 08:00 AM, he.yu...@zte.com.cn
> >  wrote:
> > > so we need to public the DataTreeCandidateTip.getTipRoot() API in
> > > yangtools to get the last tx's root
> > >
> > > All of the above is not related to commit phase, the overall
> process is
> > > as follows
> > >
> > >
> > > Tx1.prepare() ---> Tx1.candidate
> > >Tx1 persist ReplicatedLogEntry
> > >Tx1 add to pipelineTransactions
> > >Tx2.prepare(Tx1.candidate) -->
> Tx2.candidate
> >
> > Actually the flow should be:
> >
> > TipProducingDataTree dataTree;
> > DataTreeCandidateTip tx1Candidate = dataTree.prepare(tx1);
> > persist tx1Candidate
> > DataTreeCandidateTip tx2Candidate = tx1Candidate.prepare(tx2);
> > persist tx2Candidate
> > dataTree.commit(tx1Candidate);
> > dataTree.commit(tx2Candidate);
> >
> > All of the accounting needed will occur inside the DataTree
> > implementation without leaking tipRoot. The API has been explicitly
> > designed for this use case -- it just has not been implemented yet,
> > because appendAndPersist() is synchronous and the surrounding code
> > assumes that once the candidate has been persisted, it has also been
> > committed.
> >
> >
> > I'm not clear on the last statement. Which surrounding code are you
> > referring to? appendAndPersist doesn't directly commit the entry. After
> > persistence completes, it replicates and commit occurs on ApplyState
> > once consensus is achieved. For single-node of course the replicate part
> > is skipped and ApplyState is sent immediately.
>
> Sorry about that, I was writing off the top of my head and have not
> properly introduced the context I am thinking in.
>
> My previous text is written from the point of view of ShardDataTree,
> which acts as an intermediary between frontend (DistributedDataStore et
> al.) and the Raft journal (via Shard and RaftActor).
>
> When I say 'persist' and 'persist completes' I really mean 'when
> ShardDataTree calls Shard.persistPayload()' and 'Shard invokes
> ShardDataTree.applyReplicatedPayload()'.
>
> The reason I did this is because it abstracts out the unneeded details
> of whether persistence is enabled and whether we are a single node --
> ShardDataTree does not care.
>
> Having completed a context switch to this topic, yes, parts of the
> persist process are asynchronous -- I guess I have not realized that
> before.
>
> > For this to work correctly in face of failures and recovery, there
> need
> > to be further persist events and frontend replies need to be sent as
> > following:
> >
> > - once candidate persist completes, notify fronted of precommit
> success,
> >   wait for commit message (or shortcut in directCommit case)
> >
> >
> > So you propose to couple persistence in the pre-commit phase. Sounds
> > reasonable.
>
> Both phases, actually. There would be two callouts to
> Shard.persistPayload(), with two different payloads:
> - one at precommit time, which contains the DataTreeCandidate
> - one at commit time, which contains only the TransactionIdentifier
>
>
right - separate calls to persist and replicate. Makes sense.

> - once commit request arrives (or 3PC commit timer expires):
> >
> >
> > Isn't this where we would replicate?
> >
> >
> > dataTree.commit()
> >
> >
> > This would occur on ApplyState?
> >
> >
> > persist tx commit record (only identifier, no data)
> >
> >
> > Don't we already do this via ApplyJournalEntries?
>
> Well, we only have a single journal entry for each transaction. We need
> to have two.
>
>
I'm not clear yet on why we would need another entry to persist the tx ID
but we can talk about that in more detail on the clustering call.


> Hope this makes it a bit more clear.
>
> Thanks,
> Robert
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] 答复: Re: A explanation of https://bugs.opendaylight.org/show_bug.cgi?id=7033

2016-10-28 Thread Robert Varga
On 10/28/2016 04:24 PM, Tom Pantelis wrote:
> 
> 
> On Fri, Oct 28, 2016 at 10:00 AM, Robert Varga  > wrote:
> 
> On 10/28/2016 08:00 AM, he.yu...@zte.com.cn
>  wrote:
> > so we need to public the DataTreeCandidateTip.getTipRoot() API in
> > yangtools to get the last tx's root
> >
> > All of the above is not related to commit phase, the overall process is
> > as follows
> >
> >
> > Tx1.prepare() ---> Tx1.candidate
> >Tx1 persist ReplicatedLogEntry
> >Tx1 add to pipelineTransactions
> >Tx2.prepare(Tx1.candidate) --> Tx2.candidate
> 
> Actually the flow should be:
> 
> TipProducingDataTree dataTree;
> DataTreeCandidateTip tx1Candidate = dataTree.prepare(tx1);
> persist tx1Candidate
> DataTreeCandidateTip tx2Candidate = tx1Candidate.prepare(tx2);
> persist tx2Candidate
> dataTree.commit(tx1Candidate);
> dataTree.commit(tx2Candidate);
> 
> All of the accounting needed will occur inside the DataTree
> implementation without leaking tipRoot. The API has been explicitly
> designed for this use case -- it just has not been implemented yet,
> because appendAndPersist() is synchronous and the surrounding code
> assumes that once the candidate has been persisted, it has also been
> committed.
> 
> 
> I'm not clear on the last statement. Which surrounding code are you
> referring to? appendAndPersist doesn't directly commit the entry. After
> persistence completes, it replicates and commit occurs on ApplyState
> once consensus is achieved. For single-node of course the replicate part
> is skipped and ApplyState is sent immediately.

Sorry about that, I was writing off the top of my head and have not
properly introduced the context I am thinking in.

My previous text is written from the point of view of ShardDataTree,
which acts as an intermediary between frontend (DistributedDataStore et
al.) and the Raft journal (via Shard and RaftActor).

When I say 'persist' and 'persist completes' I really mean 'when
ShardDataTree calls Shard.persistPayload()' and 'Shard invokes
ShardDataTree.applyReplicatedPayload()'.

The reason I did this is because it abstracts out the unneeded details
of whether persistence is enabled and whether we are a single node --
ShardDataTree does not care.

Having completed a context switch to this topic, yes, parts of the
persist process are asynchronous -- I guess I have not realized that before.

> For this to work correctly in face of failures and recovery, there need
> to be further persist events and frontend replies need to be sent as
> following:
> 
> - once candidate persist completes, notify fronted of precommit success,
>   wait for commit message (or shortcut in directCommit case)
> 
> 
> So you propose to couple persistence in the pre-commit phase. Sounds
> reasonable.

Both phases, actually. There would be two callouts to
Shard.persistPayload(), with two different payloads:
- one at precommit time, which contains the DataTreeCandidate
- one at commit time, which contains only the TransactionIdentifier

> - once commit request arrives (or 3PC commit timer expires):
> 
> 
> Isn't this where we would replicate? 
>  
> 
> dataTree.commit()
> 
> 
> This would occur on ApplyState?
>  
> 
> persist tx commit record (only identifier, no data)
> 
> 
> Don't we already do this via ApplyJournalEntries?

Well, we only have a single journal entry for each transaction. We need
to have two.

Hope this makes it a bit more clear.

Thanks,
Robert



signature.asc
Description: OpenPGP digital signature
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] 答复: Re: A explanation of https://bugs.opendaylight.org/show_bug.cgi?id=7033

2016-10-28 Thread Tom Pantelis
On Fri, Oct 28, 2016 at 10:00 AM, Robert Varga  wrote:

> On 10/28/2016 08:00 AM, he.yu...@zte.com.cn wrote:
> > so we need to public the DataTreeCandidateTip.getTipRoot() API in
> > yangtools to get the last tx's root
> >
> > All of the above is not related to commit phase, the overall process is
> > as follows
> >
> >
> > Tx1.prepare() ---> Tx1.candidate
> >Tx1 persist ReplicatedLogEntry
> >Tx1 add to pipelineTransactions
> >Tx2.prepare(Tx1.candidate) --> Tx2.candidate
>
> Actually the flow should be:
>
> TipProducingDataTree dataTree;
> DataTreeCandidateTip tx1Candidate = dataTree.prepare(tx1);
> persist tx1Candidate
> DataTreeCandidateTip tx2Candidate = tx1Candidate.prepare(tx2);
> persist tx2Candidate
> dataTree.commit(tx1Candidate);
> dataTree.commit(tx2Candidate);
>
> All of the accounting needed will occur inside the DataTree
> implementation without leaking tipRoot. The API has been explicitly
> designed for this use case -- it just has not been implemented yet,
> because appendAndPersist() is synchronous and the surrounding code
> assumes that once the candidate has been persisted, it has also been
> committed.
>

I'm not clear on the last statement. Which surrounding code are you
referring to? appendAndPersist doesn't directly commit the entry. After
persistence completes, it replicates and commit occurs on ApplyState once
consensus is achieved. For single-node of course the replicate part is
skipped and ApplyState is sent immediately.


>
> For this to work correctly in face of failures and recovery, there need
> to be further persist events and frontend replies need to be sent as
> following:
>
> - once candidate persist completes, notify fronted of precommit success,
>   wait for commit message (or shortcut in directCommit case)
>

So you propose to couple persistence in the pre-commit phase. Sounds
reasonable.


> - once commit request arrives (or 3PC commit timer expires):
>

Isn't this where we would replicate?


> dataTree.commit()
>

This would occur on ApplyState?


> persist tx commit record (only identifier, no data)
>

Don't we already do this via ApplyJournalEntries?


> - once persist returns:
> notify frontend of commit success
>


> And on follower/recovery, the journal records need to be processed in a
> similar fashion: candidates should be committed to data tree only when
> commit record is seen. There can be multiple candidates for a
> transaction in the journal (due to failovers and similar), hence only
> the last candidate seen should be committed.
>
> Bye,
> Robert
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] 答复: Re: A explanation of https://bugs.opendaylight.org/show_bug.cgi?id=7033

2016-10-28 Thread Robert Varga
On 10/28/2016 08:00 AM, he.yu...@zte.com.cn wrote:
> so we need to public the DataTreeCandidateTip.getTipRoot() API in
> yangtools to get the last tx's root
> 
> All of the above is not related to commit phase, the overall process is
> as follows
> 
> 
> Tx1.prepare() ---> Tx1.candidate
>Tx1 persist ReplicatedLogEntry
>Tx1 add to pipelineTransactions
>Tx2.prepare(Tx1.candidate) --> Tx2.candidate

Actually the flow should be:

TipProducingDataTree dataTree;
DataTreeCandidateTip tx1Candidate = dataTree.prepare(tx1);
persist tx1Candidate
DataTreeCandidateTip tx2Candidate = tx1Candidate.prepare(tx2);
persist tx2Candidate
dataTree.commit(tx1Candidate);
dataTree.commit(tx2Candidate);

All of the accounting needed will occur inside the DataTree
implementation without leaking tipRoot. The API has been explicitly
designed for this use case -- it just has not been implemented yet,
because appendAndPersist() is synchronous and the surrounding code
assumes that once the candidate has been persisted, it has also been
committed.

For this to work correctly in face of failures and recovery, there need
to be further persist events and frontend replies need to be sent as
following:

- once candidate persist completes, notify fronted of precommit success,
  wait for commit message (or shortcut in directCommit case)
- once commit request arrives (or 3PC commit timer expires):
dataTree.commit()
persist tx commit record (only identifier, no data)
- once persist returns:
notify frontend of commit success

And on follower/recovery, the journal records need to be processed in a
similar fashion: candidates should be committed to data tree only when
commit record is seen. There can be multiple candidates for a
transaction in the journal (due to failovers and similar), hence only
the last candidate seen should be committed.

Bye,
Robert



signature.asc
Description: OpenPGP digital signature
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev