subject:"\[389\-devel\] Re\: Replication after full online init"

[389-devel] Re: Replication after full online init

2016-07-06 Thread William Brown

On Wed, 2016-07-06 at 17:38 +0200, Ludwig Krispenz wrote:
> Hi Noriko,
> 
> I have  test scenario for not correctly handled modrdns during total init:
> 
> - have a database with n entries, n large enough tthat total init takes 
> long enough to be able to apply an update while it is running
> - add two entries
> -- n+1: cn=child,$SUFFIX
> -- n+2: cn=parent,$SUFFIX
> both have the same parentid and n+2 will be replayed after n+1
> - start total update
> - do a modrdn
> cn=child,$SUFFIX
> changetype: modrdn
> newrdn: cn=child
> newsuperior: cn=parent,$SUFFIX
> 
> now cn=child,cn=paren,$SUFFIX will be sent before its parent
> 
> I do not know how we can fix this, I think it is a corner case, but we 
> should keep it in mind
> 

Wow, excellent find!

What is the actual cause of this? It sounds like a race condition
between the sort and the transmission of the entries 

-- 
Sincerely,

William Brown
Software Engineer
Red Hat, Brisbane


signature.asc
Description: This is a digitally signed message part
--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

[389-devel] Re: Replication after full online init

2016-07-06 Thread Noriko Hosoi

Thank you, Ludwig. I'll run the scenario and investigate it.
--noriko

On 07/06/2016 08:38 AM, Ludwig Krispenz wrote:

Hi Noriko,

I have test scenario for not correctly handled modrdns during total init:

- have a database with n entries, n large enough tthat total init
takes long enough to be able to apply an update while it is running

- add two entries
-- n+1: cn=child,$SUFFIX
-- n+2: cn=parent,$SUFFIX
both have the same parentid and n+2 will be replayed after n+1
- start total update
- do a modrdn
cn=child,$SUFFIX
changetype: modrdn
newrdn: cn=child
newsuperior: cn=parent,$SUFFIX

now cn=child,cn=paren,$SUFFIX will be sent before its parent

I do not know how we can fix this, I think it is a corner case, but we
should keep it in mind

Ludwig

On 06/30/2016 11:53 PM, Noriko Hosoi wrote:

On 06/30/2016 12:45 AM, Ludwig Krispenz wrote:

Hi William,

The test case would be something like this?
1. run online init on the supplier.
2. do some operation like move entries against the supplier while the
online init is still running on the consumer.
3. do some operation which depends upon the previous operation done
in the step 2.

4. check the consumer is healthy or not.

Thanks,
--noriko
Therfor the update resolution/ entry state resolution on the
consumer side has to handle this, ignore changes already applied and
apply new changes. And it handles it, if there are bugs they have to
be fixed.
Now, I am no longer sure if the fix for 48755 handles correctly all
modrdns received after the id list was prepared, the parentid might
change while the total init is on progress.
This brings up my origimal suggestion to handle the modrdn problems
also on the consumer side.

Ludwig

On 06/30/2016 02:34 AM, William Brown wrote:

Hi,

Now thathttps://fedorahosted.org/389/ticket/48755 is merged, I would
like to discuss the way we handle CSN with relation to this master. As
I'm not an expert on this topic, I want to get the input of everyone
about this.

Following this document:
http://www.port389.org/docs/389ds/design/changelog-processing-in-repl-state-sending-updates.html

As I understand it, after a full online init, the replica that consumed
the changes does not set it's CSN to match the CSN of the master that
sent the changes.

As a result, after the online init, this causes a number of changes to
be replicated from the sending master to the consumer. These are ignored
by the URP, and we continue.

However, in a number of cases these are *not* ignored, and have caused
us some bugs in replication in the past. We also have some failing
changes that are skipped, which could in certain circumstance lead to
inconsistency in replicas. We have gone to a lot of effort to be able to
skip changes, to handle the case above.

The reason was is that if there was a modrdn performed, and the entry ID
of the entry that was moved was less than the new parent ID, this *had*
to be skipped, so that after the online init the modrdn change was
replayed and applied to the consumer.

Since 48755 which sorts based on the parent ID, this seems to no longer
be an issue. So we don't need to have the master replay it's changelog
out to the consumer, because the consumer is now a literal clone of the
data.

So, is there a reason for us to leave the CSN of the consumer low to
allow this replay to occur? Or can we alter the behaviour of the
consumer to set it's CSN to the CSN of the sending master, so that we
don't need to replay these changes?

--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

--
Red Hat GmbH,http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric
Shander

--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

--
Red Hat GmbH,http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial

[389-devel] Re: Replication after full online init

2016-07-06 Thread Ludwig Krispenz

Hi Noriko,

I have test scenario for not correctly handled modrdns during total init:

- have a database with n entries, n large enough tthat total init takes
long enough to be able to apply an update while it is running

now cn=child,cn=paren,$SUFFIX will be sent before its parent

I do not know how we can fix this, I think it is a corner case, but we
should keep it in mind

Ludwig

On 06/30/2016 11:53 PM, Noriko Hosoi wrote:

On 06/30/2016 12:45 AM, Ludwig Krispenz wrote:

Hi William,

4. check the consumer is healthy or not.

Thanks,
--noriko
Therfor the update resolution/ entry state resolution on the consumer
side has to handle this, ignore changes already applied and apply new
changes. And it handles it, if there are bugs they have to be fixed.
Now, I am no longer sure if the fix for 48755 handles correctly all
modrdns received after the id list was prepared, the parentid might
change while the total init is on progress.
This brings up my origimal suggestion to handle the modrdn problems
also on the consumer side.

Ludwig

On 06/30/2016 02:34 AM, William Brown wrote:

Hi,

Following this document:
http://www.port389.org/docs/389ds/design/changelog-processing-in-repl-state-sending-updates.html

As I understand it, after a full online init, the replica that consumed
the changes does not set it's CSN to match the CSN of the master that
sent the changes.

As a result, after the online init, this causes a number of changes to
be replicated from the sending master to the consumer. These are ignored
by the URP, and we continue.

--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, E

[389-devel] Re: Replication after full online init

2016-07-04 Thread William Brown

On Mon, 2016-07-04 at 09:37 +0200, Ludwig Krispenz wrote:
> On 07/04/2016 01:32 AM, William Brown wrote:
> >>> It's not the "post init" operations I'm worried about.
> >>>
> >>> It's that operations that were part of the init to the consumer are
> >>> replayed from the changelog.
> >>>
> >>> Operations that occurred after the init starts, definitely still need to
> >>> be replayed, and this makes sense.
> >>>
> >>> Lets say we have:
> >>>
> >>> 1 - insert A
> >>> 2 - insert ou=B
> >>> 3 - modrdn A under ou=B
> >>> 4 - insert C
> >>> xx <<-- We start to transmit the data here.
> >> if we start the total update here, the supplier will send its RUV in the
> >> start repl request, it will be set as RUV in the consumer after total
> >> init is complete.
> >> it skips to send the ruv entry
> > Are you sure? The behaviour that people are claiming to see would
> > contradict this behaviour.
> yes. As I said, with this behaviour and with teh fix for 48755 there is 
> still a potential error if the modrdn is done while the online init is 
> in progress. So we would have to make the "people claim" more precise 
> and investigate

The issue is not with operation 5 post the init (it's just put in the
changelog awaiting replication.) The issue is with operation 3 being
sent *but not applied* during online init. At least, this was the
*previous* behaviour of the server prior to 48755.

So, in the previous behaviour of the server, with 3 being skipped, how
was the modrdn applied? You are claiming the RUV of the consumer is set
to that of the supplier which would mean that operation 3 would *never*
be applied to the consumer, causing inconsistency.

Prior to 48755, the only way to send 3, was to set the RUV of the
consumer to some low value, ie start of the changelog. This way, the
changelog would be replayed as a whole. 

I seem to remember Mark fixing a bug in URP earlier this year, related
to this topic. Because the consumer RUV was set to an earlier CSN, the
modrdn was being replayed. In the case the entries *were* in order, and
was able to be applied, the URP was failing to double-apply the modrdn.
The fix I think Mark applied was just to skip the failing update. This
bug could only have existing *because* of the consumer having it's RUV
set to a low CSN, and after the init, having the CL replayed. 

Given I don't know my way around the repl code very well, can you point
me to the location where the consumer ruv is updated as part of the
replication total init? 

Thanks,

-- 
Sincerely,

William Brown
Software Engineer
Red Hat, Brisbane

signature.asc
Description: This is a digitally signed message part
--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

[389-devel] Re: Replication after full online init

2016-07-04 Thread Ludwig Krispenz



On 07/04/2016 01:32 AM, William Brown wrote:

It's not the "post init" operations I'm worried about.

It's that operations that were part of the init to the consumer are
replayed from the changelog.

Operations that occurred after the init starts, definitely still need to
be replayed, and this makes sense.

Lets say we have:

1 - insert A
2 - insert ou=B
3 - modrdn A under ou=B
4 - insert C
xx <<-- We start to transmit the data here.

if we start the total update here, the supplier will send its RUV in the
start repl request, it will be set as RUV in the consumer after total
init is complete.
it skips to send the ruv entry

Are you sure? The behaviour that people are claiming to see would
contradict this behaviour.
yes. As I said, with this behaviour and with teh fix for 48755 there is 
still a potential error if the modrdn is done while the online init is 
in progress. So we would have to make the "people claim" more precise 
and investigate

Certainly there have been a number of fixes
in URP related to replaying modrdn's and related changed after an online
init 



Does that make sense?

yes, and I think that is what it is doing now

I don't think it is 
so what do you think the RUV of the consumer is after an online init ? 
it has to be set somehow and it is not random.




--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org


--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric 
Shander

--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

[389-devel] Re: Replication after full online init

2016-07-03 Thread William Brown


> >>
> > It's not the "post init" operations I'm worried about.
> >
> > It's that operations that were part of the init to the consumer are
> > replayed from the changelog.
> >
> > Operations that occurred after the init starts, definitely still need to
> > be replayed, and this makes sense.
> >
> > Lets say we have:
> >
> > 1 - insert A
> > 2 - insert ou=B
> > 3 - modrdn A under ou=B
> > 4 - insert C
> > xx <<-- We start to transmit the data here.
> if we start the total update here, the supplier will send its RUV in the 
> start repl request, it will be set as RUV in the consumer after total 
> init is complete.
> it skips to send the ruv entry

Are you sure? The behaviour that people are claiming to see would
contradict this behaviour. Certainly there have been a number of fixes
in URP related to replaying modrdn's and related changed after an online
init  


> >
> > Does that make sense?
> yes, and I think that is what it is doing now

I don't think it is  

-- 
Sincerely,

William Brown
Software Engineer
Red Hat, Brisbane


signature.asc
Description: This is a digitally signed message part
--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

[389-devel] Re: Replication after full online init

2016-07-01 Thread Ludwig Krispenz

On 06/30/2016 11:53 PM, Noriko Hosoi wrote:

On 06/30/2016 12:45 AM, Ludwig Krispenz wrote:

Hi William,

the reason that after a total init the consumer does not have the
latest state of the supplier RUV and is receiving updates based on
the RUV at start of the total init is independent of the modrdn
problem. When a supplier is performing a total init it is still
accepting changes, the total init can take a while and there are
scenarios where an entry which is already sent is updated before
total init finishes. We cannot loose these changes.
OK... Then, RUV needs to be created at the time when the supplier
starts online init?
it basically is done like that, we could explicitely send the ruv as
first entry of an online init, it would then reflect the state of the
start of the online init. Instead we send the suppliere ruv as usual in
the start repl session and the consumer stores it in the connext and at
the end of the online init creates the ruv from this - the effect is the
same.

4. check the consumer is healthy or not.

yes, that would have to be tested

Isn't it a timestamp issue from which operation should be replayed
after the total update? Regardless of the way how to fix 48755,
unless the step 2 operation(s) are replayed after the online init is
done, the consumer could get broken/inconsistent?
step 2 ops will be replicated, they have csns > csn(ruv), but I am
concerned that we could moddn an entry, which is not yet sent to a
newsuperior which also was not yet sent - and we have the same scenario
you wanted to fix with 48755

Thanks,
--noriko
Therfor the update resolution/ entry state resolution on the consumer
side has to handle this, ignore changes already applied and apply new
changes. And it handles it, if there are bugs they have to be fixed.
Now, I am no longer sure if the fix for 48755 handles correctly all
modrdns received after the id list was prepared, the parentid might
change while the total init is on progress.
This brings up my origimal suggestion to handle the modrdn problems
also on the consumer side.

Ludwig

On 06/30/2016 02:34 AM, William Brown wrote:

Hi,

Following this document:
http://www.port389.org/docs/389ds/design/changelog-processing-in-repl-state-sending-updates.html

As I understand it, after a full online init, the replica that consumed
the changes does not set it's CSN to match the CSN of the master that
sent the changes.

As a result, after the online init, this causes a number of changes to
be replicated from the sending master to the consumer. These are ignored
by the URP, and we continue.

--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Er

[389-devel] Re: Replication after full online init

2016-07-01 Thread Ludwig Krispenz



On 07/01/2016 04:08 AM, William Brown wrote:

On Thu, 2016-06-30 at 14:53 -0700, Noriko Hosoi wrote:

On 06/30/2016 12:45 AM, Ludwig Krispenz wrote:

Hi William,

the reason that after  a total init the consumer does not have the
latest state of the supplier RUV and is receiving updates based on the
RUV at start of the total init is independent of the modrdn problem.
When a supplier is performing a total init it is still accepting
changes, the total init can take a while and there are scenarios where
an entry which is already sent is updated before total init finishes.
We cannot loose these changes.

OK...  Then, RUV needs to be created at the time when the supplier
starts online init?

The test case would be something like this?
1. run online init on the supplier.
2. do some operation like move entries against the supplier while the
online init is still running on the consumer.
3. do some operation which depends upon the previous operation done in
the step 2.
4. check the consumer is healthy or not.

Isn't it a timestamp issue from which operation should be replayed after
the total update?  Regardless of the way how to fix 48755, unless the
step 2 operation(s) are replayed after the online init is done, the
consumer could get broken/inconsistent?


It's not the "post init" operations I'm worried about.

It's that operations that were part of the init to the consumer are
replayed from the changelog.

Operations that occurred after the init starts, definitely still need to
be replayed, and this makes sense.

Lets say we have:

1 - insert A
2 - insert ou=B
3 - modrdn A under ou=B
4 - insert C
xx <<-- We start to transmit the data here.
if we start the total update here, the supplier will send its RUV in the 
start repl request, it will be set as RUV in the consumer after total 
init is complete.

it skips to send the ruv entry

so mods 1-4 will not be replayed

5 - modrdn C


Once the online init is complete, the master replays the log from event
1 -> 5 to the consumer, even though it should now be up to date at
position 4.

it is


Previously we could not guarantee this because in the scenario above, A
would have sorted before ou=B, by would not be able to be applied
because the consumer hadn't seen B yet. So after the init, the consumer
would have B and C, but not A, so we had to replay 1 -> 4 to fix this
up.

So I am suggesting that when we begin the online init we set the RUV of
the consumer to match the CSN of the master at the moment we begin the
transmission of data, so that we only need to replay event 5+, rather
than 1->5+

Does that make sense?

yes, and I think that is what it is doing now





--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org


--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric 
Shander

--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

[389-devel] Re: Replication after full online init

2016-06-30 Thread William Brown

On Thu, 2016-06-30 at 14:53 -0700, Noriko Hosoi wrote:
> On 06/30/2016 12:45 AM, Ludwig Krispenz wrote:
> > Hi William,
> >
> > the reason that after  a total init the consumer does not have the 
> > latest state of the supplier RUV and is receiving updates based on the 
> > RUV at start of the total init is independent of the modrdn problem. 
> > When a supplier is performing a total init it is still accepting 
> > changes, the total init can take a while and there are scenarios where 
> > an entry which is already sent is updated before total init finishes. 
> > We cannot loose these changes.
> OK...  Then, RUV needs to be created at the time when the supplier 
> starts online init?
> 
> The test case would be something like this?
> 1. run online init on the supplier.
> 2. do some operation like move entries against the supplier while the 
> online init is still running on the consumer.
> 3. do some operation which depends upon the previous operation done in 
> the step 2.
> 4. check the consumer is healthy or not.
> 
> Isn't it a timestamp issue from which operation should be replayed after 
> the total update?  Regardless of the way how to fix 48755, unless the 
> step 2 operation(s) are replayed after the online init is done, the 
> consumer could get broken/inconsistent?
> 

It's not the "post init" operations I'm worried about.

It's that operations that were part of the init to the consumer are
replayed from the changelog. 

Operations that occurred after the init starts, definitely still need to
be replayed, and this makes sense.

Lets say we have:

1 - insert A
2 - insert ou=B
3 - modrdn A under ou=B
4 - insert C
xx <<-- We start to transmit the data here.
5 - modrdn C


Once the online init is complete, the master replays the log from event
1 -> 5 to the consumer, even though it should now be up to date at
position 4.

Previously we could not guarantee this because in the scenario above, A
would have sorted before ou=B, by would not be able to be applied
because the consumer hadn't seen B yet. So after the init, the consumer
would have B and C, but not A, so we had to replay 1 -> 4 to fix this
up.

So I am suggesting that when we begin the online init we set the RUV of
the consumer to match the CSN of the master at the moment we begin the
transmission of data, so that we only need to replay event 5+, rather
than 1->5+

Does that make sense? 


-- 
Sincerely,

William Brown
Software Engineer
Red Hat, Brisbane


signature.asc
Description: This is a digitally signed message part
--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

[389-devel] Re: Replication after full online init

2016-06-30 Thread Noriko Hosoi

On 06/30/2016 12:45 AM, Ludwig Krispenz wrote:

Hi William,

4. check the consumer is healthy or not.

Thanks,
--noriko
Therfor the update resolution/ entry state resolution on the consumer
side has to handle this, ignore changes already applied and apply new
changes. And it handles it, if there are bugs they have to be fixed.
Now, I am no longer sure if the fix for 48755 handles correctly all
modrdns received after the id list was prepared, the parentid might
change while the total init is on progress.
This brings up my origimal suggestion to handle the modrdn problems
also on the consumer side.

Ludwig

On 06/30/2016 02:34 AM, William Brown wrote:

Hi,

Following this document:
http://www.port389.org/docs/389ds/design/changelog-processing-in-repl-state-sending-updates.html

As I understand it, after a full online init, the replica that consumed
the changes does not set it's CSN to match the CSN of the master that
sent the changes.

As a result, after the online init, this causes a number of changes to
be replicated from the sending master to the consumer. These are ignored
by the URP, and we continue.

--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

[389-devel] Re: Replication after full online init

2016-06-30 Thread Ludwig Krispenz

Hi William,

the reason that after a total init the consumer does not have the
latest state of the supplier RUV and is receiving updates based on the
RUV at start of the total init is independent of the modrdn problem.
When a supplier is performing a total init it is still accepting
changes, the total init can take a while and there are scenarios where
an entry which is already sent is updated before total init finishes. We
cannot loose these changes.
Therfor the update resolution/ entry state resolution on the consumer
side has to handle this, ignore changes already applied and apply new
changes. And it handles it, if there are bugs they have to be fixed.
Now, I am no longer sure if the fix for 48755 handles correctly all
modrdns received after the id list was prepared, the parentid might
change while the total init is on progress.
This brings up my origimal suggestion to handle the modrdn problems also
on the consumer side.

Ludwig

On 06/30/2016 02:34 AM, William Brown wrote:

Hi,

Now that https://fedorahosted.org/389/ticket/48755 is merged, I would
like to discuss the way we handle CSN with relation to this master. As
I'm not an expert on this topic, I want to get the input of everyone
about this.

Following this document:
http://www.port389.org/docs/389ds/design/changelog-processing-in-repl-state-sending-updates.html

As I understand it, after a full online init, the replica that consumed
the changes does not set it's CSN to match the CSN of the master that
sent the changes.

As a result, after the online init, this causes a number of changes to
be replicated from the sending master to the consumer. These are ignored
by the URP, and we continue.

--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric
Shander

--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-devel@lists.fedoraproject.org

[389-devel] Re: Replication after full online init

[389-devel] Re: Replication after full online init

[389-devel] Re: Replication after full online init

[389-devel] Re: Replication after full online init

[389-devel] Re: Replication after full online init

[389-devel] Re: Replication after full online init

[389-devel] Re: Replication after full online init

[389-devel] Re: Replication after full online init

[389-devel] Re: Replication after full online init

[389-devel] Re: Replication after full online init

[389-devel] Re: Replication after full online init

11 matches

Site Navigation

Mail list logo

Footer information