[389-users] Re: 389DS v1.3.4.x after fixes for tickets 48766 and 48954

2016-09-07 Thread Ivanov Andrey (M.)
> De: "Ludwig Krispenz" 
> À: 389-users@lists.fedoraproject.org
> Envoyé: Mercredi 7 Septembre 2016 12:48:38
> Objet: [389-users] Re: 389DS v1.3.4.x after fixes for tickets 48766 and 48954

 the fixes for the tickets you mention did change the iteration thru the
 changelog and how it handles situtations when the start csn is not found 
 in the
 changelog. and it also did change the logging, so you might see messages 
 now
 which were not there or hidden before.
>>> That was my understanding too.

>> so far I have not seen any replication problems related to these messages, 
>> all
>> generatedcsns seem to be replicated. What makes it a bit more difficult is 
>> that
>> most of the updates are updates of lastlogintime and the original MOD is not
>> logged. I still do not understand why we have these messages so frequently, I
>> will try to reproduce.
>> Or, if it possible, could you run the servers for just an hour with 
>> replication
>> logging enabled ?

> no more need for this, I found the messages in a deployment where repl logging
> was enabled. I think it happens when the smallest consumer maxCSN is ahead of
> the local maxCSN for this replicaID.
> It should do no harm, but in some scenarios could slow down replication a bit.
> I will continue to investigate and work on a fix
Ok, thank you. And yes, as you say apparently it does no harm - i check the 
consistency of three replicated servers from time to time and there is no data 
discrepancy between these servers, . 

Anyway, enabling replication logging on production servers is not something 
easily done, mainly due to performance reasons. And i was not able to reproduce 
the problem in our test environment with 2 replicated servers, maybe the charge 
or frequency of connections updating lastlogintime attribute was not high 
enough in test environment. Or the three-server full-replicated topology makes 
things a bit different too with one or two additional hops for the same mod 
arriving to the consumer by two different paths. 

>> When looking into the provided data set I did notice three replicated ops 
>> with
>> err=50, insufficient access. This should not happen and requires a separate
>> investigation
Yes, i see the three modifications you are talking about. it is present only on 
one server of three. Strange indeed. No more err=50 in replicated ops today on 
any of the servers, i've just checked. 
--
389-users mailing list
389-users@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-users@lists.fedoraproject.org


[389-users] Re: 389DS v1.3.4.x after fixes for tickets 48766 and 48954

2016-09-07 Thread Ludwig Krispenz


On 09/07/2016 08:55 AM, Ludwig Krispenz wrote:


On 09/06/2016 02:02 PM, Ivanov Andrey (M.) wrote:

Hi Ludwig,






the fixes for the tickets you mention did change the iteration
thru the changelog and how it handles situtations when the start
csn is not found in the changelog. and it also did change the
logging, so you might see messages now which were not there or
hidden before. 


That was my understanding too.
so far I have not seen any replication problems related to these 
messages, all generatedcsns seem to be replicated. What makes it a bit 
more difficult is that most of the updates are updates of 
lastlogintime and the original MOD is not logged. I still do not 
understand why we have these messages so frequently, I will try to 
reproduce.
Or, if it possible, could you run the servers for just an hour with 
replication logging enabled ?
no more need for this, I found the messages in a deployment where repl 
logging was enabled. I think it happens when the smallest consumer 
maxCSN is ahead of the local maxCSN for this replicaID.
It should do no harm, but in some scenarios could slow down replication 
a bit.

I will continue to investigate and work on a fix


When looking into the provided data set I did notice three replicated 
ops with err=50, insufficient access. This should not happen and 
requires a separate investigation



But I am very surprised to see them so frequently and I would
like to understand it.
First some questions, do you have changelog trimming enabled and
how, do you have fractional replication ?

yes for both questions.

Trimming: 14 days
Fractional replication:
nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn memberOf
nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn
nsds5ReplicaStripAttrs: modifiersName modifyTimestamp 
internalModifiersName internalModifyTimestamp internalCreatorsname


Changelog:
cn=changelog5,cn=config
objectClass: top
objectClass: extensibleObject
cn: changelog5
nsslapd-changelogdir: /Local/dirsrv/var/lib/dirsrv/slapd-ens/changelogdb
nsslapd-changelogmaxage: 14d


replica:
cn=replica,cn=dc\\3Did\\2Cdc\\3Dpolytechnique\\2Cdc\\3Dedu,cn=mapping 
tree,cn=config

objectClass: top
objectClass: nsDS5Replica
cn: replica
nsDS5ReplicaId: 1
nsDS5ReplicaRoot: dc=id,dc=polytechnique,dc=edu
nsDS5Flags: 1
nsDS5ReplicaBindDN: cn=RepliX,cn=config
nsds5ReplicaPurgeDelay: 604800
nsds5ReplicaTombstonePurgeInterval: 86400
nsds5ReplicaLegacyConsumer: False
nsDS5ReplicaType: 3
nsState:: AQDCrc5XAQABAA==
nsDS5ReplicaName: eeb6d304-736c11e6-9bc5a1ff-40280b8e
nsds5ReplicaChangeCount: 114948
nsds5replicareapactive: 0


Typical replication agreement:

cn=Replication from ldap-lab. to ldap-adm.name>,cn=replica,cn=dc\\3Did\\2Cdc\\3Dpolytechnique\\2Cdc\\3Dedu,cn=mapping 
tree,cn=config

objectClass: top
objectClass: nsDS5ReplicationAgreement
cn: Replication from ldap-lab. to ldap-adm.
description: Replication agreement from server ldap-lab. 
to server ldap-adm.

nsDS5ReplicaHost: ldap-adm.
nsDS5ReplicaRoot: dc=id,dc=polytechnique,dc=edu
nsDS5ReplicaPort: 636
nsDS5ReplicaTransportInfo: SSL
nsDS5ReplicaBindDN: cn=RepliX,cn=config
nsDS5ReplicaBindMethod: simple
nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn memberOf
nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn
nsds5ReplicaStripAttrs: modifiersName modifyTimestamp 
internalModifiersName internalModifyTimestamp internalCreatorsname

nsds5replicaBusyWaitTime: 5
nsds5ReplicaFlowControlPause: 500
nsds5ReplicaFlowControlWindow: 1000
nsds5replicaTimeout: 120
nsDS5ReplicaCredentials: {AES-...
nsds50ruv: {replicageneration} 57cd73770002
nsds50ruv: {replica 2 ldap://ldap-adm.:389}
nsruvReplicaLastModified: {replica 2 ldap://ldap-adm.name>:389} 

nsds5replicareapactive: 0
nsds5replicaLastUpdateStart: 20160906115520Z
nsds5replicaLastUpdateEnd: 20160906115520Z
nsds5replicaChangesSentSinceStartup: 3:13525/670 1:3671/0 2:1/0
nsds5replicaLastUpdateStatus: 0 Replica acquired successfully: 
Incremental update succeeded

nsds5replicaUpdateInProgress: FALSE
nsds5replicaLastInitStart: 1970010100Z
nsds5replicaLastInitEnd: 1970010100Z



Next, is it possible to get the access and error logs for a
period of an hour from all servers (you can send them off list) ?
I would like to track some of the reported csns.

Sure, i will send it to you off list in a moment.

Thank you,

Regards,
Andrey



Regards,
Ludwig


On 09/06/2016 12:31 PM, Ivanov Andrey (M.) wrote:

Hi,

We are successfully using the compiled 1.3.4 git branch of
389DS in production on CentOS 7 since about a year
(approximately 40 000 entries, about 4000 groups, hundreds of
reads and tens of writes per second).
Our current topology consists of 3 serve