Re: [389-users] Serious write-performance problems on RHEL6 - CoS cache repeatedly rebuilding?

thierry bordaz Tue, 01 Apr 2014 10:37:47 -0700

On 04/01/2014 07:04 PM, Steve Holden wrote:

It seems like CoS is at fault here.


The CoS rules are working correctly, but they massively impact directory update 
performance.
If I delete all of the 15 rules, my simple LDIF update takes milliseconds.
If I re-add them (and wait for re-indexing), it takes 2m30s!

Anyone familiar with CoS got any suggestions?

Thanks,
Steve

-----Original Message-----
From: Steve Holden
Sent: 01 April 2014 14:49
To: 389-users@lists.fedoraproject.org
Subject: RE: [389-users] Serious write-performance problems on RHEL6 - CoS 
cache repeatedly rebuilding?

Just what I was thinking!

CoS is clearly the problem: disabling the CoS plugin (and the Legacy and 
Multimaster Replication plugins which depend on it) fixed the issue. Import the 
same LDIF now takes 0.22s ;-)

Re-enabling the plugins, and adding logging shows that the CoS cache is 
constantly rebuilding...

[01/Apr/2014:12:33:15 +0100] - Processing cosDefinition 
cn=cosRegistryDeptCode,ou=People,dc=brighton,dc=ac,dc=uk
[01/Apr/2014:12:33:15 +0100] - cos_cache_add_attr: Added attribute 
uobRegistryDeptCode
[01/Apr/2014:12:33:15 +0100] - cos_cache_add_tmpl: Added template 
cn=a0,ou=Departments,ou=Tables,dc=brighton,dc=ac,dc=uk
[01/Apr/2014:12:33:15 +0100] - cos_cache_add_attr: Added attribute 
uobRegistryDeptCode
[01/Apr/2014:12:33:15 +0100] - cos_cache_add_tmpl: Added template 
cn=a1,ou=Departments,ou=Tables,dc=brighton,dc=ac,dc=uk
[01/Apr/2014:12:33:15 +0100] - cos_cache_add_dn_tmpls: incomplete cos template 
detected in cn=c0,ou=Departments,ou=Tables,dc=brighton,dc=ac,dc=uk, discarding 
from cache.

After this completed, I imported the LDIF again, which should not have affected 
any CoS attributes (just the 'title' one), but which nonetheless triggered a 
CoS change and apparently a refresh of the entire CoS cache!

[01/Apr/2014:13:38:06 +0100] - cos_cache_add_tmpl: Added template 
cn=1408,ou=Courses,ou=Tables,dc=brighton,dc=ac,dc=uk
[01/Apr/2014:13:38:06 +0100] - cos_cache_add_attr: Added attribute 
uobCourseDescription
[01/Apr/2014:13:38:06 +0100] - cos_cache_change_notify:updating due to indirect 
template change(uid=st8,ou=people,dc=brighton,dc=ac,dc=uk)

Hello,

The message "cos_cache_change_notify:updating due to indirect templatechange(uid=st8,ou=people,dc=brighton,dc=ac,dc=uk)" means that the coscache will be rebuilt.

A possible cause is an update ofuid=st8,ou=people,dc=brighton,dc=ac,dc=uk is related to an indirect cosspecifier. That trigger a rebuild of the cos cache.


Would it be possible to get the cos definitions.

Also, if you enable audit log it can help to identify the modifiedattribute in the entry.


regards
thierry


The test account being updated doesn't have a course attribute which would need 
the CoS rule above.

The cache appears to being continually rebuilt...

[01/Apr/2014:13:51:30 +0100] - cos: Class of service cache built.
[01/Apr/2014:13:51:30 +0100] - cos: cos cache index built
[01/Apr/2014:13:51:30 +0100] - cos: Building class of service cache after 
status change.
[01/Apr/2014:13:53:28 +0100] - cos: Class of service cache built.
[01/Apr/2014:13:53:28 +0100] - cos: cos cache index built
[01/Apr/2014:13:53:28 +0100] - cos: Building class of service cache after 
status change.
[01/Apr/2014:13:55:38 +0100] - cos: Class of service cache built.
[01/Apr/2014:13:55:38 +0100] - cos: cos cache index built

[01/Apr/2014:14:49:02 +0100] - Processing cosDefinition 
cn=cosDeptDescription,ou=People,dc=brighton,dc=ac,dc=uk
[01/Apr/2014:14:49:06 +0100] - Added cosDefinition 
cn=cosDeptDescription,ou=People,dc=brighton,dc=ac,dc=uk
[01/Apr/2014:14:49:06 +0100] - Processing cosDefinition 
cn=cosTownName,ou=People,dc=brighton,dc=ac,dc=uk
[01/Apr/2014:14:49:09 +0100] - Added cosDefinition 
cn=cosTownName,ou=People,dc=brighton,dc=ac,dc=uk


Is this normal behaviour, a side-effect of something I've done (a problem with 
the CoS definitions, which were imported from an old Sun DS 5.2 server - but 
which are working correctly and returning the correct values) or potentially a 
bug?

Does anyone have any suggestions how I can debug (or fix) what's going wrong?

Kind regards,
Steve

-----Original Message-----
From: 389-users-boun...@lists.fedoraproject.org 
[mailto:389-users-boun...@lists.fedoraproject.org] On Behalf Of Ludwig Krispenz
Sent: 01 April 2014 12:05
To: 389-users@lists.fedoraproject.org
Subject: Re: [389-users] Serious write-performance problems on RHEL6

Hi,

looks like the cos plugin is busy with internal searches, maybe these
are unidexed. Could you turn on logging for internal searches in the cos
plugin to see what kind of searches are performed:
http://directory.fedoraproject.org/wiki/Plugin_Logging

Ludwig

On 04/01/2014 12:52 PM, Steve Holden wrote:

Hi, Ludwig

Thanks for taking the time to reply.

I'm using this simple LDIF import file to generate the problem:
http://pastebin.com/NyNY650L
It generally hangs around the 7th record, and the complete import takes 2m32s!


Parent pid from /var/run/dirsrv/slapd-algieba.pid is 10382, and 'top -H' for it 
shows:

# top -H -p 10382

top - 11:45:02 up 5 days, 22:27,  9 users,  load average: 0.87, 0.35, 0.24
Tasks:  41 total,   1 running,  40 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.8%us,  0.2%sy,  0.0%ni, 97.6%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8060964k total,  7886612k used,   174352k free,   215412k buffers
Swap:  6291448k total,    11584k used,  6279864k free,  2588264k cached

    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
10892 dirsrv    20   0 11.9g 6.1g 1.9g R 99.7 79.1  46:19.78 ns-slapd
10881 dirsrv    20   0 11.9g 6.1g 1.9g S  2.0 79.1  40:58.88 ns-slapd
...


Standard top output:

top - 11:29:09 up 5 days, 22:11,  9 users,  load average: 0.35, 0.14, 0.16
Tasks: 219 total,   1 running, 218 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.8%us,  0.2%sy,  0.0%ni, 97.6%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8060964k total,  7887052k used,   173912k free,   215412k buffers
Swap:  6291448k total,    11584k used,  6279864k free,  2587968k cached

    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
10382 dirsrv    20   0 11.9g 6.1g 1.9g S 100.3 79.1  83:23.85 ns-slapd
25216 root      20   0 15168 1184  824 R  2.0  0.0   0:00.01 top


Output from pstack is here: http://pastebin.com/8LpfbdCb

I'm curious about the number of CoS lines, which I've highlighted.
I mention this as enabling Plugins in the error log shows an incredible amount 
of CoS activity - and the LDIF import above doesn't include any CoS attributes.
I'll disable the CoS rules and see whether this helps...


Thanks for the note about hardware. Sounds like it doesn't apply here; details 
are:
VM hardware: Dell PowerEdge R710 2x quad core Xeon.
Production hardware: Sun Fire V240 (Sparc, 8G RAM).

Best wishes,
Steve

-----Original Message-----
From: 389-users-boun...@lists.fedoraproject.org 
[mailto:389-users-boun...@lists.fedoraproject.org] On Behalf Of Ludwig Krispenz
Sent: 01 April 2014 08:33
To: 389-users@lists.fedoraproject.org
Subject: Re: [389-users] Serious write-performance problems on RHEL6

In the phase with high cpu usage, could you run
a) top -H -p <pid>
to see if there are many threads competing for the cpu or one or two
occupying the cpu
b) pstack <pid>
to see what the threads are doing, sometimes pstack for the complete
process doesn't look meaningful, you can also run pstack <tpid> where
tpid is one of the threads consuming the cpu

You are on a VM with 2cpus, what is the real HW, there have been
problems with RHDS on machines with Numa architecture if the threads of
teh process have been distributed to different nodes. What was the HW
for SunDS ?

Ludwig

On 03/31/2014 05:34 PM, Steve Holden wrote:

Hi, folks

I'm hoping to use 389 DS to replace our ancient Sun DS 5.2 service.

I've hit a snag with my 389 development server; it's performance far worse
than the 10 year-old servers it's intended to replace.

Things looked promising: the old directory data has been imported (with
only minor changes), read requests perform reasonably well, and isolated
write requests are ok.

However, even after a small number (typically 6) of consecutive write requests
(basic attribute changes to a single entry, say) the ns-slapd process hits >100%
CPU (of 2 CPUs) and stays there for *at least* 10 seconds per update, and blocks
the client process attempting the update.

I can't see anything obvious in the performance counters or the logs to suggest
a problem. The updates are logged with "etime=0" in the access log.

I've tried enabling different log levels in the error log.
Is it normal for the Plugin level to show constant re-scanning of CoS templates?

I'd be very grateful for any suggestions of how I can go about tracing where the
Problem might be and how to resolve it...

Best wishes,
Steve


Details

The RHEL6.5 server is a VMware ESXi VM with 8GB RAM and 2x CPUs,
and is running the latest EPEL package for RHEL6 (v1.2.11.15-32).
(After a package upgrade a few weeks ago, I ran "setup-ds-admin.pl -u").

The directory contains in excess of 200,000 entries, and
its databases consume over 3.5GB on disk.

The userRoot database has therefore been configured with a 4GB cache
(and the general LDBM max cache is set at 6GB - though it's quite possible
I haven't understood how to set these correctly - I've tried smaller numbers of 
each).

The directory contains custom attributes, some of which are CoS,
and many of which have been indexed (AFAIK, all attributes have been 
re-indexed).

No replication has been configured so far.

___________________________________________________________
This email has been scanned by MessageLabs' Email Security
System on behalf of the University of Brighton.
For more information see http://www.brighton.ac.uk/is/spam/
___________________________________________________________
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

___________________________________________________________
This email has been scanned by MessageLabs' Email Security
System on behalf of the University of Brighton.
For more information see http://www.brighton.ac.uk/is/spam/
___________________________________________________________

___________________________________________________________
This email has been scanned by MessageLabs' Email Security
System on behalf of the University of Brighton.
For more information see http://www.brighton.ac.uk/is/spam/
___________________________________________________________
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users


--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

Re: [389-users] Serious write-performance problems on RHEL6 - CoS cache repeatedly rebuilding?

Reply via email to