Hi

we had an comparable issue with gpfs clusterfilesystem from IBM at 11/2005 I posted on samba technical (subject tdb_lock problem on gpfs filesystem). Smbd went to D state sometimes to in this case. Mostly we recognized the problem with the tdb files of the printer ( the samba server was acting as a printserver to)

I got the following information from the IBM gpfs list:
"Also, Samba uses fcntl locking extensively on these files and may be maintaining thousands of individual locks. GPFS specifically sets a limit on the number of fcntl ranges allowed on a file at one time (to prevent a runaway or deviant application from consuming large amounts of resources recording such locks). I expect you are exceeding this limit, but you can configure a larger value: "mmchconfig maxFcntlRangesPerFile=10000.
The default is 200 and the acceptable range is currently 10-200000"

Increasing this (undocumented) value to 10000 solves the problem in our case.

Maybe there is a similar restriction with vertiasFS.

Have you tried to start smbd with an

strace -e fcntl -f smbd


to trace down the system call?
In our case it shows something like

fcntl(18, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=936, len=1}) =
-1 ENOLCK (No locks available)

which indicates a problem with the filesystem.

Greetings

Hansjörg











Pappas, Bill wrote:

Jeremy,

I was in a position (last night) to upgrade to 3.0.23a. Again, I was using 3.0.21c.

If smbd goes into the D state, we can at least eliminate the possibility
that it is an unexpected 3.0.21c bug.

Thanks,
Bill Pappas - System Integration Engineer - SAN St. Jude Children's Research Hospital
332 North Lauderdale
Memphis, TN 38105
Danny Thomas Tower - Room D1010
Mail Stop 312

-----Original Message-----
From: Pappas, Bill Sent: Saturday, July 22, 2006 4:01 PM
To: Jeremy Allison
Cc: samba@lists.samba.org
Subject: RE: [Samba] tdb/tdbutil.c:tdb_chainlock_with_timeout_internal

Jeremy Allison wrote:
Then it might be an intermittent bug in Veritas. What system call is
smbd hanging on ? smbd should never hang in the D wait state unless
it's a filesystem bug.

I am beginning to believe that this could make sense. Let me emphasize
that ./private/secrets.tdb is shared between two samba servers (via
clustered vxfs) that are running independently.  Only one server runs
nmbd at a time as veritas cluster server fails nmbd over between servers
as needed.  I just figured keeping smbd running up on both servers to
reduce failover time.  I discovered that I had to share secrets.tdb to
ensure that either samba server would remain as a domain member server.
Is there another way to do what I am doing?  I'd gladly stop sharing
this file if I could keep smbd up on both servers.  Does smbd need a
lock on secrets.tdb? I thought (probably wrong) that only nmbd relied on
this file?

Further below, you will find some more logs between clients and the
server running nmbd and smbd (as the other was sitting idle with smbd
running). SJMEMDC05 is a windows domain controller and the other clients
are windows explorer clients.
When you see these logs, they appear to confirm that secrets.tcb is
directly involved, but how would a locking issue with this file cause
smbd to go to the D state (and stay)?

log.hc-dfinkletest:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
tdb_lock failed on list 78 ltype=1 (Interrupted system call)
log.hc-dfinkletest:  tdb_chainlock_with_timeout_internal: alarm (10)
timed out for key SJMEMDC05 in tdb
/usr/local/samba-3.0.21c/private/secrets
.tdb
log.hc-dfinkletest:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
tdb_lock failed on list 78 ltype=1 (Interrupted system call)
log.hc-dfinkletest:  tdb_chainlock_with_timeout_internal: alarm (10)
timed out for key SJMEMDC05 in tdb
/usr/local/samba-3.0.21c/private/secrets
.tdb
log.hc-dfinkletest:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
tdb_lock failed on list 78 ltype=1 (Interrupted system call)
log.hc-dfinkletest:  tdb_chainlock_with_timeout_internal: alarm (10)
timed out for key SJMEMDC05 in tdb
/usr/local/samba-3.0.21c/private/secrets
.tdb
log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
tdb_lock failed on list 78 ltype=1 (Interrupted system call)
log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
out for key SJMEMDC05 in tdb
/usr/local/samba-3.0.21c/private/secrets.tdb
log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
tdb_lock failed on list 78 ltype=1 (Interrupted system call)
log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
out for key SJMEMDC05 in tdb
/usr/local/samba-3.0.21c/private/secrets.tdb
log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
tdb_lock failed on list 78 ltype=1 (Interrupted system call)
log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
out for key SJMEMDC05 in tdb
/usr/local/samba-3.0.21c/private/secrets.tdb
log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
tdb_lock failed on list 78 ltype=1 (Interrupted system call)
log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
out for key SJMEMDC05 in tdb
/usr/local/samba-3.0.21c/private/secrets.tdb
log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
tdb_lock failed on list 78 ltype=1 (Interrupted system call)
log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
out for key SJMEMDC05 in tdb
/usr/local/samba-3.0.21c/private/secrets.tdb
log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
tdb_lock failed on list 78 ltype=1 (Interrupted system call)
log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
out for key SJMEMDC05 in tdb
/usr/local/samba-3.0.21c/private/secrets.tdb

Thanks,
Bill Pappas - System Integration Engineer - SAN St. Jude Children's Research Hospital
332 North Lauderdale
Memphis, TN 38105
Danny Thomas Tower - Room D1010
Mail Stop 312

-----Original Message-----
From: Jeremy Allison [mailto:[EMAIL PROTECTED] Sent: Saturday, July 22, 2006 10:56 AM
To: Pappas, Bill
Cc: [EMAIL PROTECTED]; samba@lists.samba.org
Subject: Re: [Samba] tdb/tdbutil.c:tdb_chainlock_with_timeout_internal

On Fri, Jul 21, 2006 at 06:17:09PM -0500, Pappas, Bill wrote:
I will say this works for weeks on end w/o a problem.  When you say
this will not work, why? I've had no real problems with the veritas
clustered fs.  It adheres to file locking and fcntl operations like any
normal local filesystem (ext3).

Then it might be an intermittent bug in Veritas. What system call is
smbd hanging on ? smbd should never hang in the D wait state unless
it's a filesystem bug.

Jeremy.



--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/listinfo/samba

Reply via email to