On Aug 31, 2010, at 4:23 PM, Roman Gaechter wrote:

> The problem is intermittend and we are not able tho reproduce it as we like.

Intermittent is what makes it tricky to identify.

> At the moment the db_verify returns no error.

So technically no "corruption" if db_verify doesn't detect any.

rpm -qavv will show the data integrity checks: each installed header has a SHA1
digest attached, and you should see confirmation that the SHA1 digest
check passes when -vv is passed.

> Maybe the output of db_stat -CA will give a clue?
> 

If a "hang" is the issue, running db_stat will show what locks are held.

The output you have attached is from a quiescent system afaict.

Try running (as root so that locks can be acquired) rpm -qavv as before.
In the middle, hit ^Z.
then run db_stat -Cl.
You should see a shared/read lock from the running rpm -qa in the db_stat 
output.

The locks usually have a state attached like READ or WRITE or HELD.

And one cause of "hangs" is a deadlock or a "stale lock" (though
rpm should detect and clean up "stale locks" automatically these days).

One way to see "stale lock" cleanup is to start an rpm install operation, and 
then send kill -9
to rpm to cause immediate program termination.

The next rpm invocation (as root) should have msgs displaying the "stale lock"
cleanup (from the previously terminated by kill -9 rpm install).

That's basically the techniques for diagnosing "hangs" with RPM and Berkeley DB.

hth

73 de Jeff

> db_tool db_stat -CA
> 
> Default locking region information:
> 629     Last allocated locker ID
> 0x7fffffff      Current maximum unused locker ID
> 5       Number of lock modes
> 1000    Maximum number of locks possible
> 1000    Maximum number of lockers possible
> 1000    Maximum number of lock objects possible
> 160     Number of lock object partitions
> 0       Number of current locks
> 48      Maximum number of locks at any one time
> 4       Maximum number of locks in any one bucket
> 0       Maximum number of locks stolen by for an empty partition
> 0       Maximum number of locks stolen for any one partition
> 0       Number of current lockers
> 12      Maximum number of lockers at any one time
> 0       Number of current lock objects
> 34      Maximum number of lock objects at any one time
> 1       Maximum number of lock objects in any one bucket
> 0       Maximum number of objects stolen by for an empty partition
> 0       Maximum number of objects stolen for any one partition
> 12898   Total number of locks requested
> 12898   Total number of locks released
> 90      Total number of locks upgraded
> 495     Total number of locks downgraded
> 4       Lock requests not available due to conflicts, for which we waited
> 0       Lock requests not available due to conflicts, for which we did not 
> wait
> 0       Number of deadlocks
> 0       Lock timeout value
> 0       Number of locks that have timed out
> 0       Transaction timeout value
> 0       Number of transactions that have timed out
> 512KB   The size of the lock region
> 11      The number of partition locks that required waiting (0%)
> 7       The maximum number of times any partition lock was waited for (0%)
> 0       The number of object queue operations that required waiting (0%)
> 3       The number of locker allocations that required waiting (0%)
> 0       The number of region locks that required waiting (0%)
> 1       Maximum hash bucket length
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Lock REGINFO information:
> Lock    Region type
> 4       Region ID
> __db.004        Region name
> 0xfed00000      Original region address
> 0xfed00000      Region address
> 0xfed000c8      Region primary address
> 0       Region maximum allocation
> 0       Region allocated
> Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
> Allocations by power-of-two sizes:
>   1KB   3002
>   2KB   0
>   4KB   0
>   8KB   1
>  16KB   2
>  32KB   0
>  64KB   1
> 128KB   0
> 256KB   0
> 512KB   0
> 1024KB  0
> REGION_JOIN_OK  Region flags
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Lock region parameters:
> 283     Lock region region mutex [0/223 0% 23806/0]
> 1031    locker table size
> 1031    object table size
> 560     obj_off
> 69560   locker_off
> 1       need_dd
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Lock conflict matrix:
> 0       0       0       0       0
> 0       0       1       0       0
> 0       1       1       1       1
> 0       0       0       0       0
> 0       0       1       0       1
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Locks grouped by lockers:
> Locker   Mode      Count Status  ----------------- Object ---------------
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Locks grouped by object:
> Locker   Mode      Count Status  ----------------- Object ---------------
> 
> Regards
> Roman
> 
> 
> On 08/31/2010 02:20 PM, Jeff Johnson wrote:
>> 
>> 
>> On Aug 31, 2010, at 4:23 AM, Roman Gächter wrote:
>> 
>>> Hello
>>> 
>>> We run into serious problems with OpenPKG 4.0.2 and 4.0.7 on solaris 10 
>>> SPARC in "solaris whole rooted" zones.
>>> 
>>> After some days of normal operation our rpm db gets corrupted.
>>> 
>>> 
>> 
>> (aside)
>> "Corrupted" means data loss or integrity failure technically
>> (although its quite common to hear "corruption" applied
>> to Berkeley DB with other meanings.)
>> 
>> Is there data loss or integrity failure? Running db_verify
>> 
>>  cd /var/lib/rpm
>>  db_verify Packages
>> 
>> against Packages or other "tables" (Berkeley DB calls
>> "tables" databases) will detect integrity failures.
>> 
>> Data loss can be detected by, say, counting the number
>> of packages installed like
>>  rpm -qa | wc -l
>>> We see hanging processes from the the openpkg rc facility.
>>> The openpkg rc script tests the rpm db with a "openpkg rpm -q openpkg" 
>>> query.
>>> These queries hang up.
>>> 
>> 
>> Hang up how? Can you add -vv to a query? There is also
>>  cd  /var/lib/rpm
>>  db_stat -CA # <-- -CA displays all info, -Cl displays lock info
>> 
>> Note that you MUST use db_verify/db_stat for the same version of
>> Berkeley DB used by RPM. The tools for the same version of Berkeley DB if 
>> internal
>> to RPM are released with RPM.
>>> When this happens the permission of the rpm lockfiles RPM/DB/__db.001 .002 
>>> .003 changes to root.
>>> 
>>> 
>> An rpmdb is protected by permissions on /var/lib/rpm.
>> 
>> The associated __db* files are created as needed.
>> 
>> I would expect "root" as owner typically, because /var/lib/rpm is
>> usually writable only by "root", and rpm installs are typically run as 
>> "root".
>> 
>> But the openpkg wrapper appears to have a different ownership model.
>>> To recover from this problem we have to terminate the rpm queries, cleanup 
>>> and rebuild the rpm db and to change back
>>> the permissions of the rpm lockfiles to the openpkg user.
>>> 
>>> The building of our openpkg4 binary package could be done without problems 
>>> with the original openpkg-4.0.7 source package.
>>> We are using the VALUE license.
>>> 
>>> Does anyone had similar problems?
>>> Any hints to resolve?
>>> 
>>> 
>> 
>> Identifying what is causing the "hang" is what is needed first.
>> 
>> hth
>> 
>> 73 de Jeff
>>> Best regards
>>> Roman
>>> 
>> 

Reply via email to