Replying to myself and top-posting...

A very helpful person contacted me off-list and pointed me to this PR that dates back to 2.2:

http://www.freebsd.org/cgi/query-pr.cgi?pr=2325

I'm going to submit a followup to that so that whomever claims it can see that it persists into at least 6.1.

In short, the machine I was rsyncing from had some very high UIDs and these seem to trip up the quota code. The big hint was the 4GB+ quota.user file. I'm still doing some more testing, but so far it looks very much like this bug was the root of all my problems.

Thanks,

Charles

On Fri, 7 Jul 2006, Charles Sprickman wrote:

Hello all,

I'm in the process of rolling out a new shell server and for numerous reasons have decided 6.x is the best fit (jail improvements, SMP improvements, 3Ware driver, pf). The shell server is within a jail, and the uids there are unique so that quotas remain sane. There are about 5000 active accounts using about 40GB of a 210GB partition. The quota.user file is about 4GB.

I just started work on getting quotas setup for everyone after rsyncing all the homedirs from the old server over. At first, all seemed well, then I ran into a few issues on subsequent rsyncs. I had people with large (1GB+) homedirs and quotas in the 1GB-4GB range and as rsync was chowning the files to the users it was throwing errors about "quota exceeded". Here's a brief example that illustrates what I was seeing:

[EMAIL PROTECTED]/home/staff/micro/tmp]# quota micro
Disk quotas for user micro (uid 5315):
Filesystem   usage   quota   limit   grace   files   quota   limit grace
    /       1630026 3000000 3100000         13393       0       0
[EMAIL PROTECTED]/home/staff/micro/tmp]# chown micro index.html
chown: index.html: Disc quota exceeded
[EMAIL PROTECTED]/home/staff/micro/tmp]#

I know in the past when I've seen inconsistencies indicating that I needed a manual run of quotacheck, they would show up in the output of the quota command; ie: the "quota" command would show the user had more usage than "du" would indicate. The above example is a bit odd - "quota" shows that he's well within his limits, but the kernel thinks otherwise.

Thinking it would be a good idea to stop the jails, turn off quotas, umount the partition, fsck it, mount it and then run quotacheck, I found more problems. My first run of quotacheck ran for a few minutes, reported many inconsistencies and then sat there for quite some time before spitting this out:

quotacheck: /jails/quota.user: seek failed: Invalid argument

Trying again, it reported the same inconsistencies then sat there for more than an hour taking up all the available CPU on the box until I killed it. The mtime on quota.user had not changed during the run.

Running it yet again now gives me this:

/jails:          fixed: inodes 27 -> 0  blocks 156 -> 0
quotacheck: /jails/quota.user: seek failed: Invalid argument
THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY:
       /dev/twed0s1g (/jails)

For now I can live without quotas, but if there's anything I can test from -stable that might address this I'd like to try it. I'd say this thing is still a good month from going live since we have lots of dependancy mess on the old box to clean up before cutting over.

Any ideas what's going on here? Is this related to the large number of users and the size of the partition? I've seen some of the discussions about snapshots + quotas, but that seems like an entirely different issue. For the time being I've killed "background_fsck" and "check_quotas" in rc.conf, and I'll avoid dumping that fs with the snapshot flag.

What other information can I provide to help better define where this bug lives?

Thanks,

Charles

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to