Re: About QUOTA support in stock kernel (resent)
Hi, I would like to know why quota is not enabled in the stock kernel.. I remembered that it is not enabled since freebsd 3.5 or freebsd 4 generation. Now in freebsd 9.0, it still neeed a kernel rebuild. I have heard it has performance issue (GIANT lock) about quota. Regards, Patrick --- On Sat, 12/22/12, Patrick Dung patrick_...@yahoo.com.hk wrote: From: Patrick Dung patrick_...@yahoo.com.hk Subject: About QUOTA support in stock kernel To: freebsd-questi...@freebsd.org, freebsd hackers freebsd-hackers@freebsd.org Date: Saturday, December 22, 2012, 1:35 AM Hi, I would like to know why quota is not enabled in the stock kernel.. I remembered that it is not enabled since freebsd 3.5 or freebsd 4 generation. Now in freebsd 9.0, it still neeed a kernel rebuild. I have heard it has performance issue (GIANT lock) about quota. Regards, Patrick ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: About QUOTA support in stock kernel (resent)
On Tue, Dec 25, 2012 at 09:34:30PM +0800, Patrick Dung wrote: Hi, I would like to know why quota is not enabled in the stock kernel.. I remembered that it is not enabled since freebsd 3.5 or freebsd 4 generation. Now in freebsd 9.0, it still neeed a kernel rebuild. I have heard it has performance issue (GIANT lock) about quota. Enabling quota by default would cause small overhead, like one mutex acquire, for each inode and block alloc/dealloc, even for mount without quotas enabled. Might be, it is reasonable to just enable it now. Unless somebody provide valid objections and I do not forget, I will do it in a week for HEAD. pgp9ipGxCUQ5A.pgp Description: PGP signature
Re: About QUOTA support in stock kernel (resent)
On 25 December 2012 10:07, Konstantin Belousov kostik...@gmail.com wrote: Enabling quota by default would cause small overhead, like one mutex acquire, for each inode and block alloc/dealloc, even for mount without quotas enabled. Why is this, and can it be avoided (for mounts without quotas)? Might be, it is reasonable to just enable it now. Unless somebody provide valid objections and I do not forget, I will do it in a week for HEAD. -- Eitan Adler ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: About QUOTA support in stock kernel (resent)
On Tue, Dec 25, 2012 at 10:23:26AM -0500, Eitan Adler wrote: On 25 December 2012 10:07, Konstantin Belousov kostik...@gmail.com wrote: Enabling quota by default would cause small overhead, like one mutex acquire, for each inode and block alloc/dealloc, even for mount without quotas enabled. Why is this, and can it be avoided (for mounts without quotas)? Because system should check whether quota is enabled to do the accounting. Might be, it is reasonable to just enable it now. Unless somebody provide valid objections and I do not forget, I will do it in a week for HEAD. -- Eitan Adler pgpGnpuB7k1Hj.pgp Description: PGP signature
Re: looking for someone to fix humanize_number (test cases included)
On Mon, Dec 24, 2012 at 12:00:01PM +, freebsd-hackers-requ...@freebsd.org wrote: Date: Sun, 23 Dec 2012 00:32:20 -0800 From: John-Mark Gurney j...@funkthat.com To: hack...@freebsd.org Subject: looking for someone to fix humanize_number (test cases included) Message-ID: 20121223083220.gl1...@funkthat.com Content-Type: text/plain; charset=us-ascii I'm looking for a person who is interested in fixing up humanize_number. The other day I copied some data from a 7.2-R box to a 9.1-stable box and did a du -shc and a du-skc to check the results... I noticed the -h run dropped from 11M to 10M, which I thought was weird... Then I looked at the results from the -k run, but the new machine had a larger result (I copied from UFS to ZFS)... It turns out that humanize_number was broken when doing rounding... No longer does humanize_number round up at .5 or more of the prefix.. So I decided to write a test program to test the output, and now I'm even more surprised by the output... Neither 7.2-R nor 10-current give what I expect are the correct results... Feel free to take a look at the test program posted to: http://people.freebsd.org/~jmg/humanize_numbers/ The .c contains what I think the output should be. I'm testing on 7.3R (yes, I know, I know, should be on 8 or 9) and see similar results as to rounding problems; see below on the others. So far the bugs I know of: 1) rounding is incorrect (started this whole search) 2) buffer calculation is incorrect in some cases, index 11 should fit but doesn't 3) some cases zero is returned though it isn't zero, more like 0T for 512 G (indexes 16, 17, 22, 23) I think these last are caused by integer wraparound and truncation in the integer constant calculations of your test program, once you get beyond 1G. Even though it's an anachronism in these days of 3TB disks and 8GB RAM in laptops, int is still 32 bits in most C implementations, giving a maximum range of +/- 2G, and constant calculations are done with int by default. Your point 1 and 4 seem correct, at first glance. I've tweaked the test slightly to correct those cases - trailing L doesn't do it, you must also prefix the constant value with (int64_t) - and it fixes the dimensionless 0s for 16, 17, 22, and 23, and the buffer error for 11 (caused because it probably comes out with some really weird value after truncation.) There's another brain-blip bug which took me a couple minutes of staring at - your test skips over peta- and expects exa- (E) to come after tera-. Fixing that by replacing 1 E and 2 E with 1 P and 2 P corrects a couple more errors. I'm left with index 1-11 all showing one less than expected (0 K for 1 K, and so on to 1 T for 2 T), and 25 and 27 showing the same problem - so at least it's down to just the rounding problem. There's actually another problem implicit in the results from the rounding problem - I think it should never yield 0 M instead of 512 K; for that matter, I would think anything up to 999 K (divisor 1000) or 1023 K (divisor 1024) should be represented with the smaller unit, not as 1 M. 4) man page is missing required sys/types.h include I'll work to get the code into the tree once we get it in a good state. No promises as I'm chronically oversubscribed but it's intrigued me and I'll take a look. Please cc me as I'm not subscribed to -hackers. Done. -- Clifton -- Clifton Royston -- clift...@iandicomputing.com / clift...@volcano.org President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: looking for someone to fix humanize_number (test cases included)
On Tue, Dec 25, 2012 at 07:20:37AM -1000, Clifton Royston wrote: On Mon, Dec 24, 2012 at 12:00:01PM +, freebsd-hackers-requ...@freebsd.org wrote: From: John-Mark Gurney j...@funkthat.com To: hack...@freebsd.org Subject: looking for someone to fix humanize_number (test cases included) I'm looking for a person who is interested in fixing up humanize_number. ... So I decided to write a test program to test the output, and now I'm even more surprised by the output... Neither 7.2-R nor 10-current give what I expect are the correct results... Feel free to take a look at the test program posted to: http://people.freebsd.org/~jmg/humanize_numbers/ The .c contains what I think the output should be. I'm testing on 7.3R (yes, I know, I know, should be on 8 or 9) and see similar results as to rounding problems; see below on the others. So far the bugs I know of: 1) rounding is incorrect (started this whole search) ... 3) some cases zero is returned though it isn't zero, more like 0T for 512 G (indexes 16, 17, 22, 23) I think these last are caused by integer wraparound and truncation in the integer constant calculations of your test program, once you get beyond 1G. ... There's another brain-blip bug which took me a couple minutes of staring at - your test skips over peta- and expects exa- (E) to come after tera-. Fixing that by replacing 1 E and 2 E with 1 P and 2 P corrects a couple more errors. I'm left with index 1-11 all showing one less than expected (0 K for 1 K, and so on to 1 T for 2 T), and 25 and 27 showing the same problem - so at least it's down to just the rounding problem. There's actually another problem implicit in the results from the rounding problem - I think it should never yield 0 M instead of 512 K; for that matter, I would think anything up to 999 K (divisor 1000) or 1023 K (divisor 1024) should be represented with the smaller unit, not as 1 M. Having looked more closely at your test, I now see that it forces the current behavior by setting the buffer length to 4, leaving room for only 3 characters - so that part is reasonable. I also realized that the flags and scale fields in the structure initialization in the test code are swapped, which seemed to explain some problems. However, switching the order to the correct one, so that the flags were actually used, revealed a lot more problems, for instance: mismatch on index 1, got: 500, expected 1 K. (correct!) mismatch on index 2, got: 500, expected 1 M. mismatch on index 3, got: 500, expected 1 G. ... mismatch on index 7, got: 150, expected 2 K. mismatch on index 8, got: 150, expected 2 M. ... I now question whether it's working correctly with any flags other than 0. The man page states: The len argument must be at least 4 plus the length of suffix, in order to ensure a useful result is generated into buffer. which this satisfies but in fact larger sizes don't seem to be adequate either; for example with a 6 char buffer: mismatch on index 1, got: 500, expected 1 K. (correct!) mismatch on index 2, got: 5, expected 1 M. mismatch on index 3, got: 5, expected 1 G. ... mismatch on index 11, got: 15000, expected 2 P. mismatch on index 13, got: 512 , expected 1 K. (correct!) mismatch on index 14, got: 52428, expected 1 M. mismatch on index 15, got: 53687, expected 1 G. ... I am bemused. -- Clifton -- Clifton Royston -- clift...@iandicomputing.com / clift...@volcano.org President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: looking for someone to fix humanize_number (test cases included)
On Tue, Dec 25, 2012 at 08:23:55AM -1000, Clifton Royston wrote: On Tue, Dec 25, 2012 at 07:20:37AM -1000, Clifton Royston wrote: On Mon, Dec 24, 2012 at 12:00:01PM +, freebsd-hackers-requ...@freebsd.org wrote: From: John-Mark Gurney j...@funkthat.com To: hack...@freebsd.org Subject: looking for someone to fix humanize_number (test cases included) I'm looking for a person who is interested in fixing up humanize_number. ... So I decided to write a test program to test the output, and now I'm even more surprised by the output... Neither 7.2-R nor 10-current give what I expect are the correct results... ... I am bemused. I correct myself: the function works fine, and there are no bugs I could find, though it's clear the man page could emphasize the correct usage a bit more. I had to read the source several times and start on debugging it before I understood the correct usage of the flag values with the scale and flags parameters, despite the man page stating: The following flags may be passed in scale: HN_AUTOSCALE Format the buffer using the lowest multiplier pos- sible. HN_GETSCALE Return the prefix index number (the number of times number must be divided to fit) instead of formatting it to the buffer. The following flags may be passed in flags: HN_DECIMAL If the final result is less than 10, display it using one digit. ... HN_DIVISOR_1000 Divide number with 1000 instead of 1024. That is, certain flags must be passed in flags and others must only be passed in scale - a bit counter-intuitive. Also, scale == 0 is clearly not interpreted as AUTOSCALE, but I am not yet clear how it is being handled - it seems somewhat like AUTOSCALE but not identical. When the test program constant table is updated to pass the scale flags as specified, as well as fixing the bugs mentioned in the previous emails, it all passes except for the one (intentional?) inconsistency that k is used in place of K if HN_DECIMAL is enabled. The bug in the transfer speed results which prompted this inquiry suggests that perhaps some clients of humanize_number in the codebase are also passing the scale parameters incorrectly. I would propose accepting HN_AUTOSCALE and HN_GETSCALE in the flags field (they don't overlap with other values) while continuing to accept them in the scale field for backwards compatibility. Trivial diff below. -- Clifton -- --- /usr/src/lib/libutil/humanize_number.c 2010-12-28 09:36:31.0 -1000 +++ humanize_number.c 2012-12-25 09:36:36.0 -1000 @@ -54,7 +54,7 @@ const char *suffix, int scale, int flags) { const char *prefixes, *sep; - int b, i, r, maxscale, s1, s2, sign; + int b, i, r, maxscale, s1, s2, sign, autoscale, getscale; int64_t divisor, max; size_t baselen; @@ -84,8 +84,10 @@ #defineSCALE2PREFIX(scale) (prefixes[(scale) 1]) maxscale = 7; + autoscale = (flags | scale) HN_AUTOSCALE; + getscale = (flags | scale) HN_GETSCALE; if (scale = maxscale - (scale (HN_AUTOSCALE | HN_GETSCALE)) == 0) + (autoscale | getscale) == 0) return (-1); if (buf == NULL || suffix == NULL) @@ -114,7 +116,7 @@ if (len baselen + 1) return (-1); - if (scale (HN_AUTOSCALE | HN_GETSCALE)) { + if (autoscale | getscale) { /* See if there is additional columns can be used. */ for (max = 100, i = len - baselen; i-- 0;) max *= 10; @@ -127,7 +129,7 @@ for (i = 0; bytes = max - 50 i maxscale; i++) bytes /= divisor; - if (scale HN_GETSCALE) + if (getscale) return (i); } else for (i = 0; i scale i maxscale; i++) ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: looking for someone to fix humanize_number (test cases included)
On 25 December 2012 14:46, Clifton Royston clift...@volcano.org wrote: I correct myself: the function works fine, and there are no bugs I could find, though it's clear the man page could emphasize the correct usage a bit more. Can you submit a diff to the man page as well? I figure if you got confused at least 10 others got even more confused. -- Eitan Adler ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD for serious performance?
Which device drivers? We can't fix problems we don't know about. ata(4) completely hung the system for 19 minutes (at which point I manually intervened, see the PR), probably an infinite loop. http://www.freebsd.org/cgi/query-pr.cgi?pr=170675 Siis(4) and ahci(4) have also caused data loss, presumably by blocking interrupts for too long. Improving these drivers would be wonderful. But better yet, can we please find a way to fix the underlying problem? When a device driver handles an interrupt, it needs to block further interrupts while it modifies its data structures. Otherwise another interrupt coming in might cause it to mangle the data. Right? But! Why does it need to block interrupts for everything? Why does a disk driver need to block interrupts from Ethernet? Why does Ethernet need to block Firewire? Why does Firewire need to block USB? And so on. Can't the disk driver block just its own interrupts and leave the other devices alone? That way, when some device driver writer puts in DELAY(TOO_LONG), at least the other devices will still work. Alternately, why couldn't the data structures be protected with a mutex? Then the drivers shouldn't have to block even themselves. Alternately, why can't drivers have a polling option? Yes, the extra overhead of polling sucks, but losing incoming data sucks a lot more. I am not suggesting that polling should be the default, just an option for those who need it. Alternately, some method I haven't thought of Current machines can have multiple disks, multiple Ethernets, multiple pretty-much-any-device, multiple CPUs, etc. etc. We have SMP kernel to juggle those multiple CPUs. But we still have this absurd bottleneck where the device drivers bring everything to a screaching halt every time an interrupt happens. And if the driver has a bug, or thinks there is a problem and decides to keep DELAY()ing over and over, the entire machine just locks up and stays locked up, often forever. It isn't just me. I have seen quite a few threads where other people are having the same problem. This needs to be fixed. (Fixing this is at *least* a Usenix paper.) ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD for serious performance?
Hi, If the driver is doing something daft like DELAY(x) in a fast interrupt handler which would lead to that behaviour, it should be fixed. If it's doing a DELAY(x) in a critical section, it shuld be fixed. Otherwise, a DELAY(x) in a driver only chews CPU; the scheduler can preempt that. I don't agree with this behaviour, but it's possible. Now, it's quite likely you hit some kind of ata(4) bug which kept it in a tight loop without some kind of too many errors; bailing behaviour. I'm not an ata driver person; i have no idea why it's doing that. The driver shouldn't be disabling interrupts for other devices. That happens in critical sections and when doing lock operations. ata(4) doesn't call critical_* in the driver code. So it was likely just spun in some high priority loop that nothing lower-priority could really do anything about. The next time it happens, please break into the debugger and grab some debugging output. Show alllocks, ps, should be a good couple of things to start with. Alternately - please find a currently actively maintained SATA chipset. (Or Alternatively - step up and help migrate the nvidia chipset support out of ata(4).) Adrian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: About QUOTA support in stock kernel (resent)
I am curious if other operating systems have this performance impact. Could we have some workaround or need some code improvement? For example: Do the checking/accounting only if the specific mount point has enabled quota. etc.. Regards, Patrick --- On Tue, 12/25/12, Konstantin Belousov kostik...@gmail.com wrote: From: Konstantin Belousov kostik...@gmail.com Subject: Re: About QUOTA support in stock kernel (resent) To: Eitan Adler li...@eitanadler.com Cc: Patrick Dung patrick_...@yahoo.com.hk, freebsd hackers freebsd-hackers@freebsd.org, f...@freebsd.org Date: Tuesday, December 25, 2012, 11:29 PM On Tue, Dec 25, 2012 at 10:23:26AM -0500, Eitan Adler wrote: On 25 December 2012 10:07, Konstantin Belousov kostik...@gmail.com wrote: Enabling quota by default would cause small overhead, like one mutex acquire, for each inode and block alloc/dealloc, even for mount without quotas enabled. Why is this, and can it be avoided (for mounts without quotas)? Because system should check whether quota is enabled to do the accounting. Might be, it is reasonable to just enable it now. Unless somebody provide valid objections and I do not forget, I will do it in a week for HEAD. -- Eitan Adler ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org