Re: About QUOTA support in stock kernel (resent)

2012-12-25 Thread Patrick Dung
Hi,

I would like to know why quota is not enabled in the stock kernel..

I remembered that it is not enabled since freebsd 3.5 or freebsd 4 generation.
Now in freebsd 9.0, it still neeed a kernel rebuild.

I have heard it has performance issue (GIANT lock) about quota.

Regards,
Patrick



--- On Sat, 12/22/12, Patrick Dung patrick_...@yahoo.com.hk wrote:

From: Patrick Dung patrick_...@yahoo.com.hk
Subject: About QUOTA support in stock kernel
To: freebsd-questi...@freebsd.org, freebsd hackers 
freebsd-hackers@freebsd.org
Date: Saturday, December 22, 2012, 1:35 AM

Hi,

I would like to know why quota is not enabled in the stock kernel..

I remembered that it is not enabled since freebsd 3.5 or freebsd 4 generation.
Now in freebsd 9.0, it still neeed a kernel rebuild.

I have heard it has performance issue (GIANT lock) about quota.

Regards,
Patrick
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: About QUOTA support in stock kernel (resent)

2012-12-25 Thread Konstantin Belousov
On Tue, Dec 25, 2012 at 09:34:30PM +0800, Patrick Dung wrote:
 Hi,
 
 I would like to know why quota is not enabled in the stock kernel..
 
 I remembered that it is not enabled since freebsd 3.5 or freebsd 4 generation.
 Now in freebsd 9.0, it still neeed a kernel rebuild.
 
 I have heard it has performance issue (GIANT lock) about quota.

Enabling quota by default would cause small overhead, like one mutex acquire,
for each inode and block alloc/dealloc, even for mount without quotas enabled.

Might be, it is reasonable to just enable it now. Unless somebody provide
valid objections and I do not forget, I will do it in a week for HEAD.


pgp9ipGxCUQ5A.pgp
Description: PGP signature


Re: About QUOTA support in stock kernel (resent)

2012-12-25 Thread Eitan Adler
On 25 December 2012 10:07, Konstantin Belousov kostik...@gmail.com wrote:
 Enabling quota by default would cause small overhead, like one mutex acquire,
 for each inode and block alloc/dealloc, even for mount without quotas enabled.

Why is this, and can it be avoided (for mounts without quotas)?

 Might be, it is reasonable to just enable it now. Unless somebody provide
 valid objections and I do not forget, I will do it in a week for HEAD.



-- 
Eitan Adler
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: About QUOTA support in stock kernel (resent)

2012-12-25 Thread Konstantin Belousov
On Tue, Dec 25, 2012 at 10:23:26AM -0500, Eitan Adler wrote:
 On 25 December 2012 10:07, Konstantin Belousov kostik...@gmail.com wrote:
  Enabling quota by default would cause small overhead, like one mutex 
  acquire,
  for each inode and block alloc/dealloc, even for mount without quotas 
  enabled.
 
 Why is this, and can it be avoided (for mounts without quotas)?
Because system should check whether quota is enabled to do the accounting.

 
  Might be, it is reasonable to just enable it now. Unless somebody provide
  valid objections and I do not forget, I will do it in a week for HEAD.
 
 
 
 -- 
 Eitan Adler


pgpGnpuB7k1Hj.pgp
Description: PGP signature


Re: looking for someone to fix humanize_number (test cases included)

2012-12-25 Thread Clifton Royston
On Mon, Dec 24, 2012 at 12:00:01PM +, freebsd-hackers-requ...@freebsd.org 
wrote:
 Date: Sun, 23 Dec 2012 00:32:20 -0800
 From: John-Mark Gurney j...@funkthat.com
 To: hack...@freebsd.org
 Subject: looking for someone to fix humanize_number (test cases
   included)
 Message-ID: 20121223083220.gl1...@funkthat.com
 Content-Type: text/plain; charset=us-ascii
 
 I'm looking for a person who is interested in fixing up humanize_number.
 
 The other day I copied some data from a 7.2-R box to a 9.1-stable box
 and did a du -shc and a du-skc to check the results...  I noticed the -h
 run dropped from 11M to 10M, which I thought was weird...  Then I looked
 at the results from the -k run, but the new machine had a larger result
 (I copied from UFS to ZFS)...  It turns out that humanize_number was
 broken when doing rounding...  No longer does humanize_number round up
 at .5 or more of the prefix..
 
 So I decided to write a test program to test the output, and now I'm even
 more surprised by the output...  Neither 7.2-R nor 10-current give what
 I expect are the correct results...
 
 Feel free to take a look at the test program posted to:
 http://people.freebsd.org/~jmg/humanize_numbers/
 
 The .c contains what I think the output should be.
 
  I'm testing on 7.3R (yes, I know, I know, should be on 8 or 9) and
see similar results as to rounding problems; see below on the others.

 So far the bugs I know of:
 1) rounding is incorrect (started this whole search)
 2) buffer calculation is incorrect in some cases, index 11 should fit
but doesn't
 3) some cases zero is returned though it isn't zero, more like 0T for 512 G
(indexes 16, 17, 22, 23)

  I think these last are caused by integer wraparound and truncation in the
integer constant calculations of your test program, once you get beyond 1G. 
Even though it's an anachronism in these days of 3TB disks and 8GB RAM in
laptops, int is still 32 bits in most C implementations, giving a maximum
range of +/- 2G, and constant calculations are done with int by default. 
Your point 1 and 4 seem correct, at first glance.

  I've tweaked the test slightly to correct those cases - trailing L doesn't
do it, you must also prefix the constant value with (int64_t) - and it fixes
the dimensionless 0s for 16, 17, 22, and 23, and the buffer error for 11
(caused because it probably comes out with some really weird value after
truncation.)

  There's another brain-blip bug which took me a couple minutes of staring
at - your test skips over peta- and expects exa- (E) to come after
tera-.  Fixing that by replacing 1 E and 2 E with 1 P and 2 P
corrects a couple more errors.  I'm left with index 1-11 all showing one
less than expected (0 K for 1 K, and so on to 1 T for 2 T), and 25
and 27 showing the same problem - so at least it's down to just the rounding
problem.
  
  There's actually another problem implicit in the results from the rounding
problem - I think it should never yield 0 M instead of 512 K; for that
matter, I would think anything up to 999 K (divisor 1000) or 1023 K
(divisor 1024) should be represented with the smaller unit, not as 1 M.


 4) man page is missing required sys/types.h include
 
 I'll work to get the code into the tree once we get it in a good state.
 
  No promises as I'm chronically oversubscribed but it's intrigued me and
I'll take a look.

 Please cc me as I'm not subscribed to -hackers.

  Done.

  -- Clifton

-- 
   Clifton Royston  --  clift...@iandicomputing.com / clift...@volcano.org
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: looking for someone to fix humanize_number (test cases included)

2012-12-25 Thread Clifton Royston
On Tue, Dec 25, 2012 at 07:20:37AM -1000, Clifton Royston wrote:
 On Mon, Dec 24, 2012 at 12:00:01PM +, freebsd-hackers-requ...@freebsd.org 
 wrote:
  From: John-Mark Gurney j...@funkthat.com
  To: hack...@freebsd.org
  Subject: looking for someone to fix humanize_number (test cases
  included)
  
  I'm looking for a person who is interested in fixing up humanize_number.
...
  So I decided to write a test program to test the output, and now I'm even
  more surprised by the output...  Neither 7.2-R nor 10-current give what
  I expect are the correct results...
  
  Feel free to take a look at the test program posted to:
  http://people.freebsd.org/~jmg/humanize_numbers/
  
  The .c contains what I think the output should be.
  
   I'm testing on 7.3R (yes, I know, I know, should be on 8 or 9) and
 see similar results as to rounding problems; see below on the others.
 
  So far the bugs I know of:
  1) rounding is incorrect (started this whole search)
...
  3) some cases zero is returned though it isn't zero, more like 0T for 512 G
 (indexes 16, 17, 22, 23)
 
   I think these last are caused by integer wraparound and truncation in the
 integer constant calculations of your test program, once you get beyond 1G. 
...
   There's another brain-blip bug which took me a couple minutes of staring
 at - your test skips over peta- and expects exa- (E) to come after
 tera-.  Fixing that by replacing 1 E and 2 E with 1 P and 2 P
 corrects a couple more errors.  I'm left with index 1-11 all showing one
 less than expected (0 K for 1 K, and so on to 1 T for 2 T), and 25
 and 27 showing the same problem - so at least it's down to just the rounding
 problem.
   
   There's actually another problem implicit in the results from the rounding
 problem - I think it should never yield 0 M instead of 512 K; for that
 matter, I would think anything up to 999 K (divisor 1000) or 1023 K
 (divisor 1024) should be represented with the smaller unit, not as 1 M.

  Having looked more closely at your test, I now see that it forces the
current behavior by setting the buffer length to 4, leaving room for
only 3 characters - so that part is reasonable.  

  I also realized that the flags and scale fields in the structure
initialization in the test code are swapped, which seemed to explain
some problems.  However, switching the order to the correct one, so
that the flags were actually used, revealed a lot more problems, for 
instance:

mismatch on index 1, got: 500, expected 1 K. (correct!)
mismatch on index 2, got: 500, expected 1 M.
mismatch on index 3, got: 500, expected 1 G.
...
mismatch on index 7, got: 150, expected 2 K.
mismatch on index 8, got: 150, expected 2 M.
...

 I now question whether it's working correctly with any flags other
than 0.  The man page states:

  The len argument must be at least 4 plus the length of suffix, in
   order to ensure a useful result is generated into buffer. 

which this satisfies but in fact larger sizes don't seem to be adequate
either; for example with a 6 char buffer:

mismatch on index 1, got: 500, expected 1 K. (correct!)
mismatch on index 2, got: 5, expected 1 M.
mismatch on index 3, got: 5, expected 1 G.
...
mismatch on index 11, got: 15000, expected 2 P.
mismatch on index 13, got: 512 , expected 1 K. (correct!)
mismatch on index 14, got: 52428, expected 1 M.
mismatch on index 15, got: 53687, expected 1 G.
...

  I am bemused.
  -- Clifton

-- 
   Clifton Royston  --  clift...@iandicomputing.com / clift...@volcano.org
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: looking for someone to fix humanize_number (test cases included)

2012-12-25 Thread Clifton Royston
On Tue, Dec 25, 2012 at 08:23:55AM -1000, Clifton Royston wrote:
 On Tue, Dec 25, 2012 at 07:20:37AM -1000, Clifton Royston wrote:
  On Mon, Dec 24, 2012 at 12:00:01PM +, 
  freebsd-hackers-requ...@freebsd.org wrote:
   From: John-Mark Gurney j...@funkthat.com
   To: hack...@freebsd.org
   Subject: looking for someone to fix humanize_number (test cases
 included)
   
   I'm looking for a person who is interested in fixing up humanize_number.
 ...
   So I decided to write a test program to test the output, and now I'm even
   more surprised by the output...  Neither 7.2-R nor 10-current give what
   I expect are the correct results...
 ...
 
   I am bemused.

  I correct myself: the function works fine, and there are no bugs I
could find, though it's clear the man page could emphasize the correct
usage a bit more.

  I had to read the source several times and start on debugging it
before I understood the correct usage of the flag values with the scale
and flags parameters, despite the man page stating:

 The following flags may be passed in scale:

   HN_AUTOSCALE Format the buffer using the lowest multiplier pos-
sible.
   HN_GETSCALE  Return the prefix index number (the number of
times number must be divided to fit) instead of
formatting it to the buffer.

 The following flags may be passed in flags:

   HN_DECIMAL   If the final result is less than 10, display it
using one digit.
...
   HN_DIVISOR_1000  Divide number with 1000 instead of 1024.

  That is, certain flags must be passed in flags and others must only
be passed in scale - a bit counter-intuitive.  Also, scale == 0 is
clearly not interpreted as AUTOSCALE, but I am not yet clear how it is
being handled - it seems somewhat like AUTOSCALE but not identical.

  When the test program constant table is updated to pass the scale
flags as specified, as well as fixing the bugs mentioned in the
previous emails, it all passes except for the one (intentional?)
inconsistency that k is used in place of K if HN_DECIMAL is
enabled.

  The bug in the transfer speed results which prompted this inquiry
suggests that perhaps some clients of humanize_number in the codebase
are also passing the scale parameters incorrectly.  I would propose
accepting HN_AUTOSCALE and HN_GETSCALE in the flags field (they don't
overlap with other values) while continuing to accept them in the scale
field for backwards compatibility.  Trivial diff below.

  -- Clifton

-- 
--- /usr/src/lib/libutil/humanize_number.c  2010-12-28 09:36:31.0 
-1000
+++ humanize_number.c   2012-12-25 09:36:36.0 -1000
@@ -54,7 +54,7 @@
 const char *suffix, int scale, int flags)
 {
const char *prefixes, *sep;
-   int b, i, r, maxscale, s1, s2, sign;
+   int b, i, r, maxscale, s1, s2, sign, autoscale, getscale;
int64_t divisor, max;
size_t  baselen;
 
@@ -84,8 +84,10 @@
 #defineSCALE2PREFIX(scale) (prefixes[(scale)  1])
maxscale = 7;
 
+   autoscale = (flags | scale)  HN_AUTOSCALE;
+   getscale  = (flags | scale)  HN_GETSCALE;
if (scale = maxscale 
-   (scale  (HN_AUTOSCALE | HN_GETSCALE)) == 0)
+   (autoscale | getscale) == 0)
return (-1);
 
if (buf == NULL || suffix == NULL)
@@ -114,7 +116,7 @@
if (len  baselen + 1)
return (-1);
 
-   if (scale  (HN_AUTOSCALE | HN_GETSCALE)) {
+   if (autoscale | getscale) {
/* See if there is additional columns can be used. */
for (max = 100, i = len - baselen; i--  0;)
max *= 10;
@@ -127,7 +129,7 @@
for (i = 0; bytes = max - 50  i  maxscale; i++)
bytes /= divisor;
 
-   if (scale  HN_GETSCALE)
+   if (getscale)
return (i);
} else
for (i = 0; i  scale  i  maxscale; i++)
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: looking for someone to fix humanize_number (test cases included)

2012-12-25 Thread Eitan Adler
On 25 December 2012 14:46, Clifton Royston clift...@volcano.org wrote:
   I correct myself: the function works fine, and there are no bugs I
 could find, though it's clear the man page could emphasize the correct
 usage a bit more.

Can you submit a diff to the man page as well? I figure if you got
confused at least 10 others got even more confused.



-- 
Eitan Adler
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD for serious performance?

2012-12-25 Thread Dieter BSD
 Which device drivers?  We can't fix problems we don't know about.

ata(4) completely hung the system for 19 minutes (at which point
I manually intervened, see the PR), probably an infinite loop.

http://www.freebsd.org/cgi/query-pr.cgi?pr=170675

Siis(4) and ahci(4) have also caused data loss, presumably by
blocking interrupts for too long.

Improving these drivers would be wonderful. But better yet,
can we please find a way to fix the underlying problem?

When a device driver handles an interrupt, it needs to block
further interrupts while it modifies its data structures. Otherwise
another interrupt coming in might cause it to mangle the data.
Right? But! Why does it need to block interrupts for everything?
Why does a disk driver need to block interrupts from Ethernet?
Why does Ethernet need to block Firewire? Why does Firewire
need to block USB? And so on. Can't the disk driver block just
its own interrupts and leave the other devices alone?

That way, when some device driver writer puts in DELAY(TOO_LONG),
at least the other devices will still work.

Alternately, why couldn't the data structures be protected with
a mutex? Then the drivers shouldn't have to block even themselves.

Alternately, why can't drivers have a polling option?
Yes, the extra overhead of polling sucks, but losing incoming
data sucks a lot more. I am not suggesting that polling should
be the default, just an option for those who need it.

Alternately, some method I haven't thought of

Current machines can have multiple disks, multiple Ethernets,
multiple pretty-much-any-device, multiple CPUs, etc. etc.
We have SMP kernel to juggle those multiple CPUs. But we still
have this absurd bottleneck where the device drivers bring
everything to a screaching halt every time an interrupt happens.
And if the driver has a bug, or thinks there is a problem and
decides to keep DELAY()ing over and over, the entire machine
just locks up and stays locked up, often forever.

It isn't just me. I have seen quite a few threads where other
people are having the same problem.

This needs to be fixed.

(Fixing this is at *least* a Usenix paper.)
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: FreeBSD for serious performance?

2012-12-25 Thread Adrian Chadd
Hi,

If the driver is doing something daft like DELAY(x) in a fast
interrupt handler which would lead to that behaviour, it should be
fixed.

If it's doing a DELAY(x) in a critical section, it shuld be fixed.

Otherwise, a DELAY(x) in a driver only chews CPU; the scheduler can
preempt that. I don't agree with this behaviour, but it's possible.

Now, it's quite likely you hit some kind of ata(4) bug which kept it
in a tight loop without some kind of too many errors; bailing
behaviour. I'm not an ata driver person; i have no idea why it's doing
that.

The driver shouldn't be disabling interrupts for other devices. That
happens in critical sections and when doing lock operations. ata(4)
doesn't call critical_* in the driver code. So it was likely just spun
in some high priority loop that nothing lower-priority could really do
anything about.

The next time it happens, please break into the debugger and grab some
debugging output. Show alllocks, ps, should be a good couple of things
to start with.

Alternately - please find a currently actively maintained SATA chipset.

(Or Alternatively - step up and help migrate the nvidia chipset
support out of ata(4).)



Adrian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: About QUOTA support in stock kernel (resent)

2012-12-25 Thread Patrick Dung
I am curious if other operating systems have this performance impact.

Could we have some workaround or need some code improvement?
For example:
Do the checking/accounting only if the specific mount point has enabled quota.
etc..

Regards,
Patrick

--- On Tue, 12/25/12, Konstantin Belousov kostik...@gmail.com wrote:

From: Konstantin Belousov kostik...@gmail.com
Subject: Re: About QUOTA support in stock kernel (resent)
To: Eitan Adler li...@eitanadler.com
Cc: Patrick Dung patrick_...@yahoo.com.hk, freebsd hackers 
freebsd-hackers@freebsd.org, f...@freebsd.org
Date: Tuesday, December 25, 2012, 11:29 PM

On Tue, Dec 25, 2012 at 10:23:26AM -0500, Eitan Adler wrote:
 On 25 December 2012 10:07, Konstantin Belousov kostik...@gmail.com wrote:
  Enabling quota by default would cause small overhead, like one mutex 
  acquire,
  for each inode and block alloc/dealloc, even for mount without quotas 
  enabled.
 
 Why is this, and can it be avoided (for mounts without quotas)?
Because system should check whether quota is enabled to do the accounting.

 
  Might be, it is reasonable to just enable it now. Unless somebody provide
  valid objections and I do not forget, I will do it in a week for HEAD.
 
 
 
 -- 
 Eitan Adler
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org