Re: PING: fix ARG_MAX

2005-09-21 Thread Christopher Faylor
On Wed, Sep 21, 2005 at 07:24:39AM -0600, Eric Blake wrote:
>According to Christopher Faylor on 9/20/2005 10:05 AM:
>>AFAICT, we're not talking about defaults.  We're talking about the
>>optimum setting.
>>
>>Your change to xargs doesn't permit me to go beyond 32K.  Personally,
>>I'd like to be able to override that.
>
>So would I.  See below.
>
>>I have a similar test which shows noticeable improvement when going
>>from 32K to 64K and miniscule-but-still-there improvements after that:
>
>Was this benchmark run on a modified xargs, or did you still suffer
>from the 32k limit?

It was a modified xargs and a modified cygwin to allow command line
lengths > 1M.  I would think that the fact that you see noticeable
timing differences between 32768 -> 262144 would make that pretty clear
that xargs was actually using these.

An unmodified xargs would have given errors if I attempted to use a
larger limit - hence my request to be allowed to use larger sizes.

cgf


Re: PING: fix ARG_MAX

2005-09-21 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Christopher Faylor on 9/20/2005 10:05 AM:
> AFAICT, we're not talking about defaults.  We're talking about the
> optimum setting.
> 
> Your change to xargs doesn't permit me to go beyond 32K.  Personally,
> I'd like to be able to override that.

So would I.  See below.

> 
> I have a similar test which shows noticeable improvement when going from
> 32K to 64K and miniscule-but-still-there improvements after that:

Was this benchmark run on a modified xargs, or did you still suffer from
the 32k limit?  xargs truncates the -s arg down to what it is told is the
system limit; use the undocumented xargs --show-limits to prove that you
are getting the buffer size you are requesting.

> 
> I am not really interested in providing a non-standard interface which
> would ultimately end up being used just by xargs.  That would mean that
> we're adding an interface to cygwin so that a UNIX program could work
> better with non-cygwin programs.  I think I've been pretty consistent in
> stating that I want to encumber cygwin as little as possible when it
> comes to accommodating non-cygwin programs.

POSIX allows extensions to sysconf and pathconf for a reason, but I can
understand if you are reluctant to add _PC_ARG_MAX.

> 
> If you want to keep the 32K limit, that's ok with me.  I'd just ask that
> you make it possible to override it.

My current findutils release just bypasses the _SC_ARG_MAX check
altogether with a hard-coded 32k upper limit to -s, without touching the
code that defaults to 128k (since xargs automatically trims its default
down to the results of its _SC_ARG_MAX check as needed).  But my next
release of findutils, after cygwin 1.5.19 is out (where all cygwin
processes and not just cygexec mount points get the larger cygwin arg
limits), will change the default from 128k to 32k, but use the normal
_SC_ARG_MAX as the upper limit of -s.  So maybe instead of having
_SC_ARG_MAX return 1 meg, you should make it even larger, since cygwin
processes really can pass more than 1 meg.
> 
> But, then, I suspect that this wasn't overrideable when I was providing
> xargs either so you can feel free to ignore my request.

Correct, your earlier releases of xargs could not exceed your hardcoded
ARG_MAX limitation either.

- --
Life is short - so eat dessert first!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDMV8X84KuGfSFAYARAjH3AJsFVfYmqzWBcqQyYNYYdwfRQjnykACeMzvB
GX41apLMG8QW9NyjslbhRjo=
=22kM
-END PGP SIGNATURE-


Re: PING: fix ARG_MAX

2005-09-20 Thread Christopher Faylor
On Tue, Sep 20, 2005 at 06:43:20AM -0600, Eric Blake wrote:
>According to Christopher Faylor on 9/19/2005 8:31 AM:
>>If this is really true, then the findutils configury should be
>>attempting some kind of timing which finds that magic point where it
>>should be ignoring _SC_ARG_MAX.  It shouldn't be vaguely assuming that
>>it is in its best interests to ignore it because someone thinks that
>>the cost of processing each argument outweighs the benefits of forking
>>fewer tests.
>
>POSIX allows xargs to have a default size (currently, xargs defaults to
>128k unless otherwise constrained by _SC_ARG_MAX), and that -s can
>change that size to anything within the range permitted by _SC_ARG_MAX.

AFAICT, we're not talking about defaults.  We're talking about the
optimum setting.

Your change to xargs doesn't permit me to go beyond 32K.  Personally,
I'd like to be able to override that.

>>Given that cost of forking is much more expensive on cygwin than on
>>other systems I really don't see how you can use this argument anyway
>>and, IMO, it doesn't make much sense on standard UNIX either.  If you
>>create more processes via fork you are invoking the OS and incurring
>>context switches.  You're still processing the same number of arguments
>>but you're just going to the OS to handle them more often.  I don't see
>>how that's ever a win.
>
>In isolation, no.  But it is what else you are doing with the arguments
>- the text processing of xargs to parse it into chunks, and the invoked
>utility's processing of its argv, that also consumes time.  Also, lots
>of data tends to imply more page faults, which can be as expensive as
>context switches anyways.

Context switches also imply page faults.

>> I'm willing to be proven wrong by hard data but I think that you and the
>> findutils mailing list shouldn't be making assumptions without data to
>> back them up.
>
>Did you not read the thread on bug-findutils?  Bob Proulx proposed a test
>that shows that there is NO MEASURABLE DIFFERENCE between a simple xargs
>beyond a certain -s:
>http://lists.gnu.org/archive/html/bug-findutils/2005-09/msg00038.html

No, I didn't read a thread in another mailing list.  Thank you for
providing references.

>Then I repeated the test on cygwin, and found similar results:
>http://lists.gnu.org/archive/html/bug-findutils/2005-09/msg00039.html
>
>There comes a point, where even when all xargs is doing is invoking echo,
>that the cost of passing that much information through pipes does overtake
>the cost of forks.

I have a similar test which shows noticeable improvement when going from
32K to 64K and miniscule-but-still-there improvements after that:

#!/bin/sh
export TIMEFORMAT='real %3lR  user %3lU  sys %3lS'
for i in 20480 32768 65536 131072 262144 524288 1048576 2097152 4194304; do
 time /bin/bash -c "/bin/head -n15 /tmp/files | /bin/xargs -s$i echo 
>/dev/null"
done

timing 20480: real 0m12.448s  user 0m18.408s  sys 0m7.223s
timing 32768: real 0m8.448s  user 0m12.811s  sys 0m4.890s
timing 65536: real 0m5.191s  user 0m8.472s  sys 0m3.085s
timing 131072: real 0m4.318s  user 0m5.908s  sys 0m1.665s
timing 262144: real 0m3.833s  user 0m4.841s  sys 0m1.213s
timing 524288: real 0m3.566s  user 0m3.900s  sys 0m1.078s
timing 1048576: real 0m3.478s  user 0m3.564s  sys 0m0.665s
timing 2097152: real 0m3.417s  user 0m3.039s  sys 0m0.821s
timing 4194304: real 0m3.395s  user 0m3.370s  sys 0m0.823s

/tmp/files is the output of 'find /' on my system.

I prefer my test because it measures the clock time of the entire
operation rather than just the amount of time taken by xargs. YMMV.

What I think you can take away from this is that you can't make
assumptions about an optimal size that will work for every system.

>However, I am also keen on providing a more reasonable -s behavior in
>xargs.  If cygwin were to have pathconf(filename, _PC_ARG_MAX), where a
>PATH search were done when filename does not contain '/', then pathconf
>could return 32k on Windows processes, and unlimited (or an actual known
>limit) for cygwin processes, so that xargs can then allow unlimited -s
>sizes for cygwin processes but cap windows processes at 32k and never
>encounter the E2BIG.

I am not really interested in providing a non-standard interface which
would ultimately end up being used just by xargs.  That would mean that
we're adding an interface to cygwin so that a UNIX program could work
better with non-cygwin programs.  I think I've been pretty consistent in
stating that I want to encumber cygwin as little as possible when it
comes to accommodating non-cygwin programs.

If you want to keep the 32K limit, that's ok with me.  I'd just ask that
you make it possible to override it.

But, then, I suspect that this wasn't overrideable when I was providing
xargs either so you can feel free to ignore my request.

cgf


Re: PING: fix ARG_MAX

2005-09-20 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Christopher Faylor on 9/19/2005 8:31 AM:
> If this is really true, then the findutils configury should be
> attempting some kind of timing which finds that magic point where it
> should be ignoring _SC_ARG_MAX.  It shouldn't be vaguely assuming that
> it is in its best interests to ignore it because someone thinks that the
> cost of processing each argument outweighs the benefits of forking fewer
> tests.

POSIX allows xargs to have a default size (currently, xargs defaults to
128k unless otherwise constrained by _SC_ARG_MAX), and that -s can change
that size to anything within the range permitted by _SC_ARG_MAX.

> 
> Given that cost of forking is much more expensive on cygwin than on
> other systems I really don't see how you can use this argument anyway
> and, IMO, it doesn't make much sense on standard UNIX either.  If you
> create more processes via fork you are invoking the OS and incurring
> context switches.  You're still processing the same number of arguments
> but you're just going to the OS to handle them more often.  I don't see
> how that's ever a win.

In isolation, no.  But it is what else you are doing with the arguments -
the text processing of xargs to parse it into chunks, and the invoked
utility's processing of its argv, that also consumes time.  Also, lots of
data tends to imply more page faults, which can be as expensive as context
switches anyways.

> 
> I'm willing to be proven wrong by hard data but I think that you and the
> findutils mailing list shouldn't be making assumptions without data to
> back them up.

Did you not read the thread on bug-findutils?  Bob Proulx proposed a test
that shows that there is NO MEASURABLE DIFFERENCE between a simple xargs
beyond a certain -s:
http://lists.gnu.org/archive/html/bug-findutils/2005-09/msg00038.html

Then I repeated the test on cygwin, and found similar results:
http://lists.gnu.org/archive/html/bug-findutils/2005-09/msg00039.html

There comes a point, where even when all xargs is doing is invoking echo,
that the cost of passing that much information through pipes does overtake
the cost of forks.

However, I am also keen on providing a more reasonable -s behavior in
xargs.  If cygwin were to have pathconf(filename, _PC_ARG_MAX), where a
PATH search were done when filename does not contain '/', then pathconf
could return 32k on Windows processes, and unlimited (or an actual known
limit) for cygwin processes, so that xargs can then allow unlimited -s
sizes for cygwin processes but cap windows processes at 32k and never
encounter the E2BIG.

- --
Life is short - so eat dessert first!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDMAPo84KuGfSFAYARAry+AKCrEPEhqsTIQwWKrLpNA2M1qC/dFACeLz9k
aPTSZXTkUZCHUkoDNIiPdxA=
=zS83
-END PGP SIGNATURE-


Re: PING: fix ARG_MAX

2005-09-19 Thread Christopher Faylor
On Mon, Sep 19, 2005 at 10:31:01AM -0400, Christopher Faylor wrote:
>On Mon, Sep 12, 2005 at 10:09:55PM -0600, Eric Blake wrote:
>>Also, the argument brought up on the findutils mailing list was that
>>beyond a certain size, the cost of processing each argument starts to
>>outweigh the benefits of forking fewer tasks, to the point that the
>>difference between a 32k ARG_MAX vs.  a 1M ARG_MAX falls in the noise
>>when the same amount of data is divided by xargs to as few runs as
>>possible, so a 32k limit is not really penalizing cygwin apps.
>
>If this is really true, then the findutils configury should be
>attempting some kind of timing which finds that magic point where it
>should be ignoring _SC_ARG_MAX.  It shouldn't be vaguely assuming that
>it is in its best interests to ignore it because someone thinks that the
>cost of processing each argument outweighs the benefits of forking fewer
>tests.
 tasks

cgf


Re: PING: fix ARG_MAX

2005-09-19 Thread Christopher Faylor
On Mon, Sep 12, 2005 at 10:09:55PM -0600, Eric Blake wrote:
>Also, the argument brought up on the findutils mailing list was that
>beyond a certain size, the cost of processing each argument starts to
>outweigh the benefits of forking fewer tasks, to the point that the
>difference between a 32k ARG_MAX vs.  a 1M ARG_MAX falls in the noise
>when the same amount of data is divided by xargs to as few runs as
>possible, so a 32k limit is not really penalizing cygwin apps.

If this is really true, then the findutils configury should be
attempting some kind of timing which finds that magic point where it
should be ignoring _SC_ARG_MAX.  It shouldn't be vaguely assuming that
it is in its best interests to ignore it because someone thinks that the
cost of processing each argument outweighs the benefits of forking fewer
tests.

Given that cost of forking is much more expensive on cygwin than on
other systems I really don't see how you can use this argument anyway
and, IMO, it doesn't make much sense on standard UNIX either.  If you
create more processes via fork you are invoking the OS and incurring
context switches.  You're still processing the same number of arguments
but you're just going to the OS to handle them more often.  I don't see
how that's ever a win.

I'm willing to be proven wrong by hard data but I think that you and the
findutils mailing list shouldn't be making assumptions without data to
back them up.

cgf


Re: PING: fix ARG_MAX

2005-09-12 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Corinna Vinschen on 9/12/2005 9:22 AM:
>>Even with your recent patches to make cygwin programs receive longer command
>>lines, whether or not they are not mounted cygexec, ARG_MAX should still 
>>reflect
>>the worst case limit so that programs (like xargs) that use ARG_MAX will work
>>reliably even when invoking non-cygwin programs that are really bound by the 
>>32k
>>limit.
> 
> 
> I had a short talk with Chris and we both agree that it doesn't make
> overly sense to go down to the lowest limit just to accomodate
> non-Cygwin applications.  Users of those apps can easily use xargs -s
> so why penalize Cygwin apps?

Well, for now, xargs in findutils-4.2.25-2 is already hardcoded to 32k
max; attempting to use -s to get a larger value will fail, because of the
POSIX rules placed on xargs.  If, on the other hand, cygwin added
pathconf(_PC_ARG_MAX) as a legal extension to POSIX, then xargs could use
its preferred 128k default when calling cygwin apps, while using 32k for
windows apps without even requiring users to supply -s; not to mention the
fact that -s could then be used to obtain larger command lines than even
the default 128k for cygwin apps.  With that extension in place,
sysconf(_SC_ARG_MAX) at 32k is not much of a limit for applications that
know about cygwin's extension.

Also, the argument brought up on the findutils mailing list was that
beyond a certain size, the cost of processing each argument starts to
outweigh the benefits of forking fewer tasks, to the point that the
difference between a 32k ARG_MAX vs. a 1M ARG_MAX falls in the noise when
the same amount of data is divided by xargs to as few runs as possible, so
a 32k limit is not really penalizing cygwin apps.

But since I have not provided a patch for pathconf(_PC_ARG_MAX), and I do
not have copyright assignment, I will be understanding if 1.5.19 is
released with _SC_ARG_MAX still broken in the corner cases.  Just be aware
that xargs will remain at its hardcoded 32k limit unless it can find a way
to query cygwin whether a particular executable can be given a larger limit.

- --
Life is short - so eat dessert first!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDJlES84KuGfSFAYARAu75AJ4r3Zd2U/eFTMzod39mpNn0M8aQigCgySob
xk7QutMPTnN3wh/zUMnSMHM=
=sw7M
-END PGP SIGNATURE-


Re: PING: fix ARG_MAX

2005-09-12 Thread Corinna Vinschen
Eric,

On Sep 10 14:55, Eric Blake wrote:
> Eric Blake  byu.net> writes:
> 
> Just making sure this patch didn't fall through the cracks...
> 
> > 
> > 2005-09-06  Eric Blake   byu.net>
> > 
> > * include/limits.h (ARG_MAX): New limit.
> > * sysconf.cc (sysconf): _SC_ARG_MAX: Use it.
> 
> Even with your recent patches to make cygwin programs receive longer command
> lines, whether or not they are not mounted cygexec, ARG_MAX should still 
> reflect
> the worst case limit so that programs (like xargs) that use ARG_MAX will work
> reliably even when invoking non-cygwin programs that are really bound by the 
> 32k
> limit.

I had a short talk with Chris and we both agree that it doesn't make
overly sense to go down to the lowest limit just to accomodate
non-Cygwin applications.  Users of those apps can easily use xargs -s
so why penalize Cygwin apps?


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat, Inc.