Re: PING: fix ARG_MAX
On Wed, Sep 21, 2005 at 07:24:39AM -0600, Eric Blake wrote: >According to Christopher Faylor on 9/20/2005 10:05 AM: >>AFAICT, we're not talking about defaults. We're talking about the >>optimum setting. >> >>Your change to xargs doesn't permit me to go beyond 32K. Personally, >>I'd like to be able to override that. > >So would I. See below. > >>I have a similar test which shows noticeable improvement when going >>from 32K to 64K and miniscule-but-still-there improvements after that: > >Was this benchmark run on a modified xargs, or did you still suffer >from the 32k limit? It was a modified xargs and a modified cygwin to allow command line lengths > 1M. I would think that the fact that you see noticeable timing differences between 32768 -> 262144 would make that pretty clear that xargs was actually using these. An unmodified xargs would have given errors if I attempted to use a larger limit - hence my request to be allowed to use larger sizes. cgf
Re: PING: fix ARG_MAX
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Christopher Faylor on 9/20/2005 10:05 AM: > AFAICT, we're not talking about defaults. We're talking about the > optimum setting. > > Your change to xargs doesn't permit me to go beyond 32K. Personally, > I'd like to be able to override that. So would I. See below. > > I have a similar test which shows noticeable improvement when going from > 32K to 64K and miniscule-but-still-there improvements after that: Was this benchmark run on a modified xargs, or did you still suffer from the 32k limit? xargs truncates the -s arg down to what it is told is the system limit; use the undocumented xargs --show-limits to prove that you are getting the buffer size you are requesting. > > I am not really interested in providing a non-standard interface which > would ultimately end up being used just by xargs. That would mean that > we're adding an interface to cygwin so that a UNIX program could work > better with non-cygwin programs. I think I've been pretty consistent in > stating that I want to encumber cygwin as little as possible when it > comes to accommodating non-cygwin programs. POSIX allows extensions to sysconf and pathconf for a reason, but I can understand if you are reluctant to add _PC_ARG_MAX. > > If you want to keep the 32K limit, that's ok with me. I'd just ask that > you make it possible to override it. My current findutils release just bypasses the _SC_ARG_MAX check altogether with a hard-coded 32k upper limit to -s, without touching the code that defaults to 128k (since xargs automatically trims its default down to the results of its _SC_ARG_MAX check as needed). But my next release of findutils, after cygwin 1.5.19 is out (where all cygwin processes and not just cygexec mount points get the larger cygwin arg limits), will change the default from 128k to 32k, but use the normal _SC_ARG_MAX as the upper limit of -s. So maybe instead of having _SC_ARG_MAX return 1 meg, you should make it even larger, since cygwin processes really can pass more than 1 meg. > > But, then, I suspect that this wasn't overrideable when I was providing > xargs either so you can feel free to ignore my request. Correct, your earlier releases of xargs could not exceed your hardcoded ARG_MAX limitation either. - -- Life is short - so eat dessert first! Eric Blake [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDMV8X84KuGfSFAYARAjH3AJsFVfYmqzWBcqQyYNYYdwfRQjnykACeMzvB GX41apLMG8QW9NyjslbhRjo= =22kM -END PGP SIGNATURE-
Re: PING: fix ARG_MAX
On Tue, Sep 20, 2005 at 06:43:20AM -0600, Eric Blake wrote: >According to Christopher Faylor on 9/19/2005 8:31 AM: >>If this is really true, then the findutils configury should be >>attempting some kind of timing which finds that magic point where it >>should be ignoring _SC_ARG_MAX. It shouldn't be vaguely assuming that >>it is in its best interests to ignore it because someone thinks that >>the cost of processing each argument outweighs the benefits of forking >>fewer tests. > >POSIX allows xargs to have a default size (currently, xargs defaults to >128k unless otherwise constrained by _SC_ARG_MAX), and that -s can >change that size to anything within the range permitted by _SC_ARG_MAX. AFAICT, we're not talking about defaults. We're talking about the optimum setting. Your change to xargs doesn't permit me to go beyond 32K. Personally, I'd like to be able to override that. >>Given that cost of forking is much more expensive on cygwin than on >>other systems I really don't see how you can use this argument anyway >>and, IMO, it doesn't make much sense on standard UNIX either. If you >>create more processes via fork you are invoking the OS and incurring >>context switches. You're still processing the same number of arguments >>but you're just going to the OS to handle them more often. I don't see >>how that's ever a win. > >In isolation, no. But it is what else you are doing with the arguments >- the text processing of xargs to parse it into chunks, and the invoked >utility's processing of its argv, that also consumes time. Also, lots >of data tends to imply more page faults, which can be as expensive as >context switches anyways. Context switches also imply page faults. >> I'm willing to be proven wrong by hard data but I think that you and the >> findutils mailing list shouldn't be making assumptions without data to >> back them up. > >Did you not read the thread on bug-findutils? Bob Proulx proposed a test >that shows that there is NO MEASURABLE DIFFERENCE between a simple xargs >beyond a certain -s: >http://lists.gnu.org/archive/html/bug-findutils/2005-09/msg00038.html No, I didn't read a thread in another mailing list. Thank you for providing references. >Then I repeated the test on cygwin, and found similar results: >http://lists.gnu.org/archive/html/bug-findutils/2005-09/msg00039.html > >There comes a point, where even when all xargs is doing is invoking echo, >that the cost of passing that much information through pipes does overtake >the cost of forks. I have a similar test which shows noticeable improvement when going from 32K to 64K and miniscule-but-still-there improvements after that: #!/bin/sh export TIMEFORMAT='real %3lR user %3lU sys %3lS' for i in 20480 32768 65536 131072 262144 524288 1048576 2097152 4194304; do time /bin/bash -c "/bin/head -n15 /tmp/files | /bin/xargs -s$i echo >/dev/null" done timing 20480: real 0m12.448s user 0m18.408s sys 0m7.223s timing 32768: real 0m8.448s user 0m12.811s sys 0m4.890s timing 65536: real 0m5.191s user 0m8.472s sys 0m3.085s timing 131072: real 0m4.318s user 0m5.908s sys 0m1.665s timing 262144: real 0m3.833s user 0m4.841s sys 0m1.213s timing 524288: real 0m3.566s user 0m3.900s sys 0m1.078s timing 1048576: real 0m3.478s user 0m3.564s sys 0m0.665s timing 2097152: real 0m3.417s user 0m3.039s sys 0m0.821s timing 4194304: real 0m3.395s user 0m3.370s sys 0m0.823s /tmp/files is the output of 'find /' on my system. I prefer my test because it measures the clock time of the entire operation rather than just the amount of time taken by xargs. YMMV. What I think you can take away from this is that you can't make assumptions about an optimal size that will work for every system. >However, I am also keen on providing a more reasonable -s behavior in >xargs. If cygwin were to have pathconf(filename, _PC_ARG_MAX), where a >PATH search were done when filename does not contain '/', then pathconf >could return 32k on Windows processes, and unlimited (or an actual known >limit) for cygwin processes, so that xargs can then allow unlimited -s >sizes for cygwin processes but cap windows processes at 32k and never >encounter the E2BIG. I am not really interested in providing a non-standard interface which would ultimately end up being used just by xargs. That would mean that we're adding an interface to cygwin so that a UNIX program could work better with non-cygwin programs. I think I've been pretty consistent in stating that I want to encumber cygwin as little as possible when it comes to accommodating non-cygwin programs. If you want to keep the 32K limit, that's ok with me. I'd just ask that you make it possible to override it. But, then, I suspect that this wasn't overrideable when I was providing xargs either so you can feel free to ignore my request. cgf
Re: PING: fix ARG_MAX
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Christopher Faylor on 9/19/2005 8:31 AM: > If this is really true, then the findutils configury should be > attempting some kind of timing which finds that magic point where it > should be ignoring _SC_ARG_MAX. It shouldn't be vaguely assuming that > it is in its best interests to ignore it because someone thinks that the > cost of processing each argument outweighs the benefits of forking fewer > tests. POSIX allows xargs to have a default size (currently, xargs defaults to 128k unless otherwise constrained by _SC_ARG_MAX), and that -s can change that size to anything within the range permitted by _SC_ARG_MAX. > > Given that cost of forking is much more expensive on cygwin than on > other systems I really don't see how you can use this argument anyway > and, IMO, it doesn't make much sense on standard UNIX either. If you > create more processes via fork you are invoking the OS and incurring > context switches. You're still processing the same number of arguments > but you're just going to the OS to handle them more often. I don't see > how that's ever a win. In isolation, no. But it is what else you are doing with the arguments - the text processing of xargs to parse it into chunks, and the invoked utility's processing of its argv, that also consumes time. Also, lots of data tends to imply more page faults, which can be as expensive as context switches anyways. > > I'm willing to be proven wrong by hard data but I think that you and the > findutils mailing list shouldn't be making assumptions without data to > back them up. Did you not read the thread on bug-findutils? Bob Proulx proposed a test that shows that there is NO MEASURABLE DIFFERENCE between a simple xargs beyond a certain -s: http://lists.gnu.org/archive/html/bug-findutils/2005-09/msg00038.html Then I repeated the test on cygwin, and found similar results: http://lists.gnu.org/archive/html/bug-findutils/2005-09/msg00039.html There comes a point, where even when all xargs is doing is invoking echo, that the cost of passing that much information through pipes does overtake the cost of forks. However, I am also keen on providing a more reasonable -s behavior in xargs. If cygwin were to have pathconf(filename, _PC_ARG_MAX), where a PATH search were done when filename does not contain '/', then pathconf could return 32k on Windows processes, and unlimited (or an actual known limit) for cygwin processes, so that xargs can then allow unlimited -s sizes for cygwin processes but cap windows processes at 32k and never encounter the E2BIG. - -- Life is short - so eat dessert first! Eric Blake [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDMAPo84KuGfSFAYARAry+AKCrEPEhqsTIQwWKrLpNA2M1qC/dFACeLz9k aPTSZXTkUZCHUkoDNIiPdxA= =zS83 -END PGP SIGNATURE-
Re: PING: fix ARG_MAX
On Mon, Sep 19, 2005 at 10:31:01AM -0400, Christopher Faylor wrote: >On Mon, Sep 12, 2005 at 10:09:55PM -0600, Eric Blake wrote: >>Also, the argument brought up on the findutils mailing list was that >>beyond a certain size, the cost of processing each argument starts to >>outweigh the benefits of forking fewer tasks, to the point that the >>difference between a 32k ARG_MAX vs. a 1M ARG_MAX falls in the noise >>when the same amount of data is divided by xargs to as few runs as >>possible, so a 32k limit is not really penalizing cygwin apps. > >If this is really true, then the findutils configury should be >attempting some kind of timing which finds that magic point where it >should be ignoring _SC_ARG_MAX. It shouldn't be vaguely assuming that >it is in its best interests to ignore it because someone thinks that the >cost of processing each argument outweighs the benefits of forking fewer >tests. tasks cgf
Re: PING: fix ARG_MAX
On Mon, Sep 12, 2005 at 10:09:55PM -0600, Eric Blake wrote: >Also, the argument brought up on the findutils mailing list was that >beyond a certain size, the cost of processing each argument starts to >outweigh the benefits of forking fewer tasks, to the point that the >difference between a 32k ARG_MAX vs. a 1M ARG_MAX falls in the noise >when the same amount of data is divided by xargs to as few runs as >possible, so a 32k limit is not really penalizing cygwin apps. If this is really true, then the findutils configury should be attempting some kind of timing which finds that magic point where it should be ignoring _SC_ARG_MAX. It shouldn't be vaguely assuming that it is in its best interests to ignore it because someone thinks that the cost of processing each argument outweighs the benefits of forking fewer tests. Given that cost of forking is much more expensive on cygwin than on other systems I really don't see how you can use this argument anyway and, IMO, it doesn't make much sense on standard UNIX either. If you create more processes via fork you are invoking the OS and incurring context switches. You're still processing the same number of arguments but you're just going to the OS to handle them more often. I don't see how that's ever a win. I'm willing to be proven wrong by hard data but I think that you and the findutils mailing list shouldn't be making assumptions without data to back them up. cgf
Re: PING: fix ARG_MAX
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Corinna Vinschen on 9/12/2005 9:22 AM: >>Even with your recent patches to make cygwin programs receive longer command >>lines, whether or not they are not mounted cygexec, ARG_MAX should still >>reflect >>the worst case limit so that programs (like xargs) that use ARG_MAX will work >>reliably even when invoking non-cygwin programs that are really bound by the >>32k >>limit. > > > I had a short talk with Chris and we both agree that it doesn't make > overly sense to go down to the lowest limit just to accomodate > non-Cygwin applications. Users of those apps can easily use xargs -s > so why penalize Cygwin apps? Well, for now, xargs in findutils-4.2.25-2 is already hardcoded to 32k max; attempting to use -s to get a larger value will fail, because of the POSIX rules placed on xargs. If, on the other hand, cygwin added pathconf(_PC_ARG_MAX) as a legal extension to POSIX, then xargs could use its preferred 128k default when calling cygwin apps, while using 32k for windows apps without even requiring users to supply -s; not to mention the fact that -s could then be used to obtain larger command lines than even the default 128k for cygwin apps. With that extension in place, sysconf(_SC_ARG_MAX) at 32k is not much of a limit for applications that know about cygwin's extension. Also, the argument brought up on the findutils mailing list was that beyond a certain size, the cost of processing each argument starts to outweigh the benefits of forking fewer tasks, to the point that the difference between a 32k ARG_MAX vs. a 1M ARG_MAX falls in the noise when the same amount of data is divided by xargs to as few runs as possible, so a 32k limit is not really penalizing cygwin apps. But since I have not provided a patch for pathconf(_PC_ARG_MAX), and I do not have copyright assignment, I will be understanding if 1.5.19 is released with _SC_ARG_MAX still broken in the corner cases. Just be aware that xargs will remain at its hardcoded 32k limit unless it can find a way to query cygwin whether a particular executable can be given a larger limit. - -- Life is short - so eat dessert first! Eric Blake [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDJlES84KuGfSFAYARAu75AJ4r3Zd2U/eFTMzod39mpNn0M8aQigCgySob xk7QutMPTnN3wh/zUMnSMHM= =sw7M -END PGP SIGNATURE-
Re: PING: fix ARG_MAX
Eric, On Sep 10 14:55, Eric Blake wrote: > Eric Blake byu.net> writes: > > Just making sure this patch didn't fall through the cracks... > > > > > 2005-09-06 Eric Blake byu.net> > > > > * include/limits.h (ARG_MAX): New limit. > > * sysconf.cc (sysconf): _SC_ARG_MAX: Use it. > > Even with your recent patches to make cygwin programs receive longer command > lines, whether or not they are not mounted cygexec, ARG_MAX should still > reflect > the worst case limit so that programs (like xargs) that use ARG_MAX will work > reliably even when invoking non-cygwin programs that are really bound by the > 32k > limit. I had a short talk with Chris and we both agree that it doesn't make overly sense to go down to the lowest limit just to accomodate non-Cygwin applications. Users of those apps can easily use xargs -s so why penalize Cygwin apps? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat, Inc.