Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)
On Nov 9 21:14, Eric Blake wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > According to Corinna Vinschen on 11/9/2009 7:05 AM: > > This part of the testcase > > > > data2 = (char *) malloc (2 * pagesize); > > if (!data2) > > return 1; > > data2 += (pagesize - ((long int) data2 & (pagesize - 1))) & (pagesize - > > 1); > > if (data2 != mmap (data2, pagesize, PROT_READ | PROT_WRITE, > >MAP_PRIVATE | MAP_FIXED, fd, 0L)) > > return 1; > > > > is bad. The chance that the address of data2 is not usable for mmap on > > Windows/Cygwin is 100%. > > But in testing this further, I discovered that you CAN do: > > data2 = mmap(...); > munmap (data2,...); > mmap (data2, ... MAP_FIXED) > > and get success on cygwin. Yes, but basically only if you unmap the entire mmaped region. See below. > So I will be updating autoconf accordingly, > based on the STD below. Unfortunately, it looks like I also found a hole > in cygwin. Consider this (borrowing heavily from the autoconf test that I > am fixing): > [...] > This test behaves differently on Linux than on cygwin; on Linux, both > './foo' and './foo 1' give status 0, but on cygwin, './foo' gives status > 6, and only './foo 1' succeeds. In other words, the second mmap fails if > there is no intermediate munmap. > > POSIX apparently allows cygwin's behavior: > > "If MAP_FIXED is set, mmap() may return MAP_FAILED and set errno to > [EINVAL]. If a MAP_FIXED request is successful, the mapping established by > mmap() replaces any previous mappings for the pages in the range > [pa,pa+len) of the process." > > However, since we already have to maintain a list of mappings in order to > implement fork(), it seems like it would be easy to fix cygwin to > implicitly munmap anything that would otherwise be in the way of a > subsequent MAP_FIXED request, rather than blindly calling > NtMapViewOfSection and failing because of the overlap, so that we could be > even more like Linux behavior. That's tricky and bound to fail. The problem is that, in Windows, you can't munmap mmap'ed regions only partially. NtUnmapViewOfSection only allows to unmap an entire section. So, with the bookkeeping in Cygwin you can re-use a partially unmapped region of anonymous memory to map new anonymous memory, but you can't reuse a partially unmapped region to mmap another file at this point in memory, nor even the same file with just another offset. The only way around this problem would be to map files and anonymous memory always in single 64K chunks, so that every page of a map can be actually unmapped on OS level. But in that case the process of allocating memory is not atomic anymore, so we get the other potential problem of not being able to fulfill a request because another thread has called VirtualAlloc one way or the other. > > That's why I think we need at least two tests in autoconf, a generic > > mmap test and a mmap test for the "mmap private/shared fixed at > > somewhere already mapped" case, if an application actually insists on > > using that. > > In the case of the autoconf test, I think a single test is still > sufficient, once it is fixed to be portable to what POSIX requires. One problem is actually grep, which started the entire discussion. It really uses malloc/mmap(MAP_FIXED), along the lines of what the HAVE_MMAP test tests. Fortunately, grep doesn't fail if mmap returns an error, so it doesn't hurt. Of course it would be nice if grep would use mmap in a more portable way. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [please limit replies about the patch itself to autoconf-patches] According to Corinna Vinschen on 11/9/2009 7:05 AM: > This part of the testcase > > data2 = (char *) malloc (2 * pagesize); > if (!data2) > return 1; > data2 += (pagesize - ((long int) data2 & (pagesize - 1))) & (pagesize - 1); > if (data2 != mmap (data2, pagesize, PROT_READ | PROT_WRITE, >MAP_PRIVATE | MAP_FIXED, fd, 0L)) > return 1; > > is bad. The chance that the address of data2 is not usable for mmap on > Windows/Cygwin is 100%. The problem here is that the generic HAVE_MMAP > test tests one certain feature, which is not usable on Windows, and which > is non-portable. MAP_FIXED appears to be more portable when the fixed address was obtained from a previous mmap call. Therefore, this patch fixes the macro as well as making diagnosing configure failures more accurately pinpoint why they are declaring failure. I don't have access to HP-UX 11, which is another platform where AC_FUNC_MMAP was failing; I would appreciate if someone else could see if this makes a difference there. But I have verified that this now sets HAVE_MMAP for cygwin 1.5.x and cygwin 1.7 where the old version failed, and that it does not change behavior on Linux or OpenBSD. - -- Don't work too hard, make some time for fun as well! Eric Blake e...@byu.net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkr484kACgkQ84KuGfSFAYDCIgCbBl/eHS9C9acPwXp5Krk7KAeF zAIAoMBEbnQm5tLpRDkCFWhEXNieL5cf =3fYB -END PGP SIGNATURE- >From fb1f28a2ff2c688e63dc97ece7fde86e16864491 Mon Sep 17 00:00:00 2001 From: Eric Blake Date: Mon, 9 Nov 2009 21:45:00 -0700 Subject: [PATCH] Fix AC_FUNC_MMAP for cygwin. * lib/autoconf/functions.m4 (AC_FUNC_MMAP): Make the test more portable: Actually check for , and only use MAP_FIXED on an address previously returned from mmap. * THANKS: Update. Reported by Corinna Vinschen. Signed-off-by: Eric Blake --- ChangeLog |9 +++ NEWS |3 ++ lib/autoconf/functions.m4 | 55 ++-- 3 files changed, 44 insertions(+), 23 deletions(-) diff --git a/ChangeLog b/ChangeLog index 4d028c0..77e9d4e 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,12 @@ +2009-11-09 Eric Blake + + Fix AC_FUNC_MMAP for cygwin. + * lib/autoconf/functions.m4 (AC_FUNC_MMAP): Make the test more + portable: Actually check for , and only use MAP_FIXED + on an address previously returned from mmap. + * THANKS: Update. + Reported by Corinna Vinschen. + 2009-11-04 Eric Blake Redocument AS_DIRNAME, even with its flaws. diff --git a/NEWS b/NEWS index 9e7e64c..86a0c3f 100644 --- a/NEWS +++ b/NEWS @@ -29,6 +29,9 @@ GNU Autoconf NEWS - User visible changes. longer mistakenly select a 32-bit type on some compilers (bug present since macros were introduced in 2.59c). +** The AC_FUNC_MMAP macro has been fixed to be portable to systems like + Cygwin (bug present since macro was introduced in 2.0). + ** The following documented autotest macros are new: AT_CHECK_EUNIT diff --git a/lib/autoconf/functions.m4 b/lib/autoconf/functions.m4 index 946a646..6b6e7fc 100644 --- a/lib/autoconf/functions.m4 +++ b/lib/autoconf/functions.m4 @@ -1186,9 +1186,9 @@ AU_ALIAS([AM_FUNC_MKTIME], [AC_FUNC_MKTIME]) # AN_FUNCTION([mmap], [AC_FUNC_MMAP]) AC_DEFUN([AC_FUNC_MMAP], -[AC_CHECK_HEADERS(stdlib.h unistd.h) -AC_CHECK_FUNCS(getpagesize) -AC_CACHE_CHECK(for working mmap, ac_cv_func_mmap_fixed_mapped, +[AC_CHECK_HEADERS_ONCE([stdlib.h unistd.h sys/param.h]) +AC_CHECK_FUNCS([getpagesize]) +AC_CACHE_CHECK([for working mmap], [ac_cv_func_mmap_fixed_mapped], [AC_RUN_IFELSE([AC_LANG_SOURCE([AC_INCLUDES_DEFAULT] [[/* malloc might have been renamed as rpl_malloc. */ #undef malloc @@ -1224,11 +1224,6 @@ char *malloc (); /* This mess was copied from the GNU getpagesize.h. */ #ifndef HAVE_GETPAGESIZE -/* Assume that all systems that can run configure have sys/param.h. */ -# ifndef HAVE_SYS_PARAM_H -# define HAVE_SYS_PARAM_H 1 -# endif - # ifdef _SC_PAGESIZE # define getpagesize() sysconf(_SC_PAGESIZE) # else /* no _SC_PAGESIZE */ @@ -1264,7 +1259,7 @@ main () { char *data, *data2, *data3; int i, pagesize; - int fd; + int fd, fd2; pagesize = getpagesize (); @@ -1277,27 +1272,41 @@ main () umask (0); fd = creat ("conftest.mmap", 0600); if (fd < 0) -return 1; +return 2; if (write (fd, data, pagesize) != pagesize) -return 1; +return 3; close (fd); + /* Next, check that the tail of a page is zero-filled. File must have + non-zero length, otherwise we risk SIGBUS for entire page. */ + fd2 = open ("conftest.txt", O_RDWR | O_CREAT | O_TRUNC, 0600);
Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Corinna Vinschen on 11/9/2009 7:05 AM: > This part of the testcase > > data2 = (char *) malloc (2 * pagesize); > if (!data2) > return 1; > data2 += (pagesize - ((long int) data2 & (pagesize - 1))) & (pagesize - 1); > if (data2 != mmap (data2, pagesize, PROT_READ | PROT_WRITE, >MAP_PRIVATE | MAP_FIXED, fd, 0L)) > return 1; > > is bad. The chance that the address of data2 is not usable for mmap on > Windows/Cygwin is 100%. But in testing this further, I discovered that you CAN do: data2 = mmap(...); munmap (data2,...); mmap (data2, ... MAP_FIXED) and get success on cygwin. So I will be updating autoconf accordingly, based on the STD below. Unfortunately, it looks like I also found a hole in cygwin. Consider this (borrowing heavily from the autoconf test that I am fixing): #include #include #include #include #include #include #include #include int main (int argc, char **argv) { char *data, *data2, *data3; int i, pagesize; int fd, fd2; pagesize = getpagesize (); /* First, make a file with some known garbage in it. */ data = (char *) malloc (pagesize); if (!data) return 1; for (i = 0; i < pagesize; ++i) *(data + i) = rand (); umask (0); fd = creat ("conftest.mmap", 0600); if (fd < 0) return 2; if (write (fd, data, pagesize) != pagesize) return 3; close (fd); /* Next, check that a page is zero-filled if not backed by a file. */ fd2 = open ("conftest.txt", O_RDWR | O_CREAT | O_TRUNC, 0600); if (fd2 < 0) return 11; data2 = ""; if (write (fd2, data2, 1) != 1) return 12; else /* We expect mmap to succeed, but reads to give SIGBUS, since mapped region is an entire page beyond bounds of mapped file. */ ; data2 = mmap (0, pagesize, PROT_READ | PROT_WRITE, MAP_SHARED, fd2, 0L); if (data2 == MAP_FAILED) return 14; printf ("mapped %p\n", data2); for (i = 0; i < pagesize; ++i) if (*(data2 + i)) { printf ("%p, %x\n", data2 + i, *(data2 + i)); return 15; } close (fd2); if (argc > 1) munmap (data2, pagesize); /* Next, try to mmap the file at a fixed address which already has something else allocated at it. If we can, also make sure that we see the same garbage. */ fd = open ("conftest.mmap", O_RDWR); if (fd < 0) return 4; if (data2 != mmap (data2, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED, fd, 0L)) return 6; for (i = 0; i < pagesize; ++i) if (*(data + i) != *(data2 + i)) { printf ("%p, exp %x, got %x\n", data2 + i, *(data + i), *(data2 + i)); return 7; } /* Finally, make sure that changes to the mapped area do not percolate back to the file as seen by read(). (This is a bug on some variants of i386 svr4.0.) */ for (i = 0; i < pagesize; ++i) *(data2 + i) = *(data2 + i) + 1; data3 = (char *) malloc (pagesize); if (!data3) return 8; if (read (fd, data3, pagesize) != pagesize) return 9; for (i = 0; i < pagesize; ++i) if (*(data + i) != *(data3 + i)) return 10; close (fd); return 0; } This test behaves differently on Linux than on cygwin; on Linux, both './foo' and './foo 1' give status 0, but on cygwin, './foo' gives status 6, and only './foo 1' succeeds. In other words, the second mmap fails if there is no intermediate munmap. POSIX apparently allows cygwin's behavior: "If MAP_FIXED is set, mmap() may return MAP_FAILED and set errno to [EINVAL]. If a MAP_FIXED request is successful, the mapping established by mmap() replaces any previous mappings for the pages in the range [pa,pa+len) of the process." However, since we already have to maintain a list of mappings in order to implement fork(), it seems like it would be easy to fix cygwin to implicitly munmap anything that would otherwise be in the way of a subsequent MAP_FIXED request, rather than blindly calling NtMapViewOfSection and failing because of the overlap, so that we could be even more like Linux behavior. > That's why I think we need at least two tests in autoconf, a generic > mmap test and a mmap test for the "mmap private/shared fixed at > somewhere already mapped" case, if an application actually insists on > using that. In the case of the autoconf test, I think a single test is still sufficient, once it is fixed to be portable to what POSIX requires. gnulib provides a more interesting test, for whether MMAP_ANON works. http://git.savannah.gnu.org/cgit/gnulib.git/tree/m4/mmap-anon.m4 - -- Don't work too hard, make some time for fun as well! Eric Blake e...@byu.net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkr46LMACgkQ84KuGfSFAYCBrwCgsu2/rWozZs/1R33RaAlUwHow aLQAoNVjQ8P9it7nkDv8u2RRF4l0uDur =D/jK
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
On Nov 9 10:22, aputerguy wrote: > My only remaining question is can we assume that this bug (or bad coding) is > grep-specific or is it likely to rear its head in other core *nix utilities > that use UTF-8? Who knows? Nobody is immune against creating bad code, right? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
Corinna writes: > I'm glad to read that, but I only debugged the problem. The Fedora > fix was applied by Chris. Well it works for me too and as the OP of the problem, I extend my thanks to both of you and all the others who helped in debugging and coming up with such a quick fix. My only remaining question is can we assume that this bug (or bad coding) is grep-specific or is it likely to rear its head in other core *nix utilities that use UTF-8? -- View this message in context: http://old.nabble.com/1.7--BUG---GREP-slows-to-a-crawl-with-large-number-of-matches-on-a-single-file-tp26224019p26271227.html Sent from the Cygwin list mailing list archive at Nabble.com. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)
On Nov 9 05:50, Eric Blake wrote: > According to Corinna Vinschen on 11/9/2009 4:59 AM: > > MAP_FIXED > > [...] > > If the specified address cannot be used, mmap() will fail. Because > > requiring a fixed address for a mapping is less portable, the use of > > this option is discouraged. > > It's an upstream issue now ;) > > The problem is that I need some more advice from the cygwin list on how > best to fix the test to pass on cygwin by default. I'm hoping to release > autoconf 2.65 this week, so a speedy fix to help this issue go away before > the release would be extra nice. This part of the testcase data2 = (char *) malloc (2 * pagesize); if (!data2) return 1; data2 += (pagesize - ((long int) data2 & (pagesize - 1))) & (pagesize - 1); if (data2 != mmap (data2, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED, fd, 0L)) return 1; is bad. The chance that the address of data2 is not usable for mmap on Windows/Cygwin is 100%. The problem here is that the generic HAVE_MMAP test tests one certain feature, which is not usable on Windows, and which is non-portable. So, on Cygwin this test always fails and all applications using this test in good faith will never use mmap on Cygwin, just because the single case of "mmap private fixed at somewhere already mapped" doesn't work. In fact, most applications don't need this case. And grep wouldn't need it either, since the method used in grep would also work if the area hadn't been malloced before, if it would just use the address returned by mmap as buffer. That's why I think we need at least two tests in autoconf, a generic mmap test and a mmap test for the "mmap private/shared fixed at somewhere already mapped" case, if an application actually insists on using that. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Corinna Vinschen on 11/9/2009 4:59 AM: >>> I just found that the latest autoconf *still* has this broken test >>> for mmap, which basically calls >>> >>> data2 = malloc (size); >>> mmap(data2, ...); >>> >>> Why has this test never been fixed? Chuck? >> ...err, 'cause I didn't realize it was a problem. I see that cygport has >> hidden this for years: >> >> # AC_HAVE_MMAP fails despite a working mmap, so we force this to yes >> # (see http://www.cygwin.com/ml/cygwin/2004-09/msg00741.html >> # and following thread for details) >> export ac_cv_func_mmap_fixed_mapped=yes; >> >> NTTAWWT, but it never triggered my "gee I ought to fix that" reflex. I >> agree this should be fixed, but I'm leery of changing an autoconf test >> without knowing how that change will affect the other 9,236 platforms > > The problem in this testcase is the fact that it calls malloc, then > computes the next page-aligned free address after the mallocated area > and then tries to mmap to this address with MAP_FIXED set. Sure, this > *might* work, and it works on most systems, but there's no reason at all > to *expect* that it works since it only works by chance. The memory > addresses can be taken by anything and to require that an arbitrary > fixed address is available to mmap is just plain wrong. From the > Linux man page: > > MAP_FIXED > [...] > If the specified address cannot be used, mmap() will fail. Because > requiring a fixed address for a mapping is less portable, the use of > this option is discouraged. > > Since autoconf is supposed to help applications to be more portable, > it's not really feasible, IMHO, that autoconf requires a non-portable > feature to work. > > It's frustrating that mmap() and even mmap(MAP_FIXED) > works fine on Cygwin, just not in the non-portable way it's tested > in the autoconf test. Maybe we need two mmap tests in autconf, one > for mmap in general, the other for MAP_FIXED iisues. > >> I think this is an issue for the autoconf list as a whole. Would you -- >> or Eric -- care to raise it there? Especially as you seemed to have >> quite strong feelings about it back in 2004: >> http://www.cygwin.com/ml/cygwin/2004-09/msg00753.html > > I had hoped that you, as the autoconf maintainer, would put this > upstream... It's an upstream issue now ;) The problem is that I need some more advice from the cygwin list on how best to fix the test to pass on cygwin by default. I'm hoping to release autoconf 2.65 this week, so a speedy fix to help this issue go away before the release would be extra nice. - -- Don't work too hard, make some time for fun as well! Eric Blake e...@byu.net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkr4D/sACgkQ84KuGfSFAYCOjwCghVcvxtUrAPxqB7w+/6gaT+Y/ H0EAoIUsDfqQ42NzKa8olQtBdhkvVS1f =36fe -END PGP SIGNATURE- -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)
On Nov 8 14:07, Charles Wilson wrote: > Corinna Vinschen wrote: > > On Nov 8 14:56, Corinna Vinschen wrote: > >> Btw., the check for mmap in grep's configure file is broken. It tries > >> to mmap to a fixed address formerly allocated via malloc(). This doesn't > >> work on Windows. An autoconf run with a newer version of autoconf would > >> be nice. > > > > I just found that the latest autoconf *still* has this broken test > > for mmap, which basically calls > > > > data2 = malloc (size); > > mmap(data2, ...); > > > > Why has this test never been fixed? Chuck? > > ...err, 'cause I didn't realize it was a problem. I see that cygport has > hidden this for years: > > # AC_HAVE_MMAP fails despite a working mmap, so we force this to yes > # (see http://www.cygwin.com/ml/cygwin/2004-09/msg00741.html > # and following thread for details) > export ac_cv_func_mmap_fixed_mapped=yes; > > NTTAWWT, but it never triggered my "gee I ought to fix that" reflex. I > agree this should be fixed, but I'm leery of changing an autoconf test > without knowing how that change will affect the other 9,236 platforms The problem in this testcase is the fact that it calls malloc, then computes the next page-aligned free address after the mallocated area and then tries to mmap to this address with MAP_FIXED set. Sure, this *might* work, and it works on most systems, but there's no reason at all to *expect* that it works since it only works by chance. The memory addresses can be taken by anything and to require that an arbitrary fixed address is available to mmap is just plain wrong. From the Linux man page: MAP_FIXED [...] If the specified address cannot be used, mmap() will fail. Because requiring a fixed address for a mapping is less portable, the use of this option is discouraged. Since autoconf is supposed to help applications to be more portable, it's not really feasible, IMHO, that autoconf requires a non-portable feature to work. It's frustrating that mmap() and even mmap(MAP_FIXED) works fine on Cygwin, just not in the non-portable way it's tested in the autoconf test. Maybe we need two mmap tests in autconf, one for mmap in general, the other for MAP_FIXED iisues. > I think this is an issue for the autoconf list as a whole. Would you -- > or Eric -- care to raise it there? Especially as you seemed to have > quite strong feelings about it back in 2004: > http://www.cygwin.com/ml/cygwin/2004-09/msg00753.html I had hoped that you, as the autoconf maintainer, would put this upstream... Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
On Nov 8 18:41, Jim Reisert AD1C wrote: > Corinna, the new grep works super great - thanks! I'm glad to read that, but I only debugged the problem. The Fedora fix was applied by Chris. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
Corinna, the new grep works super great - thanks! -- Jim Reisert AD1C, , http://www.ad1c.us -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)
Corinna Vinschen wrote: > On Nov 8 14:56, Corinna Vinschen wrote: >> Btw., the check for mmap in grep's configure file is broken. It tries >> to mmap to a fixed address formerly allocated via malloc(). This doesn't >> work on Windows. An autoconf run with a newer version of autoconf would >> be nice. > > I just found that the latest autoconf *still* has this broken test > for mmap, which basically calls > > data2 = malloc (size); > mmap(data2, ...); > > Why has this test never been fixed? Chuck? ...err, 'cause I didn't realize it was a problem. I see that cygport has hidden this for years: # AC_HAVE_MMAP fails despite a working mmap, so we force this to yes # (see http://www.cygwin.com/ml/cygwin/2004-09/msg00741.html # and following thread for details) export ac_cv_func_mmap_fixed_mapped=yes; NTTAWWT, but it never triggered my "gee I ought to fix that" reflex. I agree this should be fixed, but I'm leery of changing an autoconf test without knowing how that change will affect the other 9,236 platforms that may depend on the current behavior, esp. given my current (lack of) knowledge about how mmap is *supposed* to work in the various MAP_* modes. I think this is an issue for the autoconf list as a whole. Would you -- or Eric -- care to raise it there? Especially as you seemed to have quite strong feelings about it back in 2004: http://www.cygwin.com/ml/cygwin/2004-09/msg00753.html > The mmap test is crap. How can an application expect to be able to > access just about every address together with MAP_FIXED? > > Consequentially MapViewOfFileEx returns error 487 in these cases, > "Attempt to access invalid address." > > That's just another example of a crappy autoconf mmap test. -- Chuck -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
On Sun, Nov 08, 2009 at 12:27:29PM -0600, Yaakov (Cygwin/X) wrote: >On 08/11/2009 07:56, Corinna Vinschen wrote: >> Btw., the check for mmap in grep's configure file is broken. It tries >> to mmap to a fixed address formerly allocated via malloc(). This doesn't >> work on Windows. An autoconf run with a newer version of autoconf would >> be nice. > >You said the same thing over five years ago: :-) > >http://www.cygwin.com/ml/cygwin/2004-09/msg00753.html > >AFAIK the autoconf test has not changed since then. cygport sets >ac_cv_func_mmap_fixed_mapped=yes for this very reason. FWIW, this test doesn't really matter to my build of grep since it is not used when cross-compiling. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
On 08/11/2009 07:56, Corinna Vinschen wrote: Btw., the check for mmap in grep's configure file is broken. It tries to mmap to a fixed address formerly allocated via malloc(). This doesn't work on Windows. An autoconf run with a newer version of autoconf would be nice. You said the same thing over five years ago: :-) http://www.cygwin.com/ml/cygwin/2004-09/msg00753.html AFAIK the autoconf test has not changed since then. cygport sets ac_cv_func_mmap_fixed_mapped=yes for this very reason. Yaakov -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)
On Sun, Nov 08, 2009 at 10:51:56AM -0500, Ralph Hempel wrote: >Corinna Vinschen wrote: >> On Nov 8 14:56, Corinna Vinschen wrote: >>> Btw., the check for mmap in grep's configure file is broken. It tries >>> to mmap to a fixed address formerly allocated via malloc(). This doesn't >>> work on Windows. An autoconf run with a newer version of autoconf would >>> be nice. >> >> I just found that the latest autoconf *still* has this broken test >> for mmap, which basically calls >> >> data2 = malloc (size); >> mmap(data2, ...); >> >> Why has this test never been fixed? Chuck? > >I can't answer that question but this thread points out very important >lessons in debugging specifically and projects in general. > >1. Easily reproducible test cases are critical to getting somone >interested in fixing your problem. > >2. Having the good fortune to have somebody run the test case and >duplicate the problem helps a bit more. > >3. Having that person challenge the assumptions under which the code >has been working for YEARS without a complaint helps a bit more. > >4. Having that person do a great analysis that shows why the problem >exists helps even more. > >5. Going even one step further and trying to figure out why the >problem has existed for years and what else might be wrong is >just the icing on the cake. > >Bravo Corinna - on a Sunday no less... 6. googling for the problem is always a good thing to do. Once it was clear that this was a character set issue in grep it was easy enough to find a fix since it was already in a couple of linux bug trackers. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)
Corinna Vinschen wrote: On Nov 8 14:56, Corinna Vinschen wrote: Btw., the check for mmap in grep's configure file is broken. It tries to mmap to a fixed address formerly allocated via malloc(). This doesn't work on Windows. An autoconf run with a newer version of autoconf would be nice. I just found that the latest autoconf *still* has this broken test for mmap, which basically calls data2 = malloc (size); mmap(data2, ...); Why has this test never been fixed? Chuck? I can't answer that question but this thread points out very important lessons in debugging specifically and projects in general. 1. Easily reproducible test cases are critical to getting somone interested in fixing your problem. 2. Having the good fortune to have somebody run the test case and duplicate the problem helps a bit more. 3. Having that person challenge the assumptions under which the code has been working for YEARS without a complaint helps a bit more. 4. Having that person do a great analysis that shows why the problem exists helps even more. 5. Going even one step further and trying to figure out why the problem has existed for years and what else might be wrong is just the icing on the cake. Bravo Corinna - on a Sunday no less... Cheers, Ralph -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)
On Nov 8 14:56, Corinna Vinschen wrote: > Btw., the check for mmap in grep's configure file is broken. It tries > to mmap to a fixed address formerly allocated via malloc(). This doesn't > work on Windows. An autoconf run with a newer version of autoconf would > be nice. I just found that the latest autoconf *still* has this broken test for mmap, which basically calls data2 = malloc (size); mmap(data2, ...); Why has this test never been fixed? Chuck? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
On Nov 8 11:30, Corinna Vinschen wrote: > On Nov 7 15:26, aputerguy wrote: > > > > Changing LC_ALL also solved the problem for me. > > But it begs the question of how many other basic and take-for-granted > > functions might be affected by this apparent UTF-8 slowdown. And again we, > > are not talking about some minor overhead, we are talking about a slowdown > > of 1500X or 150,000% > > Yeah, that's really still strange to me. In my testing, the multibyte > to widechar conversion performed by grep in case of UTF-8 took only > 1.5 up to 4 seconds for 10 times the number of input lines as in your > case. It still puzzles me where the time is wasted in grep. Got it. The problem is this. Grep reads the file in chunks > pagesize. Pagesize is 64K on Cygwin. For each buffer read into memory, the grepbuf() function calls the execute() function as long as it returns a match. The execute() function calls check_multibyte_string() for the entire buffer(!), then calls kwsexec() to find a match. If a match has been found, it free's the memory allocated by check_multibyte_string() and returns to grepbuf. Then grepbuf() will call execute again with the pointers into the buffer moved to the next line. Let's make an example. Assume the buffer is 100K, which is not unusual when running under Cygwin. Assume further that the file consists of 100,000 lines with the text "The quick brown fox jumped over the lazy dog". Each line is 45 bytes, so the buffer contains somwhat more than 2200 lines. Now let's search for the expression "dog". The first call to execute will call check_multibyte_string() for the entire buffer of 10 bytes. Then it finds a match in the first line, free's the check_multibyte_string() memory and returns to grepbuf. grepbuf calls execute with the start pointer moved to the second line in the buffer, so execute() calls check_multibyte_string() for the remainder of the buffer, which is 99955 bytes. It find a match in the first line, free's the check_multibyte_string buffer, returns to grepbuf, which calls execute, which calls check_multibyte_string() with a buffer of 99910 bytes, and so on... Every invocation of check_multibyte_string() calls mbrtowc() in a loop for the entire buffer given as argument. In our example, that means that mbrtowc() is called (hold your breath) 111,161,115 times for each 100K of input file! No wonder grep takes 3 or 4 minutes to grep this very example on Cygwin. I really think there's some room for optimization left in this algorithm. Btw., the check for mmap in grep's configure file is broken. It tries to mmap to a fixed address formerly allocated via malloc(). This doesn't work on Windows. An autoconf run with a newer version of autoconf would be nice. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
On Nov 7 15:26, aputerguy wrote: > > Changing LC_ALL also solved the problem for me. > But it begs the question of how many other basic and take-for-granted > functions might be affected by this apparent UTF-8 slowdown. And again we, > are not talking about some minor overhead, we are talking about a slowdown > of 1500X or 150,000% Yeah, that's really still strange to me. In my testing, the multibyte to widechar conversion performed by grep in case of UTF-8 took only 1.5 up to 4 seconds for 10 times the number of input lines as in your case. It still puzzles me where the time is wasted in grep. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
Changing LC_ALL also solved the problem for me. But it begs the question of how many other basic and take-for-granted functions might be affected by this apparent UTF-8 slowdown. And again we, are not talking about some minor overhead, we are talking about a slowdown of 1500X or 150,000% As a North American English speaker, UTF-8 is not that important to me and certainly not worth such a heavy overhead price. Also, while I don't have 'pcgrep' installed on my machine, it is interesting that 'sed' is not affected. -- View this message in context: http://old.nabble.com/1.7--BUG---GREP-slows-to-a-crawl-with-large-number-of-matches-on-a-single-file-tp26224019p26249599.html Sent from the Cygwin list mailing list archive at Nabble.com. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
Jim Reisert wrote: >On Fri, Nov 6, 2009 at 7:12 AM, Cooper, Karl (US SSA) > wrote: > >> Corinna Vinschen wrote: >>> Or try LANG=C.ASCII since LANG=C will still return UTF-8 as charset >>> when calling nl_langinfo(CHARSET). >> >> Yes, this solves it: >> >> $ time LC_ALL=C.ASCII grep dog testfile | wc >> 10 90 450 >> >> real0m0.359s >> user0m0.279s >> sys 0m0.232s > > > I just tried this on my system, I routinely grep groups of files > containing 100K lines. I was *astounded* how fast "grep" is after > setting LC_ALL=C.ASCII ! The second run of grep is usually much faster due to disk buffering. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
OK. Here is a simple test case: X=10 while [ $X -gt 0 ] ; do echo "The quick brown fox jumped over the lazy dog" ; let X=X-1; done > testfile time grep dog testfile | wc Cygwin 1.5: real0m0.219s user0m0.232s sys 0m0.045s Cygwin 1.7: real7m46.575s user7m14.138s sys 0m0.076s While using sed on Cygwin 1.5, I get the reasonable result: time sed -ne /dog/p testfile | wc real0m1.229s user0m1.202s sys 0m0.046s -- View this message in context: http://old.nabble.com/1.7--BUG---GREP-slows-to-a-crawl-with-large-number-of-matches-on-a-single-file-tp26224019p26226567.html Sent from the Cygwin list mailing list archive at Nabble.com. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
On Thu, Nov 05, 2009 at 07:11:02PM -0800, Linda Walsh wrote: >aputerguy wrote: >> Running grep on a 20MB file with ~100,000 matches takes an incredible almost >> 8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5 >> (on a 2nd machine). > >I've seen nasty behavior with grep that isnt' cygwin specific. Try >"pcregrep" and see if you have the same issue. > >I found it to be about ~100 times faster under _some_ searches though >2-3x is more typical. The gnu re-parser isn't real efficient under >some circumstances. > >If you find a big difference, you might also want to report it to the >bug-g...@gnu.org mailing list, but last time I did, they told me >"that's the way it is" due to some posix conformance thing... The fact that it behaves differently between Cygwin 1.5 and 1.7 would suggest that this isn't a grep problem. That's why I asked for a test case. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
aputerguy wrote: Running grep on a 20MB file with ~100,000 matches takes an incredible almost 8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5 (on a 2nd machine). --- I've seen nasty behavior with grep that isnt' cygwin specific. Try "pcregrep" and see if you have the same issue. I found it to be about ~100 times faster under _some_ searches though 2-3x is more typical. The gnu re-parser isn't real efficient under some circumstances. If you find a big difference, you might also want to report it to the bug-g...@gnu.org mailing list, but last time I did, they told me "that's the way it is" due to some posix conformance thing... -l -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
On Thu, Nov 05, 2009 at 03:27:07PM -0800, aputerguy wrote: > >Running grep on a 20MB file with ~100,000 matches takes an incredible almost >8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5 >(on a 2nd machine). > >The following cases show how grep under 1.7 grinds to a halt as the number >of matches increases. > >The data 'testfile' is a plain text file of the acl's of all the 108,000 >files on my Windoze computer. > >Note since the machines are different, compare relative times across cases >rather than the times between the two machines. We'll need an actual test case if you want us to track it down. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
aputerguy wrote: > The data 'testfile' is a plain text file of the acl's of all the 108,000 > files on my Windoze computer. So, the "find | xargs" trick worked then did it? :-) cheers, DaveK -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
1.7] BUG - GREP slows to a crawl with large number of matches on a single file
Running grep on a 20MB file with ~100,000 matches takes an incredible almost 8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5 (on a 2nd machine). The following cases show how grep under 1.7 grinds to a halt as the number of matches increases. The data 'testfile' is a plain text file of the acl's of all the 108,000 files on my Windoze computer. Note since the machines are different, compare relative times across cases rather than the times between the two machines. Case 1] Zero matches time grep "sfsdfdsfds" testfile | wc 0 0 0 Cygwin 1.5 real0m0.093s user0m0.092s sys 0m0.030s Cygwin 1.7 real0m1.353s user0m1.342s sys 0m0.062s Case 2] One match time grep ".lesshst" testfile | wc 1 3 29 Cygwin 1.5 (~same as zero matches) real0m0.234s user0m0.091s sys 0m0.061s Cygwin 1.7 (~same as zero matches) real0m1.499s user0m1.404s sys 0m0.046s Case 3] ~1400 matches Cygwin 1.5 (~ same as zero matches) time grep ".bin" testfile | wc 14395661 71067 real0m0.110s user0m0.076s sys 0m0.077s Cygwin 1.7 (~6x zero matches case real0m7.537s user0m7.341s sys 0m0.045s Case 4] ~16000 matches time grep "Documents and Settings" testfile | wc 15824 131573 1918500 Cygwin 1.5 (~same as zero matches) real0m0.437s user0m0.092s sys 0m0.092s Cygwin 1.7 (~50x zero matches) real1m14.491s user1m8.904s sys 0m0.031s Case 5] ~100,000 matches time grep "# file" testfile | wc 106988 510944 7930558 Cygwin 1.5 (~1.5x zero matches) real0m0.475s user0m0.154s sys 0m0.201s Cygwin 1.7 (~350x zero matches) real7m51.771s user7m16.810s sys 0m0.062s Case 6] Test that nothing wrong with file system reads or 'wc' time cat testfile | wc 966300 1821815 20426592 Cygwin 1.5 (approx same time as grepping zero matches) real0m0.344s user0m0.201s sys 0m0.186s Cygwin 1.7 (approx same time as grepping zero matches) real0m1.662s user0m1.373s sys 0m0.138s -- View this message in context: http://old.nabble.com/1.7--BUG---GREP-slows-to-a-crawl-with-large-number-of-matches-on-a-single-file-tp26224019p26224019.html Sent from the Cygwin list mailing list archive at Nabble.com. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple