Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)

2009-11-10 Thread Corinna Vinschen
On Nov  9 21:14, Eric Blake wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 According to Corinna Vinschen on 11/9/2009 7:05 AM:
  This part of the testcase
  
data2 = (char *) malloc (2 * pagesize);
if (!data2)
  return 1;
data2 += (pagesize - ((long int) data2  (pagesize - 1)))  (pagesize - 
  1);
if (data2 != mmap (data2, pagesize, PROT_READ | PROT_WRITE,
 MAP_PRIVATE | MAP_FIXED, fd, 0L))
  return 1;
  
  is bad.  The chance that the address of data2 is not usable for mmap on
  Windows/Cygwin is 100%.
 
 But in testing this further, I discovered that you CAN do:
 
 data2 = mmap(...);
 munmap (data2,...);
 mmap (data2, ... MAP_FIXED)
 
 and get success on cygwin.

Yes, but basically only if you unmap the entire mmaped region.  See
below.

   So I will be updating autoconf accordingly,
 based on the STD below.  Unfortunately, it looks like I also found a hole
 in cygwin.  Consider this (borrowing heavily from the autoconf test that I
 am fixing):
 [...]
 This test behaves differently on Linux than on cygwin; on Linux, both
 './foo' and './foo 1' give status 0, but on cygwin, './foo' gives status
 6, and only './foo 1' succeeds.  In other words, the second mmap fails if
 there is no intermediate munmap.
 
 POSIX apparently allows cygwin's behavior:
 
 If MAP_FIXED is set, mmap() may return MAP_FAILED and set errno to
 [EINVAL]. If a MAP_FIXED request is successful, the mapping established by
 mmap() replaces any previous mappings for the pages in the range
 [pa,pa+len) of the process.
 
 However, since we already have to maintain a list of mappings in order to
 implement fork(), it seems like it would be easy to fix cygwin to
 implicitly munmap anything that would otherwise be in the way of a
 subsequent MAP_FIXED request, rather than blindly calling
 NtMapViewOfSection and failing because of the overlap, so that we could be
 even more like Linux behavior.

That's tricky and bound to fail.  The problem is that, in Windows,
you can't munmap mmap'ed regions only partially.  NtUnmapViewOfSection
only allows to unmap an entire section.  So, with the bookkeeping in
Cygwin you can re-use a partially unmapped region of anonymous
memory to map new anonymous memory, but you can't reuse a partially
unmapped region to mmap another file at this point in memory, nor
even the same file with just another offset.

The only way around this problem would be to map files and anonymous
memory always in single 64K chunks, so that every page of a map can be
actually unmapped on OS level.  But in that case the process of allocating
memory is not atomic anymore, so we get the other potential problem of
not being able to fulfill a request because another thread has called
VirtualAlloc one way or the other.

  That's why I think we need at least two tests in autoconf, a generic
  mmap test and a mmap test for the mmap private/shared fixed at
  somewhere already mapped case, if an application actually insists on
  using that.
 
 In the case of the autoconf test, I think a single test is still
 sufficient, once it is fixed to be portable to what POSIX requires.

One problem is actually grep, which started the entire discussion.  It
really uses malloc/mmap(MAP_FIXED), along the lines of what the HAVE_MMAP
test tests.  Fortunately, grep doesn't fail if mmap returns an error, so
it doesn't hurt.  Of course it would be nice if grep would use mmap in
a more portable way.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-09 Thread Corinna Vinschen
On Nov  8 18:41, Jim Reisert AD1C wrote:
 Corinna, the new grep works super great - thanks!

I'm glad to read that, but I only debugged the problem.  The Fedora
fix was applied by Chris.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)

2009-11-09 Thread Corinna Vinschen
On Nov  8 14:07, Charles Wilson wrote:
 Corinna Vinschen wrote:
  On Nov  8 14:56, Corinna Vinschen wrote:
  Btw., the check for mmap in grep's configure file is broken.  It tries
  to mmap to a fixed address formerly allocated via malloc().  This doesn't
  work on Windows.  An autoconf run with a newer version of autoconf would
  be nice.
  
  I just found that the latest autoconf *still* has this broken test
  for mmap, which basically calls
  
data2 = malloc (size);
mmap(data2, ...);
  
  Why has this test never been fixed?  Chuck?
 
 ...err, 'cause I didn't realize it was a problem. I see that cygport has
 hidden this for years:
 
 # AC_HAVE_MMAP fails despite a working mmap, so we force this to yes
 # (see http://www.cygwin.com/ml/cygwin/2004-09/msg00741.html
 # and following thread for details)
 export ac_cv_func_mmap_fixed_mapped=yes;
 
 NTTAWWT, but it never triggered my gee I ought to fix that reflex. I
 agree this should be fixed, but I'm leery of changing an autoconf test
 without knowing how that change will affect the other 9,236 platforms

The problem in this testcase is the fact that it calls malloc, then
computes the next page-aligned free address after the mallocated area
and then tries to mmap to this address with MAP_FIXED set.  Sure, this
*might* work, and it works on most systems, but there's no reason at all
to *expect* that it works since it only works by chance.  The memory
addresses can be taken by anything and to require that an arbitrary
fixed address is available to mmap is just plain wrong.  From the
Linux man page:

MAP_FIXED
  [...]
  If the specified address cannot be used, mmap() will fail.  Because
  requiring a fixed address for a mapping is less portable, the use of
  this option is discouraged.

Since autoconf is supposed to help applications to be more portable,
it's not really feasible, IMHO, that autoconf requires a non-portable
feature to work.

It's frustrating that mmap() and even mmap(MAP_FIXED)
works fine on Cygwin, just not in the non-portable way it's tested
in the autoconf test.  Maybe we need two mmap tests in autconf, one
for mmap in general, the other for MAP_FIXED iisues.

 I think this is an issue for the autoconf list as a whole.  Would you --
 or Eric -- care to raise it there?  Especially as you seemed to have
 quite strong feelings about it back in 2004:
 http://www.cygwin.com/ml/cygwin/2004-09/msg00753.html

I had hoped that you, as the autoconf maintainer, would put this
upstream...


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)

2009-11-09 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Corinna Vinschen on 11/9/2009 4:59 AM:
 I just found that the latest autoconf *still* has this broken test
 for mmap, which basically calls

   data2 = malloc (size);
   mmap(data2, ...);

 Why has this test never been fixed?  Chuck?
 ...err, 'cause I didn't realize it was a problem. I see that cygport has
 hidden this for years:

 # AC_HAVE_MMAP fails despite a working mmap, so we force this to yes
 # (see http://www.cygwin.com/ml/cygwin/2004-09/msg00741.html
 # and following thread for details)
 export ac_cv_func_mmap_fixed_mapped=yes;

 NTTAWWT, but it never triggered my gee I ought to fix that reflex. I
 agree this should be fixed, but I'm leery of changing an autoconf test
 without knowing how that change will affect the other 9,236 platforms
 
 The problem in this testcase is the fact that it calls malloc, then
 computes the next page-aligned free address after the mallocated area
 and then tries to mmap to this address with MAP_FIXED set.  Sure, this
 *might* work, and it works on most systems, but there's no reason at all
 to *expect* that it works since it only works by chance.  The memory
 addresses can be taken by anything and to require that an arbitrary
 fixed address is available to mmap is just plain wrong.  From the
 Linux man page:
 
 MAP_FIXED
   [...]
   If the specified address cannot be used, mmap() will fail.  Because
   requiring a fixed address for a mapping is less portable, the use of
   this option is discouraged.
 
 Since autoconf is supposed to help applications to be more portable,
 it's not really feasible, IMHO, that autoconf requires a non-portable
 feature to work.
 
 It's frustrating that mmap() and even mmap(MAP_FIXED)
 works fine on Cygwin, just not in the non-portable way it's tested
 in the autoconf test.  Maybe we need two mmap tests in autconf, one
 for mmap in general, the other for MAP_FIXED iisues.
 
 I think this is an issue for the autoconf list as a whole.  Would you --
 or Eric -- care to raise it there?  Especially as you seemed to have
 quite strong feelings about it back in 2004:
 http://www.cygwin.com/ml/cygwin/2004-09/msg00753.html
 
 I had hoped that you, as the autoconf maintainer, would put this
 upstream...

It's an upstream issue now ;)

The problem is that I need some more advice from the cygwin list on how
best to fix the test to pass on cygwin by default.  I'm hoping to release
autoconf 2.65 this week, so a speedy fix to help this issue go away before
the release would be extra nice.

- --
Don't work too hard, make some time for fun as well!

Eric Blake e...@byu.net
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkr4D/sACgkQ84KuGfSFAYCOjwCghVcvxtUrAPxqB7w+/6gaT+Y/
H0EAoIUsDfqQ42NzKa8olQtBdhkvVS1f
=36fe
-END PGP SIGNATURE-

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)

2009-11-09 Thread Corinna Vinschen
On Nov  9 05:50, Eric Blake wrote:
 According to Corinna Vinschen on 11/9/2009 4:59 AM:
  MAP_FIXED
[...]
If the specified address cannot be used, mmap() will fail.  Because
requiring a fixed address for a mapping is less portable, the use of
this option is discouraged.
 
 It's an upstream issue now ;)
 
 The problem is that I need some more advice from the cygwin list on how
 best to fix the test to pass on cygwin by default.  I'm hoping to release
 autoconf 2.65 this week, so a speedy fix to help this issue go away before
 the release would be extra nice.

This part of the testcase

  data2 = (char *) malloc (2 * pagesize);
  if (!data2)
return 1;
  data2 += (pagesize - ((long int) data2  (pagesize - 1)))  (pagesize - 1);
  if (data2 != mmap (data2, pagesize, PROT_READ | PROT_WRITE,
   MAP_PRIVATE | MAP_FIXED, fd, 0L))
return 1;

is bad.  The chance that the address of data2 is not usable for mmap on
Windows/Cygwin is 100%.  The problem here is that the generic HAVE_MMAP
test tests one certain feature, which is not usable on Windows, and which
is non-portable.  So, on Cygwin this test always fails and all applications
using this test in good faith will never use mmap on Cygwin, just because
the single case of mmap private fixed at somewhere already mapped doesn't
work.  In fact, most applications don't need this case.
And grep wouldn't need it either, since the method used in grep would
also work if the area hadn't been malloced before, if it would just use
the address returned by mmap as buffer.

That's why I think we need at least two tests in autoconf, a generic
mmap test and a mmap test for the mmap private/shared fixed at
somewhere already mapped case, if an application actually insists on
using that.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-09 Thread aputerguy

Corinna writes:
 I'm glad to read that, but I only debugged the problem.  The Fedora
 fix was applied by Chris.

Well it works for me too and as the OP of the problem, I extend my thanks to
both of you and all the others who helped in debugging and coming up with
such a quick fix.

My only remaining question is can we assume that this bug (or bad coding) is
grep-specific or is it likely to rear its head in other core *nix utilities
that use UTF-8?
-- 
View this message in context: 
http://old.nabble.com/1.7--BUG---GREP-slows-to-a-crawl-with-large-number-of-matches-on-a-single-file-tp26224019p26271227.html
Sent from the Cygwin list mailing list archive at Nabble.com.


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-09 Thread Corinna Vinschen
On Nov  9 10:22, aputerguy wrote:
 My only remaining question is can we assume that this bug (or bad coding) is
 grep-specific or is it likely to rear its head in other core *nix utilities
 that use UTF-8?

Who knows?  Nobody is immune against creating bad code, right?


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)

2009-11-09 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Corinna Vinschen on 11/9/2009 7:05 AM:
 This part of the testcase
 
   data2 = (char *) malloc (2 * pagesize);
   if (!data2)
 return 1;
   data2 += (pagesize - ((long int) data2  (pagesize - 1)))  (pagesize - 1);
   if (data2 != mmap (data2, pagesize, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_FIXED, fd, 0L))
 return 1;
 
 is bad.  The chance that the address of data2 is not usable for mmap on
 Windows/Cygwin is 100%.

But in testing this further, I discovered that you CAN do:

data2 = mmap(...);
munmap (data2,...);
mmap (data2, ... MAP_FIXED)

and get success on cygwin.  So I will be updating autoconf accordingly,
based on the STD below.  Unfortunately, it looks like I also found a hole
in cygwin.  Consider this (borrowing heavily from the autoconf test that I
am fixing):

#include stdio.h
#include sys/types.h
#include sys/stat.h
#include stdlib.h
#include string.h
#include unistd.h
#include fcntl.h
#include sys/mman.h

int
main (int argc, char **argv)
{
  char *data, *data2, *data3;
  int i, pagesize;
  int fd, fd2;

  pagesize = getpagesize ();
  /* First, make a file with some known garbage in it. */
  data = (char *) malloc (pagesize);
  if (!data)
return 1;
  for (i = 0; i  pagesize; ++i)
*(data + i) = rand ();
  umask (0);
  fd = creat (conftest.mmap, 0600);
  if (fd  0)
return 2;
  if (write (fd, data, pagesize) != pagesize)
return 3;
  close (fd);

  /* Next, check that a page is zero-filled if not backed by a file.  */
  fd2 = open (conftest.txt, O_RDWR | O_CREAT | O_TRUNC, 0600);
  if (fd2  0)
return 11;
  data2 = ;
  if (write (fd2, data2, 1) != 1)
return 12;
  else
/* We expect mmap to succeed, but reads to give SIGBUS, since mapped
   region is an entire page beyond bounds of mapped file.  */
;
  data2 = mmap (0, pagesize, PROT_READ | PROT_WRITE, MAP_SHARED, fd2, 0L);
  if (data2 == MAP_FAILED)
return 14;
  printf (mapped %p\n, data2);
  for (i = 0; i  pagesize; ++i)
if (*(data2 + i))
  {
printf (%p, %x\n, data2 + i, *(data2 + i));
return 15;
  }
  close (fd2);
  if (argc  1)
munmap (data2, pagesize);

  /* Next, try to mmap the file at a fixed address which already has
 something else allocated at it.  If we can, also make sure that
 we see the same garbage.  */
  fd = open (conftest.mmap, O_RDWR);
  if (fd  0)
return 4;
  if (data2 != mmap (data2, pagesize, PROT_READ | PROT_WRITE,
   MAP_PRIVATE | MAP_FIXED, fd, 0L))
return 6;
  for (i = 0; i  pagesize; ++i)
if (*(data + i) != *(data2 + i))
  {
printf (%p, exp %x, got %x\n, data2 + i, *(data + i), *(data2 + i));
return 7;
  }

  /* Finally, make sure that changes to the mapped area do not
 percolate back to the file as seen by read().  (This is a bug on
 some variants of i386 svr4.0.)  */
  for (i = 0; i  pagesize; ++i)
*(data2 + i) = *(data2 + i) + 1;
  data3 = (char *) malloc (pagesize);
  if (!data3)
return 8;
  if (read (fd, data3, pagesize) != pagesize)
return 9;
  for (i = 0; i  pagesize; ++i)
if (*(data + i) != *(data3 + i))
  return 10;
  close (fd);
  return 0;
}

This test behaves differently on Linux than on cygwin; on Linux, both
'./foo' and './foo 1' give status 0, but on cygwin, './foo' gives status
6, and only './foo 1' succeeds.  In other words, the second mmap fails if
there is no intermediate munmap.

POSIX apparently allows cygwin's behavior:

If MAP_FIXED is set, mmap() may return MAP_FAILED and set errno to
[EINVAL]. If a MAP_FIXED request is successful, the mapping established by
mmap() replaces any previous mappings for the pages in the range
[pa,pa+len) of the process.

However, since we already have to maintain a list of mappings in order to
implement fork(), it seems like it would be easy to fix cygwin to
implicitly munmap anything that would otherwise be in the way of a
subsequent MAP_FIXED request, rather than blindly calling
NtMapViewOfSection and failing because of the overlap, so that we could be
even more like Linux behavior.

 That's why I think we need at least two tests in autoconf, a generic
 mmap test and a mmap test for the mmap private/shared fixed at
 somewhere already mapped case, if an application actually insists on
 using that.

In the case of the autoconf test, I think a single test is still
sufficient, once it is fixed to be portable to what POSIX requires.

gnulib provides a more interesting test, for whether MMAP_ANON works.
http://git.savannah.gnu.org/cgit/gnulib.git/tree/m4/mmap-anon.m4

- --
Don't work too hard, make some time for fun as well!

Eric Blake e...@byu.net
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkr46LMACgkQ84KuGfSFAYCBrwCgsu2/rWozZs/1R33RaAlUwHow

Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)

2009-11-09 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

[please limit replies about the patch itself to autoconf-patches]

According to Corinna Vinschen on 11/9/2009 7:05 AM:
 This part of the testcase
 
   data2 = (char *) malloc (2 * pagesize);
   if (!data2)
 return 1;
   data2 += (pagesize - ((long int) data2  (pagesize - 1)))  (pagesize - 1);
   if (data2 != mmap (data2, pagesize, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_FIXED, fd, 0L))
 return 1;
 
 is bad.  The chance that the address of data2 is not usable for mmap on
 Windows/Cygwin is 100%.  The problem here is that the generic HAVE_MMAP
 test tests one certain feature, which is not usable on Windows, and which
 is non-portable.

MAP_FIXED appears to be more portable when the fixed address was obtained
from a previous mmap call.  Therefore, this patch fixes the macro as well
as making diagnosing configure failures more accurately pinpoint why they
are declaring failure.  I don't have access to HP-UX 11, which is another
platform where AC_FUNC_MMAP was failing; I would appreciate if someone
else could see if this makes a difference there.  But I have verified that
this now sets HAVE_MMAP for cygwin 1.5.x and cygwin 1.7 where the old
version failed, and that it does not change behavior on Linux or OpenBSD.

- --
Don't work too hard, make some time for fun as well!

Eric Blake e...@byu.net
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkr484kACgkQ84KuGfSFAYDCIgCbBl/eHS9C9acPwXp5Krk7KAeF
zAIAoMBEbnQm5tLpRDkCFWhEXNieL5cf
=3fYB
-END PGP SIGNATURE-
From fb1f28a2ff2c688e63dc97ece7fde86e16864491 Mon Sep 17 00:00:00 2001
From: Eric Blake e...@byu.net
Date: Mon, 9 Nov 2009 21:45:00 -0700
Subject: [PATCH] Fix AC_FUNC_MMAP for cygwin.

* lib/autoconf/functions.m4 (AC_FUNC_MMAP): Make the test more
portable: Actually check for sys/param.h, and only use MAP_FIXED
on an address previously returned from mmap.
* THANKS: Update.
Reported by Corinna Vinschen.

Signed-off-by: Eric Blake e...@byu.net
---
 ChangeLog |9 +++
 NEWS  |3 ++
 lib/autoconf/functions.m4 |   55 ++--
 3 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 4d028c0..77e9d4e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2009-11-09  Eric Blake  e...@byu.net
+
+   Fix AC_FUNC_MMAP for cygwin.
+   * lib/autoconf/functions.m4 (AC_FUNC_MMAP): Make the test more
+   portable: Actually check for sys/param.h, and only use MAP_FIXED
+   on an address previously returned from mmap.
+   * THANKS: Update.
+   Reported by Corinna Vinschen.
+
 2009-11-04  Eric Blake  e...@byu.net

Redocument AS_DIRNAME, even with its flaws.
diff --git a/NEWS b/NEWS
index 9e7e64c..86a0c3f 100644
--- a/NEWS
+++ b/NEWS
@@ -29,6 +29,9 @@ GNU Autoconf NEWS - User visible changes.
longer mistakenly select a 32-bit type on some compilers (bug present
since macros were introduced in 2.59c).

+** The AC_FUNC_MMAP macro has been fixed to be portable to systems like
+   Cygwin (bug present since macro was introduced in 2.0).
+
 ** The following documented autotest macros are new:
AT_CHECK_EUNIT

diff --git a/lib/autoconf/functions.m4 b/lib/autoconf/functions.m4
index 946a646..6b6e7fc 100644
--- a/lib/autoconf/functions.m4
+++ b/lib/autoconf/functions.m4
@@ -1186,9 +1186,9 @@ AU_ALIAS([AM_FUNC_MKTIME], [AC_FUNC_MKTIME])
 # 
 AN_FUNCTION([mmap], [AC_FUNC_MMAP])
 AC_DEFUN([AC_FUNC_MMAP],
-[AC_CHECK_HEADERS(stdlib.h unistd.h)
-AC_CHECK_FUNCS(getpagesize)
-AC_CACHE_CHECK(for working mmap, ac_cv_func_mmap_fixed_mapped,
+[AC_CHECK_HEADERS_ONCE([stdlib.h unistd.h sys/param.h])
+AC_CHECK_FUNCS([getpagesize])
+AC_CACHE_CHECK([for working mmap], [ac_cv_func_mmap_fixed_mapped],
 [AC_RUN_IFELSE([AC_LANG_SOURCE([AC_INCLUDES_DEFAULT]
 [[/* malloc might have been renamed as rpl_malloc. */
 #undef malloc
@@ -1224,11 +1224,6 @@ char *malloc ();

 /* This mess was copied from the GNU getpagesize.h.  */
 #ifndef HAVE_GETPAGESIZE
-/* Assume that all systems that can run configure have sys/param.h.  */
-# ifndef HAVE_SYS_PARAM_H
-#  define HAVE_SYS_PARAM_H 1
-# endif
-
 # ifdef _SC_PAGESIZE
 #  define getpagesize() sysconf(_SC_PAGESIZE)
 # else /* no _SC_PAGESIZE */
@@ -1264,7 +1259,7 @@ main ()
 {
   char *data, *data2, *data3;
   int i, pagesize;
-  int fd;
+  int fd, fd2;

   pagesize = getpagesize ();

@@ -1277,27 +1272,41 @@ main ()
   umask (0);
   fd = creat (conftest.mmap, 0600);
   if (fd  0)
-return 1;
+return 2;
   if (write (fd, data, pagesize) != pagesize)
-return 1;
+return 3;
   close (fd);

+  /* Next, check that the tail of a page is zero-filled.  File must have
+ non-zero length, otherwise we risk SIGBUS for entire page.  */
+  fd2 = open 

Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-08 Thread Corinna Vinschen
On Nov  7 15:26, aputerguy wrote:
 
 Changing LC_ALL also solved the problem for me.
 But it begs the question of how many other basic and take-for-granted
 functions might be affected by this apparent UTF-8 slowdown. And again we,
 are not talking about some minor overhead, we are talking about a slowdown
 of 1500X or 150,000%

Yeah, that's really still strange to me.  In my testing, the multibyte
to widechar conversion performed by grep in case of UTF-8 took only
1.5 up to 4 seconds for 10 times the number of input lines as in your
case.  It still puzzles me where the time is wasted in grep.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-08 Thread Corinna Vinschen
On Nov  8 11:30, Corinna Vinschen wrote:
 On Nov  7 15:26, aputerguy wrote:
  
  Changing LC_ALL also solved the problem for me.
  But it begs the question of how many other basic and take-for-granted
  functions might be affected by this apparent UTF-8 slowdown. And again we,
  are not talking about some minor overhead, we are talking about a slowdown
  of 1500X or 150,000%
 
 Yeah, that's really still strange to me.  In my testing, the multibyte
 to widechar conversion performed by grep in case of UTF-8 took only
 1.5 up to 4 seconds for 10 times the number of input lines as in your
 case.  It still puzzles me where the time is wasted in grep.

Got it.  The problem is this.

Grep reads the file in chunks  pagesize.  Pagesize is 64K on Cygwin.
For each buffer read into memory, the grepbuf() function calls the
execute() function as long as it returns a match.

The execute() function calls check_multibyte_string() for the entire
buffer(!), then calls kwsexec() to find a match.  If a match has been
found, it free's the memory allocated by check_multibyte_string()
and returns to grepbuf.  Then grepbuf() will call execute again with
the pointers into the buffer moved to the next line.

Let's make an example.  Assume the buffer is 100K, which is not unusual
when running under Cygwin.  Assume further that the file consists of
100,000 lines with the text The quick brown fox jumped over the lazy dog.
Each line is 45 bytes, so the buffer contains somwhat more than 2200
lines.  Now let's search for the expression dog.

The first call to execute will call check_multibyte_string() for the
entire buffer of 10 bytes.  Then it finds a match in the first line,
free's the check_multibyte_string() memory and returns to grepbuf.
grepbuf calls execute with the start pointer moved to the second line in
the buffer, so execute() calls check_multibyte_string() for the remainder
of the buffer, which is 99955 bytes.  It find a match in the first line,
free's the check_multibyte_string buffer, returns to grepbuf, which calls
execute, which calls check_multibyte_string() with a buffer of 99910
bytes, and so on...

Every invocation of check_multibyte_string() calls mbrtowc() in a loop
for the entire buffer given as argument.  In our example, that means
that mbrtowc() is called (hold your breath)

  111,161,115

times for each 100K of input file!  No wonder grep takes 3 or 4 minutes
to grep this very example on Cygwin.

I really think there's some room for optimization left in this algorithm.

Btw., the check for mmap in grep's configure file is broken.  It tries
to mmap to a fixed address formerly allocated via malloc().  This doesn't
work on Windows.  An autoconf run with a newer version of autoconf would
be nice.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)

2009-11-08 Thread Corinna Vinschen
On Nov  8 14:56, Corinna Vinschen wrote:
 Btw., the check for mmap in grep's configure file is broken.  It tries
 to mmap to a fixed address formerly allocated via malloc().  This doesn't
 work on Windows.  An autoconf run with a newer version of autoconf would
 be nice.

I just found that the latest autoconf *still* has this broken test
for mmap, which basically calls

  data2 = malloc (size);
  mmap(data2, ...);

Why has this test never been fixed?  Chuck?


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)

2009-11-08 Thread Ralph Hempel

Corinna Vinschen wrote:

On Nov  8 14:56, Corinna Vinschen wrote:

Btw., the check for mmap in grep's configure file is broken.  It tries
to mmap to a fixed address formerly allocated via malloc().  This doesn't
work on Windows.  An autoconf run with a newer version of autoconf would
be nice.


I just found that the latest autoconf *still* has this broken test
for mmap, which basically calls

  data2 = malloc (size);
  mmap(data2, ...);

Why has this test never been fixed?  Chuck?


I can't answer that question but this thread points out very important
lessons in debugging specifically and projects in general.

1. Easily reproducible test cases are critical to getting somone
   interested in fixing your problem.

2. Having the good fortune to have somebody run the test case and
   duplicate the problem helps a bit more.

3. Having that person challenge the assumptions under which the code
   has been working for YEARS without a complaint helps a bit more.

4. Having that person do a great analysis that shows why the problem
   exists helps even more.

5. Going even one step further and trying to figure out why the
   problem has existed for years and what else might be wrong is
   just the icing on the cake.

Bravo Corinna - on a Sunday no less...

Cheers, Ralph

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)

2009-11-08 Thread Christopher Faylor
On Sun, Nov 08, 2009 at 10:51:56AM -0500, Ralph Hempel wrote:
Corinna Vinschen wrote:
 On Nov  8 14:56, Corinna Vinschen wrote:
 Btw., the check for mmap in grep's configure file is broken.  It tries
 to mmap to a fixed address formerly allocated via malloc().  This doesn't
 work on Windows.  An autoconf run with a newer version of autoconf would
 be nice.
 
 I just found that the latest autoconf *still* has this broken test
 for mmap, which basically calls
 
   data2 = malloc (size);
   mmap(data2, ...);
 
 Why has this test never been fixed?  Chuck?

I can't answer that question but this thread points out very important
lessons in debugging specifically and projects in general.

1. Easily reproducible test cases are critical to getting somone
interested in fixing your problem.

2. Having the good fortune to have somebody run the test case and
duplicate the problem helps a bit more.

3. Having that person challenge the assumptions under which the code
has been working for YEARS without a complaint helps a bit more.

4. Having that person do a great analysis that shows why the problem
exists helps even more.

5. Going even one step further and trying to figure out why the
problem has existed for years and what else might be wrong is
just the icing on the cake.

Bravo Corinna - on a Sunday no less...

6. googling for the problem is always a good thing to do.

Once it was clear that this was a character set issue in grep it was
easy enough to find a fix since it was already in a couple of linux bug
trackers.

cgf

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-08 Thread Yaakov (Cygwin/X)

On 08/11/2009 07:56, Corinna Vinschen wrote:

Btw., the check for mmap in grep's configure file is broken.  It tries
to mmap to a fixed address formerly allocated via malloc().  This doesn't
work on Windows.  An autoconf run with a newer version of autoconf would
be nice.


You said the same thing over five years ago: :-)

http://www.cygwin.com/ml/cygwin/2004-09/msg00753.html

AFAIK the autoconf test has not changed since then.  cygport sets 
ac_cv_func_mmap_fixed_mapped=yes for this very reason.



Yaakov

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-08 Thread Christopher Faylor
On Sun, Nov 08, 2009 at 12:27:29PM -0600, Yaakov (Cygwin/X) wrote:
On 08/11/2009 07:56, Corinna Vinschen wrote:
 Btw., the check for mmap in grep's configure file is broken.  It tries
 to mmap to a fixed address formerly allocated via malloc().  This doesn't
 work on Windows.  An autoconf run with a newer version of autoconf would
 be nice.

You said the same thing over five years ago: :-)

http://www.cygwin.com/ml/cygwin/2004-09/msg00753.html

AFAIK the autoconf test has not changed since then.  cygport sets 
ac_cv_func_mmap_fixed_mapped=yes for this very reason.

FWIW, this test doesn't really matter to my build of grep since it
is not used when cross-compiling.

cgf

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Broken autoconf mmap test (was Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file)

2009-11-08 Thread Charles Wilson
Corinna Vinschen wrote:
 On Nov  8 14:56, Corinna Vinschen wrote:
 Btw., the check for mmap in grep's configure file is broken.  It tries
 to mmap to a fixed address formerly allocated via malloc().  This doesn't
 work on Windows.  An autoconf run with a newer version of autoconf would
 be nice.
 
 I just found that the latest autoconf *still* has this broken test
 for mmap, which basically calls
 
   data2 = malloc (size);
   mmap(data2, ...);
 
 Why has this test never been fixed?  Chuck?

...err, 'cause I didn't realize it was a problem. I see that cygport has
hidden this for years:

# AC_HAVE_MMAP fails despite a working mmap, so we force this to yes
# (see http://www.cygwin.com/ml/cygwin/2004-09/msg00741.html
# and following thread for details)
export ac_cv_func_mmap_fixed_mapped=yes;

NTTAWWT, but it never triggered my gee I ought to fix that reflex. I
agree this should be fixed, but I'm leery of changing an autoconf test
without knowing how that change will affect the other 9,236 platforms
that may depend on the current behavior, esp. given my current (lack of)
knowledge about how mmap is *supposed* to work in the various MAP_* modes.

I think this is an issue for the autoconf list as a whole.  Would you --
or Eric -- care to raise it there?  Especially as you seemed to have
quite strong feelings about it back in 2004:
http://www.cygwin.com/ml/cygwin/2004-09/msg00753.html
 The mmap test is crap.  How can an application expect to be able to
 access just about every address together with MAP_FIXED?
 
 Consequentially MapViewOfFileEx returns error 487 in these cases,
 Attempt to access invalid address.
 
 That's just another example of a crappy autoconf mmap test.

--
Chuck

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-08 Thread Jim Reisert AD1C

Corinna, the new grep works super great - thanks!

--
Jim Reisert AD1C, jjreis...@alum.mit.edu, http://www.ad1c.us

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-07 Thread Richard Foulk
Jim Reisert wrote:

On Fri, Nov 6, 2009 at 7:12 AM, Cooper, Karl (US SSA)
karl.coo...@baesystems.com wrote:

 Corinna Vinschen wrote:
 Or try LANG=C.ASCII since LANG=C will still return UTF-8 as charset
 when calling nl_langinfo(CHARSET).

 Yes, this solves it:

 $ time LC_ALL=C.ASCII grep dog testfile | wc
  10  90 450

 real0m0.359s
 user0m0.279s
 sys 0m0.232s


 I just tried this on my system, I routinely grep groups of files
 containing 100K lines.  I was *astounded* how fast grep is after
 setting LC_ALL=C.ASCII !


The second run of grep is usually much faster due to disk buffering.




--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-07 Thread aputerguy

Changing LC_ALL also solved the problem for me.
But it begs the question of how many other basic and take-for-granted
functions might be affected by this apparent UTF-8 slowdown. And again we,
are not talking about some minor overhead, we are talking about a slowdown
of 1500X or 150,000%

As a North American English speaker, UTF-8 is not that important to me and
certainly not worth such a heavy overhead price.

Also, while I don't have 'pcgrep' installed on my machine, it is interesting
that 'sed' is not affected.
-- 
View this message in context: 
http://old.nabble.com/1.7--BUG---GREP-slows-to-a-crawl-with-large-number-of-matches-on-a-single-file-tp26224019p26249599.html
Sent from the Cygwin list mailing list archive at Nabble.com.


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-05 Thread aputerguy

Running grep on a 20MB file with ~100,000 matches takes an incredible almost
8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5
(on a 2nd machine).

The following cases show how grep under 1.7 grinds to a halt as the number
of matches increases.

The data 'testfile' is a plain text file of the acl's of all the 108,000
files on my Windoze computer.

Note since the machines are different, compare relative times across cases
rather than the times between the two machines.

Case 1] Zero matches
time grep sfsdfdsfds testfile | wc
 0   0   0

Cygwin 1.5
real0m0.093s
user0m0.092s
sys 0m0.030s

Cygwin 1.7
real0m1.353s
user0m1.342s
sys 0m0.062s

Case 2] One match
time grep .lesshst testfile | wc
  1   3  29

Cygwin 1.5 (~same as zero matches)
real0m0.234s
user0m0.091s
sys 0m0.061s

Cygwin 1.7 (~same as zero matches)
real0m1.499s
user0m1.404s
sys 0m0.046s

Case 3] ~1400 matches

Cygwin 1.5 (~ same as zero matches)
time grep .bin testfile | wc
   14395661   71067

real0m0.110s
user0m0.076s
sys 0m0.077s

Cygwin 1.7 (~6x zero matches case
real0m7.537s
user0m7.341s
sys 0m0.045s

Case 4] ~16000 matches
time grep Documents and Settings testfile | wc
  15824  131573 1918500

Cygwin 1.5 (~same as zero matches)
real0m0.437s
user0m0.092s
sys 0m0.092s

Cygwin 1.7 (~50x zero matches)
real1m14.491s
user1m8.904s
sys 0m0.031s


Case 5] ~100,000 matches
time grep # file testfile | wc
 106988  510944 7930558

Cygwin 1.5 (~1.5x zero matches)

real0m0.475s
user0m0.154s
sys 0m0.201s

Cygwin 1.7 (~350x zero matches)
real7m51.771s
user7m16.810s
sys 0m0.062s

Case 6] Test that nothing wrong with file system reads or 'wc'
time cat testfile | wc
 966300 1821815 20426592

Cygwin 1.5 (approx same time as grepping zero matches)
real0m0.344s
user0m0.201s
sys 0m0.186s

Cygwin 1.7 (approx same time as grepping zero matches)
real0m1.662s
user0m1.373s
sys 0m0.138s



-- 
View this message in context: 
http://old.nabble.com/1.7--BUG---GREP-slows-to-a-crawl-with-large-number-of-matches-on-a-single-file-tp26224019p26224019.html
Sent from the Cygwin list mailing list archive at Nabble.com.


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-05 Thread Dave Korn
aputerguy wrote:

 The data 'testfile' is a plain text file of the acl's of all the 108,000
 files on my Windoze computer.

  So, the find | xargs trick worked then did it? :-)

cheers,
  DaveK

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-05 Thread Christopher Faylor
On Thu, Nov 05, 2009 at 03:27:07PM -0800, aputerguy wrote:

Running grep on a 20MB file with ~100,000 matches takes an incredible almost
8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5
(on a 2nd machine).

The following cases show how grep under 1.7 grinds to a halt as the number
of matches increases.

The data 'testfile' is a plain text file of the acl's of all the 108,000
files on my Windoze computer.

Note since the machines are different, compare relative times across cases
rather than the times between the two machines.

We'll need an actual test case if you want us to track it down.

cgf

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-05 Thread Linda Walsh

aputerguy wrote:

Running grep on a 20MB file with ~100,000 matches takes an incredible almost
8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5
(on a 2nd machine).

---
	I've seen nasty behavior with grep that isnt' cygwin 
specific.  Try pcregrep and see if you have the same issue.


I found it to be about ~100 times faster under _some_ searches
though 2-3x is more typical. The gnu re-parser isn't real 
efficient under some circumstances.  


If you find a big difference, you might also want to report
it to the bug-g...@gnu.org mailing list, but last time I did,
they told me that's the way it is due to some posix conformance
thing...

-l

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-05 Thread Christopher Faylor
On Thu, Nov 05, 2009 at 07:11:02PM -0800, Linda Walsh wrote:
aputerguy wrote:
 Running grep on a 20MB file with ~100,000 matches takes an incredible almost
 8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5
 (on a 2nd machine).

I've seen nasty behavior with grep that isnt' cygwin specific.  Try
pcregrep and see if you have the same issue.

I found it to be about ~100 times faster under _some_ searches though
2-3x is more typical.  The gnu re-parser isn't real efficient under
some circumstances.

If you find a big difference, you might also want to report it to the
bug-g...@gnu.org mailing list, but last time I did, they told me
that's the way it is due to some posix conformance thing...

The fact that it behaves differently between Cygwin 1.5 and 1.7 would
suggest that this isn't a grep problem.

That's why I asked for a test case.

cgf

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file

2009-11-05 Thread aputerguy

OK. Here is a simple test case:


X=10
while [ $X -gt 0 ] ; do echo The quick brown fox jumped over the lazy dog
; let X=X-1; done   testfile

time grep dog testfile | wc

Cygwin 1.5:
real0m0.219s
user0m0.232s
sys 0m0.045s

Cygwin 1.7:
real7m46.575s
user7m14.138s
sys 0m0.076s

While using sed on Cygwin 1.5, I get the reasonable result:
time sed -ne /dog/p testfile | wc

real0m1.229s
user0m1.202s
sys 0m0.046s


-- 
View this message in context: 
http://old.nabble.com/1.7--BUG---GREP-slows-to-a-crawl-with-large-number-of-matches-on-a-single-file-tp26224019p26226567.html
Sent from the Cygwin list mailing list archive at Nabble.com.


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple