Re: non portable sed scripts

2006-05-19 Thread Ralf Wildenhues
Hi Tim,

* Tim Rice wrote on Fri, May 19, 2006 at 06:57:48PM CEST:
> 
> Autoconf version 2.59c
> 
> I had an opportunity to run a configure script generated with 2.59c
> (ftp://ftp.gnu.org/gnu/coreutils/coreutils-5.95.tar.gz) and found
> that it failed.

> config.status: creating config.h
> UX:sed: ERROR: Command garbled: HAVE_DECL_STRNDUP\)[   (].*$,\1define\2 0 ,

I'm pretty sure the syntax of the sed script is ok.  It's probably that
your sed has a length restriction.  We check the 99 commands limit, but
not the 2000 characters limit any more...  :-/

This particular sed script seems to have around 7200 characters before
the failing line though.  So what's the limit on your system?

To get at the individual sed scripts for config.h:
- in config.status, search for lines
sed -f "$tmp/defines.sed" "$tmp/out..." >"$tmp/out..."
  and rename the file "$tmp/defines.sed" afterwards, so it's not
  overwritten by the next file.

- Then, watch
sh -x ./config.status -d
  and look at the sed scripts.

Boy, I hope this isn't so difficult to fix, with all that reworking done
to the config.status code.

Cheers, and thanks for reporting this!
Ralf




Re: non portable sed scripts

2006-05-19 Thread Ralf Wildenhues
* Ralf Wildenhues wrote on Fri, May 19, 2006 at 08:37:06PM CEST:
> * Tim Rice wrote on Fri, May 19, 2006 at 06:57:48PM CEST:
> > 
> > Autoconf version 2.59c

> > config.status: creating config.h
> > UX:sed: ERROR: Command garbled: HAVE_DECL_STRNDUP\)[ 
> > (].*$,\1define\2 0 ,
> 
> I'm pretty sure the syntax of the sed script is ok.  It's probably that
> your sed has a length restriction.  We check the 99 commands limit, but
> not the 2000 characters limit any more...  :-/
*snip*

That should've been 4000 characters, as in autoconf.texi.

Sorry about that.




Re: non portable sed scripts

2006-05-19 Thread Paul Eggert
Thanks for the bug report.  I suspect that the sed usage is portable
but that we are running into some limitation of your 'sed'
implementation.

Here is some further information that you can send that will help us
debug this.  (I don't have access to your platform so I can't debug
the problem directly.)

Which operating system are you using?  What does the shell command
"uname -a" output?  How about the shell command "type sed" or "which
sed"?

Please try patching your config.status file as follows:

--- config.status~  2006-05-19 12:02:29.0 -0700
+++ config.status   2006-05-19 12:04:44.0 -0700
@@ -1013,6 +1013,7 @@ ${ac_dA}HAVE_DECL_STRTOUL$ac_dB${ac_dC}1
 ${ac_dA}HAVE_DECL_STRTOULL$ac_dB${ac_dC}1$ac_dD
 ${ac_dA}HAVE_DECL_TTYNAME$ac_dB${ac_dC}1$ac_dD
 CEOF
+cp "$tmp/defines.sed" myscript.sed
 sed -f "$tmp/defines.sed" $ac_file_inputs >"$tmp/out1"
 # First, check the format of the line:
 cat >"$tmp/defines.sed" 

Re: non portable sed scripts

2006-05-19 Thread Paul Eggert
Ralf Wildenhues <[EMAIL PROTECTED]> writes:

> We check the 99 commands limit, but
> not the [4000] characters limit any more...  :-/

But the 4000-character limit is documented by Autoconf to be a limit
on the length of lines of sed's input data, not a limit on the total
size of the sed script.

However, it turns out that we check the 99 commands limit incorrectly,
as the sed script in question contains 100 commands.  I installed this
patch.  It's conceivable that this patch fixes the problem; it'd be
nice to test this.

I found what appear to be some other off-by-one issues that cause
98-line scripts instead of 99, but these are not bugs so I didn't fix
it.  Also, that code is too hairy (does anybody understand it other
than its author? I sure don't) so I didn't want to mess with it.

In the long run we're probably better off finding a working 'sed' than
continuing to cater to broken ones.

2006-05-19  Paul Eggert  <[EMAIL PROTECTED]>

* lib/autoconf/status.m4 (_AC_OUTPUT_HEADER): Fix off-by-one bug
that caused config.status to generate 100-command sed scripts; the
portable limit is 99.

--- lib/autoconf/status.m4  19 May 2006 04:14:13 -  1.102
+++ lib/autoconf/status.m4  19 May 2006 21:01:31 -
@@ -660,7 +660,7 @@ echo 's/ $//
 [s,^[   #]*u.*,/* & */,]' >>conftest.defines
 
 # Break up conftest.defines:
-ac_max_sed_lines=m4_eval(_AC_SED_CMD_LIMIT - 3)
+ac_max_sed_lines=m4_eval(_AC_SED_CMD_LIMIT - 4)
 
 # First sed command is: sed -f defines.sed $ac_file_inputs >"$tmp/out1"
 # Second one is:sed -f defines.sed "$tmp/out1" >"$tmp/out2"




Re: non portable sed scripts

2006-05-19 Thread Ralf Wildenhues
* Ralf Wildenhues wrote on Fri, May 19, 2006 at 08:42:46PM CEST:
> > 
> > I'm pretty sure the syntax of the sed script is ok.  It's probably that
> > your sed has a length restriction.

Well, Autoconf-2.59 has been using 38 lines per here document:
| # Maximum number of lines to put in a shell here document.
| # This variable seems obsolete.  It should probably be removed, and
| # only ac_max_sed_lines should be used.
| : ${ac_max_here_lines=38}

lib/autoconf/status.m4:
|   sed ${ac_max_here_lines}q conftest.defines >>$CONFIG_STATUS

2.59c uses 99 commands now (for CONFIG_HEADERS substitutions).

Neither have been computing the number of characters in the sed script.
So, probably 2.59 has just been lucky to produce short enough scripts.

FWIW, we may easily generate too long scripts for CONFIG_FILES as well
now: the 2.59 method did 48 substitutions per script (because it used
two commands per substitution), now we do 96.  I've see a couple of
real-world examples with more than 4000 bytes with 2.59c.

FWIW, on Solaris, /usr/ucb/sed errors out with scripts longer than 6810
characters, /usr/xpg4/bin/sed segfaults at some point (don't remember
the exact number), /bin/sed allows much longer scripts.  My testing on
Solaris always had /bin early in $PATH, thus did not expose this issue.

Cheers,
Ralf




Re: non portable sed scripts

2006-05-19 Thread Tim Rice
On Fri, 19 May 2006, Ralf Wildenhues wrote:

> Hi Tim,
> 
> * Tim Rice wrote on Fri, May 19, 2006 at 06:57:48PM CEST:
> > 
> > Autoconf version 2.59c
> > 
> > I had an opportunity to run a configure script generated with 2.59c
> > (ftp://ftp.gnu.org/gnu/coreutils/coreutils-5.95.tar.gz) and found
> > that it failed.
> 
> > config.status: creating config.h
> > UX:sed: ERROR: Command garbled: HAVE_DECL_STRNDUP\)[ 
> > (].*$,\1define\2 0 ,
> 
> I'm pretty sure the syntax of the sed script is ok.  It's probably that
> your sed has a length restriction.  We check the 99 commands limit, but
> not the 2000 characters limit any more...  :-/
> 
> This particular sed script seems to have around 7200 characters before
> the failing line though.  So what's the limit on your system?

You may well be right.
Unknown, I don't have source. :-(

> 
> To get at the individual sed scripts for config.h:
> - in config.status, search for lines
> sed -f "$tmp/defines.sed" "$tmp/out..." >"$tmp/out..."
>   and rename the file "$tmp/defines.sed" afterwards, so it's not
>   overwritten by the next file.
> 
> - Then, watch
> sh -x ./config.status -d

Hmm, "sh -x ./config.status -d" generates a good config.h
The config.h generated at configure time had just the first line.

>   and look at the sed scripts.

They look fine.

[hours and much testing later]

It must be some obscure shell bug.
If I "ksh configure" it works fine.

To answer Paul's question, This is UnixWare 7.1.1
I don't see the problem on my UnixWare 7.1.4 box.
Even my old UnixWare 2.03 box does OK.

I'd say it's not a bug. Sorry for the noise.
 
> Boy, I hope this isn't so difficult to fix, with all that reworking done
> to the config.status code.
> 
> Cheers, and thanks for reporting this!
> Ralf
> 

-- 
Tim RiceMultitalents(707) 887-1469
[EMAIL PROTECTED]






Re: non portable sed scripts

2006-05-19 Thread Ralf Wildenhues
Hi Paul,

* Paul Eggert wrote on Fri, May 19, 2006 at 11:04:52PM CEST:
> Ralf Wildenhues <[EMAIL PROTECTED]> writes:
> 
> > We check the 99 commands limit, but
> > not the [4000] characters limit any more...  :-/
> 
> But the 4000-character limit is documented by Autoconf to be a limit
> on the length of lines of sed's input data, not a limit on the total
> size of the sed script.

D'oh.  I misread this section long ago and have always thought that
wrongly since...

So then the total limit of the script size I found on Solaris (described
in that other mail in this thread that was pending for some hours)
really is a new issue.

> However, it turns out that we check the 99 commands limit incorrectly,
> as the sed script in question contains 100 commands.  I installed this
> patch.  It's conceivable that this patch fixes the problem; it'd be
> nice to test this.

All my testing of seds a couple of months ago showed that labels do not
count as commands.  If you have a sed where this is different, then I'd
like to know.

> I found what appear to be some other off-by-one issues that cause
> 98-line scripts instead of 99, but these are not bugs so I didn't fix
> it.  Also, that code is too hairy (does anybody understand it other
> than its author? I sure don't) so I didn't want to mess with it.

It's pretty hairy.  I don't want to claim that I fully understand it,
but I did fix a couple of bugs in it.

> In the long run we're probably better off finding a working 'sed' than
> continuing to cater to broken ones.

FWIW, I agree.  Actually, it would probably be best to kill the
quadratic complexity in the number of AC_SUBSTs incurred by a sed script
and replace it by an awk script that searches for @[a-zA-Z][a-zA-Z0-9]*@
in the .in file and uses that variable name as key in a hash.

Cheers,
Ralf




Re: non portable sed scripts

2006-05-21 Thread Paul Eggert
Ralf Wildenhues <[EMAIL PROTECTED]> writes:

> So then the total limit of the script size I found on Solaris (described
> in that other mail in this thread that was pending for some hours)
> really is a new issue.

If it's just Solaris, we should be able to work around it by using
AC_PROG_SED, as it should check for that bug (it currently doesn't,
but it should).

> All my testing of seds a couple of months ago showed that labels do not
> count as commands.

Sorry, I didn't know that.  I guess we can undo my patch then,
but add a comment.

Here's a first cut to do this.  I haven't tested it and I assume
it's not the full job, but it should give you an idea.

Index: doc/autoconf.texi
===
RCS file: /cvsroot/autoconf/autoconf/doc/autoconf.texi,v
retrieving revision 1.1018
diff -p -u -r1.1018 autoconf.texi
--- doc/autoconf.texi   20 May 2006 05:39:03 -  1.1018
+++ doc/autoconf.texi   21 May 2006 07:45:40 -
@@ -3632,9 +3632,8 @@ is found, and otherwise to @samp{:} (do 
 @defmac AC_PROG_SED
 @acindex{PROG_SED}
 @ovindex SED
-Set output variable @code{SED} to a Sed implementation on @env{PATH} that
-truncates as few characters as possible.  If @sc{gnu} Sed is found,
-use that instead.
+Set output variable @code{SED} to a Sed implementation on
[EMAIL PROTECTED] that conforms to Posix without arbitrary length limits.
 @end defmac
 
 @defmac AC_PROG_YACC
@@ -13207,9 +13206,12 @@ them.
 Unicos 9 @command{sed} loops endlessly on patterns like @samp{.*\n.*}.
 
 Sed scripts should not use branch labels longer than 8 characters and
-should not contain comments.  HP-UX sed has a limit of 99 commands and
+should not contain comments.  HP-UX sed has a limit of 99 commands
+(not counting @samp{:} commands) and
 48 labels, which can not be circumvented by using more than one script
 file.  It can execute up to 19 reads with the @samp{r} command per cycle.
+Solaris @command{/usr/ucb/sed} does not allow scripts longer than 6810
+bytes, and its @command{/usr/xpg4/bin/sed} dumps core with long scripts.
 
 Avoid redundant @samp{;}, as some @command{sed} implementations, such as
 [EMAIL PROTECTED] 1.4.2's, incorrectly try to interpret the second
Index: lib/autoconf/programs.m4
===
RCS file: /cvsroot/autoconf/autoconf/lib/autoconf/programs.m4,v
retrieving revision 1.54
diff -p -u -r1.54 programs.m4
--- lib/autoconf/programs.m419 May 2006 08:11:27 -  1.54
+++ lib/autoconf/programs.m421 May 2006 07:45:40 -
@@ -812,9 +812,15 @@ adjust the code.])
 # as few characters as possible.  Prefer GNU sed if found.
 AC_DEFUN([AC_PROG_SED],
 [AC_CACHE_CHECK([for a sed that does not truncate output], ac_cv_path_SED,
-[_AC_PATH_PROG_FEATURE_CHECK(SED, [sed gsed],
+[dnl ac_script should contain more than 99 commands and more than
+ dnl 6810 bytes, to catch limits in Solaris 8 /usr/ucb/sed.
+ ac_script=s/aaa/b/
+ for ac_i in 1 2 3 4 5 6 7; do
+   ac_script="$ac_script$as_nl$ac_script"
+ done
+ _AC_PATH_PROG_FEATURE_CHECK(SED, [sed gsed],
[_AC_FEATURE_CHECK_LENGTH([ac_path_SED], [ac_cv_path_SED],
-   ["$ac_path_SED" -e 's/a$//'])])])
+   ["$ac_path_SED" -e "$ac_script"])])])
  SED="$ac_cv_path_SED"
  AC_SUBST([SED])
 ])# AC_PROG_SED
Index: lib/autoconf/status.m4
===
RCS file: /cvsroot/autoconf/autoconf/lib/autoconf/status.m4,v
retrieving revision 1.103
diff -p -u -r1.103 status.m4
--- lib/autoconf/status.m4  19 May 2006 21:02:10 -  1.103
+++ lib/autoconf/status.m4  21 May 2006 07:45:40 -
@@ -303,7 +303,7 @@ AC_DEFUN([AC_CONFIG_FILES], [_AC_CONFIG_
 # _AC_SED_CMD_LIMIT
 # -
 # Evaluate to an m4 number equal to the maximum number of commands to put
-# in any single sed program.
+# in any single sed program, not counting ":" commands.
 #
 # Some seds have small command number limits, like on Digital OSF/1 and HP-UX.
 m4_define([_AC_SED_CMD_LIMIT],
@@ -660,7 +660,7 @@ echo 's/ $//
 [s,^[   #]*u.*,/* & */,]' >>conftest.defines
 
 # Break up conftest.defines:
-ac_max_sed_lines=m4_eval(_AC_SED_CMD_LIMIT - 4)
+ac_max_sed_lines=m4_eval(_AC_SED_CMD_LIMIT - 3)
 
 # First sed command is: sed -f defines.sed $ac_file_inputs >"$tmp/out1"
 # Second one is:sed -f defines.sed "$tmp/out1" >"$tmp/out2"




Re: non portable sed scripts

2006-05-21 Thread Ralf Wildenhues
Hi Paul,

* Paul Eggert wrote on Sun, May 21, 2006 at 09:46:32AM CEST:
> Ralf Wildenhues <[EMAIL PROTECTED]> writes:
> 
> > So then the total limit of the script size I found on Solaris (described
> > in that other mail in this thread that was pending for some hours)
> > really is a new issue.
> 
> If it's just Solaris, we should be able to work around it by using
> AC_PROG_SED, as it should check for that bug (it currently doesn't,
> but it should).

I think I have this figured out now, (took me way too long :-( )
but I need a while to write it all down, and I need to go back to
Libtool fix a 5 year old bug (the one that led to LT_AC_PROG_SED
in the first place) in a different (right) way first.

Short story: Libtool has always (wanted to) prefer /usr/xpg4/bin/sed
over /bin/sed on Solaris, stating that the latter doesn't cope as well
with long lines.  Well, it copes worse that the xpg4 one with
_incomplete_ lines (without final newlines), which libtool likes to
create at times.  But test for sed has been "fixed" along the way not to
use incomplete lines, so it wouldn't exclude /bin/sed anyway ...  and
then anyway libtool just needs to put its $NL2SP | $SED | $SP2NL
workaround in place everywhere so that this doesn't matter any more.

Then, /usr/xpg4/bin/sed doesn't really expose a small script length
limit; rather, it segfaults on the CONFIG_HEADERS script created by the
"Torturing config.status" test, but it works with simpler scripts of
the same size.  I have not analyzed in detail the characteristics when
this segfault triggers.

I tested /usr/ucb/sed again.  It turns out, the 6810 bytes for it isn't
fixed.  With a script that your proposed test generates, the border ends
at about 6635 characters.  If you use one less substitution, 6644
characters are ok.  White space before a command does not count, neither
does a `;' separating commands.  Labels (`:' commands) and their
arguments do not count, neither do jumps `b' or conditional jumps `t'.
An escape character (backslash) in a regex does not count.  The limit
cannot be circumvented by splitting the script into several files
(although the length of the representation of 2 scripts may not exactly
be the sum of the lengths of the individual representations; I did not
check that).  For too long scripts, the error message is:
  sed: Too much command text: [...]

My conclusion from these observations is that there is a fixed buffer
size for some internal representation of the command text, which has a
constant overhead per command (possible with a per-command constant),
plus the (internal representation of the) arguments.  I have not
attempted to measure the overhead per `s' command or any other constants
here exactly.


I will post another message with an actual patch, and more technical
comments to it; this one is messy and long enough already.

Cheers,
Ralf




Re: non portable sed scripts

2006-05-21 Thread Ralf Wildenhues
Hi Paul, Tim,

* Paul Eggert wrote on Sun, May 21, 2006 at 09:46:32AM CEST:
> Ralf Wildenhues <[EMAIL PROTECTED]> writes:
> 
> > So then the total limit of the script size I found on Solaris (described
> > in that other mail in this thread that was pending for some hours)
> > really is a new issue.
> 
> If it's just Solaris, we should be able to work around it by using
> AC_PROG_SED, as it should check for that bug (it currently doesn't,
> but it should).

Well, my guess is still that (at least one sed on) UnixWare has similar
issues, but I have no idea whether this UnixWare has several sed
implementations, and one of them is usable, or whether it is common to
have GNU sed as add-on installed and in $PATH.

> > All my testing of seds a couple of months ago showed that labels do not
> > count as commands.
> 
> Sorry, I didn't know that.  I guess we can undo my patch then,
> but add a comment.
> 
> Here's a first cut to do this.  I haven't tested it and I assume
> it's not the full job, but it should give you an idea.

Changes w.r.t. to your initial patch (thanks BTW!):

- we need to use a script file to avoid overrunning the command line
  length limit on w32 systems (and also to avoid making the test
  reay crawl there, even more than it does already),
- we may not use more than 99 commands in the script we test with:
  otherwise, on HP-UX, we won't find any suitable sed.
- remove the script file afterwards, and (try to) unset the long
  variable,
- document that we may still end up with a limited sed, if we don't find
  any better, but that we error out if we don't find any that works,
- use ${SED-sed} in the config.status code.  Require AC_PROG_SED from
  AC_OUTPUT if needed.  (I don't like to expand it unconditionally, that
  would cause a significant testsuite runtime overhead.)
- bugfix in _AC_FEATURE_CHECK_LENGTH

There are are some caveats to be aware of here: AC_PROG_SED originates
from Libtool.  And there is a history to the Libtool macro.  Libtool
actually plays well with the name space (LT_AC_PROG_SED in 1.5.22, or a
test of m4_defined AC_PROG_SED in CVS Libtool), but if someone ends up
to be smart enough with that to
  if $some_condition; then
AC_PROG_LIBTOOL
  fi
  # ...
  AC_OUTPUT

then you may end up with unset $SED.  This is one reason for using
${SED-sed}, and _not_ initializing $SED in config.status if we have
not called that macro, and not setting the variable in config.status
from within AC_PROG_SED.  Libtool-1.5.10 had a bug in its sed check,
but since we don't use its cache variable $lt_cv_path_SED, we don't
fall prey of that.  OTOH, we better make Libtool-2.0's test match with
what we come up here.

The patch below does not attempt to save us from the segfault of Solaris
/usr/xpg4/bin/sed.  I could try to mimic the script from the torture
test more so that it gets exposed, but I'm not sure it's useful.

Do you think this patch should be applied?  I'm not sure it's worth it.

I think we'd need a full more run of tests on various systems to ensure
this doesn't break down anywhere: it would leave the user with an
unusable package, even if the package in question does not come anywhere
close to the tested limits.  The fact that AC_PROG_SED errors out at
times makes this patch quite scary to me, in fear of trapping the GNU
sed package once it uses Autoconf-2.60.

So IMHO I wouldn't mind if only the bugfix and documentation parts of
the patch below were committed.


Tim, could you do us a favor and test this patch?  It should apply to
Autoconf-2.59c or CVS Autoconf, and the interesting test output would
be in tests/testsuite.log after

  env TESTSUITEFLAGS='-k config.status -k AC_PROG_SED' make -e check

Cheers,
Ralf

2006-05-21  Paul Eggert  <[EMAIL PROTECTED]>
and Ralf Wildenhues  <[EMAIL PROTECTED]>

* lib/autoconf/programs.m4 (AC_PROG_SED): Catch script length
limits in Solaris 8 /usr/ucb/sed by testing a long script.
(_AC_FEATURE_CHECK_LENGTH): Do not remove `conftest.*', but only
the files that this macro actually generates, to keep the file
`conftest.sed' created by AC_PROG_SED.
* lib/autoconf/status.m4 (_AC_OUTPUT_FILES_PREPARE): Use
`${SED-sed}' for substitutions at config.status time.
(_AC_OUTPUT_FILE, _AC_OUTPUT_HEADER): Likewise.
(_AC_OUTPUT_REQUIRE_SED): New macro.
(AC_OUTPUT): Call it.
(_AC_OUTPUT_FILE): Fix typo in comment.
(_AC_OUTPUT_CONFIG_STATUS): Initialize `$SED' in config.status
if AC_PROG_SED has been called.
* doc/autoconf.texi (Particular Programs): Update description of
AC_PROG_SED.
(Limitations of Usual Tools) : Mention script length
limitations with some of the sed implementations on Solaris.
* NEWS: Update.

2006-05-21  Paul Eggert  <[EMAIL PROTECTED]>

Undo this change, and add comment about not counting labels:

* lib/autoconf/status.m4 (_AC_OUTPUT_HEADER): Fix off-by-one bug
t

Re: non portable sed scripts

2006-05-21 Thread Paul Eggert
Ralf Wildenhues <[EMAIL PROTECTED]> writes:

> So IMHO I wouldn't mind if only the bugfix and documentation parts of
> the patch below were committed.

Thanks very much for the analysis.  I agree with your conclusion for
Autoconf 2.60.  But we definitely need to revisit this afterwards,
since my impression is that sed size-related gotchas are becoming more
and more common (see, for example,
,
which has exactly this problem and which closely preceded your message
in my email inbox).

Switching to awk sounds like a win to me, as awk is more expressive
than sed is.  In the old days this would have been problematic due to
the incompatibilities between traditional and POSIX Awk, but nowadays
we can assume an almost-POSIX-complaint awk, if we run AC_PROG_AWK.




Re: non portable sed scripts

2006-05-22 Thread Stepan Kasal
Hello,

> Ralf Wildenhues <[EMAIL PROTECTED]> writes:
> > So IMHO I wouldn't mind if only the bugfix and documentation parts of
> > the patch below were committed.

Ralf, I'm not sure what you mean by ``the bugfix'' but I guess it
might be this:
(_AC_FEATURE_CHECK_LENGTH): Do not remove `conftest.*', but only
the files that this macro actually generates, to keep the file
and I agree that it is harmless and should be committed.
(BTW, I do not see the change in your patch.)

I share your opinion that committing all the AC_PROG_SED & Co. would
be dangerous and would require a lot of testing.

I think we should lower ac_max_sed_lines artificially to workaround the
problem which started this thread.

I'm attaching a patch, which uses the value of 60, which usually leads to
sed scripts of size < 5000.  This should be reasonably safe.
Or should we go to the old value of 38?

Since most packages use only one config header, we don't have to be
sad that its creation has been slowed down.

What do you think about this hack?

On Sun, May 21, 2006 at 10:51:01PM -0700, Paul Eggert wrote:
> Switching to awk sounds like a win to me, [...]. [...] but nowadays
> we can assume an almost-POSIX-complaint awk, if we run AC_PROG_AWK.

I also would like to use awk in config.status.
But I think that AC_PROG_AWK shouldn't be used directly here:
- AC_PROG_AWK looks for an implementation which is as POSIX-compliant
  as possible
- the ``config awk'' can be any reasonable awk, ie. we only have to
  avoid Solaris' /bin/awk

I was told that there are concerns that the /usr/xpg4/bin
implementations are less debugged and less eagerly fixed then the
/bin ones.  Perhaps we should just check whether `awk' behaves sanely
and switch to `nawk' if it doesn't.

Have a nice day,
Stepan
Index: lib/autoconf/status.m4
===
RCS file: /cvsroot/autoconf/autoconf/lib/autoconf/status.m4,v
retrieving revision 1.103
diff -u -r1.103 status.m4
--- lib/autoconf/status.m4  19 May 2006 21:02:10 -  1.103
+++ lib/autoconf/status.m4  22 May 2006 08:47:23 -
@@ -660,7 +660,10 @@
 [s,^[   #]*u.*,/* & */,]' >>conftest.defines
 
 # Break up conftest.defines:
-ac_max_sed_lines=m4_eval(_AC_SED_CMD_LIMIT - 4)
+ac_max_sed_lines=m4_eval(_AC_SED_CMD_LIMIT - 3)
+# Well, to work around problems with the size of the script, use a smaller
+# limit:
+ac_max_sed_lines=60
 
 # First sed command is: sed -f defines.sed $ac_file_inputs >"$tmp/out1"
 # Second one is:sed -f defines.sed "$tmp/out1" >"$tmp/out2"


Re: non portable sed scripts

2006-05-22 Thread Tim Rice
On Mon, 22 May 2006, Ralf Wildenhues wrote:

> Well, my guess is still that (at least one sed on) UnixWare has similar
> issues, but I have no idea whether this UnixWare has several sed
> implementations, and one of them is usable, or whether it is common to
> have GNU sed as add-on installed and in $PATH.

UnixWare only has one sed. But there so many open source projects out
there that use GNU specific features os sed that it is common for
people doing development on UnixWare to install a GNU sed.

> Tim, could you do us a favor and test this patch?  It should apply to
> Autoconf-2.59c or CVS Autoconf, and the interesting test output would
> be in tests/testsuite.log after
> 
>   env TESTSUITEFLAGS='-k config.status -k AC_PROG_SED' make -e check

Looks good on my UnxiWare 7.1.4 box.
...
/usr/bin/posix/sh ./testsuite -k config.status -k AC_PROG_SED
## -- ##
## GNU Autoconf 2.59c test suite. ##
## -- ##

Testing config.status.

 81: Torturing config.status   ok

Testing autoconf/programs macros.

203: AC_PROG_SED   ok

## - ##
## Test results. ##
## - ##

All 2 tests were successful.
...

Not so good on my OpenServer 5.0.4 box
...
gmake[2]: Entering directory `/usr/local/src/gnu/autoconf-2.59c/tests'
/bin/ksh ./testsuite -k config.status -k AC_PROG_SED
/testsuite[900]: : is not an identifier
gmake[2]: *** [check-local] Error 1
...

BTW, This hunk failed on autoconf-2.59c as it was already 3.
> -ac_max_sed_lines=m4_eval(_AC_SED_CMD_LIMIT - 4)
> +ac_max_sed_lines=m4_eval(_AC_SED_CMD_LIMIT - 3)


-- 
Tim RiceMultitalents(707) 887-1469
[EMAIL PROTECTED]






Re: non portable sed scripts

2006-05-22 Thread Tim Rice
On Sun, 21 May 2006, Paul Eggert wrote:

> Switching to awk sounds like a win to me, as awk is more expressive
> than sed is.  In the old days this would have been problematic due to
> the incompatibilities between traditional and POSIX Awk, but nowadays
> we can assume an almost-POSIX-complaint awk, if we run AC_PROG_AWK.

Switching to awk sounds scary to me. It's quite easy to break
awk on OpenServer.

I've got some info on limits of awk from a SCO engineer on OpenServer
& UnixWare that may be usefull if you switch to awk.

OSR5 awk has an input record limit of 3K bytes.
UW7 awk has an input record limit of 5K
And the awk string limit -- 401 limit in OSR5, 5K limit in UW7.

-- 
Tim RiceMultitalents(707) 887-1469
[EMAIL PROTECTED]






Re: non portable sed scripts

2006-05-23 Thread Ralf Wildenhues
[ Cc:ing bug-autoconf again ]

* Tim Rice wrote on Tue, May 23, 2006 at 04:13:34AM CEST:
> On Mon, 22 May 2006, Ralf Wildenhues wrote:
> 
> > > Next I tried
> > >   CONFIG_SHELL=/bin/sh /bin/sh  \
> > >   /opt/src/gnu/coreutils-5.95/configure \
> > >   CONFIG_SHELL=/bin/sh
> > > Again a valid config.h and no error.
> > > That was all on my UnixWare 7.1.1 box.
> > 
> > Pleas try again with /usr/bin/posix/sh as shell; that's what the shell
> > selection algorithm of 2.59c will select.
> 
> Yes that fails. /usr/bin/posix/sh is a symbolic link to /u95/bin/sh which
> is hard linked to /u95/bin/ksh. /usr/bin/ksh is a symbolic link to
> /u95/bin/ksh.
> 
> Testing with /usr/bin/ksh fails too.
> I've attached a snip of the output of a "/usr/bin/posix/sh -x" test.

Thanks.  This snippet shows that it's the shell which actually generates
a broken script:

| + cat
| + 1> ./conf24563-17529/defines.sed 0<<
[...]
| s,^\([ ]*#[]*\)[^  ]*\([   ][  ]*HAVE_DECL_NANOSLEEP\)[   
 (].*$,\1define\2 0 ,
| s,^\([ ]*#[]*\)[^  ]*\([   ][  ]*HAVE_DECL_REALLOC\)[  
(].*$,\1define\2 1 ,
| s,^\([ ]*#[]*\)[^  ]*\([   ][  ]*HAVE_DECL_STPCPY\)[   
(].*$,\1define\2 0 ,
| HAVE_DECL_STRNDUP\)[   (].*$,\1define\2 0 ,
| s,^\([ ]*#[]*\)[^  ]*\([   ][  ]*HAVE_DECL_STRNLEN\)[  
(].*$,\1define\2 0 ,
[...]

So I assume we have an incarnation of a bug similar to this one
(quoting `info Autoconf "Here-Documents"'):

|Many older shells (including the Bourne shell) implement
| here-documents inefficiently.  And some shells mishandle large
| here-documents: for example, Solaris `dtksh', which is derived from
| Korn shell version M-12/28/93d, mishandles variable expansion that
| occurs on 1024-byte buffer boundaries within a here-document.  Users
| can generally fix these problems by using a faster or more reliable
| shell, e.g., by using the command `CONFIG_SHELL=/bin/bash /bin/bash
| ./configure' rather than plain `./configure'.

Hmm.  This may actually present a regression on this system: the 2.59
shell selection algorithm would probably(?) have selected /bin/sh as
shell, whereas, due to changes we did because of OSF, /usr/bin/posix/sh
is preferred now.

I hope we get away with this.  The reduction of ac_max_sed_lines Paul
just installed may just save us, hopefully.  Otherwise, I don't see much
choice other than to suggest passing a more reliable shell.

Cheers,
Ralf




Re: non portable sed scripts

2006-05-23 Thread Stepan Kasal
Hello,

On Tue, May 23, 2006 at 10:43:22AM +0200, Ralf Wildenhues wrote:
> | s,^\([   ]*#[]*\)[^  ]*\([   ][  ]*HAVE_DECL_NANOSLEEP\)[   
>  (].*$,\1define\2 0 ,
> | s,^\([   ]*#[]*\)[^  ]*\([   ][  ]*HAVE_DECL_REALLOC\)[  
> (].*$,\1define\2 1 ,
> | s,^\([   ]*#[]*\)[^  ]*\([   ][  ]*HAVE_DECL_STPCPY\)[   
> (].*$,\1define\2 0 ,
> | HAVE_DECL_STRNDUP\)[ (].*$,\1define\2 0 ,
> | s,^\([   ]*#[]*\)[^  ]*\([   ][  ]*HAVE_DECL_STRNLEN\)[  
> (].*$,\1define\2 0 ,
> [...]
> 
> So I assume we have an incarnation of a bug similar to this one
> (quoting `info Autoconf "Here-Documents"'):
> 
> |Many older shells (including the Bourne shell) implement
> | here-documents inefficiently.  And some shells mishandle large
> | here-documents: for example, Solaris `dtksh', which is derived from
> | Korn shell version M-12/28/93d, mishandles variable expansion that
> | occurs on 1024-byte buffer boundaries within a here-document.  Users
> | can generally fix these problems by using a faster or more reliable
> | shell, e.g., by using the command `CONFIG_SHELL=/bin/bash /bin/bash
> | ./configure' rather than plain `./configure'.

you are so bright, Ralf!

> [...]  Otherwise, I don't see much
> choice other than to suggest passing a more reliable shell.

Of course there is a general solution: we can actively test the shell
for this problem, in the ``detect better shell'' routine.
But this will enlarge the generated script by a kilobyte :-O

Have a nice day,
Stepan




Re: non portable sed scripts

2006-05-23 Thread Stepan Kasal
Hello,

On Tue, May 23, 2006 at 02:33:42PM +0200, Stepan Kasal wrote:
> you are so bright, Ralf!

this doesn't sound nice, I'm afraid.

I wanted to say that it was realy clever to notice that

> > | s,^\([ ]*#[]*\)[^  ]*\([   ][  ]*HAVE_DECL_STPCPY\)[   
> > (].*$,\1define\2 0 ,
> > | HAVE_DECL_STRNDUP\)[   (].*$,\1define\2 0 ,

is actually a malformed sed script.  And even more clever was to
connect it with that bug in the docs.

Well done! Thanks.

Stepan




Re: non portable sed scripts

2006-05-23 Thread Tim Rice
On Tue, 23 May 2006, Ralf Wildenhues wrote:

> > > Pleas try again with /usr/bin/posix/sh as shell; that's what the shell
> > > selection algorithm of 2.59c will select.
> > 
> > Yes that fails. /usr/bin/posix/sh is a symbolic link to /u95/bin/sh which
> > is hard linked to /u95/bin/ksh. /usr/bin/ksh is a symbolic link to
> > /u95/bin/ksh.
> > 
[snip]
> So I assume we have an incarnation of a bug similar to this one
> (quoting `info Autoconf "Here-Documents"'):
> 
> |Many older shells (including the Bourne shell) implement
> | here-documents inefficiently.  And some shells mishandle large
> | here-documents: for example, Solaris `dtksh', which is derived from
> | Korn shell version M-12/28/93d, mishandles variable expansion that
> | occurs on 1024-byte buffer boundaries within a here-document.  Users
> | can generally fix these problems by using a faster or more reliable
> | shell, e.g., by using the command `CONFIG_SHELL=/bin/bash /bin/bash
> | ./configure' rather than plain `./configure'.

I'd say the identical bug.
...
$ what /usr/bin/ksh | grep -i version
Version M-12/28/93e-SCO
...

-- 
Tim RiceMultitalents(707) 887-1469
[EMAIL PROTECTED]






Re: non portable sed scripts

2006-05-23 Thread Paul Eggert
Ralf Wildenhues <[EMAIL PROTECTED]> writes:

> the 2.59 shell selection algorithm would probably(?) have selected
> /bin/sh as shell, whereas, due to changes we did because of OSF,
> /usr/bin/posix/sh is preferred now.

Ouch.  Good catch.

> I hope we get away with this.

I don't think we will, since the bug occurs every 1024 bytes, and many
define.sed scripts are longer than that.

I installed this patch, which works around this particular problem by
not using shell expansion at all in the here-documents used to create
defines.sed.  However, other instances of this problem lurk in
AC_LANG_SOURCE(C), _AC_INIT_HELP, _AC_DEFINE_Q, AC_LANG_CONFTEST,
_AC_OUTPUT_FILES_PREPARE, _AC_OUTPUT_FILE, and
_AC_OUTPUT_CONFIG_STATUS, with the last 3 being the most worrisome.

Perhaps Tim could check whether this patch fixes his problem?
If not, other patches are probably also needed.

I just now noticed that this patch removes the undocumented
ac_word_regexp var.  That was a fairly recent addition, though (June
2005), and I couldn't find evidence in Google of other packages using
it.

2006-05-23  Paul Eggert  <[EMAIL PROTECTED]>

* lib/autoconf/status.m4 (_AC_OUTPUT_HEADER): Don't use shell
expansion in the here-documents used by config.status, as that
runs afoul of the Korn shell version M-12/28/93d bug described in
the Autoconf manual, and this in turn causes a Coreutils 5.95 build to
fail as described by Tim Rice and diagnosed by Ralf Wildenhues in
.

--- lib/autoconf/status.m4  23 May 2006 08:27:32 -  1.106
+++ lib/autoconf/status.m4  23 May 2006 23:30:57 -  1.108
@@ -601,27 +601,6 @@ m4_define([_AC_OUTPUT_HEADER],
   #
   # CONFIG_HEADER
   #
-
-  # These sed commands are passed to sed as "A NAME B PARAMS C VALUE D", where
-  # NAME is the cpp macro being defined, VALUE is the value it is being given.
-  # PARAMS is the parameter list in the macro definition--in most cases, it's
-  # just an empty string.
-  #
-dnl Quote, for the `[ ]' and `define'.
-[  ac_dA='s,^\([#]*\)[^ ]*\([   ]*'
-  ac_dB='\)[(].*,\1define\2'
-  ac_dC=' '
-  ac_dD=' ,']
-dnl ac_dD used to contain `;t' at the end, but that was both slow and 
incorrect.
-dnl 1) Since the script must be broken into chunks containing 100 commands,
-dnl the extra command meant extra calls to sed.
-dnl 2) The code was incorrect: in the unusual case where a symbol has multiple
-dnl different AC_DEFINEs, the last one should be honored.
-dnl
-dnl ac_dB works because every line has a space appended.  ac_dD reinserts
-dnl the space, because some symbol may have been AC_DEFINEd several times.
-
-  [ac_word_regexp=[_$as_cr_Letters][_$as_cr_alnum]*]
 _ACEOF
 
 # Transform confdefs.h into a sed script `conftest.defines', that
@@ -637,6 +616,26 @@ echo 's/$/ /' >conftest.defines
 dnl
 dnl Quote, for `[ ]' and `define'.
 [ac_word_re=[_$as_cr_Letters][_$as_cr_alnum]*
+# These sed commands are passed to sed as "A NAME B PARAMS C VALUE D", where
+# NAME is the cpp macro being defined, VALUE is the value it is being given.
+# PARAMS is the parameter list in the macro definition--in most cases, it's
+# just an empty string.
+ac_dA='s,^\\([  #]*\\)[^]*\\([  ]*'
+ac_dB='\\)[ (].*,\\1define\\2'
+ac_dC=' '
+ac_dD=' ,']
+dnl ac_dD used to contain `;t' at the end, but that was both slow and 
incorrect.
+dnl 1) Since the script must be broken into chunks containing 100 commands,
+dnl the extra command meant extra calls to sed.
+dnl 2) The code was incorrect: in the unusual case where a symbol has multiple
+dnl different AC_DEFINEs, the last one should be honored.
+dnl
+dnl ac_dB works because every line has a space appended.  ac_dD reinserts
+dnl the space, because some symbol may have been AC_DEFINEd several times.
+dnl
+dnl The first use of ac_dA has a space prepended, so that the second
+dnl use does not match the initial 's' of $ac_dA.
+[
 uniq confdefs.h |
   sed -n '
t rset
@@ -646,9 +645,8 @@ uniq confdefs.h |
d
:ok
s/[\\&,]/\\&/g
-   s/[\\$`]/\\&/g
-   s/^\('"$ac_word_re"'\)\(([^()]*)\)[  
]*\(.*\)/${ac_dA}\1$ac_dB\2${ac_dC}\3$ac_dD/p
-   s/^\('"$ac_word_re"'\)[  ]*\(.*\)/${ac_dA}\1$ac_dB${ac_dC}\2$ac_dD/p
+   s/^\('"$ac_word_re"'\)\(([^()]*)\)[  ]*\(.*\)/ 
'"$ac_dA"'\1'"$ac_dB"'\2'"${ac_dC}"'\3'"$ac_dD"'/p
+   s/^\('"$ac_word_re"'\)[  
]*\(.*\)/'"$ac_dA"'\1'"$ac_dB$ac_dC"'\2'"$ac_dD"'/p
   ' >>conftest.defines
 ]
 # Remove the space that was appended to ease matching.
@@ -682,12 +680,14 @@ while :
 do
   # Write a here document:
   dnl Quote, for the `[ ]' and `define'.
-  echo ['# First, check the format of the line:
-cat >"$tmp/defines.sed" <>$CONFIG_STATUS <<_ACEOF
+# First, check the format of the line:
+cat >"\$tmp/defines.sed" <<\\CEOF
+/^[ ]*#[]*undef[][  ]*$ac_word_re[  ]*\$/b def
+/^[ ]*#[]*define[ 

Re: non portable sed scripts

2006-05-25 Thread Tim Rice
On Tue, 23 May 2006, Paul Eggert wrote:

> I installed this patch, which works around this particular problem by
> not using shell expansion at all in the here-documents used to create
> defines.sed.  However, other instances of this problem lurk in
> AC_LANG_SOURCE(C), _AC_INIT_HELP, _AC_DEFINE_Q, AC_LANG_CONFTEST,
> _AC_OUTPUT_FILES_PREPARE, _AC_OUTPUT_FILE, and
> _AC_OUTPUT_CONFIG_STATUS, with the last 3 being the most worrisome.
> 
> Perhaps Tim could check whether this patch fixes his problem?
> If not, other patches are probably also needed.

I built autoconf from CVS pulled May 24 15:30 US/Pacific, ran
autoconf in the coreutils source tree, ran configure on my 7.1.1 box.
This time it worked fine.

Good work.

-- 
Tim RiceMultitalents(707) 887-1469
[EMAIL PROTECTED]






braced variable expansion in here documents (was: non portable sed scripts)

2006-05-24 Thread Ralf Wildenhues
Hi Paul,

* Paul Eggert wrote on Wed, May 24, 2006 at 01:32:18AM CEST:
> Ralf Wildenhues <[EMAIL PROTECTED]> writes:
> 
> > I hope we get away with this.
> 
> I don't think we will, since the bug occurs every 1024 bytes, and many
> define.sed scripts are longer than that.
> 
> I installed this patch, which works around this particular problem by
> not using shell expansion at all in the here-documents used to create
> defines.sed.

We've further analyzed this off-list now, thanks to Stepan and Tim for
insistence and help!

The original post about this issue I found here:
http://lists.gnu.org/archive/html/bug-autoconf/2002-03/msg00056.html
The corresponding patch here:
http://lists.gnu.org/archive/html/autoconf-patches/2002-03/msg00052.html

Let's look at this closely.  The script below (first attachment) tries
different types of substitutions in here documents.  The output of the
script for Solaris 2.6 dtksh (second attachment), and, with different
bounds, for UnixWare 7.1.1 /usr/bin/posix/sh (third attachment) can be
seen below.

What do we see?
- The failure is always connected with the position of the closing brace
  `}' of the substitution: avoiding parameter substitutions without
  braces helps.

- Some failures are silent, some come with shell errors: using one-byte
  variable names prevent silent failures.

- Command substitutions `cmd` work fine.

- The UnixWare shell exposes this bug only at a later stage (we tried
  ranges of 1000 to 3100, and 4000 to 5000 for $i in the script).

Further, the same bug also happens for dtksh on a UnixWare 7.1.1 box
(with the lower limit already).

So I propose the following doc patch (last attachment).  And I propose
that your patch be reverted and replaced by something mildly less ugly
(Stepan, did you want to propose a patch to this extent?).

Cheers,
Ralf


x.sh
Description: Bourne shell script
ts[1]: {}: bad substitution
Breakage at i=1014 with ${long_var}
Silent breakage at i=1015 with ${long_var}
ts[1]: {}: bad substitution
Breakage at i=1015 with ${foo-XXX}
Silent breakage at i=1016 with ${long_var}
Silent breakage at i=1016 with ${foo-XXX}
Silent breakage at i=1017 with ${long_var}
Silent breakage at i=1017 with ${foo-XXX}
Silent breakage at i=1018 with ${long_var}
Silent breakage at i=1018 with ${foo-XXX}
Silent breakage at i=1019 with ${long_var}
Silent breakage at i=1019 with ${foo-XXX}
Silent breakage at i=1020 with ${long_var}
ts[1]: {}: bad substitution
Breakage at i=1021 with ${x}
Silent breakage at i=1021 with ${long_var}
ts[1]: {}: bad substitution
Breakage at i=2038 with ${long_var}
Silent breakage at i=2039 with ${long_var}
ts[1]: {}: bad substitution
Breakage at i=2039 with ${foo-XXX}
Silent breakage at i=2040 with ${long_var}
Silent breakage at i=2040 with ${foo-XXX}
Silent breakage at i=2041 with ${long_var}
Silent breakage at i=2041 with ${foo-XXX}
Silent breakage at i=2042 with ${long_var}
Silent breakage at i=2042 with ${foo-XXX}
Silent breakage at i=2043 with ${long_var}
Silent breakage at i=2043 with ${foo-XXX}
Silent breakage at i=2044 with ${long_var}
ts[1]: {}: bad substitution
Breakage at i=2045 with ${x}
Silent breakage at i=2045 with ${long_var}
ts[1]: {}: bad substitution
Breakage at i=3062 with ${long_var}
Silent breakage at i=3063 with ${long_var}
ts[1]: {}: bad substitution
Breakage at i=3063 with ${foo-XXX}
Silent breakage at i=3064 with ${long_var}
Silent breakage at i=3064 with ${foo-XXX}
Silent breakage at i=3065 with ${long_var}
Silent breakage at i=3065 with ${foo-XXX}
Silent breakage at i=3066 with ${long_var}
Silent breakage at i=3066 with ${foo-XXX}
Silent breakage at i=3067 with ${long_var}
Silent breakage at i=3067 with ${foo-XXX}
Silent breakage at i=3068 with ${long_var}
ts[1]: {}: bad substitution
Breakage at i=3069 with ${x}
Silent breakage at i=3069 with ${long_var}
...
ts: line 1: {}: bad substitution
Breakage at i=4086 with ${long_var}
Silent breakage at i=4087 with ${long_var}
ts: line 1: {}: bad substitution
Breakage at i=4087 with ${foo-XXX}
Silent breakage at i=4088 with ${long_var}
Silent breakage at i=4088 with ${foo-XXX}
Silent breakage at i=4089 with ${long_var}
Silent breakage at i=4089 with ${foo-XXX}
Silent breakage at i=4090 with ${long_var}
Silent breakage at i=4090 with ${foo-XXX}
Silent breakage at i=4091 with ${long_var}
Silent breakage at i=4091 with ${foo-XXX}
Silent breakage at i=4092 with ${long_var}
ts: line 1: {}: bad substitution
Breakage at i=4093 with ${x}
Silent breakage at i=4093 with ${long_var}
...
* doc/autoconf.texi (Here-Documents): We now know more about
the variable expansion in here documents bug.
Thanks to Tim Rice and Stepan Kasal.

Index: doc/autoconf.texi
===
RCS file: /cvsroot/autoconf/autoconf/doc/autoconf.texi,v
retrieving revision 1.1021
diff -u -r1.1021 autoconf.texi
--- doc/autoconf.texi   22 May 2006 17:27:50 -  1.1021
+++ doc/autoconf.texi   25 May 2006 05:58:44 -
@