Re: [PATCH] Command line parsing of wc with genparse

2007-07-22 Thread Michael Geng
On Mon, Jul 16, 2007 at 09:55:47PM +0200, Michael Geng wrote:
> On Mon, Jul 16, 2007 at 10:02:13AM +0200, Jim Meyering wrote:
> > [EMAIL PROTECTED] (Michael Geng) wrote:
> > > I released version 0.6.6 of genparse which supports
> > > internationalization now. A link to the download page
> > > and to the updated documentation can be accessed from
> > > the genparse project page http://genparse.sourceforge.net/.
> > >
> > > I also prepared a patch which demonstrates how genparse
> > > could be used for parsing the command line arguments of
> > > the wc command. The patch is relative to the coreutils version
> > > 6.9. In order to compile the coreutils after you applied
> > > the patch you will need genparse version 0.6.6 in your path.
> > >
> > > Does this approach appear reasonable to you? What else should
> > > be improved on genparse?
> > 
> > I took a quick look only at the patch (not at the code genparse generates)
> > and spotted some minor problems:
> > 
> > > +  if (cmdline.version)
> > > +{
> > > +  version_etc (stdout, program_name, GNU_PACKAGE, VERSION, 
> > AUTHORS,
> > 
> > Here, it should be PROGRAM_NAME, not program_name.

I changed it to PROGRAM_NAME. Thanks.

> > > diff -u -N -r coreutils-6.9.orig/src/wc.gp coreutils-6.9/src/wc.gp
> > > --- coreutils-6.9.orig/src/wc.gp  1970-01-01 01:00:00.0 
> > +0100
> > > +++ coreutils-6.9/src/wc.gp   2007-07-10 18:17:33.0 +0200
> > > @@ -0,0 +1,24 @@
> > ...
> > > +Report bugs to <[EMAIL PROTECTED]>.
> > > +#usage_end
> > 
> > s/fileutils/coreutils/
> > or, perhaps better: PACKAGE_BUGREPORT

Again I released a new version of genparse (0.6.7) which now allows to include
string macros into the usage() function. So genparse is now able to use the 
PACKAGE_BUGREPORT macro.

> > Did you ensure that wc still passes "make check"?
> 
> I did this now - and I found a bug: wc -l doesn't work because I didn't assign
> the print_lines variable. That's not a bug in genparse but in my hand written
> code in wc.c. But after fixing this make check still fails because the usage()
> function doesn't print a proper PACKAGE_BUGREPORT as you also found.

I 'm attaching an updated patch which passes make check. You need genpare 
version
0.6.7 in your path in order to compile coreutils after you applied the patch.

> > Make sure valgrind wc ... shows no leaks, too.
> > In particular, test that with the --files0-from=F option.

valgrind wc --files0-from=F tells me something about 3 not-freed blocks.
But it gives exactly the same message if I'm executing valgrind wc 
--files0-from=F
from a coreutils build from the original sources of version 6.9. Is there a 
known memory leak with the --files0-from=F option of wc?

I tried vagrind wc ... with some other options but it didn't report any errors
except for the --files0-from option.

> > Is there any significant difference in performance (I doubt it) or binary 
> > size?

I inspected the wc which was generated using genparse with the original one and
found that the .text segment increased by 16 bytes. BTW, how can I build the
coreutils without debug information? configure CFLAGS="-O2 -g0" doesn't
remove debug information completely. Would you build coreutils differently for 
comparing binary file sizes? How exactly would you compare binary file sizes? 
I think comparing the .text segment size is not enough.

> > I suggest you also adapt a program that takes integer arguments.
> > Can genparse handle arguments of all different types, from int
> > through uintmax_t?

Presently genparse is able to handle int, float, char, string or flag types.
No unsigned int for example. That's another extension which I will have to
add.

> > 
> > tail looks like a good candidate.  It also had an optional argument:
> > --follow[={name|descriptor}]
> > 
> > There are quite a few programs in the coreutils that have unusual
> > argument-parsing constraints, so it's hard to point to a small subset
> > and say "if you adapt just these few", then that should be proof enough.
> > I'm afraid that if you're interested in having coreutils convert, you'll
> > have to do most of the work, and then demonstrate that there's no
> > resulting regression (and presumably cleaner code).
> > 
> > For now, let's start with tail, and maybe "ls" if you're ambitious :)

That's probably a good path to move on, but now that I started with the wc 
command I wanted to get this one working first. 

> > Another detail:
> > IMHO, the contents of your wc.gp file should reside in wc.c.
> > Might as well leave it in a delimited comment in place of the usage
> > function you're eliminating, then have a makefile rule use sed or awk
> > to extract it into wc.gp.

Could be done this way.

Michael
diff -u -N -r coreutils-6.9.orig/src/Makefile.am coreutils-6.9/src/Makefile.am
--- coreutils-6.9.orig/src/Makefile.am  2007-03-20 08:24:27.0 +0100
+++ coreutils-6.9/src/Makefile.am   2007-07-10 18:17:23.00

Re: [PATCH] Command line parsing of wc with genparse

2007-07-16 Thread Michael Geng
On Mon, Jul 16, 2007 at 10:02:13AM +0200, Jim Meyering wrote:
> [EMAIL PROTECTED] (Michael Geng) wrote:
> > I released version 0.6.6 of genparse which supports
> > internationalization now. A link to the download page
> > and to the updated documentation can be accessed from
> > the genparse project page http://genparse.sourceforge.net/.
> >
> > I also prepared a patch which demonstrates how genparse
> > could be used for parsing the command line arguments of
> > the wc command. The patch is relative to the coreutils version
> > 6.9. In order to compile the coreutils after you applied
> > the patch you will need genparse version 0.6.6 in your path.
> >
> > Does this approach appear reasonable to you? What else should
> > be improved on genparse?
> 
> I took a quick look only at the patch (not at the code genparse generates)
> and spotted some minor problems:
> 
> > +  if (cmdline.version)
> > +{
> > +  version_etc (stdout, program_name, GNU_PACKAGE, VERSION, AUTHORS,
> 
> Here, it should be PROGRAM_NAME, not program_name.
> ...
> > diff -u -N -r coreutils-6.9.orig/src/wc.gp coreutils-6.9/src/wc.gp
> > --- coreutils-6.9.orig/src/wc.gp1970-01-01 01:00:00.0 
> +0100
> > +++ coreutils-6.9/src/wc.gp 2007-07-10 18:17:33.0 +0200
> > @@ -0,0 +1,24 @@
> ...
> > +Report bugs to <[EMAIL PROTECTED]>.
> > +#usage_end
> 
> s/fileutils/coreutils/
> or, perhaps better: PACKAGE_BUGREPORT
> 
> Did you ensure that wc still passes "make check"?

I did this now - and I found a bug: wc -l doesn't work because I didn't assign
the print_lines variable. That's not a bug in genparse but in my hand written
code in wc.c. But after fixing this make check still fails because the usage()
function doesn't print a proper PACKAGE_BUGREPORT as you also found.

> Make sure valgrind wc ... shows no leaks, too.
> In particular, test that with the --files0-from=F option.
> Is there any significant difference in performance (I doubt it) or binary 
> size?
> 
> I suggest you also adapt a program that takes integer arguments.
> Can genparse handle arguments of all different types, from int
> through uintmax_t?
> 
> tail looks like a good candidate.  It also had an optional argument:
> --follow[={name|descriptor}]
> 
> There are quite a few programs in the coreutils that have unusual
> argument-parsing constraints, so it's hard to point to a small subset
> and say "if you adapt just these few", then that should be proof enough.
> I'm afraid that if you're interested in having coreutils convert, you'll
> have to do most of the work, and then demonstrate that there's no
> resulting regression (and presumably cleaner code).
> 
> For now, let's start with tail, and maybe "ls" if you're ambitious :)
> 
> Another detail:
> IMHO, the contents of your wc.gp file should reside in wc.c.
> Might as well leave it in a delimited comment in place of the usage
> function you're eliminating, then have a makefile rule use sed or awk
> to extract it into wc.gp.

I will need some time to think about all the issues you mentioned and
find solutions. I appreciate your constructive commments! 

Michael


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: [PATCH] Command line parsing of wc with genparse

2007-07-16 Thread Jim Meyering
[EMAIL PROTECTED] (Michael Geng) wrote:
> I released version 0.6.6 of genparse which supports
> internationalization now. A link to the download page
> and to the updated documentation can be accessed from
> the genparse project page http://genparse.sourceforge.net/.
>
> I also prepared a patch which demonstrates how genparse
> could be used for parsing the command line arguments of
> the wc command. The patch is relative to the coreutils version
> 6.9. In order to compile the coreutils after you applied
> the patch you will need genparse version 0.6.6 in your path.
>
> Does this approach appear reasonable to you? What else should
> be improved on genparse?

I took a quick look only at the patch (not at the code genparse generates)
and spotted some minor problems:

> +  if (cmdline.version)
> +{
> +  version_etc (stdout, program_name, GNU_PACKAGE, VERSION, AUTHORS,

Here, it should be PROGRAM_NAME, not program_name.
...
> diff -u -N -r coreutils-6.9.orig/src/wc.gp coreutils-6.9/src/wc.gp
> --- coreutils-6.9.orig/src/wc.gp  1970-01-01 01:00:00.0 +0100
> +++ coreutils-6.9/src/wc.gp   2007-07-10 18:17:33.0 +0200
> @@ -0,0 +1,24 @@
...
> +Report bugs to <[EMAIL PROTECTED]>.
> +#usage_end

s/fileutils/coreutils/
or, perhaps better: PACKAGE_BUGREPORT

Did you ensure that wc still passes "make check"?
Make sure valgrind wc ... shows no leaks, too.
In particular, test that with the --files0-from=F option.
Is there any significant difference in performance (I doubt it) or binary size?

I suggest you also adapt a program that takes integer arguments.
Can genparse handle arguments of all different types, from int
through uintmax_t?

tail looks like a good candidate.  It also had an optional argument:
--follow[={name|descriptor}]

There are quite a few programs in the coreutils that have unusual
argument-parsing constraints, so it's hard to point to a small subset
and say "if you adapt just these few", then that should be proof enough.
I'm afraid that if you're interested in having coreutils convert, you'll
have to do most of the work, and then demonstrate that there's no
resulting regression (and presumably cleaner code).

For now, let's start with tail, and maybe "ls" if you're ambitious :)

Another detail:
IMHO, the contents of your wc.gp file should reside in wc.c.
Might as well leave it in a delimited comment in place of the usage
function you're eliminating, then have a makefile rule use sed or awk
to extract it into wc.gp.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


[PATCH] Command line parsing of wc with genparse

2007-07-15 Thread Michael Geng
Hi,

I released version 0.6.6 of genparse which supports 
internationalization now. A link to the download page
and to the updated documentation can be accessed from
the genparse project page http://genparse.sourceforge.net/.

I also prepared a patch which demonstrates how genparse 
could be used for parsing the command line arguments of 
the wc command. The patch is relative to the coreutils version 
6.9. In order to compile the coreutils after you applied 
the patch you will need genparse version 0.6.6 in your path.

Does this approach appear reasonable to you? What else should 
be improved on genparse? 

Michael
diff -u -N -r coreutils-6.9.orig/src/Makefile.am coreutils-6.9/src/Makefile.am
--- coreutils-6.9.orig/src/Makefile.am  2007-03-20 08:24:27.0 +0100
+++ coreutils-6.9/src/Makefile.am   2007-07-10 18:17:23.0 +0200
@@ -363,3 +363,7 @@
| grep -Ev -f $$t &&\
  { echo 'the above variables should have static scope' 1>&2;   \
exit 1; } || :
+
+wc_SOURCES = wc_clp.c wc.c
+wc_clp.c: wc.gp
+   genparse --longmembers --internationalize -o wc_clp wc.gp
diff -u -N -r coreutils-6.9.orig/src/wc.c coreutils-6.9/src/wc.c
--- coreutils-6.9.orig/src/wc.c 2007-03-18 22:36:43.0 +0100
+++ coreutils-6.9/src/wc.c  2007-07-10 18:49:27.0 +0200
@@ -24,13 +24,13 @@
 #include 
 #include 
 
-#include "system.h"
 #include "error.h"
 #include "inttostr.h"
 #include "quote.h"
 #include "readtokens0.h"
 #include "safe-read.h"
 #include "wcwidth.h"
+#include "wc_clp.h"
 
 #if !defined iswspace && !HAVE_ISWSPACE
 # define iswspace(wc) \
@@ -77,60 +77,6 @@
   struct stat st;
 };
 
-/* For long options that have no equivalent short option, use a
-   non-character as a pseudo short option, starting with CHAR_MAX + 1.  */
-enum
-{
-  FILES0_FROM_OPTION = CHAR_MAX + 1
-};
-
-static struct option const longopts[] =
-{
-  {"bytes", no_argument, NULL, 'c'},
-  {"chars", no_argument, NULL, 'm'},
-  {"lines", no_argument, NULL, 'l'},
-  {"words", no_argument, NULL, 'w'},
-  {"files0-from", required_argument, NULL, FILES0_FROM_OPTION},
-  {"max-line-length", no_argument, NULL, 'L'},
-  {GETOPT_HELP_OPTION_DECL},
-  {GETOPT_VERSION_OPTION_DECL},
-  {NULL, 0, NULL, 0}
-};
-
-void
-usage (int status)
-{
-  if (status != EXIT_SUCCESS)
-fprintf (stderr, _("Try `%s --help' for more information.\n"),
-program_name);
-  else
-{
-  printf (_("\
-Usage: %s [OPTION]... [FILE]...\n\
-  or:  %s [OPTION]... --files0-from=F\n\
-"),
- program_name, program_name);
-  fputs (_("\
-Print newline, word, and byte counts for each FILE, and a total line if\n\
-more than one FILE is specified.  With no FILE, or when FILE is -,\n\
-read standard input.\n\
-  -c, --bytesprint the byte counts\n\
-  -m, --charsprint the character counts\n\
-  -l, --linesprint the newline counts\n\
-"), stdout);
-  fputs (_("\
-  --files0-from=Fread input from the files specified by\n\
-   NUL-terminated names in file F\n\
-  -L, --max-line-length  print the length of the longest line\n\
-  -w, --wordsprint the word counts\n\
-"), stdout);
-  fputs (HELP_OPTION_DESCRIPTION, stdout);
-  fputs (VERSION_OPTION_DESCRIPTION, stdout);
-  printf (_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
-}
-  exit (status);
-}
-
 /* FILE is the name of the file (or NULL for standard input)
associated with the specified counters.  */
 static void
@@ -580,6 +526,7 @@
   char *files_from = NULL;
   struct fstatus *fstatus;
   struct Tokens tok;
+  struct arg_t cmdline;
 
   initialize_main (&argc, &argv);
   program_name = argv[0];
@@ -589,44 +536,27 @@
 
   atexit (close_stdout);
 
-  print_lines = print_words = print_chars = print_bytes = false;
-  print_linelength = false;
   total_lines = total_words = total_chars = total_bytes = max_line_length = 0;
 
-  while ((optc = getopt_long (argc, argv, "clLmw", longopts, NULL)) != -1)
-switch (optc)
-  {
-  case 'c':
-   print_bytes = true;
-   break;
-
-  case 'm':
-   print_chars = true;
-   break;
-
-  case 'l':
-   print_lines = true;
-   break;
-
-  case 'w':
-   print_words = true;
-   break;
-
-  case 'L':
-   print_linelength = true;
-   break;
-
-  case FILES0_FROM_OPTION:
-   files_from = optarg;
-   break;
-
-  case_GETOPT_HELP_CHAR;
-
-  case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS);
-
-  default:
-   usage (EXIT_FAILURE);
-  }
+  Cmdline(&cmdline, argc, argv);
+  print_bytes  = cmdline.bytes;
+  print_chars  = cmdline.lines;
+  print_words  = cmdline.words;
+  print_linelength = cmdline.max_line_length;
+  files_from   = cmdline.files0_from;
+  print_bytes  = cmdline.bytes;
+  print_bytes  = cmdline.bytes;
+  print_bytes  = cmdline.bytes;
+
+  if (cmdline.help)
+