Re: [PATCH] Command line parsing of wc with genparse
On Mon, Jul 16, 2007 at 09:55:47PM +0200, Michael Geng wrote: > On Mon, Jul 16, 2007 at 10:02:13AM +0200, Jim Meyering wrote: > > [EMAIL PROTECTED] (Michael Geng) wrote: > > > I released version 0.6.6 of genparse which supports > > > internationalization now. A link to the download page > > > and to the updated documentation can be accessed from > > > the genparse project page http://genparse.sourceforge.net/. > > > > > > I also prepared a patch which demonstrates how genparse > > > could be used for parsing the command line arguments of > > > the wc command. The patch is relative to the coreutils version > > > 6.9. In order to compile the coreutils after you applied > > > the patch you will need genparse version 0.6.6 in your path. > > > > > > Does this approach appear reasonable to you? What else should > > > be improved on genparse? > > > > I took a quick look only at the patch (not at the code genparse generates) > > and spotted some minor problems: > > > > > + if (cmdline.version) > > > +{ > > > + version_etc (stdout, program_name, GNU_PACKAGE, VERSION, > > AUTHORS, > > > > Here, it should be PROGRAM_NAME, not program_name. I changed it to PROGRAM_NAME. Thanks. > > > diff -u -N -r coreutils-6.9.orig/src/wc.gp coreutils-6.9/src/wc.gp > > > --- coreutils-6.9.orig/src/wc.gp 1970-01-01 01:00:00.0 > > +0100 > > > +++ coreutils-6.9/src/wc.gp 2007-07-10 18:17:33.0 +0200 > > > @@ -0,0 +1,24 @@ > > ... > > > +Report bugs to <[EMAIL PROTECTED]>. > > > +#usage_end > > > > s/fileutils/coreutils/ > > or, perhaps better: PACKAGE_BUGREPORT Again I released a new version of genparse (0.6.7) which now allows to include string macros into the usage() function. So genparse is now able to use the PACKAGE_BUGREPORT macro. > > Did you ensure that wc still passes "make check"? > > I did this now - and I found a bug: wc -l doesn't work because I didn't assign > the print_lines variable. That's not a bug in genparse but in my hand written > code in wc.c. But after fixing this make check still fails because the usage() > function doesn't print a proper PACKAGE_BUGREPORT as you also found. I 'm attaching an updated patch which passes make check. You need genpare version 0.6.7 in your path in order to compile coreutils after you applied the patch. > > Make sure valgrind wc ... shows no leaks, too. > > In particular, test that with the --files0-from=F option. valgrind wc --files0-from=F tells me something about 3 not-freed blocks. But it gives exactly the same message if I'm executing valgrind wc --files0-from=F from a coreutils build from the original sources of version 6.9. Is there a known memory leak with the --files0-from=F option of wc? I tried vagrind wc ... with some other options but it didn't report any errors except for the --files0-from option. > > Is there any significant difference in performance (I doubt it) or binary > > size? I inspected the wc which was generated using genparse with the original one and found that the .text segment increased by 16 bytes. BTW, how can I build the coreutils without debug information? configure CFLAGS="-O2 -g0" doesn't remove debug information completely. Would you build coreutils differently for comparing binary file sizes? How exactly would you compare binary file sizes? I think comparing the .text segment size is not enough. > > I suggest you also adapt a program that takes integer arguments. > > Can genparse handle arguments of all different types, from int > > through uintmax_t? Presently genparse is able to handle int, float, char, string or flag types. No unsigned int for example. That's another extension which I will have to add. > > > > tail looks like a good candidate. It also had an optional argument: > > --follow[={name|descriptor}] > > > > There are quite a few programs in the coreutils that have unusual > > argument-parsing constraints, so it's hard to point to a small subset > > and say "if you adapt just these few", then that should be proof enough. > > I'm afraid that if you're interested in having coreutils convert, you'll > > have to do most of the work, and then demonstrate that there's no > > resulting regression (and presumably cleaner code). > > > > For now, let's start with tail, and maybe "ls" if you're ambitious :) That's probably a good path to move on, but now that I started with the wc command I wanted to get this one working first. > > Another detail: > > IMHO, the contents of your wc.gp file should reside in wc.c. > > Might as well leave it in a delimited comment in place of the usage > > function you're eliminating, then have a makefile rule use sed or awk > > to extract it into wc.gp. Could be done this way. Michael diff -u -N -r coreutils-6.9.orig/src/Makefile.am coreutils-6.9/src/Makefile.am --- coreutils-6.9.orig/src/Makefile.am 2007-03-20 08:24:27.0 +0100 +++ coreutils-6.9/src/Makefile.am 2007-07-10 18:17:23.00
Re: [PATCH] Command line parsing of wc with genparse
On Mon, Jul 16, 2007 at 10:02:13AM +0200, Jim Meyering wrote: > [EMAIL PROTECTED] (Michael Geng) wrote: > > I released version 0.6.6 of genparse which supports > > internationalization now. A link to the download page > > and to the updated documentation can be accessed from > > the genparse project page http://genparse.sourceforge.net/. > > > > I also prepared a patch which demonstrates how genparse > > could be used for parsing the command line arguments of > > the wc command. The patch is relative to the coreutils version > > 6.9. In order to compile the coreutils after you applied > > the patch you will need genparse version 0.6.6 in your path. > > > > Does this approach appear reasonable to you? What else should > > be improved on genparse? > > I took a quick look only at the patch (not at the code genparse generates) > and spotted some minor problems: > > > + if (cmdline.version) > > +{ > > + version_etc (stdout, program_name, GNU_PACKAGE, VERSION, AUTHORS, > > Here, it should be PROGRAM_NAME, not program_name. > ... > > diff -u -N -r coreutils-6.9.orig/src/wc.gp coreutils-6.9/src/wc.gp > > --- coreutils-6.9.orig/src/wc.gp1970-01-01 01:00:00.0 > +0100 > > +++ coreutils-6.9/src/wc.gp 2007-07-10 18:17:33.0 +0200 > > @@ -0,0 +1,24 @@ > ... > > +Report bugs to <[EMAIL PROTECTED]>. > > +#usage_end > > s/fileutils/coreutils/ > or, perhaps better: PACKAGE_BUGREPORT > > Did you ensure that wc still passes "make check"? I did this now - and I found a bug: wc -l doesn't work because I didn't assign the print_lines variable. That's not a bug in genparse but in my hand written code in wc.c. But after fixing this make check still fails because the usage() function doesn't print a proper PACKAGE_BUGREPORT as you also found. > Make sure valgrind wc ... shows no leaks, too. > In particular, test that with the --files0-from=F option. > Is there any significant difference in performance (I doubt it) or binary > size? > > I suggest you also adapt a program that takes integer arguments. > Can genparse handle arguments of all different types, from int > through uintmax_t? > > tail looks like a good candidate. It also had an optional argument: > --follow[={name|descriptor}] > > There are quite a few programs in the coreutils that have unusual > argument-parsing constraints, so it's hard to point to a small subset > and say "if you adapt just these few", then that should be proof enough. > I'm afraid that if you're interested in having coreutils convert, you'll > have to do most of the work, and then demonstrate that there's no > resulting regression (and presumably cleaner code). > > For now, let's start with tail, and maybe "ls" if you're ambitious :) > > Another detail: > IMHO, the contents of your wc.gp file should reside in wc.c. > Might as well leave it in a delimited comment in place of the usage > function you're eliminating, then have a makefile rule use sed or awk > to extract it into wc.gp. I will need some time to think about all the issues you mentioned and find solutions. I appreciate your constructive commments! Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Command line parsing of wc with genparse
[EMAIL PROTECTED] (Michael Geng) wrote: > I released version 0.6.6 of genparse which supports > internationalization now. A link to the download page > and to the updated documentation can be accessed from > the genparse project page http://genparse.sourceforge.net/. > > I also prepared a patch which demonstrates how genparse > could be used for parsing the command line arguments of > the wc command. The patch is relative to the coreutils version > 6.9. In order to compile the coreutils after you applied > the patch you will need genparse version 0.6.6 in your path. > > Does this approach appear reasonable to you? What else should > be improved on genparse? I took a quick look only at the patch (not at the code genparse generates) and spotted some minor problems: > + if (cmdline.version) > +{ > + version_etc (stdout, program_name, GNU_PACKAGE, VERSION, AUTHORS, Here, it should be PROGRAM_NAME, not program_name. ... > diff -u -N -r coreutils-6.9.orig/src/wc.gp coreutils-6.9/src/wc.gp > --- coreutils-6.9.orig/src/wc.gp 1970-01-01 01:00:00.0 +0100 > +++ coreutils-6.9/src/wc.gp 2007-07-10 18:17:33.0 +0200 > @@ -0,0 +1,24 @@ ... > +Report bugs to <[EMAIL PROTECTED]>. > +#usage_end s/fileutils/coreutils/ or, perhaps better: PACKAGE_BUGREPORT Did you ensure that wc still passes "make check"? Make sure valgrind wc ... shows no leaks, too. In particular, test that with the --files0-from=F option. Is there any significant difference in performance (I doubt it) or binary size? I suggest you also adapt a program that takes integer arguments. Can genparse handle arguments of all different types, from int through uintmax_t? tail looks like a good candidate. It also had an optional argument: --follow[={name|descriptor}] There are quite a few programs in the coreutils that have unusual argument-parsing constraints, so it's hard to point to a small subset and say "if you adapt just these few", then that should be proof enough. I'm afraid that if you're interested in having coreutils convert, you'll have to do most of the work, and then demonstrate that there's no resulting regression (and presumably cleaner code). For now, let's start with tail, and maybe "ls" if you're ambitious :) Another detail: IMHO, the contents of your wc.gp file should reside in wc.c. Might as well leave it in a delimited comment in place of the usage function you're eliminating, then have a makefile rule use sed or awk to extract it into wc.gp. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
[PATCH] Command line parsing of wc with genparse
Hi, I released version 0.6.6 of genparse which supports internationalization now. A link to the download page and to the updated documentation can be accessed from the genparse project page http://genparse.sourceforge.net/. I also prepared a patch which demonstrates how genparse could be used for parsing the command line arguments of the wc command. The patch is relative to the coreutils version 6.9. In order to compile the coreutils after you applied the patch you will need genparse version 0.6.6 in your path. Does this approach appear reasonable to you? What else should be improved on genparse? Michael diff -u -N -r coreutils-6.9.orig/src/Makefile.am coreutils-6.9/src/Makefile.am --- coreutils-6.9.orig/src/Makefile.am 2007-03-20 08:24:27.0 +0100 +++ coreutils-6.9/src/Makefile.am 2007-07-10 18:17:23.0 +0200 @@ -363,3 +363,7 @@ | grep -Ev -f $$t &&\ { echo 'the above variables should have static scope' 1>&2; \ exit 1; } || : + +wc_SOURCES = wc_clp.c wc.c +wc_clp.c: wc.gp + genparse --longmembers --internationalize -o wc_clp wc.gp diff -u -N -r coreutils-6.9.orig/src/wc.c coreutils-6.9/src/wc.c --- coreutils-6.9.orig/src/wc.c 2007-03-18 22:36:43.0 +0100 +++ coreutils-6.9/src/wc.c 2007-07-10 18:49:27.0 +0200 @@ -24,13 +24,13 @@ #include #include -#include "system.h" #include "error.h" #include "inttostr.h" #include "quote.h" #include "readtokens0.h" #include "safe-read.h" #include "wcwidth.h" +#include "wc_clp.h" #if !defined iswspace && !HAVE_ISWSPACE # define iswspace(wc) \ @@ -77,60 +77,6 @@ struct stat st; }; -/* For long options that have no equivalent short option, use a - non-character as a pseudo short option, starting with CHAR_MAX + 1. */ -enum -{ - FILES0_FROM_OPTION = CHAR_MAX + 1 -}; - -static struct option const longopts[] = -{ - {"bytes", no_argument, NULL, 'c'}, - {"chars", no_argument, NULL, 'm'}, - {"lines", no_argument, NULL, 'l'}, - {"words", no_argument, NULL, 'w'}, - {"files0-from", required_argument, NULL, FILES0_FROM_OPTION}, - {"max-line-length", no_argument, NULL, 'L'}, - {GETOPT_HELP_OPTION_DECL}, - {GETOPT_VERSION_OPTION_DECL}, - {NULL, 0, NULL, 0} -}; - -void -usage (int status) -{ - if (status != EXIT_SUCCESS) -fprintf (stderr, _("Try `%s --help' for more information.\n"), -program_name); - else -{ - printf (_("\ -Usage: %s [OPTION]... [FILE]...\n\ - or: %s [OPTION]... --files0-from=F\n\ -"), - program_name, program_name); - fputs (_("\ -Print newline, word, and byte counts for each FILE, and a total line if\n\ -more than one FILE is specified. With no FILE, or when FILE is -,\n\ -read standard input.\n\ - -c, --bytesprint the byte counts\n\ - -m, --charsprint the character counts\n\ - -l, --linesprint the newline counts\n\ -"), stdout); - fputs (_("\ - --files0-from=Fread input from the files specified by\n\ - NUL-terminated names in file F\n\ - -L, --max-line-length print the length of the longest line\n\ - -w, --wordsprint the word counts\n\ -"), stdout); - fputs (HELP_OPTION_DESCRIPTION, stdout); - fputs (VERSION_OPTION_DESCRIPTION, stdout); - printf (_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT); -} - exit (status); -} - /* FILE is the name of the file (or NULL for standard input) associated with the specified counters. */ static void @@ -580,6 +526,7 @@ char *files_from = NULL; struct fstatus *fstatus; struct Tokens tok; + struct arg_t cmdline; initialize_main (&argc, &argv); program_name = argv[0]; @@ -589,44 +536,27 @@ atexit (close_stdout); - print_lines = print_words = print_chars = print_bytes = false; - print_linelength = false; total_lines = total_words = total_chars = total_bytes = max_line_length = 0; - while ((optc = getopt_long (argc, argv, "clLmw", longopts, NULL)) != -1) -switch (optc) - { - case 'c': - print_bytes = true; - break; - - case 'm': - print_chars = true; - break; - - case 'l': - print_lines = true; - break; - - case 'w': - print_words = true; - break; - - case 'L': - print_linelength = true; - break; - - case FILES0_FROM_OPTION: - files_from = optarg; - break; - - case_GETOPT_HELP_CHAR; - - case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS); - - default: - usage (EXIT_FAILURE); - } + Cmdline(&cmdline, argc, argv); + print_bytes = cmdline.bytes; + print_chars = cmdline.lines; + print_words = cmdline.words; + print_linelength = cmdline.max_line_length; + files_from = cmdline.files0_from; + print_bytes = cmdline.bytes; + print_bytes = cmdline.bytes; + print_bytes = cmdline.bytes; + + if (cmdline.help) +