On Monday 08 March 2010 20:56:03 Rob Landley wrote: > I've already written a toybox version, since it was easier for me to write > that from scratch than try to wrestle with the busybox one. The new toybox > command is 98 lines long (1737 bytes of source) and the existing busybox > one is 206 lines (4974 bytes of source) in current git.
By the way, I'm not sure how much of the "easier" with the writing a new wc for toybox was me, and how much was the toybox infrastructure. I'm in the wierd position of not really wanting to continue toybox as a separate project that's vastly out-resourced by the busybox development community, but also finding working on busybox incredibly clumsy and tedious compared to working on toybox. What I'd really like to do is port the toybox infrastructure over to busybox, if you guys are interested. I'll describe the process of creating the new wc command to give you guys a feel for it. (A previous attempt of mine to document all this is at http://landley.net/code/toybox/code.html by the way.) Each toybox command is a single C file. Adding a new command to toybox involves adding a new file to the toys directory. That's it. I don't touch any makefiles or headers or anything, the rest is entirely generated by the build script, which scans the toys/*.c files and constructs the other files at build time. The generic infrastructure has no specific knowledge of the actual commands. To start a new command, I cd into the "toys" subdirectory of my toybox source code and "cp hello.c wc.c". The "hello" command is an example which has all the basic plumbing a command needs (actually way more than a simple hello world needs) so it can act as a convenient skeleton for new commands. Note that I call them "commands" rather than "applets" because this isn't java. It's a command line, not an applet line. The toybox hello.c looks like: /* vi: set sw=4 ts=4: * * hello.c - A hello world program. * * Copyright 2006 Rob Landley <r...@landley.net> * * Not in SUSv4. * http://www.opengroup.org/onlinepubs/9699919799/utilities/ USE_HELLO(NEWTOY(hello, "e...@d*c#b:a", TOYFLAG_USR|TOYFLAG_BIN)) config HELLO bool "hello" default n help A hello world program. You don't need this. Mostly used as an example/skeleton file for adding new commands, occasionally nice to test kernel booting via "init=/bin/hello". */ #include "toys.h" // Hello doesn't use these, they're here for example/skeleton purposes. DEFINE_GLOBALS( char *b_string; long c_number; struct arg_list *d_list; long e_count; int more_globals; ) #define TT this.hello void hello_main(void) { printf("Hello world\n"); } But most of that's example boilerplate for skeleton purposs. All it _really_ needs is: /* hello.c - A hello world program. USE_HELLO(NEWTOY(hello, NULL, TOYFLAG_USR|TOYFLAG_BIN)) config HELLO bool "hello" default n help A hello world program. You don't need this. */ #include "toys.h" void hello_main(void) { printf("Hello world\n"); } Each toybox command starts with a specially formatted comment that contains the command line options, usage info, and kconfig blob for menuconfig. The command's help text (spit out by the "help" command, as well as by the command itself if run with unintelligible options) is also extracted from the kconfig help text, so I don't have to describe the same thing twice. The first few comment lines (the ones starting with an asterisk) are normal comment lines that don't get parsed by anything. The convention is to put a description, copyright notice, and link to the relevant standard (if any) there, but it's really just a comment. The USE_XXX(NEWTOY(XXX)) line defines the command name, command line options, and install location of each command. At compile time a sed invocation collects this line from from every toys/*.c file into "generated/newtoys.h", which is then #included to set up the command array toy_exec() searches (see main.c at the top level). The USE_XXX() macro chops its contents out if the relevant config option isn't enabled (just like I added to busybox back in 2006). There's a SKIP_XXX() too but it's not used much. So this line is always copied into generated/newtoys.h, but only _used_ if the relevant config entry is enabled. The NEWTOY() macro takes three arguments: command name, option string, and install location. If you'd like one command to have multiple names there's also an OLDTOY() macro, which takes four arguments: the new name, the original name, command options the new name understands (which can differ from the other name, but they are washed through the same main() function), and install location. The install location is used if you give the "toybox" multiplexer any option beginning with a dash. Currently for defconfig, it outputs: ./toybox -? bin/basename usr/bin/bzcat bin/cat usr/bin/catv usr/sbin/chroot usr/sbin/chvt bin/cksum usr/bin/count bin/cp usr/sbin/df bin/dirname bin/dmesg bin/echo bin/false bin/help usr/bin/mdev bin/mkfifo sbin/mkswap bin/nc bin/netcat usr/bin/nice sbin/oneit usr/bin/patch bin/pwd bin/rmdir usr/bin/seq usr/bin/setsid bin/sh usr/bin/sha1sum bin/sleep usr/bin/sort bin/sync bin/tee bin/touch bin/toysh bin/true bin/tty bin/uname usr/sbin/useradd usr/bin/wget usr/bin/which usr/bin/yes A trivial script can go through that output and install the appropriate symlinks to the "toybox" binary, something like: for i in $(./toybox -); do ln -s /bin/toybox $i; done You can run ./toybox without any arguments to get the list of commands without the paths prepended, to install all the links in the same directory. (Yes you can do "toybox cat filename" too, none of the command names start with a dash.) That leaves the middle argument to NEWTOY(), which is the command line option string. This is the biggest difference between toybox and busybox, the option parsing logic is completely different, and so automated you can largely ignore it. However, I'm going to explain it here in more detail than you probably really need to know. :) I wrote my own option parser (lib/args.c, which does _not_ call getopt() so was net smaller than busybox's last I checked). It's automatically called before the command's main() function is ever run, using the option string supplied by NEWTOY() to parse the command line options and fill out global variables with the appropriate values. You can disable this automatic option parsing (and call it manually if you like) by passing NULL in as the option string in NEWTOY(), which is also how you specify you take no arguments so that the option parsing can get compiled out if nobody's using it. See main.c for details. The command_main() functions return void and take no arguments, instead you use global variables. The main one is the global "toys", which looks like this: extern struct toy_context { struct toy_list *which; // Which entry in toy_list is this one? int exitval; // Value error_exit feeds to exit() char **argv; // Original command line arguments unsigned optflags; // Command line option flags from get_optflags() char **optargs; // Arguments left over from get_optflags() int optc; // Count of optargs int exithelp; // Should error_exit print a usage message first? int old_umask; // Old umask preserved by TOYFLAG_UMASK } toys; toys.optflags is filled out by the option parsing logic with the command line flags seen this run. exitval defaults to 0 but can be changed by other stuff (such as any of the functions that exit with an error, or by setting it manually before returning from main().) optargs[] contains the options left over after option parsing. (So "ls -l file1 file2 file3", optargs[0] would be file1 and optargs[2] would be file3, and optargs[3] would be NULL. optc the equivalent of argc for optargs. argv[] is the unprocessed argument list, kept around since we can't free it anyway and there's a couple times you might want to know. (Such as if you passed NULL as the option string to NEWTOY().) The other interesting global is "this", which is a union of structures containing all your global variables for each command. That's initialized by the DEFINE_GLOBALS() macro a bit further down in the file, which lists the global variables for this file. The contents becomes a structure in a union of all such structures for each command, which can be accessed as "this.commandname" (in this case, "this.wc"). The #define TT this.wc is a shortcut so we can say TT.wc if we have any globals. (I should make the #define TT automatic as part of DEFINE_GLOBALS() or something, but haven't figured out how yet. Alas, you can't have a macro resolve to a preprocessor directive.) If there are no global variables used by this command, you can omit the DEFINE_GLOBALS() block entirely. But if the command line parsing saves results to any variables, you need to list them at the start of the DEFINE_GLOBALS() block: 1) In order (from right to left). 2) All of them are long/pointer size. (4 bytes on 32-bit, 8 on 64-bit.) The options are numbered from right to left because that way anybody familiar with boolean can work out the flag values in their head: "The option string has abcdefg, command line is -adg, that's 1001001... that's 64+8+1". Whereas if you number them the other way, you have to reverse them in your head to work out the values. (This means add extra variables to the beginning of the string to avoid renumbering the others.) So if I had an option string "ab:d#" the options are d=1, b=2, a=3 (ignore the non-letters for that), and the associated globals block could look like: DEFINE_GLOBALS( long value_for_d; char *value_for_b; int any_other_globals; ) The appended : means "takes a string argument" (just like in getopt), the appended # means "takes a number argument". Said arguments are saved into the global block, right to left becoming top to bottom. By convention, I put a space between globals filled out by the option parsing logic and globals that are just globals used by the code. Note that all of the globals are initialized to zero to start with, and then the option parsing logic can set the first few to other values, but any that aren't initialized by the option parsing logic (including ones that _could_ but that option wasn't used this time) are still reliably zeroed. That pretty much gets us through all the boilerplate, and in fact is probably way more info than you'd really need to know to implement the wc command. Rob -- Latency is more important than throughput. It's that simple. - Linus Torvalds _______________________________________________ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox