Toybox infrastructure.

Rob Landley Tue, 09 Mar 2010 18:44:18 -0800

On Monday 08 March 2010 20:56:03 Rob Landley wrote:
> I've already written a toybox version, since it was easier for me to write
> that from scratch than try to wrestle with the busybox one.  The new toybox
> command is 98 lines long (1737 bytes of source) and the existing busybox
> one is 206 lines (4974 bytes of source) in current git.


By the way, I'm not sure how much of the "easier" with the writing a new wc 
for toybox was me, and how much was the toybox infrastructure.  I'm in the 
wierd position of not really wanting to continue toybox as a separate project 
that's vastly out-resourced by the busybox development community, but also 
finding working on busybox incredibly clumsy and tedious compared to working on 
toybox.

What I'd really like to do is port the toybox infrastructure over to busybox, 
if you guys are interested.  I'll describe the process of creating the new wc 
command to give you guys a feel for it.  (A previous attempt of mine to 
document all this is at http://landley.net/code/toybox/code.html by the way.)

Each toybox command is a single C file.  Adding a new command to toybox 
involves adding a new file to the toys directory.  That's it.  I don't touch 
any makefiles or headers or anything, the rest is entirely generated by the 
build script, which scans the toys/*.c files and constructs the other files at 
build time.  The generic infrastructure has no specific knowledge of the actual 
commands.

To start a new command, I cd into the "toys" subdirectory of my toybox source 
code and "cp hello.c wc.c".  The "hello" command is an example which has all 
the basic plumbing a command needs (actually way more than a simple hello 
world needs) so it can act as a convenient skeleton for new commands.  Note 
that I call them "commands" rather than "applets" because this isn't java.  
It's a command line, not an applet line.

The toybox hello.c looks like:

/* vi: set sw=4 ts=4:
 *
 * hello.c - A hello world program.
 *
 * Copyright 2006 Rob Landley <r...@landley.net>
 *
 * Not in SUSv4.
 * http://www.opengroup.org/onlinepubs/9699919799/utilities/

USE_HELLO(NEWTOY(hello, "e...@d*c#b:a", TOYFLAG_USR|TOYFLAG_BIN))

config HELLO
        bool "hello"
        default n
        help
          A hello world program.  You don't need this.

          Mostly used as an example/skeleton file for adding new commands,
          occasionally nice to test kernel booting via "init=/bin/hello".
*/

#include "toys.h"

// Hello doesn't use these, they're here for example/skeleton purposes.

DEFINE_GLOBALS(
        char *b_string;
        long c_number;
        struct arg_list *d_list;
        long e_count;

        int more_globals;
)

#define TT this.hello

void hello_main(void)
{
        printf("Hello world\n");
}

But most of that's example boilerplate for skeleton purposs.  All it _really_ 
needs is:

/* hello.c - A hello world program.

USE_HELLO(NEWTOY(hello, NULL, TOYFLAG_USR|TOYFLAG_BIN))

config HELLO
        bool "hello"
        default n
        help
          A hello world program.  You don't need this.
*/

#include "toys.h"

void hello_main(void)
{
        printf("Hello world\n");
}

Each toybox command starts with a specially formatted comment that contains 
the command line options, usage info, and kconfig blob for menuconfig.  The 
command's help text (spit out by the "help" command, as well as by the command 
itself if run with unintelligible options) is also extracted from the kconfig 
help text, so I don't have to describe the same thing twice.

The first few comment lines (the ones starting with an asterisk) are normal 
comment lines that don't get parsed by anything.  The convention is to put a 
description, copyright notice, and link to the relevant standard (if any) 
there, but it's really just a comment.

The USE_XXX(NEWTOY(XXX)) line defines the command name, command line options, 
and install location of each command.  At compile time a sed invocation 
collects this line from from every toys/*.c file into "generated/newtoys.h", 
which is then #included to set up the command array toy_exec() searches (see 
main.c at the top level).

The USE_XXX() macro chops its contents out if the relevant config option isn't 
enabled (just like I added to busybox back in 2006).  There's a SKIP_XXX() too 
but it's not used much.  So this line is always copied into 
generated/newtoys.h, but only _used_ if the relevant config entry is enabled.

The NEWTOY() macro takes three arguments: command name, option string, and 
install location.  If you'd like one command to have multiple names there's 
also an OLDTOY() macro, which takes four arguments: the new name, the original 
name, command options the new name understands (which can differ from the other 
name, but they are washed through the same main() function), and install 
location.

The install location is used if you give the "toybox" multiplexer any option 
beginning with a dash.  Currently for defconfig, it outputs:

  ./toybox -?
  bin/basename usr/bin/bzcat bin/cat usr/bin/catv usr/sbin/chroot
  usr/sbin/chvt bin/cksum usr/bin/count bin/cp usr/sbin/df bin/dirname
  bin/dmesg bin/echo bin/false bin/help usr/bin/mdev bin/mkfifo sbin/mkswap
  bin/nc bin/netcat usr/bin/nice sbin/oneit usr/bin/patch bin/pwd bin/rmdir
  usr/bin/seq usr/bin/setsid bin/sh usr/bin/sha1sum bin/sleep usr/bin/sort
  bin/sync bin/tee bin/touch bin/toysh bin/true bin/tty bin/uname
  usr/sbin/useradd usr/bin/wget usr/bin/which usr/bin/yes

A trivial script can go through that output and install the appropriate 
symlinks to the "toybox" binary, something like:

  for i in $(./toybox -); do ln -s /bin/toybox $i; done

You can run ./toybox without any arguments to get the list of commands without 
the paths prepended, to install all the links in the same directory.  (Yes you 
can do "toybox cat filename" too, none of the command names start with a dash.)

That leaves the middle argument to NEWTOY(), which is the command line option 
string.  This is the biggest difference between toybox and busybox, the option 
parsing logic is completely different, and so automated you can largely ignore 
it.  However, I'm going to explain it here in more detail than you probably 
really need to know. :)

I wrote my own option parser (lib/args.c, which does _not_ call getopt() so 
was net smaller than busybox's last I checked).  It's automatically called 
before the command's main() function is ever run, using the option string 
supplied by NEWTOY() to parse the command line options and fill out global 
variables with the appropriate values.  You can disable this automatic option 
parsing (and call it manually if you like) by passing NULL in as the option 
string in NEWTOY(), which is also how you specify you take no arguments so 
that the option parsing can get compiled out if nobody's using it.  See main.c 
for details.

The command_main() functions return void and take no arguments, instead you 
use global variables.  The main one is the global "toys", which looks like 
this:

extern struct toy_context {
        struct toy_list *which;  // Which entry in toy_list is this one?
        int exitval;             // Value error_exit feeds to exit()
        char **argv;             // Original command line arguments
        unsigned optflags;       // Command line option flags from 
get_optflags()
        char **optargs;          // Arguments left over from get_optflags()
        int optc;                // Count of optargs
        int exithelp;         // Should error_exit print a usage message first?
        int old_umask;           // Old umask preserved by TOYFLAG_UMASK
} toys;

toys.optflags is filled out by the option parsing logic with the command line 
flags seen this run.  exitval defaults to 0 but can be changed by other stuff 
(such as any of the functions that exit with an error, or by setting it 
manually before returning from main().)  optargs[] contains the options left 
over after option parsing.  (So "ls -l file1 file2 file3", optargs[0] would be 
file1 and optargs[2] would be file3, and optargs[3] would be NULL.  optc the 
equivalent of argc for optargs.  argv[] is the unprocessed argument list, kept 
around since we can't free it anyway and there's a couple times you might want 
to know.  (Such as if you passed NULL as the option string to NEWTOY().)

The other interesting global is "this", which is a union of structures 
containing all your global variables for each command.  That's initialized by 
the DEFINE_GLOBALS() macro a bit further down in the file, which lists the 
global variables for this file.  The contents becomes a structure in a union of 
all such structures for each command, which can be accessed as 
"this.commandname" (in this case, "this.wc").  The #define TT this.wc is a 
shortcut so we can say TT.wc if we have any globals.  (I should make the 
#define TT automatic as part of DEFINE_GLOBALS() or something, but haven't 
figured out how yet.  Alas, you can't have a macro resolve to a preprocessor 
directive.)

If there are no global variables used by this command, you can omit the 
DEFINE_GLOBALS() block entirely.  But if the command line parsing saves 
results to any variables, you need to list them at the start of the 
DEFINE_GLOBALS() block:

  1) In order (from right to left).
  2) All of them are long/pointer size.  (4 bytes on 32-bit, 8 on 64-bit.)

The options are numbered from right to left because that way anybody familiar 
with boolean can work out the flag values in their head: "The option string has 
abcdefg, command line is -adg, that's 1001001... that's 64+8+1".  Whereas if 
you number them the other way, you have to reverse them in your head to work 
out the values.  (This means add extra variables to the beginning of the 
string to avoid renumbering the others.)

So if I had an option string "ab:d#" the options are d=1, b=2, a=3 (ignore the 
non-letters for that), and the associated globals block could look like:

  DEFINE_GLOBALS(
    long value_for_d;
    char *value_for_b;

    int any_other_globals;
  )

The appended : means "takes a string argument" (just like in getopt), the 
appended # means "takes a number argument".  Said arguments are saved into the 
global block, right to left becoming top to bottom.

By convention, I put a space between globals filled out by the option parsing 
logic and globals that are just globals used by the code.  Note that all of 
the globals are initialized to zero to start with, and then the option parsing 
logic can set the first few to other values, but any that aren't initialized by 
the option parsing logic (including ones that _could_ but that option wasn't 
used this time) are still reliably zeroed.

That pretty much gets us through all the boilerplate, and in fact is probably 
way more info than you'd really need to know to implement the wc command.

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Toybox infrastructure.

Reply via email to