On Wed, Feb 22, 2012 at 01:00:39AM +0000, Stuart Henderson wrote:
> On 2012/02/21 19:15, Woodchuck wrote:
> > >Fix:
> >     surround the nitems macro with #if define(_KERNEL) and
> >     let any non-kernel code that has come to depend on it
> >     choke.
> 
> The userland program/s relying on this should be defining it
> themselves, but breaking the tree is not an acceptable way to
> get them fixed.

I agree with both sentiments.  I meant my remark about letting
non-kernel code choke to suggest a (brutal) way to judge the effects
of surrounding nitems with appropriate ifdefs.  I'll be doing a
"make build" later on today to judge these effects.  A couple of
casual glances from some grep results suggest they should be zero
to minor.  Effects within OpenBSD's kernel should probably be
reported as a bug, ("breach of defensive programming bug") my
impression is that _KERNEL should always be defined for such
compilations.

I *grok fully* the OpenBSD mindset ("take it or leave it", "this
project is for the amusement of its authors", "show me the code",
"rtfm", "rtfFAQ", "do not break happy code", etc), and *fully approve
it*.  I *share* that attitude towards my own stuff.  I *encourage*
that attitude in others, and cite OpenBSD as an example of how an
uncompromising application of that attitude can result in a very
interesting, useful and attractive result, without which I would
feel lost and alone in a world of cruft and bugs.  Hence my pleasure
at finding an opportunity to report a subtle bug and its likely
fix.

In 3rd party code (existing ports and completely alien code) we are
confronted with many problems.  How I came across the nitems issue
is illustrative.

I am collaborating on porting an application to OpenBSD that is of
significant interest to radio operators, worldwide, amateurs and
potentially others.  It is fldigi, and implements various digital
radio transmission and reception modes through the audio facilities
of PeeCees, Macs and certain other architectures to eventually
include tablet, small Soekris-like systems, and so on.  Besides
needing fairly decent access to the audio subsystem, it also needs
to communicate with the radio transceiver, chiefly through RS-232
or similar (USB) channels.  The program, obviously, runs in real
time, and performs some heavy (FFT) calculations to synthesize
signals to be transmitted, and to analyze those received, much like
the work done in a software-defined radio transceiver, it is typically
compute-bound. It consists of about 120,000 lines of code, and is
written in C++, what I would term a medium-sized program.  It is
intended to be simultaenously mathematically sophisticated and
user-friendly, the users consisting of advanced amateur radio
operators, engineers and suchlike but also computer-semiliterate
emergency services operators; it has, therefore, a GUI interface.
It is intended to be as bullet-proof as possible, and to function
in fairly hostile environments. It is ambitious, and it is succeeding
pretty well.  It is natural to want to port it to OpenBSD, to make
use of the well known benefits of OpenBSD.  It is open-source.
Its homepage is at  http://www.w1hkj.com/Fldigi.html

Fldigi also wants to communicate over TCP/IP using XML to exchange
certain data with various remote sites.  To do that it uses
xlmrpc-c, an OpenBSD port.  Device access and net access are not
unreasonable tasks for 3rd party software. Getting system-specific
definitions and declarations is legitimate for such a program. It
is legitimate for applications to include sys/*.h, and it is
legitimate to expect that code that compiles and runs on Linux,
FreeBSD, NetBSD, and yea even unto Cygwin and native Windoze, to
compile on OpenBSD.  ("principle of least surprise"). After usual
tweakage, of course.

Fldigi nowhere includes sys/param.h.  It does include netdb.h, and
other members of sys/*.  Fldigi's control panel (for X11), is
implemented using FLTK, which is ported to OpenBSD, although the
port is out of date.  The xmlrpc-c software is, at best, dubious.
Such is life.

The bug only surfaces in fldigi when xml is configured into the
compilation; however the bug appears in a file included from the
FLTK port, namely "Fl.H", where nitems() is defined as a public
method for some C++ class.  Along the lines of 
        int nitems() { return nitems_; }

IMO, the usual C++ cruft, but there it is.  A curse on C++ for
including executable code in included files, but that Petri dish was
dropped long ago, alas.

Analysis reveals that this is an interaction of included files
from xmlrpc-c and FLTK, the xmlrpc-c included files include netdb.h,
which brings in sys/param.h.  It is legitimate and necessary for
an application implementing XML to include netdb.h. I have not
yet determined why the failing user routine includes FLTK files,
but I suspect that that, is a legitimate but not anecessary thing.
It is (probably) due to unfortunate design choices, chiefly due
to convenience in C++, which encourages a sort of "include mania"
that I am sure many find frightening and the source of much grief.
My point is that "we" (programmers, users, porters) are largely
stuck with the world as it is, and the terrors of C++ are now
part of it.  And I have a workaround for this specific case,
an inclusion in user source of "#undef nitems" at a point after
the implicit inclusion of sys/param.h and before the implicit
inclusion of Fl.H (from FLTK).  Such a solution is not generally
possible, however.

The problem with the nitems bug is that the compiler provides no hint
of where nitems might be defined, as it is a preprocessor symbol,
and the preprocessor doesn't care (much) about C or C++ syntactic
niceties.  So to track it down, I had to go on a snipe hunt;  I
know how to do this sort of thing, so I did it.  I expected some
sort of cruel interaction between various C++ "include files",
which nest to a depth of about 5 in the vicinity of the problem.
There are 5-6 external pacakages involved.

My disappointment is that these results are the exact opposite of
what I would have expected -- I would have expected the compilation
to fail because OpenBSD had *not* defined some innovative local feature,
and I could say, "Behold the purity of the True Source, which defineth
no cruft." and supply a patch to the user code that defines nitems.

I will continue analyzing the bug and report any results.

Addendum:  files in /usr/include that *directly* include sys/param.h

User code referencing any of these will be vulnerable to the bug.

[djv@aemilia ~ 0:65]$ find /usr/include/ -name "*.h" | xargs -n100 grep -l 
sys/param.h
/usr/include/altq/altq_var.h
/usr/include/arpa/inet.h
/usr/include/arpa/nameser.h
/usr/include/dev/ic/aic79xx_openbsd.h
/usr/include/dev/ic/aic7xxx_openbsd.h
/usr/include/dev/ic/ar5xxx.h
/usr/include/dev/ic/isp_openbsd.h
/usr/include/dev/ic/pdqvar.h
/usr/include/dev/ic/smc93cx6var.h
/usr/include/dev/ofw/openfirm.h
/usr/include/dev/pci/drm/drmP.h
/usr/include/dev/pci/if_bnxreg.h
/usr/include/dev/pci/if_em.h
/usr/include/dev/pci/if_ixgb.h
/usr/include/dev/pci/ixgbe.h
/usr/include/dev/raidframe/rf_configure.h
/usr/include/dev/raidframe/rf_threadstuff.h
/usr/include/dev/raidframe/rf_types.h
/usr/include/g++/i386-unknown-openbsd5.0/bits/c++config.h
/usr/include/i386/cpu.h
/usr/include/i386/db_machdep.h
/usr/include/net/pfvar.h
/usr/include/netdb.h
/usr/include/netmpls/mpls.h
/usr/include/nnpfs/nnpfs_config.h
/usr/include/nnpfs/nnpfs_locl.h
/usr/include/openssl/e_os.h
/usr/include/resolv.h
/usr/include/rpcsvc/bootparam_prot.h
/usr/include/sndio.h
/usr/include/sys/localedef.h
/usr/include/sys/mbuf.h
/usr/include/sys/radioio.h

On my very lightly populated system:

[djv@aemilia ~ 0:66]$ find /usr/local/include/ -name "*.h" | xargs -n100 grep 
-l sys/param.h
/usr/local/include/tcl8.5/tclInt.h
/usr/local/include/tcl8.5/tclUnixPort.h
/usr/local/include/python2.7/pyconfig.h
/usr/local/include/Xm/Xmos_r.h
/usr/local/include/Xm/Xmpoll.h

In /usr/ports (including pobj), there are on my system 97 instances 
of files referencing sys/param.h on that system.

in src:

[djv@aemilia ~ 0:75]$ find /usr/src -name "*.h" | grep -v /usr/src/sys | \
        xargs -n100 grep -l sys/param.h  | wc
     129     129    5284

These counts do not include indirect inclusion of sys/param.h (such as from
netdb.h and certain others).

This bug is pernicious because it can also generate perfectly good
C and C++ code, with no warning to the user of a macro expansion being
performed.  nitems shouldn't have leaked out of the kernel, but there
it is.  I don't think it's documented anywhere, but could be wrong.
It's a long fine manual to read.  It is very interesting to look at
the results of 
   $ find /usr/share -type f | xargs -n100 grep nitems
               which I leave as an exercise for the reader.


73,

Dave AB3NR

Reply via email to