Hi!

----

I am back from my vacation (or better: emergency babysitting) and are
only half through all the emails which queued-up in my InBox. I am now
trying to randomly summarise and explain a few things and background
issues about shells (apologies for the text below, it's 5:25h AM here
and as a result the email may be more weired as usual):


1. Some comments on shell history (see
http://mail.opensolaris.org/pipermail/opensolaris-code/2007-March/004621.html
for a similar comment in a different context)
Originally Unix used the "Thomson shell" and then came the "Bourne
shell" as major improvment (replacing it's predecessor in "/bin/sh").
Without the Unix wars the "Korn Shell" (based on korn shell spec 88)
would likely be the successor (at least some platforms such as AIX did
replace the Bourne shell with ksh) and later the "new korn shell" (based
on korn shell spec 93) would have followed as "/bin/sh" in Unix.

The "Unix wars" runied this opportunity and trigger the problems we
currently have with /usr/bin/sh in Solaris (the story goes back at least
twelve years back in the Sun bug database).

For Linux there was the problem that the original korn shell was not
"open" and at some point both "pdksh" and "bash" were developed where
"pdksh" tried to emulate "ksh88" (e.g. almost none of the "ksh93"
features are supported) while "bash" draws many of it's features from
the original Bourne shell, ksh88 and (later) the POSIX shell standard
(bash3 includes several features from ksh93, too (as a result many
scripts which use bash3-specific features run under ksh93, too)). Later
the Linux people (and LSB) standartized on "bash" mainly because the
"pdksh" project was half-dead, reducing the "competition" to exactly one
entry ("bash" ; ksh93 wasn't available under an Opensource license until
many years later).
In a parallel evolution the POSIX people developed their "shell standard
spec" (primarily based on ksh88 with a few bits of bourne (such as the
function syntax which doesn't allow a seperate scope (which is different
from ksh88 and bash1/bash2/bash3, only ksh93 implements these tiny "fine
prints" of the spec correctly, making it far more compatible to the
original Bourne shell than ksh88 or bash*))) which was a major influence
for ksh93 and later versions of bash.


2. Why does some software (e.g. shell scripts or system calls like
|popen()| which call /usr/bin/sh to get a shell) fail with Solaris's
/sbin/sh ?
Many opensource software doesn't fail because they expect "bash"
features (sometimes called "bashisms") as some people have claimed here,
usually they fail because basic functionality (such as "$(...)"
expansion, arithmetric expressions, options for the "test" builtin etc.)
required by the POSIX shell standard is missing (and this is supported
by almost every bug filed about failures caused by syntax errors or
missing features in Sun's bug DB). The "original" Bourne shell in
Solaris is simply too old and predates the POSIX shell standard and was
never updated to support newer syntax or constructs (nor were other
things fixed, e.g. support for multibyte locales was tacked-on later
which still causes pain for users of multibyte locales (e.g. where one
character is represented by multiple bytes (or better: A variable amount
of bytes)) like ja_JP.PCK or zh_CN.GB18030).


3. Which shell should replace the "original" Bourne shell as /sbin/sh ?
IMO the discussion should not be "bash" vs. "Bourne" vs. "ksh93", a
better discussion would be to think about an update of /sbin/sh
(/usr/bin/sh is just a symlink which points to /sbin/sh in Solaris >=
10, e.g. there is no longer a seperate, statically linked /sbin/sh and a
dynamically linked /usr/bin/sh) to a shell which conforms to the POSIX
shell standard  - this is the same goal as many Linux and *BSD
distributions try to reach, AFAIK many of them have configured "bash" to
run in "POSIX conformance mode" when started as /bin/sh or use
alternative solutins (e.g. Debian allows various POSIX-like alternatives
to be used).

AFAIK we have two main choices:
a. "bash"
    or
b. "ksh93"

I don't want to go into all the details but...
[a] ... may be the choice if we want 100% compatibilty to Linux in all
cases. If this choice is selected we need something like a
"bash-integration"-project to get "bash" into a shape where it would be
good enougth to fit into Solaris as "core shell" (remeber "ksh93r" (korn
shell based on spec 1993, version 'r') was very good to begin with (as
basis for the ksh93-integration project) but we still needed more than
_three_ man-years (e.g. three people working for one year) to "overhaul"
the whole shell and fix most of the
Solaris-/i18n-/l10n-/usuabilty-/performance-/conformance-/etc.-bugs. A
similar effort would be required for "bash" (and may need the same
amount of man-years, AFAIK (only doing a quick look, AFAIK someone from
the standards folks at Sun needs to run "bash" against the test suites
and check what needs to be done (e.g. the same procedure we did for the
"ksh93-integration"-project (which resulted in more than twelve months
of mad bughunting))) at least some parts like the i18n support for
non-UTF-8 locales need a complete rework (otherwise someone has to
explain the Chinese and Japanese goverments why Indiana can't match
their requirements (IMO the results of such a discussion are best to be
observed from _behind_ a rock))).
Finally we need lots of work to hunt down all the scripts (or create an
automated way to do the "hunt") which break when "bash" replaced the
Bourne shell, for example all scripts which use Bourne-style function
syntax need to be checked (short: One of the major changes between
"bash"/"ksh88" and "ksh93" is the way how Bourne/POSIX-style functions
are handled (see
http://svn.genunix.org/repos/on/branches/ksh93/gisburn/prototype005/usr/src/lib/libshell/common/COMPATIBILITY).
"ksh93" follows precisely the POSIX standard (which follows Bourne
function behaviour) while "bash"/"ksh88" have different scope behaviour.
While this makes "ksh93" slightly incompatible to "ksh88" in some cases
it makes it more compatible to the Bourne shell (and reverse makes
"bash" slightly incompatible to the Bourne shell)).

[b] ... may be easier if Solaris and/or Indiana aim more for the POSIX
shell standard since "ksh93" is closer to the POSIX shell standard than
"bash" in POSIX mode (this is one reason why the standards folks at Sun
like to check whether they can replace /usr/xpg4/bin/sh with "ksh93")
and we already have a project (the "ksh93-integration"-project) which is
now more or less "done" with the first putback. Another advantage would
be that "ksh93" is faster, more feature rich (some have stated that the
floating-point math is the "killer feature", but IMO features like
builtin commands, unlimited array size, unlimited variable string
length, arrays with strings as index ("associative arrays", very usefull
to manage lists and other data), variable trees ("compound variables")
are features which matter more in the real world (floating-point math is
still very usefull for some applications)), extensible (e.g. "ksh93" has
an API to load builtin commands/functions/etc. on demand to extend it's
functionality), very good i18n support (up to the issue that function
and variable names may contain non-ASCII charatcers (e.g.
function+variable names written in japanese aren't a problem anymore...
:-) )) and avoids things like |fork()|+|exec()| (which is a feature
unique of "ksh93", e.g. Bourne shell, "ksh88" and "bash" all create new
child processes for subshells etc.) if possible (which is very important
- first the |fork()| is a very expensive operation (new Unix process)
and the |exec()| may serious harm scalabilty of a whole system (I try to
explain this short: |exec()| requires to tear-down the address space of
the current process and requires to make a cross-call to all CPUs in a
system (note: This is a hardware implementation issue, this can't be
fixed in the kernel). Now imagine a SF25k with 144 cores or a "victorial
falls" machine with 128 or 256 (virtual) CPUs... or imagine a 8 socket
machine with 16 cores with 16 threads per core - that are 2048 threads
in a single machine (which can count as 2048 virtual cores in a "sun4v"
machine) ... (yes, I've read TheRegister.co.uk)  which requires 2048
cross-calls per _single_ |exec()| call.
Or short: A simple shell script which triggers too many
|fork()|+|exec()| calls can (there were several customer escalations
running in the past exactly about the "|fork()|+|exec()| storm on large
enterprise machine"-issue) starve a whole enterprise machine (and as a
result a shell in Solaris should avoid doing this, otherwise it is on a
Titanic-style collision course with Sun's multicore strategy))).


4. What needs to be done to replace /sbin/sh with "ksh93" ?
Technicially replacing /sbin/sh with ksh93 is possible _now_, in fact
the early ksh93-integration code prototypes and ARC drafts delivered a
copy of ksh93 as /sbin/ksh93 and provided
a build switch to install it as /sbin/sh, too. The feature was removed
from PSARC 2006/550 because there was no _immediate_ demand for such a
feature at that time but putting this stuff back for Indiana or any
other OpenSolaris distribution shouldn't be a big problem (less than a
day of work and a few days of testing (limited a bit by the performance
of my Ultra5)). The problem
is that one of the (Indiana, BeleniX etc.) project leads needs to say
"we need it" - otherwise such a change (e.g. delivering a /sbin/ksh93
and a build switch to install ksh93 as /sbin/sh) won't pass the ARC.

The only real-world problems (yes, I know that people can always write
artificial tests to dig out more artificial incompatibilties but
real-world scripts don't do that) we encountered in the last 14 months
were:
- The bash/ksh93 "unset" builtin returns a non-zero return code if the
variable which should be "unset" is not available while the Bourne shell
just returns a zero exit (="success") code in
all cases. This issue is a problem with POSIX conformance of /sbin/sh
since the POSIX standard (see
http://www.opengroup.org/onlinepubs/000095399/utilities/unset.html)
says:
-- snip --
EXIT STATUS
    0
       All name operands were successfully unset.
   >0
       At least one name could not be unset.
-- snip --
Only one bug of this kind was ever found in Solaris (hidden in the
OS/Net build system) which we corrected with
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6540124

- The bash/ksh93 "set" builtin returns a zero exit code in all cases
while the Bourne shell returns the exit code of the previous command.
Again a violation of the POSIX shell spec which says in
http://www.opengroup.org/onlinepubs/000095399/utilities/set.html
-- snip --
EXIT STATUS
    Zero.
-- snip --
Until today we only found one incarnation of this problem which was
corrected with
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6551716

That are all incompatiblites between ksh93 (or better: the POSIX shell
standard (since all issues listed above are required by the POSIX shell
standard)) and the Bourne shell so far which happened in real-world
scripts (all of them vere easy to identify and correct, for example CR
#6551716 was fixed in less than a week, including code review, RTI,
offtopic discussions and
all the other paperwork).

For a migration of /sbin/sh to ksh93 (David Comay already wrote a nice
email about the required prodedure) we need to find a way to identify
the shell scripts which may break - for example one way may be to use
ksh93's shell script compiler ("shcomp", e.g. it compiles ksh93 shell
scripts into a bytecode which is then executed by "ksh93" (sort of a
"javac" for shell scripts)) as some kind of "shell lint" to detect these
incompatibilities (other items which need to be checked may include the
old Bourne pipe syntax (e.g. $ echo "foo" ^ cat # and builtins like
"chdir" etc. (they may not be used in modern scripts but are documented
in the Bourne shell manual page)).
It already has some checking code which can easily be enhanched to check
for the two conditions above (e.g. "unset" or "set" followed by $? test
etc.).

AFAIK that's all what needs to be done for now to do the switch (from
the technical side).

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [EMAIL PROTECTED]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)
_______________________________________________
indiana-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/indiana-discuss

Reply via email to