Hi! ----
I am back from my vacation (or better: emergency babysitting) and are only half through all the emails which queued-up in my InBox. I am now trying to randomly summarise and explain a few things and background issues about shells (apologies for the text below, it's 5:25h AM here and as a result the email may be more weired as usual): 1. Some comments on shell history (see http://mail.opensolaris.org/pipermail/opensolaris-code/2007-March/004621.html for a similar comment in a different context) Originally Unix used the "Thomson shell" and then came the "Bourne shell" as major improvment (replacing it's predecessor in "/bin/sh"). Without the Unix wars the "Korn Shell" (based on korn shell spec 88) would likely be the successor (at least some platforms such as AIX did replace the Bourne shell with ksh) and later the "new korn shell" (based on korn shell spec 93) would have followed as "/bin/sh" in Unix. The "Unix wars" runied this opportunity and trigger the problems we currently have with /usr/bin/sh in Solaris (the story goes back at least twelve years back in the Sun bug database). For Linux there was the problem that the original korn shell was not "open" and at some point both "pdksh" and "bash" were developed where "pdksh" tried to emulate "ksh88" (e.g. almost none of the "ksh93" features are supported) while "bash" draws many of it's features from the original Bourne shell, ksh88 and (later) the POSIX shell standard (bash3 includes several features from ksh93, too (as a result many scripts which use bash3-specific features run under ksh93, too)). Later the Linux people (and LSB) standartized on "bash" mainly because the "pdksh" project was half-dead, reducing the "competition" to exactly one entry ("bash" ; ksh93 wasn't available under an Opensource license until many years later). In a parallel evolution the POSIX people developed their "shell standard spec" (primarily based on ksh88 with a few bits of bourne (such as the function syntax which doesn't allow a seperate scope (which is different from ksh88 and bash1/bash2/bash3, only ksh93 implements these tiny "fine prints" of the spec correctly, making it far more compatible to the original Bourne shell than ksh88 or bash*))) which was a major influence for ksh93 and later versions of bash. 2. Why does some software (e.g. shell scripts or system calls like |popen()| which call /usr/bin/sh to get a shell) fail with Solaris's /sbin/sh ? Many opensource software doesn't fail because they expect "bash" features (sometimes called "bashisms") as some people have claimed here, usually they fail because basic functionality (such as "$(...)" expansion, arithmetric expressions, options for the "test" builtin etc.) required by the POSIX shell standard is missing (and this is supported by almost every bug filed about failures caused by syntax errors or missing features in Sun's bug DB). The "original" Bourne shell in Solaris is simply too old and predates the POSIX shell standard and was never updated to support newer syntax or constructs (nor were other things fixed, e.g. support for multibyte locales was tacked-on later which still causes pain for users of multibyte locales (e.g. where one character is represented by multiple bytes (or better: A variable amount of bytes)) like ja_JP.PCK or zh_CN.GB18030). 3. Which shell should replace the "original" Bourne shell as /sbin/sh ? IMO the discussion should not be "bash" vs. "Bourne" vs. "ksh93", a better discussion would be to think about an update of /sbin/sh (/usr/bin/sh is just a symlink which points to /sbin/sh in Solaris >= 10, e.g. there is no longer a seperate, statically linked /sbin/sh and a dynamically linked /usr/bin/sh) to a shell which conforms to the POSIX shell standard - this is the same goal as many Linux and *BSD distributions try to reach, AFAIK many of them have configured "bash" to run in "POSIX conformance mode" when started as /bin/sh or use alternative solutins (e.g. Debian allows various POSIX-like alternatives to be used). AFAIK we have two main choices: a. "bash" or b. "ksh93" I don't want to go into all the details but... [a] ... may be the choice if we want 100% compatibilty to Linux in all cases. If this choice is selected we need something like a "bash-integration"-project to get "bash" into a shape where it would be good enougth to fit into Solaris as "core shell" (remeber "ksh93r" (korn shell based on spec 1993, version 'r') was very good to begin with (as basis for the ksh93-integration project) but we still needed more than _three_ man-years (e.g. three people working for one year) to "overhaul" the whole shell and fix most of the Solaris-/i18n-/l10n-/usuabilty-/performance-/conformance-/etc.-bugs. A similar effort would be required for "bash" (and may need the same amount of man-years, AFAIK (only doing a quick look, AFAIK someone from the standards folks at Sun needs to run "bash" against the test suites and check what needs to be done (e.g. the same procedure we did for the "ksh93-integration"-project (which resulted in more than twelve months of mad bughunting))) at least some parts like the i18n support for non-UTF-8 locales need a complete rework (otherwise someone has to explain the Chinese and Japanese goverments why Indiana can't match their requirements (IMO the results of such a discussion are best to be observed from _behind_ a rock))). Finally we need lots of work to hunt down all the scripts (or create an automated way to do the "hunt") which break when "bash" replaced the Bourne shell, for example all scripts which use Bourne-style function syntax need to be checked (short: One of the major changes between "bash"/"ksh88" and "ksh93" is the way how Bourne/POSIX-style functions are handled (see http://svn.genunix.org/repos/on/branches/ksh93/gisburn/prototype005/usr/src/lib/libshell/common/COMPATIBILITY). "ksh93" follows precisely the POSIX standard (which follows Bourne function behaviour) while "bash"/"ksh88" have different scope behaviour. While this makes "ksh93" slightly incompatible to "ksh88" in some cases it makes it more compatible to the Bourne shell (and reverse makes "bash" slightly incompatible to the Bourne shell)). [b] ... may be easier if Solaris and/or Indiana aim more for the POSIX shell standard since "ksh93" is closer to the POSIX shell standard than "bash" in POSIX mode (this is one reason why the standards folks at Sun like to check whether they can replace /usr/xpg4/bin/sh with "ksh93") and we already have a project (the "ksh93-integration"-project) which is now more or less "done" with the first putback. Another advantage would be that "ksh93" is faster, more feature rich (some have stated that the floating-point math is the "killer feature", but IMO features like builtin commands, unlimited array size, unlimited variable string length, arrays with strings as index ("associative arrays", very usefull to manage lists and other data), variable trees ("compound variables") are features which matter more in the real world (floating-point math is still very usefull for some applications)), extensible (e.g. "ksh93" has an API to load builtin commands/functions/etc. on demand to extend it's functionality), very good i18n support (up to the issue that function and variable names may contain non-ASCII charatcers (e.g. function+variable names written in japanese aren't a problem anymore... :-) )) and avoids things like |fork()|+|exec()| (which is a feature unique of "ksh93", e.g. Bourne shell, "ksh88" and "bash" all create new child processes for subshells etc.) if possible (which is very important - first the |fork()| is a very expensive operation (new Unix process) and the |exec()| may serious harm scalabilty of a whole system (I try to explain this short: |exec()| requires to tear-down the address space of the current process and requires to make a cross-call to all CPUs in a system (note: This is a hardware implementation issue, this can't be fixed in the kernel). Now imagine a SF25k with 144 cores or a "victorial falls" machine with 128 or 256 (virtual) CPUs... or imagine a 8 socket machine with 16 cores with 16 threads per core - that are 2048 threads in a single machine (which can count as 2048 virtual cores in a "sun4v" machine) ... (yes, I've read TheRegister.co.uk) which requires 2048 cross-calls per _single_ |exec()| call. Or short: A simple shell script which triggers too many |fork()|+|exec()| calls can (there were several customer escalations running in the past exactly about the "|fork()|+|exec()| storm on large enterprise machine"-issue) starve a whole enterprise machine (and as a result a shell in Solaris should avoid doing this, otherwise it is on a Titanic-style collision course with Sun's multicore strategy))). 4. What needs to be done to replace /sbin/sh with "ksh93" ? Technicially replacing /sbin/sh with ksh93 is possible _now_, in fact the early ksh93-integration code prototypes and ARC drafts delivered a copy of ksh93 as /sbin/ksh93 and provided a build switch to install it as /sbin/sh, too. The feature was removed from PSARC 2006/550 because there was no _immediate_ demand for such a feature at that time but putting this stuff back for Indiana or any other OpenSolaris distribution shouldn't be a big problem (less than a day of work and a few days of testing (limited a bit by the performance of my Ultra5)). The problem is that one of the (Indiana, BeleniX etc.) project leads needs to say "we need it" - otherwise such a change (e.g. delivering a /sbin/ksh93 and a build switch to install ksh93 as /sbin/sh) won't pass the ARC. The only real-world problems (yes, I know that people can always write artificial tests to dig out more artificial incompatibilties but real-world scripts don't do that) we encountered in the last 14 months were: - The bash/ksh93 "unset" builtin returns a non-zero return code if the variable which should be "unset" is not available while the Bourne shell just returns a zero exit (="success") code in all cases. This issue is a problem with POSIX conformance of /sbin/sh since the POSIX standard (see http://www.opengroup.org/onlinepubs/000095399/utilities/unset.html) says: -- snip -- EXIT STATUS 0 All name operands were successfully unset. >0 At least one name could not be unset. -- snip -- Only one bug of this kind was ever found in Solaris (hidden in the OS/Net build system) which we corrected with http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6540124 - The bash/ksh93 "set" builtin returns a zero exit code in all cases while the Bourne shell returns the exit code of the previous command. Again a violation of the POSIX shell spec which says in http://www.opengroup.org/onlinepubs/000095399/utilities/set.html -- snip -- EXIT STATUS Zero. -- snip -- Until today we only found one incarnation of this problem which was corrected with http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6551716 That are all incompatiblites between ksh93 (or better: the POSIX shell standard (since all issues listed above are required by the POSIX shell standard)) and the Bourne shell so far which happened in real-world scripts (all of them vere easy to identify and correct, for example CR #6551716 was fixed in less than a week, including code review, RTI, offtopic discussions and all the other paperwork). For a migration of /sbin/sh to ksh93 (David Comay already wrote a nice email about the required prodedure) we need to find a way to identify the shell scripts which may break - for example one way may be to use ksh93's shell script compiler ("shcomp", e.g. it compiles ksh93 shell scripts into a bytecode which is then executed by "ksh93" (sort of a "javac" for shell scripts)) as some kind of "shell lint" to detect these incompatibilities (other items which need to be checked may include the old Bourne pipe syntax (e.g. $ echo "foo" ^ cat # and builtins like "chdir" etc. (they may not be used in modern scripts but are documented in the Bourne shell manual page)). It already has some checking code which can easily be enhanched to check for the two conditions above (e.g. "unset" or "set" followed by $? test etc.). AFAIK that's all what needs to be done for now to do the switch (from the technical side). ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) [EMAIL PROTECTED] \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;) _______________________________________________ indiana-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/indiana-discuss
