*Synopsis*: ksh93 hangs in situations that ksh handles okay
CR 6631006 changed on Oct 28 2009 by <User 1-5HNZ8F>
=== Field ============ === New Value ============= === Old Value =============
Commit to Fix in Build snv_128 snv_127
Fixed in Build snv_128
Program Management Fix Integrated into Source New Defect
Status 8-Fix Available 7-Fix in Progress
====================== =========================== ===========================
*Change Request ID*: 6631006
*Synopsis*: ksh93 hangs in situations that ksh handles okay
Product: solaris
Category: shell
Subcategory: korn93
Type: Defect
Subtype:
Status: 8-Fix Available
Substatus:
Priority: 2-High
Introduced In Release: solaris_nevada
Introduced In Build: snv_72
Responsible Engineer: <User 1-7MTUEB>
Keywords: oss-request, oss-sponsor
=== *Description* ============================================================
This morning some of the elements in my $PATH were inaccessible due to
an offline NFS server. When I logged in, my GNOME terminal window
didn't give me a shell prompt. When I entered a ^C, the window went
away.
I then logged in as root and hid /usr/bin/ksh93, so that my login
scripts would use /usr/bin/ksh instead of /usr/bin/ksh93. I then
logged in as myself, and my GNOME terminal window window gave me the
expected prompt.
I then tried running ksh93 by hand; it was indeed stuck trying to
access one of the inaccessible directories:
athyra$ truss -p 8666
stat("/ws/onnv-tools/onbld/bin", 0xFFFFFFFF7FFFE588) (sleeping...)
This appears to be hard to recover from, since one usually needs a
functional shell before one can change one's shell.
ksh93 needs to be at least as robust as the Solaris ksh in
circumstances like this before it can replace the Solaris ksh.
And there's some question in my mind whether ksh93 should be the
default root shell if it hangs in situations like this. (Though I
suppose it's questionable practice for root to have NFS directories in
its PATH. So maybe this isn't a critical issue.)
*** (#1 of 2): 2007-11-16 18:04:11 GMT+00:00 <User 1-5Q-12482>
[dep, 15Apr2009]
This is especially bad considering ksh93 is installed as /bin/sh,
which means every system(3C) call will hang on startup regardless of
its dependence on PATH resolution beyond known local entries (usually
first in one's path for this reason).
*** (#2 of 2): 2009-04-16 00:44:28 GMT+00:00 <User 1-5Q-4224>
=== *Public Comments* ========================================================
Are you able to reproduce it with build 111?
*** (#1 of 6): 2009-04-16 07:32:20 GMT+00:00 <User 1-1SURPB>
[dep, 16Apr2009]
ksh93 appears to have the same behavior on build 112.
*** (#2 of 6): 2009-04-16 20:14:59 GMT+00:00 <User 1-5Q-4224>
[dep, 13Aug2009]
(In response to an unnecessarily non-public comment claiming this has
something to do with fancy stuck filesystem detection in Sun's ksh88,
and that somehow caching file descriptors and using openat will
magically solve the problem.)
There is *NOT* a matter of Sun's ksh detecting stuck filesystems.
This is a matter of ksh93 scanning your entire path on startup,
whereas Sun's ksh (and more importantly, sh) simply did not.
Period.
My PATH:
; echo $PATH
/home/dep/private/bin:/home/dep/bin/i386:/home/dep/bin:/usr/bin:/usr/sbin:/usr/openwin/bin:/usr/sfw/bin:/ws/onnv-tools/SUNWspro/SS11/bin:/ws/onnv-tools/SUNWspro/SOS8/bin:/ws/onnv-tools/onbld/bin:/ws/onnv-tools/onbld/bin/i386:/usr/ccs/bin:/usr/java/bin
Eliminate effect of dot files:
; mkdir /tmp/foo
; HOME=/tmp/foo
stats and opens from ksh (or /usr/xpg4/bin/sh):
; truss -t stat,open ksh
stat64("/usr/bin/ksh", 0x08047608) = 0
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
stat64("/lib/libc.so.1", 0x08046E08) = 0
open("/lib/libc.so.1", O_RDONLY) = 3
stat64("/home/dep", 0x080476F0) = 0
stat64(".", 0x08047780) = 0
stat64("/home/dep", 0x08047720) = 0
stat64(".", 0x080477B0) = 0
stat64("/home/dep", 0x08047720) = 0
stat64(".", 0x080477B0) = 0
open64("", O_RDWR|O_APPEND|O_CREAT, 0600) Err#2 ENOENT
open64("/tmp/sh827332.1", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
$
stats and opens from sh:
; truss -t stat,open sh
stat64("/sbin/sh", 0x08047610) = 0
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
stat64("/lib/libc.so.1", 0x08046E10) = 0
open("/lib/libc.so.1", O_RDONLY) = 3
$
stats and opens from ksh93:
; truss -t stat,open ksh93
stat64("/usr/bin/ksh93", 0x08047604) = 0
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
stat64("/lib/libc.so.1", 0x08046E04) = 0
open("/lib/libc.so.1", O_RDONLY) = 3
open("/proc/self/auxv", O_RDONLY) = 3
stat("/usr/bin/amd64/ksh93", 0xFFFFFD7FFFDFF550) = 0
open("/var/ld/64/ld.config", O_RDONLY) Err#2 ENOENT
stat("/lib/64/libc.so.1", 0xFFFFFD7FFFDFE9F0) = 0
open("/lib/64/libc.so.1", O_RDONLY) = 3
stat("/lib/64/libshell.so.1", 0xFFFFFD7FFFDFEBC0) Err#2 ENOENT
stat("/usr/lib/64/libshell.so.1", 0xFFFFFD7FFFDFEBC0) = 0
open("/usr/lib/64/libshell.so.1", O_RDONLY) = 3
stat("/lib/64/libcmd.so.1", 0xFFFFFD7FFFDFE730) Err#2 ENOENT
stat("/usr/lib/64/libcmd.so.1", 0xFFFFFD7FFFDFE730) = 0
open("/usr/lib/64/libcmd.so.1", O_RDONLY) = 3
stat("/lib/64/libast.so.1", 0xFFFFFD7FFFDFE2A0) Err#2 ENOENT
stat("/usr/lib/64/libast.so.1", 0xFFFFFD7FFFDFE2A0) = 0
open("/usr/lib/64/libast.so.1", O_RDONLY) = 3
stat("/lib/64/libm.so.2", 0xFFFFFD7FFFDFE730) = 0
open("/lib/64/libm.so.2", O_RDONLY) = 3
stat("/dev/null", 0xFFFFFD7FFFDFF180) = 0
stat("/home/dep", 0xFFFFFD7FFFDFF110) = 0
stat(".", 0xFFFFFD7FFFDFF190) = 0
stat("/home/dep/private/bin", 0xFFFFFD7FFFDFF4C0) = 0
open("/home/dep/private/bin/.paths", O_RDONLY) Err#2 ENOENT
stat("/home/dep/bin/i386", 0xFFFFFD7FFFDFF4C0) = 0
open("/home/dep/bin/i386/.paths", O_RDONLY) Err#2 ENOENT
stat("/home/dep/bin", 0xFFFFFD7FFFDFF4C0) = 0
open("/home/dep/bin/.paths", O_RDONLY) Err#2 ENOENT
stat("/usr/bin", 0xFFFFFD7FFFDFF4C0) = 0
open("/usr/bin/.paths", O_RDONLY) Err#2 ENOENT
stat("/usr/sbin", 0xFFFFFD7FFFDFF4C0) = 0
open("/usr/sbin/.paths", O_RDONLY) Err#2 ENOENT
stat("/usr/openwin/bin", 0xFFFFFD7FFFDFF4C0) = 0
open("/usr/openwin/bin/.paths", O_RDONLY) Err#2 ENOENT
stat("/usr/sfw/bin", 0xFFFFFD7FFFDFF4C0) = 0
open("/usr/sfw/bin/.paths", O_RDONLY) Err#2 ENOENT
stat("/ws/onnv-tools/SUNWspro/SS11/bin", 0xFFFFFD7FFFDFF4C0) = 0
open("/ws/onnv-tools/SUNWspro/SS11/bin/.paths", O_RDONLY) Err#2 ENOENT
stat("/ws/onnv-tools/SUNWspro/SOS8/bin", 0xFFFFFD7FFFDFF4C0) = 0
open("/ws/onnv-tools/SUNWspro/SOS8/bin/.paths", O_RDONLY) Err#2 ENOENT
stat("/ws/onnv-tools/onbld/bin", 0xFFFFFD7FFFDFF4C0) = 0
open("/ws/onnv-tools/onbld/bin/.paths", O_RDONLY) Err#2 ENOENT
stat("/ws/onnv-tools/onbld/bin/i386", 0xFFFFFD7FFFDFF4C0) = 0
open("/ws/onnv-tools/onbld/bin/i386/.paths", O_RDONLY) Err#2 ENOENT
stat("/usr/ccs/bin", 0xFFFFFD7FFFDFF4C0) = 0
open("/usr/ccs/bin/.paths", O_RDONLY) Err#2 ENOENT
stat("/usr/java/bin", 0xFFFFFD7FFFDFF4C0) = 0
open("/usr/java/bin/.paths", O_RDONLY) Err#2 ENOENT
open("/etc/ksh.kshrc", O_RDONLY) = 3
open("/tmp/foo/.kshrc", O_RDONLY) Err#2 ENOENT
open("", O_RDWR|O_APPEND|O_CREAT, 0600) Err#2 ENOENT
open("/tmp/astv6s.919", O_RDWR|O_APPEND|O_CREAT, 0600) = 3
Received signal #18, SIGCLD, in waitid() [caught]
siginfo: SIGCLD CLD_EXITED pid=827335 status=0x0000
<email address omitted>:/home/dep$
As you can see, even though nothing actually made use of my PATH,
ksh93 performed a stat and open for each PATH element. This
preliminary scan of the path is costly and unnecessary, and makes
ksh93 unusable in many situations.
Even bash doesn't do this (it searches PATH to find itself, but only
uses as much as it needs):
; truss -t open,stat bash
stat64("/usr/bin/bash", 0x08047608) = 0
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
stat64("/lib/libcurses.so.1", 0x08046E08) = 0
open("/lib/libcurses.so.1", O_RDONLY) = 3
stat64("/lib/libsocket.so.1", 0x08046E08) = 0
open("/lib/libsocket.so.1", O_RDONLY) = 3
stat64("/lib/libnsl.so.1", 0x08046E08) = 0
open("/lib/libnsl.so.1", O_RDONLY) = 3
stat64("/lib/libdl.so.1", 0x08046E08) = 0
open("/lib/libdl.so.1", O_RDONLY) = 3
stat64("/lib/libc.so.1", 0x08046E08) = 0
open("/lib/libc.so.1", O_RDONLY) = 3
open64("/dev/tty", O_RDWR|O_NONBLOCK) = 3
stat64("/dev/pts/0", 0x08047810) = 0
open64("/var/run/name_service_door", O_RDONLY) = 3
stat64("/home/dep", 0x08047690) = 0
stat64(".", 0x08047720) = 0
stat64(".", 0x080476C0) = 0
stat64("/home/dep/private/bin/bash", 0x080475C0) Err#2 ENOENT
stat64("/home/dep/bin/i386/bash", 0x080475C0) Err#2 ENOENT
stat64("/home/dep/bin/bash", 0x080475C0) Err#2 ENOENT
stat64("/usr/bin/bash", 0x080475C0) = 0
stat64("/usr/bin/bash", 0x080475E0) = 0
open64("/tmp/foo/.bashrc", O_RDONLY) Err#2 ENOENT
open64("", O_RDONLY) Err#2 ENOENT
open("/home/dep/.terminfo/x/xterm", O_RDONLY) Err#2 ENOENT
open("/usr/share/lib/terminfo//x/xterm", O_RDONLY) = 4
Received signal #20, SIGWINCH [caught]
stat64("/tmp/foo/.inputrc", 0x08046ED0) Err#2 ENOENT
stat64("/etc/inputrc", 0x08046ED0) Err#2 ENOENT
Received signal #20, SIGWINCH [caught]
bash-3.2$
Moreover, none of zsh, csh, nor tcsh scan the PATH on startup. This
is a ksh93-only phenomenon.
*** (#3 of 6): 2009-08-13 21:07:10 GMT+00:00 <User 1-5Q-4224>
Update from Roland:
1. the original ksh88i build from the AT&T sources behaves the same way as
ksh93 version 's' and scans the PATH at startup. That's why I _guessed_ that
someone has modified Solaris's ksh88 to behave differently (as a side-effect
neither Solaris ksh88 or the derived /usr/xpg4/bin/sh conform to POSIX/SUS if
they no longer check for this (see [2])).
2. The POSIX/SUS standard _requires_ that shells scan all elements of PATH when
they try to find a command. This even happens for builtin commands when they
are bound to a specific PATH since such bound builtins are only allowed to be
executed if there is a matching file in the filesystem
3. the results of the PATH scan are allowed to be cached. That's why we're
going to switch to |openat()| the directories in PATH at the time when PATH is
set/changed for one of the next ksh93 versions (but first we need to complete
ksh93-integration update2) - if there is a way to detect stuck NFS filesystems
we're going to add the matching code with that version
*** (#4 of 6): 2009-09-17 10:54:08 GMT+00:00 <User 1-5Q-6276>
POSIX/SUS says (definition of PATH) that:
"The list shall be searched from beginning to end, applying the filename
to each prefix, until an executable file with the specified name and
appropriate execution permissions is found."
So, the shell doesn't need to scan all elements of PATH.
*** (#5 of 6): 2009-09-17 18:34:44 GMT+00:00 <User 1-5Q-4028>
Copying the evaluation to public comments here, so Roland can read it.
====
There are two scenarios where shell can hang when NFS path is present in PATH
variable.
*) When ksh93 is invoked it does stat on all directories which is present in
PATH variable and it tries to open .paths file. If NFS directory is present in
PATH which is not reachable then ksh93 shell hangs.
1 86632 open:entry /usr/openwin/bin/.paths
libc.so.1`__open_syscall+0xa
libc.so.1`open+0x137
libshell.so.1`path_chkpaths+0xcc
libshell.so.1`path_addcomp+0x3f2
libshell.so.1`path_addpath+0xc1
libshell.so.1`path_init+0x70
libshell.so.1`path_opentype+0x51
libshell.so.1`path_open+0xb
libshell.so.1`sh_source+0x30
libshell.so.1`sh_main+0x43f
ksh93`main+0x52
ksh93`0x400ccc
*) When a command is being executed under ksh93. It does a stat on file in all
the directories under PATH. Which can also can cause hang if NFS fileserver is
not reachable.
# dtrace -n 'syscall::*stat*:entry /execname=="ksh93"/{
trace(copyinstr(arg0));}'
dtrace: description 'syscall::*stat*:entry ' matched 15 probes
CPU ID FUNCTION:NAME
1 86658 stat:entry /opt/SUNWspro/bin/ls
1 86658 stat:entry /usr/X11R6/bin/ls
1 86658 stat:entry /usr/dt/bin/ls
1 86658 stat:entry /usr/local/bin/ls
1 86658 stat:entry /usr/bin/ls
1 86774 lstat:entry /usr/bin/ls
====
*** (#6 of 6): 2009-10-07 10:50:51 GMT+00:00 <User 1-5Q-5197>
=== *Workaround* =============================================================
=== *Additional Details* =====================================================
Targeted Release: solaris_nevada
Commit To Fix In Build: snv_128
Fixed In Build: snv_128
Integrated In Build:
Verified In Build:
See Also: 6437624, 6793763
Duplicate of:
Hooks:
Hook1:
Hook2:
Hook3:
Hook4:
Hook5: <email address omitted>
Hook6: <email address omitted>
Program Management: Fix Integrated into Source
Root Cause: Other - see Research Activity
Fix Affects Documentation: No
Fix Affects Localization: No
=== *History* ================================================================
Date Submitted: 2007-11-16 18:04:11 GMT+00:00
Submitted By: <User 1-5Q-12482>
Status Changed Date Updated Updated By
3-Accepted 2008-08-20 22:57:39 GMT+00:00 <User 1-5Q-5151>
2-Incomplete 2009-04-16 07:32:19 GMT+00:00 <User 1-1SURPB>
3-Accepted 2009-04-16 20:14:59 GMT+00:00 <User 1-5Q-4224>
5-Cause Known 2009-10-06 09:35:50 GMT+00:00 <User 1-GN0KC>
7-Fix in Progress 2009-10-23 17:23:50 GMT+00:00 <User 1-7MTUEB>
8-Fix Available 2009-10-28 18:23:28 GMT+00:00 <User 1-5HNZ8F>
=== *Service Request* ========================================================
Impact: Significant
Functionality: Secondary
Severity: 3
Product Name: solaris
Product Release: solaris_nevada
Product Build:
Operating System: snv_77
Hardware: ultrasparc
Submitted Date: 2007-11-16 18:04:11 GMT+00:00
=== *Service Request* ========================================================
Impact: Significant
Functionality: Primary
Severity: 2
Product Name: solaris
Product Release: solaris_nevada
Product Build: snv_110
Operating System: snv_110
Hardware: generic
Submitted Date: 2009-04-16 00:44:28 GMT+00:00
=== *Service Request* ========================================================
Impact: Critical
Functionality: Primary
Severity: 1
Product Name: solaris
Product Release: solaris_nevada
Product Build: snv_122
Operating System: snv_122
Hardware: generic
Submitted Date: 2009-09-15 17:34:12 GMT+00:00
=== *Multiple Release (MR) Cluster* - 0 ======================================