Martijn Dekker <mart...@inlv.org> wrote:

> Op 18-06-20 om 14:11 schreef Joerg Schilling:
> > Is my impression correct that you did not use the modifications from Redhat
> > people that introduced non-portability, many bugs and a slowdown?
>
> Yes.

This is really good news.

> I don't know much about malloc, but what you describe sounds rather 
> dodgy. It evidently works though, so I'm leaving well enough alone.

I was just making an example as these redhat people seem to use a static code 
analyzer that emits warnings for things that do not apply to the AT&T 
implementation contained in the ksh93 tree.

> It may turn out to be an advantage for this bugfix project that my 
> primary expertise is in the shell language itself, whereas my C skills 
> are at an intermediate level. I am simply not well-versed enough in C to 
> even think about touching things like malloc. On the other hand, if some 
> change breaks the shell language, I am likely to notice it quickly.

So your expertise seems to be better than the one from the other people
that tried to work on ksh93. An important skill is to know when your skills
are not sufficient. This is why I told these other people to wait for at least 
6 years before making important changes. This is what I did when I started to 
work on the Bourne Shell before I was sure I could enhance the parser and 
interpreter without destroying things.

David Korn is a mathematician and he thinks different. He designed supersonic 
wings before he joined AT&T. This makes it even harder to understand his 
highly optimized code.

> There's also a policy of documenting in every commit message not only 
> what it fixes, but also how it fixes it. This helps improve everyone's 
> understanding of this very poorly commented codebase, while preserving 
> this documentation for future generations without depending on GitHub in 
> perpetuity. The idea is that 'git blame' will lead you straight to the 
> rationale for every change.

I will have a look at it.

I offered the redhat people to make code review for them, but they ignored my 
comments :-(

> > ++  It still uses the mature and highly portable ksh93
> >     build environment.
>
> I have no intention of changing the build system, but I would question 
> your praise for it. In my experience, it is currently very brittle. It 
> failed to build on modern Macs before I fixed it for that. It currently 
> fails to build on at least Solaris (see below), NetBSD, as well as AIX 
> and QNX (those latter two as hosted by polarhome.com).

I have no idea why it fails, but you may be interested to know that it is based 
on the same basic idea as the Schily Makefile system. It started a year after 
the Schily Makefilesystem and both implementations are based on the idea 
that a leaf Makefile should basically only list the source files and some 
project specific -Ddefines and the build system should know how to build a 
library or a command from it on the various target platforms.

The way this concept is implemented is different.

The Schily Makefile system implements the needed extertize in the "make 
language", while the ast system enhanced make to "nmake" and the knowledge 
there is compiled into nmake.

The Schily system only compiles a bootstrap smake binary because the other 
portable make program "gmake" does only work correctly on a very limited 
number of target platforms.

> And if I restore nmake (which got removed along with the other 
> superfluous non-ksh AST utilities), it fails to build on even more 

This is interesting. I thought the bootstrap code is only to compile nmake
and after that you need nmake....

> In any case, I don't even know where to begin thinking about changing 
> build systems, and there is community support for keeping the current 
> one. So it is just going to have to be fixed instead... somehow.
>
> Anyone who cares about and understands this build sytem, PLEASE send 
> patches or pull requests that make it build on systems it currently 
> fails on.
>
> > ++  It still compiles out of the box on Solaris
>
> That's amazing to me, because for me it consistently fails to compile on 
> the pristine Solaris 11.3 and 11.4 evaluation VMs as downloaded directly 
> from Oracle. The problem is that the 'dll' iffe feature test fails to 
> write the Dllscan_t structure and a few others to a generated header 
> file. The build then fails because Dllscan_t is an unknown identifier.

I compiled on OpenSolaris, so it may be that the changes from Oracle after 
August 2010 did cause this to fail.

> > -   It does not compile/use libshell.
>
> Well, it could have fooled me:
>
> $ find arch -name libshell\*
> arch/darwin.i386-64/lib/libshell.a.old
> arch/darwin.i386-64/lib/libshell.a
> arch/darwin.i386-64/src/cmd/ksh93/libshell.a

OK, thanks for the hint.

The file xec.c is at usr/src/lib/libshell/common/sh on OpenSolaris.
This is why I thought there is no libshell.

> > -   Using the additional compile options "-O -fast -xspace", it
> >     is still 2.5% slower than the OpenSolaris ksh93.
>
> It would be great if you could help isolate the commit that caused the 
> slowdown.
>
> The most obvious candidates are commits that introduce calls to 
> sh_subfork() as workarounds for various fatal defects in the 
> non-forking/virtual subshell mechanism.

I would guess that the lazy loading and the use of shared libs is also
important. This may reduce the total size of the ksh93 process while 
executing shell scripts and as a result increase the performance of a 
fork() or vfork() call.

> Correctness trumps speed, so removing those workarounds would require 
> implementing a corresponding fix for the non-forking subshell mechanism 
> instead -- without introducing regressions.

The question is whether this is important here.
I replaced a lot of () in my fork of autoconf by {} in order to avoid
the need for forking. So this should not affect my test case too much.
I did this after I could compile on IRIX, where the shell is as if time did 
stop in 1982. So I could check where () is needed in order to work around 
bugs n the early Bourne Shell.

> If you can figure out which one of those workarounds slows it down that 
> much for you, then that would tell me something about which one to 
> prioritise.
>
> Although -- another potential candidate is commmit 8b07d2a0 which 
> removed a bad use of memccpy() on potentially overlapping buffers in 
> sfputr() in libast. Imagine their comment actually blamed memccpy() for 
> being broken on ia64 while memccpy(3) clearly states that overlapping 
> buffers cause undefined behaviour! Fixing that (by removing the bad use 
> of memccpy, so the plain copying loop is always used) made various 
> crashing bugs on the Mac disappear into a poof of correctness. On my 
> system I can't detect any slower performance as a result, but it might 
> be different on Solaris.

I need to check when I have time for that.

> > Also note that OpenSlaris implements kernel support for compiled ksh93
> > scripts. Is this still supported?
>
> I haven't the foggiest.

ksh93 is able to read a shell script and to store the binary output from the 
parser in a file with a specific magic number. Solaris knows about this and
calls ksh93 the right way from the kernel to execute such a file.

> We're not touching the structure at all, we're just fixing bugs.
> Many, many, many bugs.

This is good news.

Jörg

-- 
 EMail:jo...@schily.net                    (home) Jörg Schilling D-13353 Berlin
    joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'

Reply via email to