Hi again, a few questions back, and a few early comments on the general idea (I’ve not yet had the time to look at the patch itself):
Assume we have mksh running on your EBCDIC environment. Let me ask a few questions about this sort of environment, coupled with my guesses about it. - the scripts themselves are 'iconv'd to EBCDIC? - stuff like print/printf \x4F is expected to output '|' not 'O' - what about \u20AC? UTF-8? UTF-EBCDIC? - keyboard input is in EBCDIC? - is there anything that allows Unicode input? Daniel Richard G. dixit: >conditionalized in the code. Primarily, EBCDIC has the normal >[0-9A-Za-z] characters beyond 0x80, so it is not possible to set the high >bit for signalling purposes---which mksh seems to do a lot of. Indeed. You probably refer to the variable substitution stuff (where '#'|0x80 is used for ${foo##bar}) and possibly the MAGIC stuff in extglobs @(foo|bar). That’s all legacy. I think it can go unconditionally. >* Added clauses for TARGET_OS == "OS/390" Is OS/390 always an EBCDIC environment? >* '\012\015' != '\n\r' on this platform, so use the latter Agreed. I think I can change most of the C code to use the char versions, i.e. '\n' and '@' instead of 0x0A or 0x40. I will have to see about Build.sh though. Looking at the patch (editing this eMail later), it seems you have conditionalised those things nicely. That is good! *BUT* some things in Build.sh – at least the $lfcr thing – are dependent on the *host* OS, not the *target* OS. So, as far as I see, we will require two checks: • host OS (machine running Build.sh): ASCII-based or EBCDIC? • target OS (machine running the mksh/lksh binary created): z/OS ASCII, z/OS EBCDIC, or anything else? Remember mksh is cross-buildable. So we’ll need to come up with (compile-time) checks for all those. >* When compiling with -g, xlc produces a .dbg file alongside each object > file, so clean those up Good. >* NSIG is, amazingly, not #defined on this platform. Sure would be nice > if the fancy logic that calculates NSIG could conditionally #define > it, rather than a TARGET_OS conditional... :-) No, a TARGET_OS conditional is probably good here, as you cannot really guess NSIG – as you noticed, you seem to have 37 signals, the highest number of which is 39. The best way to determine NSIG is to look at libc sources, followed by looking at libc binaries (e.g. determine the size of sys_siglist). >* Check whether "nroff -c" is supported---the system I'm using has GNU > nroff 1.17, which doesn't have -c Ah, I see. Then yes, this does make sense in the GNU case. (MirBSD has AT&T nroff.) >* On this platform, xlc -qflag=... takes only one suboption, not two Hm. Is this a platform thing, a compiler version thing, etc? >* Some special flags are needed for xlc on z/OS that are not needed on > AIX, like to make missing #include files an error instead of a warning > (!). Conversely, most of those AIX xlc flags are not recognized Can we get this sorted out so it continues working on AIX? I do not have access to an AIX machine any longer, unfortunately. (Later: conditionalised, looks good.) >* Added a note that EBCDIC has \047 as the escape character > rather than \033 Do EBCDIC systems use ANSI escapes (like ESC [ 0 m) still? >+++ check.pl > >* I was getting a parse error with an expected-exit value of "e != 0", > and adding \d to this regex fixed things... this wasn't breaking for > other folks? No. I don’t pretend to know enough Perl to even understand that. But I think I see the problem, looking at the two regexps. I think that “+-=” is expanded as range, which includes digits in ASCII. I’ll have to go through CVS and .tgz history and see whether this was intentional or an accidental fuckup. On that note, does anyone have / know of a complete set of pdsh and pdksh history? >+++ check.t > >* The "cd-pe" test fails on this system (perhaps it should be disabled?) > and the directories were not getting cleaned up properly That fails on many systems. Sure, we can disable it. What is $^O in Perl on your platform? >* If compiling in ASCII mode, #define _ENHANCED_ASCII_EXT so that as > many C/system calls are switched to ASCII as possible (this is > something I was experimenting with, but it's not how most people would > be building/using mksh on this system) So it’s possible to use ASCII on the system, but atypical? >* Define symbols for some common character/string escape literals so we > can swap them out easily OK. >* Because EBCDIC characters like 'A' will have a negative value if > signed chars are being used, #define the ORD() macro so we can always > get an integer value in [0, 255] Huh, Pascal anyone? :) >+++ edit.c (back to patch order) Here’s where we start going into Unicode land. This file is the one that assumes UTF-8 the most. >* I don't understand exactly what is_mfs() is used for, but I'm pretty > sure we can't do the & 0x80 with EBCDIC (note that e.g. 'A' == 0xC1) Motion separator. It’s simply assumed that, when you e.g. jump forwards word-wise, anything with bit7 set is Unicode and to be jumped over, as we don’t have iswprint() et al. (but we may get that eventually, yes igli you can rejoice, but your complaining about missing [[:alpha:]] is not the reason) >* Don't know much about XFUNC_VALUE(), but that & 0x7F looks un-kosher > for EBCDIC No, that’s actually fine, that’s an enum (with < 128 values), and the high bit is used here to swallow a trailing tilde, like in ANSI Del (^[[3~). >I will be happy to provide further testing and answer any questions >as needed. OK. This is just a start. I’ll add the… hopefully not discouraging… comments now. As I’ve said, I really like the enthusiasm, and absolutely want you to continue with this. There is just a very big thing: One of mksh’s biggest strengths is that it’s consistent across *all* platforms. An analogy, to help understand: I don’t know how much you know about Microsoft Windows, but they use CR+LF (\r\n) as line separators usually for (old) native code. There are Unix-like environments for it (Cygwin, and the much better Interix/SFU/SUA, and the less-well-working UWIN and PW32), and you can compile mksh for those; mksh will then behave as on Unix, i.e. require LF-only (\n) line endings. Someone has started to port mksh to native WinAPI, and that port is not 100% compatible to mksh, just “similar”, and uses it as base code. That implementation then can use CR+LF. By definition, mksh does all its I/O in binary mode (not “text” mode, so no CR+LF or (old Macintosh) CR-only line endings), and in the UTF-8 encoding of 16-bit Unicode as charset. I’ve got a suggestion for you here, though. Most of it depends on some answers to the questions I had above, this is just an initial rough draft, to be discussed. I’ll merge most of the EBCDIC- and z/OS-related changes. A future mksh release will compile for z/OS in ASCII mode out of the box, and pass all of its tests there, if at all possible. Even if this is not how a typical z/OS user would use mksh, this should be easy. You’ll be the maintainer of something we call mksh/zOS, or something like that (or mksh/EBCDIC), which has a separate $KSH_VERSION string. I was thinking either “@(#)EBCDIC MKSH R…” or “@(#) Z/OS MKSH R…”, with LKSH instead of MKSH for builds with -L, or just one string, and you decide on whether you want “POSIX arithmetics” mode always enabled or not (the main compile-time dif‐ ference of lksh) – but, why remove the flexibility. I’d also ask Michael Langguth to make mksh/Win32 fit this scheme (i.e. use something like “@(#) WIN32 MKSH”, depending on what we agree on; currently, mksh/Win32 is based on mksh R39, so it didn’t have the lksh yet). Details of this can be hashed out later. We can have this in a range of varieties: • you’ll ship mksh-ebcdic-*.tgz files from a separate repository • I’ll ship them, from a separate repository • we develop this in the same repo, in a separate branch • or it could be a bunch of #ifdefs To be honest, I’d prefer looking at the amount of ifdefs before agreeing to the latter though. mksh/Win32 is also separately developed; while the code is close to “main” mksh, there *is* a patch, part of which I’d prefer to not ship in mksh-R*.tgz itself. (But to keep the delta small is a good aim.) This also allows for different development tempo and release schedules. As I said, I’ll gladly add “not-hurting” portability to EBCDIC to the main code, e.g. remove the use of |0x80 as flag magic. (I’ll come up with something, probably after the R51 release though. I’ve got ideas.) But mksh uses UTF-8, and my plans for it will only make this worse, e.g. I’m planning to make some code use 16-bit Unicode internally (though part of *this* may make EBCDIC easier again). I cannot commit to keep supporting EBCDIC systems, due to lack of resources (my own time, skills (I have no experience with nōn-ASCII-based systems) and lack of such machines). Do you think you can help me out there and invest a little time (a few hours per month I guess) and maintain a port of mksh to EBCDIC-based systems (or even just z/OS) for a long-ish time? Do you think you can, or want to, develop this separately, merging changes back and forth? (I can, of course, do most of the changes-merging work, but you will have to be there to deal with EBCDIC-specific façettes.) This is all volunteer work, so I’ll understand if you cannot or don’t want to commit to something long-lasting like this either. But from the two messages you already sent, I presume you have got some kind of interest ☺ Legalities: I just request that anything I merge is licenced under The MirOS Licence¹, I don’t require anything like copy‐ right assignment or that, and I don’t even impose any licencing terms on the derivates (like mksh/Win32), but I prefer they use a BSD-style licencing scheme for the whole. (Michael said he’s planning to publish the whole Win32-portability library under BSD-ish terms as well.) ① I have once, on an OSI mailing list, stated requirements for a successor to The MirOS Licence. I don’t believe it will come to that a successor is written, but should there be one, I’d be happy to be able to switch the licence to it. Those mostly are: lawyer-written, also applies to neighbouring rights such as database law (in some EU countries). I wish the licence to be tailored to EU (mostly .de, as I live there) law, protect all involved (authors, contributors, licensors, licensees), but usable internationally as far as that’s possible. I don’t really wish to touch the topic of patent licences, but let it be understood that an implicit patent grant is included. Urgh. I’m rambling again. Sorry about that. bye, //mirabilos -- 18:47⎜<mirabilos:#!/bin/mksh> well channels… you see, I see everything in the same window anyway 18:48⎜<xpt:#!/bin/mksh> i know, you have some kind of telnet with automatic pong 18:48⎜<mirabilos:#!/bin/mksh> haha, yes :D 18:49⎜<mirabilos:#!/bin/mksh> though that's more tinyirc – sirc is more comfy