Re: [PATCH] IBM z/OS + EBCDIC support

Thorsten Glaser Sun, 26 Apr 2015 07:48:50 -0700

Hi again,

a few questions back, and a few early comments on the general
idea (I’ve not yet had the time to look at the patch itself):


Assume we have mksh running on your EBCDIC environment. Let me
ask a few questions about this sort of environment, coupled with
my guesses about it.

- the scripts themselves are 'iconv'd to EBCDIC?

- stuff like print/printf \x4F is expected to output '|' not 'O'

- what about \u20AC? UTF-8? UTF-EBCDIC?

- keyboard input is in EBCDIC?

- is there anything that allows Unicode input?


Daniel Richard G. dixit:

>conditionalized in the code. Primarily, EBCDIC has the normal
>[0-9A-Za-z] characters beyond 0x80, so it is not possible to set the high
>bit for signalling purposes---which mksh seems to do a lot of.

Indeed. You probably refer to the variable substitution stuff
(where '#'|0x80 is used for ${foo##bar}) and possibly the MAGIC
stuff in extglobs @(foo|bar).

That’s all legacy. I think it can go unconditionally.

>* Added clauses for TARGET_OS == "OS/390"

Is OS/390 always an EBCDIC environment?

>* '\012\015' != '\n\r' on this platform, so use the latter

Agreed. I think I can change most of the C code to use the
char versions, i.e. '\n' and '@' instead of 0x0A or 0x40.
I will have to see about Build.sh though.

Looking at the patch (editing this eMail later), it seems
you have conditionalised those things nicely. That is good!

*BUT* some things in Build.sh – at least the $lfcr thing –
are dependent on the *host* OS, not the *target* OS.

So, as far as I see, we will require two checks:

• host OS (machine running Build.sh): ASCII-based or EBCDIC?

• target OS (machine running the mksh/lksh binary created):
  z/OS ASCII, z/OS EBCDIC, or anything else?

Remember mksh is cross-buildable. So we’ll need to come up
with (compile-time) checks for all those.

>* When compiling with -g, xlc produces a .dbg file alongside each object
>  file, so clean those up

Good.

>* NSIG is, amazingly, not #defined on this platform. Sure would be nice
>  if the fancy logic that calculates NSIG could conditionally #define
>  it, rather than a TARGET_OS conditional... :-)

No, a TARGET_OS conditional is probably good here, as you cannot
really guess NSIG – as you noticed, you seem to have 37 signals,
the highest number of which is 39.

The best way to determine NSIG is to look at libc sources, followed
by looking at libc binaries (e.g. determine the size of sys_siglist).

>* Check whether "nroff -c" is supported---the system I'm using has GNU
>  nroff 1.17, which doesn't have -c

Ah, I see. Then yes, this does make sense in the GNU case.
(MirBSD has AT&T nroff.)

>* On this platform, xlc -qflag=... takes only one suboption, not two

Hm. Is this a platform thing, a compiler version thing, etc?

>* Some special flags are needed for xlc on z/OS that are not needed on
>  AIX, like to make missing #include files an error instead of a warning
>  (!). Conversely, most of those AIX xlc flags are not recognized

Can we get this sorted out so it continues working on AIX?
I do not have access to an AIX machine any longer, unfortunately.

(Later: conditionalised, looks good.)

>* Added a note that EBCDIC has \047 as the escape character
>  rather than \033

Do EBCDIC systems use ANSI escapes (like ESC [ 0 m) still?

>+++ check.pl
>
>* I was getting a parse error with an expected-exit value of "e != 0",
>  and adding \d to this regex fixed things... this wasn't breaking for
>  other folks?

No. I don’t pretend to know enough Perl to even understand that.

But I think I see the problem, looking at the two regexps. I think
that “+-=” is expanded as range, which includes digits in ASCII.
I’ll have to go through CVS and .tgz history and see whether this
was intentional or an accidental fuckup.

On that note, does anyone have / know of a complete set of pdsh
and pdksh history?

>+++ check.t
>
>* The "cd-pe" test fails on this system (perhaps it should be disabled?)
>  and the directories were not getting cleaned up properly

That fails on many systems. Sure, we can disable it.

What is $^O in Perl on your platform?

>* If compiling in ASCII mode, #define _ENHANCED_ASCII_EXT so that as
>  many C/system calls are switched to ASCII as possible (this is
>  something I was experimenting with, but it's not how most people would
>  be building/using mksh on this system)

So it’s possible to use ASCII on the system, but atypical?

>* Define symbols for some common character/string escape literals so we
>  can swap them out easily

OK.

>* Because EBCDIC characters like 'A' will have a negative value if
>  signed chars are being used, #define the ORD() macro so we can always
>  get an integer value in [0, 255]

Huh, Pascal anyone? :)

>+++ edit.c (back to patch order)

Here’s where we start going into Unicode land. This file is the one
that assumes UTF-8 the most.

>* I don't understand exactly what is_mfs() is used for, but I'm pretty
>  sure we can't do the & 0x80 with EBCDIC (note that e.g. 'A' == 0xC1)

Motion separator. It’s simply assumed that, when you e.g. jump
forwards word-wise, anything with bit7 set is Unicode and to be
jumped over, as we don’t have iswprint() et al.

(but we may get that eventually, yes igli you can rejoice, but
your complaining about missing [[:alpha:]] is not the reason)

>* Don't know much about XFUNC_VALUE(), but that & 0x7F looks un-kosher
>  for EBCDIC

No, that’s actually fine, that’s an enum (with < 128 values),
and the high bit is used here to swallow a trailing tilde,
like in ANSI Del (^[[3~).

>I will be happy to provide further testing and answer any questions
>as needed.

OK. This is just a start.


I’ll add the… hopefully not discouraging… comments now.
As I’ve said, I really like the enthusiasm, and absolutely
want you to continue with this. There is just a very big
thing:

One of mksh’s biggest strengths is that it’s consistent
across *all* platforms. An analogy, to help understand:

I don’t know how much you know about Microsoft Windows,
but they use CR+LF (\r\n) as line separators usually for
(old) native code. There are Unix-like environments for
it (Cygwin, and the much better Interix/SFU/SUA, and the
less-well-working UWIN and PW32), and you can compile mksh
for those; mksh will then behave as on Unix, i.e. require
LF-only (\n) line endings.
Someone has started to port mksh to native WinAPI, and
that port is not 100% compatible to mksh, just “similar”,
and uses it as base code. That implementation then can
use CR+LF.

By definition, mksh does all its I/O in binary mode (not
“text” mode, so no CR+LF or (old Macintosh) CR-only line
endings), and in the UTF-8 encoding of 16-bit Unicode as
charset.

I’ve got a suggestion for you here, though. Most of it
depends on some answers to the questions I had above,
this is just an initial rough draft, to be discussed.

I’ll merge most of the EBCDIC- and z/OS-related changes.
A future mksh release will compile for z/OS in ASCII mode
out of the box, and pass all of its tests there, if at
all possible. Even if this is not how a typical z/OS user
would use mksh, this should be easy.

You’ll be the maintainer of something we call mksh/zOS,
or something like that (or mksh/EBCDIC), which has a
separate $KSH_VERSION string. I was thinking either
“@(#)EBCDIC MKSH R…” or “@(#) Z/OS MKSH R…”, with LKSH
instead of MKSH for builds with -L, or just one string,
and you decide on whether you want “POSIX arithmetics”
mode always enabled or not (the main compile-time dif‐
ference of lksh) – but, why remove the flexibility.
I’d also ask Michael Langguth to make mksh/Win32 fit
this scheme (i.e. use something like “@(#) WIN32 MKSH”,
depending on what we agree on; currently, mksh/Win32
is based on mksh R39, so it didn’t have the lksh yet).

Details of this can be hashed out later. We can have
this in a range of varieties:
• you’ll ship mksh-ebcdic-*.tgz files from a separate
  repository
• I’ll ship them, from a separate repository
• we develop this in the same repo, in a separate branch
• or it could be a bunch of #ifdefs

To be honest, I’d prefer looking at the amount of ifdefs
before agreeing to the latter though. mksh/Win32 is also
separately developed; while the code is close to “main”
mksh, there *is* a patch, part of which I’d prefer to not
ship in mksh-R*.tgz itself. (But to keep the delta small
is a good aim.) This also allows for different development
tempo and release schedules.

As I said, I’ll gladly add “not-hurting” portability to
EBCDIC to the main code, e.g. remove the use of |0x80 as
flag magic. (I’ll come up with something, probably after
the R51 release though. I’ve got ideas.) But mksh uses
UTF-8, and my plans for it will only make this worse,
e.g. I’m planning to make some code use 16-bit Unicode
internally (though part of *this* may make EBCDIC easier
again). I cannot commit to keep supporting EBCDIC systems,
due to lack of resources (my own time, skills (I have no
experience with nōn-ASCII-based systems) and lack of such
machines).

Do you think you can help me out there and invest a little
time (a few hours per month I guess) and maintain a port
of mksh to EBCDIC-based systems (or even just z/OS) for
a long-ish time?

Do you think you can, or want to, develop this separately,
merging changes back and forth? (I can, of course, do most
of the changes-merging work, but you will have to be there
to deal with EBCDIC-specific façettes.)

This is all volunteer work, so I’ll understand if you cannot
or don’t want to commit to something long-lasting like this
either. But from the two messages you already sent, I presume
you have got some kind of interest ☺


Legalities: I just request that anything I merge is licenced
under The MirOS Licence¹, I don’t require anything like copy‐
right assignment or that, and I don’t even impose any licencing
terms on the derivates (like mksh/Win32), but I prefer they use
a BSD-style licencing scheme for the whole. (Michael said he’s
planning to publish the whole Win32-portability library under
BSD-ish terms as well.)

① I have once, on an OSI mailing list, stated requirements for
  a successor to The MirOS Licence. I don’t believe it will come
  to that a successor is written, but should there be one, I’d
  be happy to be able to switch the licence to it. Those mostly
  are: lawyer-written, also applies to neighbouring rights such
  as database law (in some EU countries). I wish the licence to
  be tailored to EU (mostly .de, as I live there) law, protect
  all involved (authors, contributors, licensors, licensees),
  but usable internationally as far as that’s possible. I don’t
  really wish to touch the topic of patent licences, but let it
  be understood that an implicit patent grant is included.


Urgh. I’m rambling again. Sorry about that.

bye,
//mirabilos
-- 
18:47⎜<mirabilos:#!/bin/mksh> well channels… you see, I see everything in the
same window anyway      18:48⎜<xpt:#!/bin/mksh> i know, you have some kind of
telnet with automatic pong         18:48⎜<mirabilos:#!/bin/mksh> haha, yes :D
18:49⎜<mirabilos:#!/bin/mksh> though that's more tinyirc – sirc is more comfy

Re: [PATCH] IBM z/OS + EBCDIC support

Reply via email to