On Sat, 2017 Apr 22 23:26+0000, Thorsten Glaser wrote:
>
> >Oh, so you mean like if(c=='[') and such? That is certainly
> >reasonable. The program would be tied to the compile-time codepage no
> >worse than most other programs.
>
> Right. So either something like -DMKSH_EBCDIC_CP=1047 or limiting
> EBCDIC support to precisely one codepage.

I don't think the former sort of directive should be necessary. There is
enough auto-conversion magic going on that it should be possible to
piggyback on that... where it all "just works" when you compile the code.

> >(If you could do everything in terms of character literals, without
> >depending on constructs like if(c>='A'&&c<='Z'), your code would be
> >pretty much EBCDIC-proof.)
> 
> Yesss… but…
> 
> ① not all characters are in every codepage, and

True, but ASCII should be a given. (There are some older EBCDIC
codepages that lack certain common characters, I forget which ones, but
no one will want to use those anyway.)

> ② I need strictly monotonous ordering for all 256 possible octets
>   for e.g. sorting strings in some cases and for [a-z] ranges

That sounds no worse than what is usually done for LC_COLLATE and
such...

> OK, I can live with that, so I just need to swap the conversion tables
> I got (which map 15 to NEL and 25 to LF).

Always thought it was funny that it's the weirdo mainframe platform
that has a proper "newline" character instead of pressing LF into
service as one  ^_^

> >    #pragma convert("ISO8859-1")
> […]
> >That may or may not be useful. Of course, the pragma would need to be
>
> Interesting, but I can’t think of where that would be useful at the
> moment. But good to know.
>
> Hmm. Can this be used to construct the table?
>
> Something like running this at configure time:
> 
> main() {
>       int i = 1;
> 
>       printf("#pragma convert(\"ISO8859-1\")\n");
>       printf("static const unsigned char map[] = \"");
>       while (i <= 255)
>               printf("%c", i++);
>       printf("\";\n");
> }
> 
> And then feed its output into the compiling, and have
> some code generating the reverse map like:
> 
>       i = 0;
>       while (i < 255)
>               revmap[map[i]] = i + 1;
> 
> But this reeks of fragility compared with supporting a known-good hand-
> edited set of codepages.

Probably easier just to use etoa(), or atoe()?  I don't think explicit
hand-edited tables should be needed for EBCDIC, unless you're already
doing those for other encodings.

> (Not to say we can’t do this manually once in order to actually _get_
> those mappings.)

Certainly the above code would either need some tweaking, or the output
some massaging, so the odd characters (especially '"') don't throw off
the compiler.

> >Let me know if I can help any more!
>
> Okay, sure, thanks. I must admit I’m not actively working on this
> still but I’m considering making a separate branch on which we can try
> things until they work, then merge it back.

I'm happy to test iterations of this, as long as it doesn't need much
diagnosing...

> But first, the character class changes themselves. That turned out to
> be quite a bit more effort than I had estimated and will keep me busy
> for another longish hacking session. Ugh. Oh well. But on the plus
> side, this will make support much nicer as *all* constructs like “(c
> >= '0' && c <= '9')” will go away and even the OS/2 TEXTMODE line
> endings (where CR+LF is also supported) need less cpp hackery.

Sounds great! That'll certainly make EBCDIC easier to deal with.

I might suggest looking at Gnulib, specifically lib/c-ctype.h, for
inspiration. I helped them get their ctype implementation in order on
z/OS (and at one point we were even trying to deal with *signed* EBCDIC
chars, where 'A' has a negative value!), and it works solidly now.
They've got a good design for dealing with non-ASCII weirdness; they
were clearly thinking of that from the start.


Happy hacking,


--Daniel


-- 
Daniel Richard G. || sk...@iskunk.org
My ASCII-art .sig got a bad case of Times New Roman.

Reply via email to