Re: POSIX character classes

2017-03-24 Thread Thorsten Glaser
Martijn Dekker dixit:

>> Can I get by making them match ASCII only even in UTF-8 mode?
>
>IMHO, that would defeat their primary purpose, namely locale-dependent
>class matching, so no, not really. :)
>
>If Greeks or Russians (or Germans, for that matter) can't count on
>[:upper:] matching an upper case letter in their alphabets, then I'd say

There’s no alphabets in UTF-8, only global Unicode.

>> Strictly speaking, POSIX requires only support for the C locale,
>[...]
>
>Yes, but on systems supporting other locales (e.g. UTF-8), it would not
>be conforming for character classes to match ASCII only. You either
>support UTF-8 or you don't.

For POSIX purposes, we really don’t, as we use our own routines
to read and write multibyte characters and handle them as wide
characters internally. We _really_ cannot use POSIX locales in
mksh at all. So if a system has 32-bit wchar_t and supports the
Unicode astral planes, mksh isn’t conforming in UTF-8 mode there
either. (POSIX does, however, not demand UTF-8 or Unicode support
at all, only the C locale, so that’s okay.)


The question was more whether [[:upper:]] matching [A-Z] would
be more useful than not matching anything at all.

bye,
//mirabilos
-- 
“It is inappropriate to require that a time represented as
 seconds since the Epoch precisely represent the number of
 seconds between the referenced time and the Epoch.”
-- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2


[Bug 1675842] [NEW] ^O vs. modified command lines

2017-03-24 Thread Thorsten Glaser
Public bug reported:

$ echo a
$ echo b
 should put “echo b” into the buffer (like 
) but doesn’t.

** Affects: mksh
 Importance: Wishlist
 Assignee: Thorsten Glaser (mirabilos)
 Status: Triaged

-- 
You received this bug notification because you are a member of mksh
Mailing List, which is subscribed to mksh.
Matching subscriptions: mkshlist-to-mksh-bugmail
https://bugs.launchpad.net/bugs/1675842

Title:
  ^O vs. modified command lines

Status in mksh:
  Triaged

Bug description:
  $ echo a
  $ echo b
   should put “echo b” into the buffer (like 
) but doesn’t.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mksh/+bug/1675842/+subscriptions


Re: POSIX character classes (was Re: pipes and sub-shells)

2017-03-24 Thread Martijn Dekker
Op 23-03-17 om 22:02 schreef Thorsten Glaser:
> Martijn Dekker dixit:
> 
>> * BUG_NOCHCLASS: POSIX-mandated character [:classes:] within bracket
>> [expressions] are not supported in glob patterns.
> 
> I really really REALLY hate that this will make mksh really big.
> We’re talking about 36K .rodata even without titlecase conversion
> and BMP-only (16-bit Unicode) here.

I sympathise.

Even fnmatch(3) is not compliant on all systems; the BSDs don't seem to
have caught up yet. :(  I don't suppose using that is an option in any
case because of mksh's extended globbing functionality.

Is adding 36k really that much in 2017? On my system, the current
development binary of mksh is 283k after stripping when built with -O2,
235k with -Os. Adding 36k would make it 316k/271k, still quite small.

If that's too much, I guess you should continue to not support them. The
reason modernish detects BUG_NOCHCLASS is not to make some sort of
statement, but to enable programs using the library to easily check for
the presence of the issue and implement alternative methods (such as
falling back to external commands, or just matching ASCII only without
character classes).

> Can I get by making them match ASCII only even in UTF-8 mode?

IMHO, that would defeat their primary purpose, namely locale-dependent
class matching, so no, not really. :)

If Greeks or Russians (or Germans, for that matter) can't count on
[:upper:] matching an upper case letter in their alphabets, then I'd say
for them it would be better to have no support than broken support.

> Strictly speaking, POSIX requires only support for the C locale,
[...]

Yes, but on systems supporting other locales (e.g. UTF-8), it would not
be conforming for character classes to match ASCII only. You either
support UTF-8 or you don't.

- M.



Re: pipes and sub-shells

2017-03-24 Thread Jean Delvare
Hi Martijn,

On Thu, 23 Mar 2017 16:49:33 +0100, Martijn Dekker wrote:
> Op 23-03-17 om 10:49 schreef Jean Delvare:
> > Apparently it requires a more recent version of mksh than we are
> > shipping:
> > 
> > $ echo $KSH_VERSION
> > @(#)MIRBSD KSH R50 2014/06/29 openSUSE
> 
> That version is quite ancient, so you should consider upgrading it to
> the latest. FYI, modernish 

I know, we are working up upgrading to version 50f. Upgrading to the
latest version on a released product is not an option, the risk of
regression is too high.

> currently detects the following bugs on it that are relevant for
> cross-shell programming. All except BUG_LNNOALIAS, BUG_LNNOEVAL and
> BUG_NOCHCLASS have been fixed in the current release. The former two are
> fixed in current cvs. The latter is a design decision from Thorsten that
> is nonetheless a bug in POSIX terms.

The only bugs I care about are the ones which my customers complain
about. Which sometimes are not even bugs ;-)

-- 
Jean Delvare
SUSE L3 Support