Re: [ast-users] ksh93 double byte space handling

lijo george Sun, 30 Apr 2017 12:32:16 -0700

So I guess the observed behaviour is not a bug but intended behaviour.

It's interesting that this used to work for the old ksh88 version, which
might have been due to less
complicated parsing mechanism.


Thanks,
Lijo



On Wed, Apr 26, 2017 at 12:57 AM, Richard Hamilton <rlham...@gmail.com>
wrote:

> I'm going to consider this _without_ looking at the ksh source, because
> mortals will at most look at documentation (and because documentation
> should be accurate enough that they shouldn't _have_ to look at source).
>
> My very cursory reading of the man page* is a bit ambiguous whether that
> should work:
>
>        A  blank  is a tab or a space.  An identifier is a sequence of
> letters,
>        digits, or underscores starting with a letter or  underscore.
> Identi-
>        fiers  are used as components of variable names.  A vname is a
> sequence
>        of one or more identifiers separated by a . and optionally preceded
>  by
>        a  ..   Vnames  are  used  as function and variable names.  A word
> is a
>        sequence of characters from the character set defined  by  the
>  current
>        locale, excluding non-quoted metacharacters.
>
> "A blank is a tab or a space" is more restrictive than "A word is a
> sequence of characters from the character set defined by the current
> locale, excluding non-quoted meta characters".  And if I try a vertical
> tab, formfeed, or carriage return (all plain ASCII characters classified as
> white space by isspace(3)) before "done", I get the same error.  So it
> looks like the more restrictive interpretation holds: only tabs and the
> basic space character are acceptable in the code as white space.  Of
> course, anything should be ok in a quoted string (except whatever closes
> the quotes); or rather, anything except a null byte, which does NOT work**
> (ksh isn't perl - the latter goes out of its way to tolerate just about
> anything).
>
> However, I wouldn't do it, even if it should work, because that makes it
> only work in an appropriate (UTF-8) locale; it would certainly be an error
> regardless in C locale.  If it were me, I would only use anything not
> sensible in C locale, within a quoted string constant; one does NOT want
> code that does nasty things depending on what locale is in use.
>
> * ${.sh.version} on my Mac is Version AJM 93u+ 2012-08-01, which I gather
> is reasonably current. :-)
>
> ** the following produces an interesting error:
>
> 0000000    #   !       /   b   i   n   /   k   s   h  \n  \n   e   c   h
> 0000020    o       "  \0   t   e   s   t   i   n   g   "  \n
> 0000035
> $ ./tryme.ksh
> ./tryme.ksh: syntax error at line 3: `zero byte' unexpected
>
>
>
>
> On Tue, Apr 25, 2017 at 8:42 AM, lijo george <george.l...@gmail.com>
> wrote:
>
>>
>> Thanks for the suggestion Philippe.
>> But I'm a bit confused though, Isn't "0xe3 0x80 0x80" the UTF-8
>> representation of the space character.
>>
>>
>> Thanks,
>> Lijo
>>
>> On Tue, Apr 25, 2017 at 5:49 PM, Philippe Bergheaud <
>> philippe.berghe...@fr.ibm.com> wrote:
>>
>>> > The attached testscript has a leading double byte space separator
>>> > before the for loop closing "done" keyword. This fails with a syntax
>>> > error while parsing.
>>> >
>>> > Is it a bug or is it expected behaviour?
>>> >
>>> > I've tried it with ksh93u+  and ksh93v- versions on a Solaris setup.
>>> > bash and zsh also fails, hence I'm thinking it might not be a bug,
>>> > but could someone please confirm this.
>>> >
>>> > Here's a sample output.
>>> >
>>> > root@S11_3_SRU:~# echo $LANG
>>> > ja_JP.UTF-8
>>> > root@S11_3_SRU:~# cat space.ksh
>>> > #!/bin/ksh
>>> > for i in 1 2
>>> > do
>>> > echo $i
>>> > done   # leading  double byte space character
>>> > root@S11_3_SRU:~# od -xc space.ksh
>>> > 0000000    2321    2f62    696e    2f6b    7368    0a66    6f72    2069
>>> >            #   !   /   b   i   n   /   k   s   h  \n   f   o   r
>>> i
>>> > 0000020    2069    6e20    3120    320a    646f    0a65    6368    6f20
>>> >                i   n       1       2  \n   d   o  \n   e   c   h
>>> o
>>> > 0000040    2469    0ae3    8080    646f    6e65    0a00
>>> >            $   i  \n 343 200 200   d   o   n   e  \n
>>> You should remove the (invisible) character 0343 (0xe3), before the two
>>> spaces.
>>>
>>> Philippe
>>
>>
>>
>> _______________________________________________
>> ast-users mailing list
>> ast-users@lists.research.att.com
>> http://lists.research.att.com/mailman/listinfo/ast-users
>>
>>
>

_______________________________________________
ast-users mailing list
ast-users@lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-users

Re: [ast-users] ksh93 double byte space handling

Reply via email to