I'm going to consider this _without_ looking at the ksh source, because
mortals will at most look at documentation (and because documentation
should be accurate enough that they shouldn't _have_ to look at source).
My very cursory reading of the man page* is a bit ambiguous whether that
should work:
A blank is a tab or a space. An identifier is a sequence of
letters,
digits, or underscores starting with a letter or underscore.
Identi-
fiers are used as components of variable names. A vname is a
sequence
of one or more identifiers separated by a . and optionally preceded
by
a .. Vnames are used as function and variable names. A word
is a
sequence of characters from the character set defined by the
current
locale, excluding non-quoted metacharacters.
"A blank is a tab or a space" is more restrictive than "A word is a
sequence of characters from the character set defined by the current
locale, excluding non-quoted meta characters". And if I try a vertical
tab, formfeed, or carriage return (all plain ASCII characters classified as
white space by isspace(3)) before "done", I get the same error. So it
looks like the more restrictive interpretation holds: only tabs and the
basic space character are acceptable in the code as white space. Of
course, anything should be ok in a quoted string (except whatever closes
the quotes); or rather, anything except a null byte, which does NOT work**
(ksh isn't perl - the latter goes out of its way to tolerate just about
anything).
However, I wouldn't do it, even if it should work, because that makes it
only work in an appropriate (UTF-8) locale; it would certainly be an error
regardless in C locale. If it were me, I would only use anything not
sensible in C locale, within a quoted string constant; one does NOT want
code that does nasty things depending on what locale is in use.
* ${.sh.version} on my Mac is Version AJM 93u+ 2012-08-01, which I gather
is reasonably current. :-)
** the following produces an interesting error:
0000000 # ! / b i n / k s h \n \n e c h
0000020 o " \0 t e s t i n g " \n
0000035
$ ./tryme.ksh
./tryme.ksh: syntax error at line 3: `zero byte' unexpected
On Tue, Apr 25, 2017 at 8:42 AM, lijo george <[email protected]> wrote:
>
> Thanks for the suggestion Philippe.
> But I'm a bit confused though, Isn't "0xe3 0x80 0x80" the UTF-8
> representation of the space character.
>
>
> Thanks,
> Lijo
>
> On Tue, Apr 25, 2017 at 5:49 PM, Philippe Bergheaud <
> [email protected]> wrote:
>
>> > The attached testscript has a leading double byte space separator
>> > before the for loop closing "done" keyword. This fails with a syntax
>> > error while parsing.
>> >
>> > Is it a bug or is it expected behaviour?
>> >
>> > I've tried it with ksh93u+ and ksh93v- versions on a Solaris setup.
>> > bash and zsh also fails, hence I'm thinking it might not be a bug,
>> > but could someone please confirm this.
>> >
>> > Here's a sample output.
>> >
>> > root@S11_3_SRU:~# echo $LANG
>> > ja_JP.UTF-8
>> > root@S11_3_SRU:~# cat space.ksh
>> > #!/bin/ksh
>> > for i in 1 2
>> > do
>> > echo $i
>> > done # leading double byte space character
>> > root@S11_3_SRU:~# od -xc space.ksh
>> > 0000000 2321 2f62 696e 2f6b 7368 0a66 6f72 2069
>> > # ! / b i n / k s h \n f o r i
>> > 0000020 2069 6e20 3120 320a 646f 0a65 6368 6f20
>> > i n 1 2 \n d o \n e c h o
>> > 0000040 2469 0ae3 8080 646f 6e65 0a00
>> > $ i \n 343 200 200 d o n e \n
>> You should remove the (invisible) character 0343 (0xe3), before the two
>> spaces.
>>
>> Philippe
>
>
>
> _______________________________________________
> ast-users mailing list
> [email protected]
> http://lists.research.att.com/mailman/listinfo/ast-users
>
>
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users