Re: More musings about Unicode, UTF-8, etc.

2017-09-11 Thread Paul Gilmartin
On Mon, 11 Sep 2017 19:24:55 +0100, David W Noon wrote:
>
>> 525 $ make
>> make: gcc-config: Command not found
>> make: gcc-config: Command not found
>
>Which operating system are you using?
>
BunsenLabs.  I'm giving up; not strongly motivated.

>> o I'm surprised that the fake text file survived network newline conversions.
>
>I concluded that Listserv was pretty dumb, so I felt that an attachment
>with a filename ending in .txt would survive.
>
It arrived with: Content-Transfer-Encoding: base64 which protects it
pretty well.  Don't know if you or your MUA elected that.

>> o .zip is timezone-ignorant.
>
>Yes, it's derived from an old MS-DOS/PC-DOS command and those systems
>did not know for timezones when PKZIP was written. The archive file
>format does not permit timezone data.
> 
Pax might have done better.   Should be supported by any UNIX-like OS and
most Windows archive extractors.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: More musings about Unicode, UTF-8, etc.

2017-09-11 Thread David W Noon
On Mon, 11 Sep 2017 11:49:29 -0500, Paul Gilmartin
(000433f07816-dmarc-requ...@listserv.ua.edu) wrote about "Re: More
musings about Unicode, UTF-8, etc." (in
<0662287305026690.wa.paulgboulderaim@listserv.ua.edu>):

> On Mon, 11 Sep 2017 16:41:18 +0100, David W Noon wrote:
>>
>> I have added these strings to my code and the results are the same as
>> yours. I suspect the rendering software does not handle CJK characters
>> very well in Indo-European locales.
>>
> I'm calling it a font problem: The CJK characters display double-width,

You are correct. I am using a fixed pitch font, but it uses 2 character
cells for the CJK characters.

[snip]
>> There is a Makefile included that can build the source code using either
>> GCC or CLANG using gmake. Those who use other C/C++ compilers will have
>> to work out their own build sequence.
>>
> Fails for me with:
> 
> 525 $ make
> make: gcc-config: Command not found
> make: gcc-config: Command not found

Which operating system are you using?

You should have received the gcc-config command as part of your GCC
toolchain(s). This command allows you to select from multiple versions
of GCC installed.

I developed the code on Gentoo Linux. Such a system can have 5 or 6 GCC
toolchains installed concurrently, so gcc-config is a must have.

> make: Warning: File 'Unicode_test.cpp' has modification time 19042 s in the 
> future

I'm in the BST timezone, so I'm 5 hours ahead of NYC and 8 hours ahead
of LA/SF (and Redmond, WA, for that matter).

> /g++ -o Unicode_test -pipe -std=gnu++14 -Wall -Wextra -O2 
> -fomit-frame-pointer Unicode_test.cpp -Wl,--as-needed,--strip-all
> make: /g++: Command not found
> Makefile:13: recipe for target 'Unicode_test' failed
> make: *** [Unicode_test] Error 127

If you edit the Makefile to remove the shell subcommands that invoke
gcc-config and remove the slash separator, you should then access gcc
and g++ through your PATH environment variable.

> o I'm surprised that the fake text file survived network newline conversions.

I concluded that Listserv was pretty dumb, so I felt that an attachment
with a filename ending in .txt would survive.

> o .zip is timezone-ignorant.

Yes, it's derived from an old MS-DOS/PC-DOS command and those systems
did not know for timezones when PKZIP was written. The archive file
format does not permit timezone data.
-- 
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
david.w.n...@googlemail.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: More musings about Unicode, UTF-8, etc.

2017-09-11 Thread Paul Gilmartin
On Mon, 11 Sep 2017 16:41:18 +0100, David W Noon wrote:
>
>I have added these strings to my code and the results are the same as
>yours. I suspect the rendering software does not handle CJK characters
>very well in Indo-European locales.
>
I'm calling it a font problem: The CJK characters display double-width,

>I am attaching the zip archive to this message as a fake text file.
>Rename it from Unicode_test.zip.txt to Unicode_test.zip and it should
>unzip in the usual manner. It contains a directory to hold all its
>files, so you can unzip it safely, without polluting another directory.
>
>There is a Makefile included that can build the source code using either
>GCC or CLANG using gmake. Those who use other C/C++ compilers will have
>to work out their own build sequence.
>
Fails for me with:

525 $ make
make: gcc-config: Command not found
make: gcc-config: Command not found
make: Warning: File 'Unicode_test.cpp' has modification time 19042 s in the 
future
/g++ -o Unicode_test -pipe -std=gnu++14 -Wall -Wextra -O2 -fomit-frame-pointer 
Unicode_test.cpp -Wl,--as-needed,--strip-all
make: /g++: Command not found
Makefile:13: recipe for target 'Unicode_test' failed
make: *** [Unicode_test] Error 127

o I'm surprised that the fake text file survived network newline conversions.

o .zip is timezone-ignorant.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: More musings about Unicode, UTF-8, etc.

2017-09-11 Thread David W Noon
On Sun, 10 Sep 2017 22:57:18 -0500, Paul Gilmartin
(000433f07816-dmarc-requ...@listserv.ua.edu) wrote about "Re: More
musings about Unicode, UTF-8, etc." (in
<6975604789454628.wa.paulgboulderaim@listserv.ua.edu>):

[snip]
> = zsh
> Привет мир.   +++
> Emmanuel Macron   +++
> 문재인   +++
> Enrique Peña Nieto+++
> Владимир Путин+++
> Donald Trump  +++
> 习近平   +++
> 
> = bash
> Привет мир.  +++
> Emmanuel Macron   +++
> 문재인 +++
> Enrique Peña Nieto   +++
> Владимир Путин+++
> Donald Trump  +++
> 习近平 +++
> ...
> zsh is not ideal, but still the best.

I have added these strings to my code and the results are the same as
yours. I suspect the rendering software does not handle CJK characters
very well in Indo-European locales.
I am attaching the zip archive to this message as a fake text file.
Rename it from Unicode_test.zip.txt to Unicode_test.zip and it should
unzip in the usual manner. It contains a directory to hold all its
files, so you can unzip it safely, without polluting another directory.

There is a Makefile included that can build the source code using either
GCC or CLANG using gmake. Those who use other C/C++ compilers will have
to work out their own build sequence.
-- 
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
david.w.n...@googlemail.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
PK
0ƒ+K
Unicode_test/UT  
몶Y½}°YuxèèPKd+K?$8±!Unicode_test/Unicode_test.cppUT
  »¤¶YŠ¦¶Yuxèè¥VÍnÛF>[O1pFŠ9q$°
µ¶ I…¤ì(`¬É•µ
ÅU–¤
£¤OÐÞûó9ôpÐ=ôäwhÍCtfEÇrÚ؇ž¸Üùæ›o†;\]…ÑÇFL ÔPÊ¢„a®J(ªéT›TîÊJ§ÑX]…
x¬
˜Ö“
­$$z2U™LAdùàC ÿðNU–A®q¡Í“Ëó„¤
Ha2%
ˆ<õۇd˜‰R"p\==3êx\BÓmÁÚýÛÐ'*…ýøZçq¬MFҜȴӀ»sò65E'™’®ò9”c   
]ižàîx*‘y!ɉüB™ª¢4ê¨*•Î-ݪ¤K¡+“H»s¤raÎ`¤Í¤hc.嘘ÓSW%¡LtªF*„Ña$L¥™¨²D¨(¦€‹r,JËe¤³LŸªü
  
ç©"§‚œ,’,×kjo“+@.YÙLmŒ,²%\q¤Oè¨V‘0€DnjÛh€ÂdG(‹qóô-R3ɄšPYþ“†[Pä’æ™VHí
“yü7tþ“9Vgª“j"óR\–l«¡ñÄÀ„ºI‰¬¸’Ý–ç‹™ÔéùRY_‚ÎÅD-Z#á«M+¾*‹Ë¤ò9Š6Æ<£/¥¢†ÃïLæ)îJêä0Ñ¥„¹4ز؍
;Fx0G²bzTžR×Ô-ÅT&ÔP詨ÓµR>oª¢¨©Û¶ïó¢`;ÞwB¸„Áï±tð
B¾Ó¡x=Fàø=Üõãw‡qF³ìDè¼lÏÿØGƒE!ð݁Ç„Žsµû®7ìq§
ˆ~ˆÇwyŒ–qж¡ÿí   Á6ì²Ðíã«ÓålÈmûn;  
ȁÆÜzNƒa8"”_G®çð]ÖÃÈǸÀö˜CÔw<ïzº„ìû,¤Ӆ
.C¦N×cÎfÛã!scJëj墈HÒkP4`.Ç5êÂ0)'<h×°{<D;<„ž³ëì`ŽÍÛÕÁ"¹Ãíw”$v£˜ÇØÁNô¬ì÷¸Ë¢
ð‚È
7Œ˜%ÓsbdžG-pÝFÜJÈý˜…ápóÀoaÍ÷Q!dê 
wÏjø”ó¼wX4éa«Ñ†ý>Ãýäµª9$G„ê¹ñ¢†D1mbWù‚Ïv<¾Ã|—‘A@@û<b-,È€ÛàØvhs§¢!7[®íëíܶÕ¾
NoÿÚû!âuûXùÜ~­>}«Æ{*O²
oÈM¥ñs—b²µ°G@~¼¸s"ü’w2ˆL.îЍ›œ”[†ÊKüæUÞ<Ñ*m5>m,eº¾>Ù´ëÓ:̟‡Âq¶Qև‡x‰œHSÎ=jøê=Ú<MÆ–[°
…ÊáÑY)Ñg£ÑXÂáú¸’Þ6SÀIp5Ï-¤½š‡ñö½Ö:ì*vgZãÃ#‘<izË}‰,ìk“¥åÖÆ
–³¯/žÏÎgßÏ~ºx³_fçÏoñ`“‰È+™Á®HŒÎo6þûåùë¯^¾þòüÌܨ§˜ó@þþ_ÉRßBû‹Ùϳog?"uË0Ï.^àë¯7;öt.²bSM¦7[þñ꛿~ûüÏW?ÜlÖÕù'T§
§5N:iÕ³eô´žâíotu<¶£¯zl,)’q]Í6Ž‘à´ ‰€ƒem
¨/D‚#®Àúâð…
&6R“Þ'?X¿Ös­ÆöçRÝbˆ³¹¹ØRz]4ñ¥µÈ˝w%¤Nz„l‰nDTÞÅdI€:™ÌËq³›hƒ±mðëÑí˜bqîÁ¢_îÀtéY·¯§rTeÙYێâiVá¸TÇø»`)E†kÐóÿ‚"1R杷ó]^YYù8_FTÅÿ¥Êäp£ñ¬ñPK-¢(KHÎ
-Unicode_test/MakefileUT  Æì²Yƒ¦¶Yuxè荐AKÄ0…
ÏͯضÔéRñ 
łZY/+.¸Ë%¦ÓL“’Fñç;+=¸ê%¼—¼ùf2¸“/ÔhCÐ8mÇ.Õv»Þ\Ý>%öº'À!Ôek_Ó4?<Hcø¤÷à%àý)`ã:°ñ²#읶¼¨~$äùë77à`N倖¨¦šõ¼î‘)¢ªŠ2^ÏÄÄV)TÎ6º¼NVl?ÿqü=MÅb^oÌ[÷À'mk퓕2Ò¶œ“~I2Tð`ì­V®¦Ç@C€ýn}>*!æ÷E4w™ê{ÅKn•:ˆ/a4ãˆ/Ø~-%ѸÎ0q‚3n’™IÕ
ôF’¶Qê»cÃPK\ƒ+KJÑzˆjÄUnicode_test/Unicode_test.geanyUT  
@«¶YCª¶Yuxèè¥QÁNÃ0½ç+vᄠÝ80©Ê¡MBNƒK5E^ëm6‰—‰¿Ç¡Ä
Ujûü^^ží[˞֪³́ ëvz]ı¶!„WÓøn蝞—
ö[7 é‡ŽmÒi¦•ª·¶C·q­¶ÖAgÌ·]“Á‘qd#“
†   ’çÎÄ͑" ð‰9Ÿª­kÑ1°õn­F`¶å½¾:B~¨§G´j“EVÍU‹Œ
›‘ξ?jY92¹ÔûåUò/"]+=ê'ga
cdµ(ƒ9Zì}E{pE–ÇâTYÈ}±‘¾SZL;ïv“4”IôŠi9R1ã6poÖ$-}Wóv®gy2ûf 
JQìP—jyw¿07S겺=?¯Êjñ´Z^Ì«©<eu6[¦˜ò‘ 
ò>F•ßÓ°¿àe‚¾:ñŸ&ÿ¸¯–ólýi¬êçÕB&‘MkéaªPKô+KÝ(Gœ"Y
Unicode_test/UTF8_test.cUT
œ¨¶Y'©¶Yuxèè¥VÝnãÖ¾–žbÖÖ²-{×{“Å:›‚¢h›€Lª$µ®›EI§!yÔsHn°@<A{ߟÈE
Ú¢}ï;´@æ!úÍ!eËN€^ôFâù™™o¾™33Ïvi¬ÕB§UŠ*a*š$Çû/ÉÔ«•Òɒ܃n'YJC«öf)ÄÌðý© 
L+™‹¥†ÜÃCRšò´ú€vŸu»ÏvÉU«-ˊzî½x~øAŸ†é•œÑùJ•DNž“½cH#ô•˜ti—¬ÕLÍI>ÉEj`©.gBSµ4ú3ìÞÐHf¢4‚
…X.3i*-§u%UIi  !#Ø£j 
»3•eªoh®taút-«%ƒçUW¬¥P39—YÊ:ú”jA+¡YUÀ*à>ªeZY,s•çêZ–.g’…YM¢zÕB{
ΐš¯QYO‹1ТJ–õ¦SuÅG-‹¬ƒ¨T<îãˆÉ¡ŽµlÚ-g@Áf–§²@d¾Ìm0²†?g5 
Ý!iìßÁù¿4ºZ?g*«QVé:dÏ
…M'”LssO»
›ž´îBZYV]¦…`Xü
À÷›–|Y™µSe£Ei›7œÞ5'

Re: More musings about Unicode, UTF-8, etc.

2017-09-11 Thread David W Noon
On Mon, 11 Sep 2017 12:16:41 +0800, Timothy Sipples (sipp...@sg.ibm.com)
wrote about "Re: More musings about Unicode, UTF-8, etc." (in
<of77ca7f77.8afd0e88-on48258198.001747a1-48258198.00178...@notes.na.collabserv.com>):

> David Noon wrote:
>> The script that will never work is for bash (another UNIX shell).
> 
> I don't understand this sentence. Nor does Rocket Software, I assume:
> 
> http://www.rocketsoftware.com/zos-open-source/tools

My statement was about the script, not the shell.
The issue is the printf command.

bash uses an external command /usr/bin/printf. This, on my system, is
part of Linux's coreutils package. AFAIAA, there is no impetus to make
coreutils Unicode-aware.

In contrast, zsh uses a shell intrinsic for printf.
-- 
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
david.w.n...@googlemail.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Paul Gilmartin
On Mon, 11 Sep 2017 12:16:41 +0800, Timothy Sipples wrote:

>David Noon wrote:
>>The script that will never work is for bash (another UNIX shell).
>
>I don't understand this sentence. Nor does Rocket Software, I assume:
>
>http://www.rocketsoftware.com/zos-open-source/tools
> 
In most of the Linux shells that David and I tried, printf is a shell
builtin.  Of these, only zsh seems to understand the variable-length
UTF-8 encoding.  /usr/bin/printf is UTF-8 ignorant also.

In my CJK examples, most of the glyphs are displayed double-width,
so it's as much a matter of terminal emulator behavior as of printf's
formatting.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Timothy Sipples
David Noon wrote:
>The script that will never work is for bash (another UNIX shell).

I don't understand this sentence. Nor does Rocket Software, I assume:

http://www.rocketsoftware.com/zos-open-source/tools


Timothy Sipples
IT Architect Executive, Industry Solutions, IBM z Systems, AP/GCG/MEA
E-Mail: sipp...@sg.ibm.com

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Paul Gilmartin
On Mon, 11 Sep 2017 00:03:26 +0100, David W Noon wrote:
>
>I have been doing some experiments on rendering Unicode and determining
>the length of rendered text compared to its storage in bytes. I have
>used Paul Gilmartin's 3 lines of text as sample data.
>
>I have 4 programs/scripts, of which 3 work and 1 can never work. The
>working programs are in C and C++, plus a script for zsh (a UNIX shell).
>The script that will never work is for bash (another UNIX shell).
> 
And more.   This:

#! /bin/sh
printit() {
"$I" -c "printf \"%-22s+++\n\" \"$@\""
  }

doit() {
echo; echo = $I
printit "Привет мир."
printit "Emmanuel Macron"
printit "문재인"
printit "Enrique Peña Nieto"
printit "Владимир Путин"
printit "Donald Trump"
printit "习近平"
   }
uname -a
for I in ash ksh dash csh tcsh zsh bash sh; do doit "$I"; done

... shows:
...
= zsh
Привет мир.   +++
Emmanuel Macron   +++
문재인   +++
Enrique Peña Nieto+++
Владимир Путин+++
Donald Trump  +++
习近平   +++

= bash
Привет мир.  +++
Emmanuel Macron   +++
문재인 +++
Enrique Peña Nieto   +++
Владимир Путин+++
Donald Trump  +++
习近平 +++
...
zsh is not ideal, but still the best.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Paul Gilmartin
On Mon, 11 Sep 2017 00:03:26 +0100, David W Noon wrote:
>
>I have been doing some experiments on rendering Unicode and determining
>the length of rendered text compared to its storage in bytes. I have
>used Paul Gilmartin's 3 lines of text as sample data.
>
>I have 4 programs/scripts, of which 3 work and 1 can never work. The
>working programs are in C and C++, plus a script for zsh (a UNIX shell).
>The script that will never work is for bash (another UNIX shell).
>
Trying the following:
#! /bin/sh
doit() {
echo; echo = $I
"$I" -c "printf \"%-22s+++\n\" \"Hello World.\""
"$I" -c "printf \"%-22s+++\n\" \"Привет мир.\""
"$I" -c "printf \"%-22s+++\n\" \"Bonjour le monde.\""
   }
uname -a
for I in ash ksh dash ash csh tcsh zsh bash sh; do doit "$I"; done

on Linux RaspbPi-3-2700 4.9.35-v7+ #1014 SMP Fri Jun 30 14:47:43 BST 2017 
armv7l GNU/Linux

... only zsh gives a desirable result.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Jack J. Woehr

Paul Gilmartin wrote:

I have been doing some experiments on rendering Unicode


Go Language.

--
Jack J. Woehr # Science is more than a body of knowledge. It's a way of
www.well.com/~jax # thinking, a way of skeptically interrogating the universe
www.softwoehr.com # with a fine understanding of human fallibility. - Carl Sagan

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Paul Gilmartin
On Mon, 11 Sep 2017 00:03:26 +0100, David W Noon wrote:
>
>I have been doing some experiments on rendering Unicode and determining
>the length of rendered text compared to its storage in bytes. I have
>used Paul Gilmartin's 3 lines of text as sample data.
>
>I have 4 programs/scripts, of which 3 work and 1 can never work. The
>working programs are in C and C++, plus a script for zsh (a UNIX shell).
>The script that will never work is for bash (another UNIX shell).
>
>If anybody is interested I will post the code here in a zip archive. Any
>takers?
>
I doubt that LISTSERV will tolerate a zip archive.  But if you post,
Cc: my address above.

(zsh long ago was the default script for MacOS.  It tried to be a
hybrid of sh and csh.  Are you using csh constructs?)

Thanks,
gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


More musings about Unicode, UTF-8, etc.

2017-09-10 Thread David W Noon
Hi folks,

I have been doing some experiments on rendering Unicode and determining
the length of rendered text compared to its storage in bytes. I have
used Paul Gilmartin's 3 lines of text as sample data.

I have 4 programs/scripts, of which 3 work and 1 can never work. The
working programs are in C and C++, plus a script for zsh (a UNIX shell).
The script that will never work is for bash (another UNIX shell).

If anybody is interested I will post the code here in a zip archive. Any
takers?
-- 
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
david.w.n...@googlemail.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN