subject:"More musings about Unicode, UTF\-8, etc."

Re: More musings about Unicode, UTF-8, etc.

2017-09-11 Thread Paul Gilmartin

On Mon, 11 Sep 2017 19:24:55 +0100, David W Noon wrote:
>
>> 525 $ make
>> make: gcc-config: Command not found
>> make: gcc-config: Command not found
>
>Which operating system are you using?
>
BunsenLabs.  I'm giving up; not strongly motivated.

>> o I'm surprised that the fake text file survived network newline conversions.
>
>I concluded that Listserv was pretty dumb, so I felt that an attachment
>with a filename ending in .txt would survive.
>
It arrived with: Content-Transfer-Encoding: base64 which protects it
pretty well.  Don't know if you or your MUA elected that.

>> o .zip is timezone-ignorant.
>
>Yes, it's derived from an old MS-DOS/PC-DOS command and those systems
>did not know for timezones when PKZIP was written. The archive file
>format does not permit timezone data.
> 
Pax might have done better.   Should be supported by any UNIX-like OS and
most Windows archive extractors.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: More musings about Unicode, UTF-8, etc.

2017-09-11 Thread David W Noon

On Mon, 11 Sep 2017 11:49:29 -0500, Paul Gilmartin
(000433f07816-dmarc-requ...@listserv.ua.edu) wrote about "Re: More
musings about Unicode, UTF-8, etc." (in
<0662287305026690.wa.paulgboulderaim@listserv.ua.edu>):

> On Mon, 11 Sep 2017 16:41:18 +0100, David W Noon wrote:
>>
>> I have added these strings to my code and the results are the same as
>> yours. I suspect the rendering software does not handle CJK characters
>> very well in Indo-European locales.
>>
> I'm calling it a font problem: The CJK characters display double-width,

You are correct. I am using a fixed pitch font, but it uses 2 character
cells for the CJK characters.

[snip]
>> There is a Makefile included that can build the source code using either
>> GCC or CLANG using gmake. Those who use other C/C++ compilers will have
>> to work out their own build sequence.
>>
> Fails for me with:
> 
> 525 $ make
> make: gcc-config: Command not found
> make: gcc-config: Command not found

Which operating system are you using?

You should have received the gcc-config command as part of your GCC
toolchain(s). This command allows you to select from multiple versions
of GCC installed.

I developed the code on Gentoo Linux. Such a system can have 5 or 6 GCC
toolchains installed concurrently, so gcc-config is a must have.

> make: Warning: File 'Unicode_test.cpp' has modification time 19042 s in the 
> future

I'm in the BST timezone, so I'm 5 hours ahead of NYC and 8 hours ahead
of LA/SF (and Redmond, WA, for that matter).

> /g++ -o Unicode_test -pipe -std=gnu++14 -Wall -Wextra -O2 
> -fomit-frame-pointer Unicode_test.cpp -Wl,--as-needed,--strip-all
> make: /g++: Command not found
> Makefile:13: recipe for target 'Unicode_test' failed
> make: *** [Unicode_test] Error 127

If you edit the Makefile to remove the shell subcommands that invoke
gcc-config and remove the slash separator, you should then access gcc
and g++ through your PATH environment variable.

> o I'm surprised that the fake text file survived network newline conversions.

I concluded that Listserv was pretty dumb, so I felt that an attachment
with a filename ending in .txt would survive.

> o .zip is timezone-ignorant.

Yes, it's derived from an old MS-DOS/PC-DOS command and those systems
did not know for timezones when PKZIP was written. The archive file
format does not permit timezone data.
-- 
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
david.w.n...@googlemail.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: More musings about Unicode, UTF-8, etc.

2017-09-11 Thread Paul Gilmartin

On Mon, 11 Sep 2017 16:41:18 +0100, David W Noon wrote:
>
>I have added these strings to my code and the results are the same as
>yours. I suspect the rendering software does not handle CJK characters
>very well in Indo-European locales.
>
I'm calling it a font problem: The CJK characters display double-width,

>I am attaching the zip archive to this message as a fake text file.
>Rename it from Unicode_test.zip.txt to Unicode_test.zip and it should
>unzip in the usual manner. It contains a directory to hold all its
>files, so you can unzip it safely, without polluting another directory.
>
>There is a Makefile included that can build the source code using either
>GCC or CLANG using gmake. Those who use other C/C++ compilers will have
>to work out their own build sequence.
>
Fails for me with:

525 $ make
make: gcc-config: Command not found
make: gcc-config: Command not found
make: Warning: File 'Unicode_test.cpp' has modification time 19042 s in the 
future
/g++ -o Unicode_test -pipe -std=gnu++14 -Wall -Wextra -O2 -fomit-frame-pointer 
Unicode_test.cpp -Wl,--as-needed,--strip-all
make: /g++: Command not found
Makefile:13: recipe for target 'Unicode_test' failed
make: *** [Unicode_test] Error 127

o I'm surprised that the fake text file survived network newline conversions.

o .zip is timezone-ignorant.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: More musings about Unicode, UTF-8, etc.

2017-09-11 Thread David W Noon

On Sun, 10 Sep 2017 22:57:18 -0500, Paul Gilmartin
(000433f07816-dmarc-requ...@listserv.ua.edu) wrote about "Re: More
musings about Unicode, UTF-8, etc." (in
<6975604789454628.wa.paulgboulderaim@listserv.ua.edu>):

[snip]
> = zsh
> Привет мир.   +++
> Emmanuel Macron   +++
> 문재인   +++
> Enrique Peña Nieto+++
> Владимир Путин+++
> Donald Trump  +++
> 习近平   +++
> 
> = bash
> Привет мир.  +++
> Emmanuel Macron   +++
> 문재인 +++
> Enrique Peña Nieto   +++
> Владимир Путин+++
> Donald Trump  +++
> 习近平 +++
> ...
> zsh is not ideal, but still the best.

I have added these strings to my code and the results are the same as
yours. I suspect the rendering software does not handle CJK characters
very well in Indo-European locales.
I am attaching the zip archive to this message as a fake text file.
Rename it from Unicode_test.zip.txt to Unicode_test.zip and it should
unzip in the usual manner. It contains a directory to hold all its
files, so you can unzip it safely, without polluting another directory.

There is a Makefile included that can build the source code using either
GCC or CLANG using gmake. Those who use other C/C++ compilers will have
to work out their own build sequence.
-- 
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
david.w.n...@googlemail.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
PK
0+K
Unicode_test/UT  
ëª¶Y½}°YuxèèPKd+K?$8±!Unicode_test/Unicode_test.cppUT
  »¤¶Y¦¶Yuxèè¥VÍnÛF>[O1pF9q$°
µ¶ I¤ì(`¬Éµ
ÅU¤
£¤OÐÞûó9ôpÐ=ôäwhÍCtfEÇrÚØ¸Üùæo;\]ÑÇFL ÔPÊ¢a®J(ªéTTîÊJ§ÑX]
x¬
Ö
$$z2ULAdùàC ÿðNUA®q¡ÍËó¤
Ha2%
<õÛdR"p\==3êx\BÓmÁÚýÛÐ'*ýøZçq¬MFÒÈ´Ó»sò65E'®ò9c   
]iàîx*y!ÉüBª¢4ê¨*Î-Ýª¤K¡+H»s¤raÎ`¤Í¤hc.åÓSW%¡LtªF*Ña$L¥¨²D¨(¦r,JËe¤³Lªü
  
ç©"§,,×kjo+@.YÙLm,²%\q¤Oè¨V0DÇÛhÂdG(qóô-R3ÉPYþ[PäæVHí
yü7tþ9Vgªj"óR\l«¡ñÄÀºI¬¸ÝçÔéùRY_ÎÅD-Z#á«M+¾*Ë¤ò96Æ<£/¥¢ÃïLæ)îJêä0Ñ¥¹4Ø²Ø
;Fx0G²bzTR×Ô-ÅT&ÔPè©¨ÓµR>oª¢¨©Û¶ïó¢`;ÞwB¸Áï±tð
B¾Ó¡x=Fàø=ÜõãwqF³ìDè¼lÏÿØGE!ðÝÇsµû®7ìq§
~ÇwyqÐ¶¡ÿí   Á6ì²Ðíã«ÓålÈmûn;  
ÈÆÜzNa8"_G®çð]ÖÃÈÇ¸ÀöCÔw<ïzºìû,¤Ó
.C¦N×cÎfÛã!scJëjå¢HÒkP4`.Ç5êÂ0)'<h×°{<D;<³ëì`ÍÛÕÁ"¹Ãíw$v£ÇÃÁNô¬ì÷¸Ë¢
ðÈ
7%ÓsbÇG-pÝFÜJÈýápóÀoaÍ÷Q!dê 
wÏjøó¼wX4éa«Ñý>Ãýäµª9$Gê¹ñ¢D1mbWùÏv<¾Ã|A@@û<b-,ÈÛàØvhs§¢!7[®íëíÜ¶Õ¾
NoÿÚû!âuûXùÜ~>}«Æ{*O²
oÈM¥ñsb²µ°G@~¼¸s"üw2L.îÐ[ÊKüæUÞ<Ñ*m5>m,eº¾>Ù´ëÓ:ÌÂq¶QÖxHSÎ=jøÃª=Ú<MÆÂ[°
ÊáÑY)Ñg£ÑXÂáú¸Þ6SÀIp5Ï-¤½ñö½Ö:ì*vgZãÃ#<izË},ìk¥åÖÆ
³¯/ÏÎgßÏ~ºx³_fçÏoñ`È+Á®HÎo6þûåùë¯^¾þòüÌÜ¨§ó@þþ_ÉRßBûÙÏ³og?"uË0Ï.^àë¯7;öt.²bSM¦7[þñê¿~ûüÏW?ÜlÖÕù'T§
§5N:iÕ³eô´âíotu<¶£¯zl,)q]Í6à´ em
¨/D#®Àúâð
&6RÞ'?X¿ÖsÆöçRÝb³¹¹ØRz]4ñ¥µÈËw%¤NzlnDTÞÅdI:ÌËq³h±mðëÑíËbqîÁ¢_îÀtéY·¯§rTeÙYÛâiVá¸TÇø»`)EkÐóÿ"1Ræ·ó]^YYù8_FTÅÿ¥Êäp£ñ¬ñPK-¢(KHÎ
-Unicode_test/MakefileUT  Æì²Y¦¶YuxèèAKÄ0
ÏÍ¯Ø¶ÔéRñ 
ÅZY/+.¸Ë%¦ÓLFñç;+=¸ê%¼¼ùf2¸/ÔhCÐ8mÇ.Õv»Þ\Ý>%öº'À!Ôek_Ó4?<Hcø¤÷à%àý)`ã:°ñ²#ì¶¼¨~$äùë77à`Nå¨¦õ¼î)¢ª2^ÏÄÄV)TÎ6º¼NVl?ÿqü=MÅb^oÌ[÷À'mkí2Ò¶~I2Tð`ìV®¦Ç@Cýn}>*!æ÷E4wê{ÅKn:/a4ã/Ø~-%Ñ¸Î0q3nIÕ
ôF¶Qê»cÃPK\+KJÑzjÄUnicode_test/Unicode_test.geanyUT  
@«¶YCª¶Yuxèè¥QÁNÃ0½ç+vá Ý80©Ê¡MBNK5E^ëm6¿Ç¡Ä
Ujûü^^í[ËÖª³Í ëvz]Ä±¶!WÓønè
ö[7 émÒi¦ª·¶C·q¶ÖAgÌ·]Áqd#
   çÎÄÍ" Ã°9ªkÑ1°õnF`¶å½¾:B~¨§G´jEVÍU
Î¾?jY92¹ÔûåUò/"]+=ê'ga
cdµ(9Zì}E{pEÇâTYÈ}±¾SZL;ïv4Iôi9R1ã6poÖ$-}Wóv®gy2ûf 
JQìPjyw¿07Sê²º=?¯Êjñ´Z^Ì«©<eu6[¦ò 
ò>FßÓ°¿àe¾:ñ&ÿ¸¯ólýi¬êçÕB&MkéaªPKô+KÝ(G"Y
Unicode_test/UTF8_test.cUT
¨¶Y'©¶Yuxèè¥VÝnãÖ¾bÖÖ²-{×{Å:¢hLª$µ®EI§!yÔsHn°@<A{ßÈE
Ú¢}ï;´@æ!úÍ!eËN^ôFâùo¾33Ïvi¬ÕB§U*a*$Çû/ÉÔ«ÒÉÜn'YJC«öf)ÄÌðý© 
L+¥ÜÃCRò´úvu»ÏvÉU«-Ëzî½x~øAéÑùJDN½cH#ôti¬ÕLÍI>ÉEj`©.gBSµ4ú3ìÞÐHf¢4
X.3i*-§u%UIi  !#Ø£j 
»3eªoh®taút-«%çUW¬¥P39YÊ:újA+¡YUÀ*à>ªeZY,sçêZ.gYM¢zÕB{
Î¯QYO1Ð¢Jõ¦SuÅG-¬¨T<îãÉ¡µlÚ-g@Áf§²@d¾Ìm0²?g5 
Ý!iìßÁù¿4ºZ?g*«QVé:dÏ
M'LssO»
´îBZYV]¦`Xü
À÷|YµSe£Ei7Þ5'

Re: More musings about Unicode, UTF-8, etc.

2017-09-11 Thread David W Noon

On Mon, 11 Sep 2017 12:16:41 +0800, Timothy Sipples (sipp...@sg.ibm.com)
wrote about "Re: More musings about Unicode, UTF-8, etc." (in
<of77ca7f77.8afd0e88-on48258198.001747a1-48258198.00178...@notes.na.collabserv.com>):

> David Noon wrote:
>> The script that will never work is for bash (another UNIX shell).
> 
> I don't understand this sentence. Nor does Rocket Software, I assume:
> 
> http://www.rocketsoftware.com/zos-open-source/tools

My statement was about the script, not the shell.
The issue is the printf command.

bash uses an external command /usr/bin/printf. This, on my system, is
part of Linux's coreutils package. AFAIAA, there is no impetus to make
coreutils Unicode-aware.

In contrast, zsh uses a shell intrinsic for printf.
-- 
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
david.w.n...@googlemail.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Paul Gilmartin

On Mon, 11 Sep 2017 12:16:41 +0800, Timothy Sipples wrote:

>David Noon wrote:
>>The script that will never work is for bash (another UNIX shell).
>
>I don't understand this sentence. Nor does Rocket Software, I assume:
>
>http://www.rocketsoftware.com/zos-open-source/tools
> 
In most of the Linux shells that David and I tried, printf is a shell
builtin.  Of these, only zsh seems to understand the variable-length
UTF-8 encoding.  /usr/bin/printf is UTF-8 ignorant also.

In my CJK examples, most of the glyphs are displayed double-width,
so it's as much a matter of terminal emulator behavior as of printf's
formatting.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Timothy Sipples

David Noon wrote:
>The script that will never work is for bash (another UNIX shell).

I don't understand this sentence. Nor does Rocket Software, I assume:

http://www.rocketsoftware.com/zos-open-source/tools


Timothy Sipples
IT Architect Executive, Industry Solutions, IBM z Systems, AP/GCG/MEA
E-Mail: sipp...@sg.ibm.com

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Paul Gilmartin

On Mon, 11 Sep 2017 00:03:26 +0100, David W Noon wrote:
>
>I have been doing some experiments on rendering Unicode and determining
>the length of rendered text compared to its storage in bytes. I have
>used Paul Gilmartin's 3 lines of text as sample data.
>
>I have 4 programs/scripts, of which 3 work and 1 can never work. The
>working programs are in C and C++, plus a script for zsh (a UNIX shell).
>The script that will never work is for bash (another UNIX shell).
> 
And more.   This:

#! /bin/sh
printit() {
"$I" -c "printf \"%-22s+++\n\" \"$@\""
  }

doit() {
echo; echo = $I
printit "Привет мир."
printit "Emmanuel Macron"
printit "문재인"
printit "Enrique Peña Nieto"
printit "Владимир Путин"
printit "Donald Trump"
printit "习近平"
   }
uname -a
for I in ash ksh dash csh tcsh zsh bash sh; do doit "$I"; done

... shows:
...
= zsh
Привет мир.   +++
Emmanuel Macron   +++
문재인   +++
Enrique Peña Nieto+++
Владимир Путин+++
Donald Trump  +++
习近平   +++

= bash
Привет мир.  +++
Emmanuel Macron   +++
문재인 +++
Enrique Peña Nieto   +++
Владимир Путин+++
Donald Trump  +++
习近平 +++
...
zsh is not ideal, but still the best.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Paul Gilmartin

On Mon, 11 Sep 2017 00:03:26 +0100, David W Noon wrote:
>
>I have been doing some experiments on rendering Unicode and determining
>the length of rendered text compared to its storage in bytes. I have
>used Paul Gilmartin's 3 lines of text as sample data.
>
>I have 4 programs/scripts, of which 3 work and 1 can never work. The
>working programs are in C and C++, plus a script for zsh (a UNIX shell).
>The script that will never work is for bash (another UNIX shell).
>
Trying the following:
#! /bin/sh
doit() {
echo; echo = $I
"$I" -c "printf \"%-22s+++\n\" \"Hello World.\""
"$I" -c "printf \"%-22s+++\n\" \"Привет мир.\""
"$I" -c "printf \"%-22s+++\n\" \"Bonjour le monde.\""
   }
uname -a
for I in ash ksh dash ash csh tcsh zsh bash sh; do doit "$I"; done

on Linux RaspbPi-3-2700 4.9.35-v7+ #1014 SMP Fri Jun 30 14:47:43 BST 2017 
armv7l GNU/Linux

... only zsh gives a desirable result.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Jack J. Woehr


Paul Gilmartin wrote:

I have been doing some experiments on rendering Unicode


Go Language.

--
Jack J. Woehr # Science is more than a body of knowledge. It's a way of
www.well.com/~jax # thinking, a way of skeptically interrogating the universe
www.softwoehr.com # with a fine understanding of human fallibility. - Carl Sagan

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: More musings about Unicode, UTF-8, etc.

2017-09-10 Thread Paul Gilmartin

On Mon, 11 Sep 2017 00:03:26 +0100, David W Noon wrote:
>
>I have been doing some experiments on rendering Unicode and determining
>the length of rendered text compared to its storage in bytes. I have
>used Paul Gilmartin's 3 lines of text as sample data.
>
>I have 4 programs/scripts, of which 3 work and 1 can never work. The
>working programs are in C and C++, plus a script for zsh (a UNIX shell).
>The script that will never work is for bash (another UNIX shell).
>
>If anybody is interested I will post the code here in a zip archive. Any
>takers?
>
I doubt that LISTSERV will tolerate a zip archive.  But if you post,
Cc: my address above.

(zsh long ago was the default script for MacOS.  It tried to be a
hybrid of sh and csh.  Are you using csh constructs?)

Thanks,
gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

More musings about Unicode, UTF-8, etc.

2017-09-10 Thread David W Noon

Hi folks,

I have been doing some experiments on rendering Unicode and determining
the length of rendered text compared to its storage in bytes. I have
used Paul Gilmartin's 3 lines of text as sample data.

I have 4 programs/scripts, of which 3 work and 1 can never work. The
working programs are in C and C++, plus a script for zsh (a UNIX shell).
The script that will never work is for bash (another UNIX shell).

If anybody is interested I will post the code here in a zip archive. Any
takers?
-- 
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
david.w.n...@googlemail.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: More musings about Unicode, UTF-8, etc.

Re: More musings about Unicode, UTF-8, etc.

Re: More musings about Unicode, UTF-8, etc.

Re: More musings about Unicode, UTF-8, etc.

Re: More musings about Unicode, UTF-8, etc.

Re: More musings about Unicode, UTF-8, etc.

Re: More musings about Unicode, UTF-8, etc.

Re: More musings about Unicode, UTF-8, etc.

Re: More musings about Unicode, UTF-8, etc.

Re: More musings about Unicode, UTF-8, etc.

Re: More musings about Unicode, UTF-8, etc.

More musings about Unicode, UTF-8, etc.

12 matches

Site Navigation

Mail list logo

Footer information