subject:"unicode"

Re: Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?

2022-11-13 Thread Eryk Sun

On 11/13/22, Jessica Smith <12jessicasmit...@gmail.com> wrote:
> Consider the following code ran in Powershell or cmd.exe:
>
> $ python -c "print('└')"
> └
>
> $ python -c "print('└')" > test_file.txt
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in
> encode
> return codecs.charmap_encode(input,self.errors,encoding_table)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in
> position 0: character maps to 

If your applications and existing data files are compatible with using
UTF-8, then in Windows 10+ you can modify the administrative regional
settings in the control panel to force using UTF-8. In this case,
GetACP() and GetOEMCP() will return CP_UTF8 (65001), and the reserved
code page constants CP_ACP (0),  CP_OEMCP (1), CP_MACCP (2), and
CP_THREAD_ACP (3) will use CP_UTF8.

You can override this on a per-application basis via the
ActiveCodePage setting in the manifest:

https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activecodepage

In Windows 10, this setting only supports "UTF-8". In Windows 11, it
also supports "legacy" to allow old applications to run on a system
that's configured to use UTF-8.  Setting an explicit locale is also
supported in Windows 11, such as "en-US", with fallback to UTF-8 if
the given locale has no legacy code page.

Note that setting the system to use UTF-8 also affects the host
process for console sessions (i.e. conhost.exe or openconsole.exe),
since it defaults to using the OEM code page (UTF-8 in this case).
Unfortunately, a legacy read from the console host does not support
reading non-ASCII text as UTF-8. For example:

>>> os.read(0, 6)
SPĀM
b'SP\x00M\r\n'

This is a trivial bug in the console host, which stems from the fact
that UTF-8 is a multibyte encoding (1-4 bytes per code), but for some
reason the console team at Microsoft still hasn't fixed it. You can
use chcp.com to set the console's input and output code pages to
something other than UTF-8 if you have to read non-ASCII input in a
legacy console app. By default, this problem doesn't affect Python's
sys.stdin, which internally uses wide-character ReadConsoleW() with
the system's native text encoding, UTF-16LE.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?

2022-11-13 Thread Thomas Passin


On 11/13/2022 9:49 AM, Jessica Smith wrote:

Consider the following code ran in Powershell or cmd.exe:

$ python -c "print('└')"
└

$ python -c "print('└')" > test_file.txt
Traceback (most recent call last):
   File "", line 1, in 
   File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in
position 0: character maps to 

Is this a known limitation of Windows + Unicode? I understand that
using -x utf8 would fix this, or modifying various environment
variables. But is this expected for a standard Python installation on
Windows?

Jessica



This also fails with the same error:

$ python -c "print('└')" |clip
--
https://mail.python.org/mailman/listinfo/python-list

Re: Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?

2022-11-13 Thread Barry



> On 13 Nov 2022, at 14:52, Jessica Smith <12jessicasmit...@gmail.com> wrote:
> 
> Consider the following code ran in Powershell or cmd.exe:
> 
> $ python -c "print('└')"
> └
> 
> $ python -c "print('└')" > test_file.txt
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
>return codecs.charmap_encode(input,self.errors,encoding_table)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in
> position 0: character maps to 
> 
> Is this a known limitation of Windows + Unicode? I understand that
> using -x utf8 would fix this, or modifying various environment
> variables. But is this expected for a standard Python installation on
> Windows?

Your other thread has a reply that explained this.
It is a problem with windows and character sets.
You have to set things up to allow Unicode to work.

Barry

> 
> Jessica
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?

2022-11-13 Thread Jessica Smith

Consider the following code ran in Powershell or cmd.exe:

$ python -c "print('└')"
└

$ python -c "print('└')" > test_file.txt
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in
position 0: character maps to 

Is this a known limitation of Windows + Unicode? I understand that
using -x utf8 would fix this, or modifying various environment
variables. But is this expected for a standard Python installation on
Windows?

Jessica
-- 
https://mail.python.org/mailman/listinfo/python-list

[Python-announce] ANN: unicode 2.9

2022-06-03 Thread garabik-news-2005-05


unicode is a simple python command line utility that displays
properties for a given unicode character, or searches
unicode database for a given name.

It was written with Linux in mind, but should work almost everywhere
(including MS Windows and MacOSX), UTF-8 console is recommended.

˙pɹɐpuɐʇs əpoɔı̣uՈ əɥʇ ɟo əsn pəɔuɐʌpɐ
puɐ səldı̣ɔuı̣ɹd əɥʇ ɓuı̣ʇɐɹʇsuoɯəp looʇ ɔı̣ʇɔɐpı̣p ʇuəlləɔxə uɐ sı̣ ʇI
˙sʇuı̣odəpoɔ ʇuəɹəɟɟı̣p ʎləʇəldɯoɔ ɓuı̣sn əlı̣ɥʍ 'sɥdʎlɓ ɟo ɯɐəɹʇs ɹɐlı̣ɯı̣s
ʎllɐnsı̣ʌ  oʇuı̣ ʇxəʇ əɥʇ ʇɹəʌuoɔ oʇ pɹɐpuɐʇs əpoɔı̣uՈ əɥʇ ɟo ɹəʍod llnɟ
əɥʇ sʇı̣oldxə ʇɐɥʇ 'ʎʇı̣lı̣ʇn ,əpoɔɐɹɐd, oslɐ suı̣ɐʇuoɔ əɓɐʞɔɐd əɥ⊥

Changes since previous versions:
 * better handling of changes in data files

URL: http://kassiopeia.juls.savba.sk/~garabik/software/unicode.html

License: GPL v3

Installation: pip install unicode

-- 
 ---
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__garabik @ kassiopeia.juls.savba.sk |
 ---
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-30 Thread Chris Angelico

On Sun, 1 May 2022 at 00:03, Vlastimil Brom  wrote:
> (Even the redundant u prefix from your python2 sample is apparently
> accepted, maybe for compatibility reasons.)

Yes, for compatibility reasons. It wasn't accepted in Python 3.0, but
3.3 re-added it to make porting easier. It doesn't do anything.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-30 Thread Vlastimil Brom

čt 28. 4. 2022 v 13:33 odesílatel Stephen Tucker
 napsal:
>
> Hi PythonList Members,
>
> Consider the following log from a run of IDLE:
>
> ==
>
> Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
> on win32
> Type "copyright", "credits" or "license()" for more information.
> >>> print (u"\u2551")
> ║
> >>> print ([u"\u2551"])
> [u'\u2551']
> >>>
>
> ==
>
> Yes, I am still using Python 2.x - I have good reasons for doing so and
> will be moving to Python 3.x in due course.
>
> I have the following questions arising from the log:
>
> 1. Why does the second print statement not produce [ ║]  or ["║"] ?
>
> 2. Should the second print statement produce [ ║]  or ["║"] ?
>
> 3. Given that I want to print a list of Unicode strings so that their
> characters are displayed (instead of their Unicode codepoint definitions),
> is there a more Pythonic way of doing it than concatenating them into a
> single string and printing that?
>
> 4. Does Python 3.x exhibit the same behaviour as Python 2.x in this respect?
>
> Thanks in anticipation.
>
> Stephen Tucker.
> --
> https://mail.python.org/mailman/listinfo/python-list

Hi,
I'm not sure, whether I am not misunderstanding the 4th question or
the answers to it (it is not clear to me, whether the focus is on
character printing or the quotation marks...);
in either case, in python3 the character glyphs are printed in these
cases, instead of the codepoint number notation, cf.:
==
Python 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (
AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print ([u"\u2551"])
['║']
>>>
>>> print([u"\u2551"])
['║']
>>> print("\u2551")
║
>>> print("║")
║
>>> print(repr("\u2551"))
'║'
>>> print(ascii("\u2551"))
'\u2551'
>>>
==

(Even the redundant u prefix from your python2 sample is apparently
accepted, maybe for compatibility reasons.)

hth,
   vbr
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-28 Thread Rob Cliffe via Python-list




On 28/04/2022 14:27, Stephen Tucker wrote:

To Cameron Simpson,

Thanks for your in-depth and helpful reply. I have noted it and will be
giving it close attention when I can.

The main reason why I am still using Python 2.x is that my colleagues are
still using a GIS system that has a Python programmer's interface - and
that interface uses Python 2.x.

The team are moving to an updated version of the system whose Python
interface is Python 3.x.

However, I am expecting to retire over the next 8 months or so, so I do not
need to be concerned with Python 3.x - my successor will be doing that.


Still, if you're feeling noble, you could start the work of making your 
code Python 3 compatible.😁

Best wishes
Rob Cliffe
--
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-28 Thread Jon Ribbens via Python-list

On 2022-04-28, Stephen Tucker  wrote:
> Hi PythonList Members,
>
> Consider the following log from a run of IDLE:
>
>==
>
> Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
> on win32
> Type "copyright", "credits" or "license()" for more information.
>>>> print (u"\u2551")
> ║
>>>> print ([u"\u2551"])
> [u'\u2551']
>>>>
>
>==
>
> Yes, I am still using Python 2.x - I have good reasons for doing so and
> will be moving to Python 3.x in due course.
>
> I have the following questions arising from the log:
>
> 1. Why does the second print statement not produce [ ║]  or ["║"] ?

print(x) implicitly calls str(x) to convert 'x' to a string for output.
lists don't have their own str converter, so fall back to repr instead,
which outputs '[', followed by the repr of each list item separated by
', ', followed by ']'.

> 2. Should the second print statement produce [ ║]  or ["║"] ?

There's certainly no obvious reason why it *should*, and pretty decent
reasons why it shouldn't (it would be a hybrid mess of Python-syntax
repr output and raw string output).

> 3. Given that I want to print a list of Unicode strings so that their
> characters are displayed (instead of their Unicode codepoint definitions),
> is there a more Pythonic way of doing it than concatenating them into a
> single string and printing that?

print(' '.join(list_of_strings)) is probably most common. I suppose you
could do print(*list_of_strings) if you like, but I'm not sure I'd call
it "pythonic" as I've never seen anyone do that (that doesn't mean of
course that other people haven't seen it done!) Personally I only tend
to use print() for debugging output.

> 4. Does Python 3.x exhibit the same behaviour as Python 2.x in this respect?

Yes.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-28 Thread Stephen Tucker

To Cameron Simpson,

Thanks for your in-depth and helpful reply. I have noted it and will be
giving it close attention when I can.

The main reason why I am still using Python 2.x is that my colleagues are
still using a GIS system that has a Python programmer's interface - and
that interface uses Python 2.x.

The team are moving to an updated version of the system whose Python
interface is Python 3.x.

However, I am expecting to retire over the next 8 months or so, so I do not
need to be concerned with Python 3.x - my successor will be doing that.

Stephen.


On Thu, Apr 28, 2022 at 2:07 PM Cameron Simpson  wrote:

> On 28Apr2022 12:32, Stephen Tucker  wrote:
> >Consider the following log from a run of IDLE:
> >==
> >
> >Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
> >on win32
> >Type "copyright", "credits" or "license()" for more information.
> >>>> print (u"\u2551")
> >║
> >>>> print ([u"\u2551"])
> >[u'\u2551']
> >>>>
> >==
> >
> >Yes, I am still using Python 2.x - I have good reasons for doing so and
> >will be moving to Python 3.x in due course.
>
> Love to hear those reasons. Not suggesting that they are invalid.
>
> >I have the following questions arising from the log:
> >1. Why does the second print statement not produce [ ║]  or ["║"] ?
>
> Because print() prints the str() or each of its arguments, and str() of
> a list if the same as its repr(), which is a list of the repr()s of
> every item in the list. Repr of a Unicode string looks like what you
> have in Python 2.
>
> >2. Should the second print statement produce [ ║]  or ["║"] ?
>
> Well, to me its behaviour is correct. Do you _want_ to get your Unicode
> glyph? in quotes? That is up to you. But consider: what would be sane
> output if the list contained the string "], [3," ?
>
> >3. Given that I want to print a list of Unicode strings so that their
> >characters are displayed (instead of their Unicode codepoint definitions),
> >is there a more Pythonic way of doing it than concatenating them into a
> >single string and printing that?
>
> You could print them with empty separators:
>
> print(s1, s2, .., sep='')
>
> To do that in Python 2 you need to:
>
> from __future__ import print_function
>
> at the top of your Python file. Then you've have a Python 3 string print
> function. In Python 2, pint is normally a statement and you don't need
> the brackets:
>
> print u"\u2551"
>
> but print() is genuinely better as a function anyway.
>
> >4. Does Python 3.x exhibit the same behaviour as Python 2.x in this
> respect?
>
> Broadly yes, except that all strings are Unicode strings and we don't
> bothing with the leading "u" prefix.
>
> Cheers,
> Cameron Simpson 
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-28 Thread Cameron Simpson

On 28Apr2022 12:32, Stephen Tucker  wrote:
>Consider the following log from a run of IDLE:
>==
>
>Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
>on win32
>Type "copyright", "credits" or "license()" for more information.
>>>> print (u"\u2551")
>║
>>>> print ([u"\u2551"])
>[u'\u2551']
>>>>
>==
>
>Yes, I am still using Python 2.x - I have good reasons for doing so and
>will be moving to Python 3.x in due course.

Love to hear those reasons. Not suggesting that they are invalid.

>I have the following questions arising from the log:
>1. Why does the second print statement not produce [ ║]  or ["║"] ?

Because print() prints the str() or each of its arguments, and str() of 
a list if the same as its repr(), which is a list of the repr()s of 
every item in the list. Repr of a Unicode string looks like what you 
have in Python 2.

>2. Should the second print statement produce [ ║]  or ["║"] ?

Well, to me its behaviour is correct. Do you _want_ to get your Unicode 
glyph? in quotes? That is up to you. But consider: what would be sane 
output if the list contained the string "], [3," ?

>3. Given that I want to print a list of Unicode strings so that their
>characters are displayed (instead of their Unicode codepoint definitions),
>is there a more Pythonic way of doing it than concatenating them into a
>single string and printing that?

You could print them with empty separators:

print(s1, s2, .., sep='')

To do that in Python 2 you need to:

from __future__ import print_function

at the top of your Python file. Then you've have a Python 3 string print 
function. In Python 2, pint is normally a statement and you don't need 
the brackets:

print u"\u2551"

but print() is genuinely better as a function anyway.

>4. Does Python 3.x exhibit the same behaviour as Python 2.x in this respect?

Broadly yes, except that all strings are Unicode strings and we don't 
bothing with the leading "u" prefix.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list

Printing Unicode strings in a list

2022-04-28 Thread Stephen Tucker

Hi PythonList Members,

Consider the following log from a run of IDLE:

==

Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
on win32
Type "copyright", "credits" or "license()" for more information.
>>> print (u"\u2551")
║
>>> print ([u"\u2551"])
[u'\u2551']
>>>

==

Yes, I am still using Python 2.x - I have good reasons for doing so and
will be moving to Python 3.x in due course.

I have the following questions arising from the log:

1. Why does the second print statement not produce [ ║]  or ["║"] ?

2. Should the second print statement produce [ ║]  or ["║"] ?

3. Given that I want to print a list of Unicode strings so that their
characters are displayed (instead of their Unicode codepoint definitions),
is there a more Pythonic way of doing it than concatenating them into a
single string and printing that?

4. Does Python 3.x exhibit the same behaviour as Python 2.x in this respect?

Thanks in anticipation.

Stephen Tucker.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: 'äÄöÖüÜ' in Unicode (utf-8)

2022-04-07 Thread Anssi Saari

Dennis Lee Bieber  writes:

> On Fri, 1 Apr 2022 03:59:32 +1100, Chris Angelico 
> declaimed the following:
>
>
>>That's jmf. Ignore him. He knows nothing about Unicode and is
>>determined to make everyone aware of that fact.
>>
>>He got blocked from the mailing list ages ago, and I don't think
>>anyone's regretted it.

>   Ah yes... Unfortunately, when gmane made the mirror read-only, I had to
> revert to comp.lang.python... and all the junk that gets in via that and
> Google Groups...

Hm. I just configured my news reader to send follow-ups to the mailing
list when that happened.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: 'äÄöÖüÜ' in Unicode (utf-8)

2022-04-01 Thread Chris Angelico

On Fri, 1 Apr 2022 at 11:16, Dennis Lee Bieber  wrote:
>
> On Fri, 1 Apr 2022 03:59:32 +1100, Chris Angelico 
> declaimed the following:
>
>
> >That's jmf. Ignore him. He knows nothing about Unicode and is
> >determined to make everyone aware of that fact.
> >
> >He got blocked from the mailing list ages ago, and I don't think
> >anyone's regretted it.
> >
> Ah yes... Unfortunately, when gmane made the mirror read-only, I had 
> to
> revert to comp.lang.python... and all the junk that gets in via that and
> Google Groups...
>

Killfiles can help.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: 'äÄöÖüÜ' in Unicode (utf-8)

2022-03-31 Thread Dennis Lee Bieber

On Fri, 1 Apr 2022 03:59:32 +1100, Chris Angelico 
declaimed the following:


>That's jmf. Ignore him. He knows nothing about Unicode and is
>determined to make everyone aware of that fact.
>
>He got blocked from the mailing list ages ago, and I don't think
>anyone's regretted it.
>
Ah yes... Unfortunately, when gmane made the mirror read-only, I had to
revert to comp.lang.python... and all the junk that gets in via that and
Google Groups...


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: 'äÄöÖüÜ' in Unicode (utf-8)

2022-03-31 Thread Chris Angelico

On Fri, 1 Apr 2022 at 03:45, Dennis Lee Bieber  wrote:
>
> On Thu, 31 Mar 2022 00:36:10 -0700 (PDT), moi 
> declaimed the following:
>
> >>>> 'äÄöÖüÜ'.encode('utf-8')
> >b'\xc3\xa4\xc3\x84\xc3\xb6\xc3\x96\xc3\xbc\xc3\x9c'
> >>>> len('äÄöÖüÜ'.encode('utf-8'))
> >12
> >>>>
> >>>> ?
>
> Is there a question in there somewhere?
>
> Crystal ball is hazy...
>
>     However... Note that once you encode the Unicode literal, you have a
> BYTE string. There are 12 bytes in that binary -- it is NOT considered
> Unicode at that point (only when you decode it with the same CODEC will it
> be Unicode).
>

That's jmf. Ignore him. He knows nothing about Unicode and is
determined to make everyone aware of that fact.

He got blocked from the mailing list ages ago, and I don't think
anyone's regretted it.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: 'äÄöÖüÜ' in Unicode (utf-8)

2022-03-31 Thread Dennis Lee Bieber

On Thu, 31 Mar 2022 00:36:10 -0700 (PDT), moi 
declaimed the following:

>>>> 'äÄöÖüÜ'.encode('utf-8')
>b'\xc3\xa4\xc3\x84\xc3\xb6\xc3\x96\xc3\xbc\xc3\x9c'
>>>> len('äÄöÖüÜ'.encode('utf-8'))
>12
>>>> 
>>>> ?

Is there a question in there somewhere?

Crystal ball is hazy...

However... Note that once you encode the Unicode literal, you have a
BYTE string. There are 12 bytes in that binary -- it is NOT considered
Unicode at that point (only when you decode it with the same CODEC will it
be Unicode).


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: ANN: unicode 2.8

2021-01-02 Thread Chris Angelico

On Sun, Jan 3, 2021 at 10:28 AM Terry Reedy  wrote:
> > And when implementing this, it was a no-brainer to include also the
> > brexit varian (verbatim).
>
> I assume you meant 'variation' and not Varian, the maker of scientific
> instruments.

I assumed simple typo for "variant"

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: ANN: unicode 2.8

2021-01-02 Thread Terry Reedy

On 1/1/2021 3:48 PM, garabik-news-2005...@kassiopeia.juls.savba.sk wrote:

Terry Reedy wrote:

On 12/31/2020 9:36 AM, garabik-news-2005...@kassiopeia.juls.savba.sk wrote:

unicode is a simple python command line utility that displays
properties for a given unicode character, or searches
unicode database for a given name.

...

Changes since previous versions:

* display ASCII table (either traditional with --ascii or the new
EU–UK Trade and Cooperation Agreement version with --brexit-ascii)

The latter option implied to me that the agreement defines an
intentional variation on standard ASCII. I immediately wondered whether
they had changed the actual 7-bit ascii code, which would be egregiously
bad, or made yet another variation of 8-bit 'extended ascii', perhaps to
ensure inclusion both the pound and euro signs.

So I googled 'brexit ascii'. And was surprised to discover that there
is no such thing as 'brexit ascii', just yet another cock-up in text
preparation. (I have seen worse when a digital text of mine was mangled
during markup. Fortunately, I was allowed to read the page proofs. But
I still don't understand how spelling errors were introduced within
words I had spelled correctly.)

Are you reproducing it with bugs included?
How is that of any use to anyone?

I followed this with links to justify my claim and question:

A tweet linking the treaty annex page
https://twitter.com/thejsa_/status/1343291595899207681

A stackoverflow question and discussion of the bugs and oddities.
https://politics.stackexchange.com/questions/61178/why-does-the-eu-uk-trade-deal-have-the-7-bit-ascii-table-as-an-appendix

In the latter are mentions of other text, perhaps copy-pasted from the
1990s recommending the now deprecated SHA1 and referring to Netscape
Navigator 4 as a modern browser. Clearly, in the rush to finish, the
annex was not properly reviewed by current technical experts.

Including the (correct) ASCII table has been a long term, low priority -
I am using ascii(1) utility reasonably often and it makes sense to
reproduce this functionality.

And when implementing this, it was a no-brainer to include also the
brexit varian (verbatim).

I assume you meant 'variation' and not Varian, the maker of scientific
instruments.

But why do you consider it a no-brainer to include nonsense in your
program and mislead people? People already have enough trouble dealing
with text coding.

After all, given the blood and sweat and tears
shed during the negotiations, I am sure each and every line of the
Agreement has been combed and (re)negotiated over and over by experienced
negotiators and verified an army of experts in the fields

What are we supposed to make of this? That you already knew that
'brexit-ascii' is nonsense?

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Re: ANN: unicode 2.8

2021-01-01 Thread garabik-news-2005-05

Terry Reedy  wrote:
> On 12/31/2020 9:36 AM, garabik-news-2005...@kassiopeia.juls.savba.sk wrote:
>> unicode is a simple python command line utility that displays
>> properties for a given unicode character, or searches
>> unicode database for a given name.
> ...
>> Changes since previous versions:
>> 
>>   * display ASCII table (either traditional with --ascii or the new
>> EU–UK Trade and Cooperation Agreement version with --brexit-ascii)
> 
> Are you reproducing it with bugs included?
> How is that of any use to anyone?

Including the (correct) ASCII table has been a long term, low priority -
I am using ascii(1) utility reasonably often and it makes sense to
reproduce this functionality.

And when implementing this, it was a no-brainer to include also the
brexit varian (verbatim). After all, given the blood and sweat and tears
shed during the negotiations, I am sure each and every line of the
Agreement has been combed and (re)negotiated over and over by experienced
negotiators and verified an army of experts in the fields 

-- 
 ---
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__garabik @ kassiopeia.juls.savba.sk |
 ---
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: ANN: unicode 2.8

2020-12-31 Thread Terry Reedy


On 12/31/2020 9:36 AM, garabik-news-2005...@kassiopeia.juls.savba.sk wrote:

unicode is a simple python command line utility that displays
properties for a given unicode character, or searches
unicode database for a given name.

...

Changes since previous versions:

  * display ASCII table (either traditional with --ascii or the new
EU–UK Trade and Cooperation Agreement version with --brexit-ascii)


Are you reproducing it with bugs included?
How is that of any use to anyone?
A tweet linking the treaty annex page
https://twitter.com/thejsa_/status/1343291595899207681
A stackoverflow question and discussion of the bugs and oddities.
https://politics.stackexchange.com/questions/61178/why-does-the-eu-uk-trade-deal-have-the-7-bit-ascii-table-as-an-appendix

The likely answer is that the treaty writers copy-pasted from 
decades-old docs and could not be bothered to link to the actual ISO 
standard.


--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list

ANN: unicode 2.8

2020-12-31 Thread garabik-news-2005-05

unicode is a simple python command line utility that displays
properties for a given unicode character, or searches
unicode database for a given name.

It was written with Linux in mind, but should work almost everywhere
(including MS Windows and MacOSX), UTF-8 console is recommended.

˙pɹɐpuɐʇs əpoɔı̣uՈ əɥʇ ɟo əsn pəɔuɐʌpɐ
puɐ səldı̣ɔuı̣ɹd əɥʇ ɓuı̣ʇɐɹʇsuoɯəp looʇ ɔı̣ʇɔɐpı̣p ʇuəlləɔxə uɐ sı̣ ʇI
˙sʇuı̣odəpoɔ ʇuəɹəɟɟı̣p ʎləʇəldɯoɔ ɓuı̣sn əlı̣ɥʍ 'sɥdʎlɓ ɟo ɯɐəɹʇs ɹɐlı̣ɯı̣s
ʎllɐnsı̣ʌ  oʇuı̣ ʇxəʇ əɥʇ ʇɹəʌuoɔ oʇ pɹɐpuɐʇs əpoɔı̣uՈ əɥʇ ɟo ɹəʍod llnɟ
əɥʇ sʇı̣oldxə ʇɐɥʇ 'ʎʇı̣lı̣ʇn ,əpoɔɐɹɐd, oslɐ suı̣ɐʇuoɔ əɓɐʞɔɐd əɥ⊥

Changes since previous versions:

 * display ASCII table (either traditional with --ascii or the new
   EU–UK Trade and Cooperation Agreement version with --brexit-ascii)
 * minor bug fixes

URL: http://kassiopeia.juls.savba.sk/~garabik/software/unicode.html

License: GPL v3

Installation: pip install unicode

-- 
 ---
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__garabik @ kassiopeia.juls.savba.sk |
 ---
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Friday Finking: Beyond implementing Unicode

2020-06-17 Thread Terry Reedy


On 6/16/2020 7:45 PM, DL Neil via Python-list wrote:

On 13/06/20 4:47 AM, Terry Reedy wrote:
There was a recent thread on python-ideas discussing this.  It started 
with arrow characters.  There have been others.


Am pleased to hear that it's neither 'new' nor 'way out there'...


The idea has been rejected multiple times, which puts you in good 
company (in a sense).


Am not subscribed to that list. Went looking for its archives, but 
failed - there's no "ideas" on 
(https://mail.python.org/mailman/listinfo). Please send a pointer...


Try mailman3.
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Archive link on page.

--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list

Re: Friday Finking: Beyond implementing Unicode

2020-06-16 Thread DL Neil via Python-list

There was a recent thread on python-ideas discussing this.  It started 
with arrow characters.  There have been others.



Am pleased to hear that it's neither 'new' nor 'way out there'...

Am not subscribed to that list. Went looking for its archives, but 
failed - there's no "ideas" on 
(https://mail.python.org/mailman/listinfo). Please send a pointer...



Apologies!
Eventually remembered the second list of lists - the list of Python 
lists which are Python lists but not on the first list of Python 
lists... No wonder I'm dizzy!

--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

Re: Friday Finking: Beyond implementing Unicode

2020-06-16 Thread DL Neil via Python-list

On 13/06/20 5:11 AM, Dennis Lee Bieber wrote:

On Fri, 12 Jun 2020 18:03:55 +1200, DL Neil via Python-list
declaimed the following:

There is/was a language called "APL" (and yes the acronym means "A
Programming Language", and yes it started the craze, through "B" (and
BCPL), and yes, that brought us "C" - which you are more likely to have
heard about - and yes then there were DataSci folk, presumably more
numerate than literate, who thought the next letter to be "R". So, sad!?).

R was preceded by S http://www.unige.ch/ses/sococ/cl/r/srdiff.e.html
https://cran.r-project.org/doc/FAQ/R-FAQ.html#What-are-the-differences-between-R-and-S_003f
(which, with some scrolling, produces...

Oh dear, my sarcasm about being literately-challenged stands!

APL was hopelessly keyboard-unfriendly, requiring multiple key-presses
or 'over-typing' to produce those arithmetic-operator symbols -

Not with a Tektronix APL terminal, and Xerox CP/V APL

Specific design-for-purpose - hardware/software integration!

remember, much of this was on mainframe 3270-style terminals, although
later PC-implementations have existed (I can't comment on how 'active'
any community might be). The over-typing was necessary to encode/produce
the APL symbols which don't exist on a standard typewriter keyboard. Ugh!

Many implementations also allowed for a spelled out version for special
characters... $RHO for example, for the greek letter rho.

To which my first reaction was "ugh!". However, I often prefer to have a
named constant in my Python code - instead of "magic numbers", eg

LINE_WIDTH = 79 # PEP-8 source-code characters per line

I'm glad to have limited my APL-exposure to only reading about it during
a 'Programming Languages' topic! (If you are 'into' functional
programming you may like to explore further)

I used it as a 3-credit independent study in my senior year (1980). All
I was after was a passing grade to complete the credits for graduation. I'm
slightly ashamed to admit that my fanciest program turned that Tektronix
storage display tube terminal into a glorified Etch-a-Sketch (terminal had
X/Y scroll wheels that the APL implementation could read).

Hey, at least you gained access. I think my uni (when I was an u/grad)
only had one graphic terminal which was kept in the computer room and
thus only staff had access.

Our introduction to graphics (using FORTRAN) had to be shown using 80x24
character-based terminals (DEC VT-52s, from memory). Drawing shapes was
bad-enough, but demonstrations of rotation and translation became the
very definition of ugly!

I've been somewhat re-living those days, teaching myself how to play
with Pygame (not a 'work' activity!), and learning how to move entities
around on the screen (quite similar to HTML5, but sufficiently different
to give pause). That said, the learning of such basic "building-blocks",
four-plus decades ago, under-pins working in both/either/each, today!

--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

Re: Friday Finking: Beyond implementing Unicode

2020-06-16 Thread DL Neil via Python-list


On 13/06/20 4:47 AM, Terry Reedy wrote:

On 6/12/2020 2:03 AM, DL Neil via Python-list wrote:
Unicode has given us access to a wealth of mathematical and other 
symbols. Hardware and soft-/firm-ware flexibility enable us to move 
beyond and develop new 'standards'. Do we have opportunities to make 
computer programming more math-familiar and/or more 
logically-expressive, and thus easier to learn and practice? Could we 
develop Python to take advantage of these opportunities?


...

Could we then also 'update' Python, to accept the wider range of 
symbols instead/in-addition to those currently in-use?


Would such even constitute 'a good idea'?


There was a recent thread on python-ideas discussing this.  It started 
with arrow characters.  There have been others.



Am pleased to hear that it's neither 'new' nor 'way out there'...

Am not subscribed to that list. Went looking for its archives, but 
failed - there's no "ideas" on 
(https://mail.python.org/mailman/listinfo). Please send a pointer...

--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

Re: Friday Finking: Beyond implementing Unicode

2020-06-12 Thread Terry Reedy


On 6/12/2020 2:03 AM, DL Neil via Python-list wrote:
Unicode has given us access to a wealth of mathematical and other 
symbols. Hardware and soft-/firm-ware flexibility enable us to move 
beyond and develop new 'standards'. Do we have opportunities to make 
computer programming more math-familiar and/or more 
logically-expressive, and thus easier to learn and practice? Could we 
develop Python to take advantage of these opportunities?


...

Could we then also 'update' Python, to accept the wider range of symbols 
instead/in-addition to those currently in-use?


Would such even constitute 'a good idea'?


There was a recent thread on python-ideas discussing this.  It started 
with arrow characters.  There have been others.



--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Re: Friday Finking: Beyond implementing Unicode

2020-06-12 Thread Chris Angelico

On Fri, Jun 12, 2020 at 9:11 PM Elliott Roper  wrote:
>
> On 12 Jun 2020 at 09:47:04 BST, "moi"  wrote:
> i) Who cares?

Don't bother responding to him. He's somehow gotten the idea that
Python's Unicode support is broken, and he spews his vomit out onto
the newsgroup periodically. He's blocked from the mailing list, and
for good reason. Ignore him and save yourself the hassle.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Friday Finking: Beyond implementing Unicode

2020-06-12 Thread Elliott Roper

On 12 Jun 2020 at 09:47:04 BST, "moi"  wrote:

> i) Today there people, who are still not understanding this:
> 
 'Å'.encode('utf-8')
> b'\xc3\x85'
 'Å'.encode('utf-16-le')
> b'\xc5\x00'
 'Å'.encode('utf-32-le')
> b'\xc5\x00\x00\x00'
> 
> ii) On a Western Europen Windows, Py 3 is not even working
> correctly with the *characters* of the Windows-1252 coding
> scheme. (As I understand this issue, you may have the same
> problem on let say an iso-8859-2 platform).
> 
> iii) When it works, I mean when it *by chance* works, the
> result is all by satisfying:
> 
 import timeit
 timeit.timeit("s.encode('utf-8')", "s = 'Universität Zürich' * 1000")
> 50.9616764429
 timeit.timeit("s.encode('utf-8')", "s = 'Universitat Zurich' * 1000")
> 2.488587845973
 
> 
> 
> iv) ...
> v) ...
> vi) ...

i) Who cares?
ii) Breaking News. Windows is mired in backward compatibility.
iii) My 3 year old Mac is 5 times faster than that. Get over it.

Maths always made its greatest advances after notation improved.
Terseness and unambiguity are king.

You are looking backward.
DL Neil is looking forward. A long way forward. It won't be our generation,
our brains are already mis-wired.

-- 
To de-mung my e-mail address:- fsnospam$elliott$$
PGP Fingerprint: 1A96 3CF7 637F 896B C810  E199 7E5C A9E4 8E59 E248


-- 
https://mail.python.org/mailman/listinfo/python-list

Friday Finking: Beyond implementing Unicode

2020-06-11 Thread DL Neil via Python-list

Unicode has given us access to a wealth of mathematical and other 
symbols. Hardware and soft-/firm-ware flexibility enable us to move 
beyond and develop new 'standards'. Do we have opportunities to make 
computer programming more math-familiar and/or more 
logically-expressive, and thus easier to learn and practice? Could we 
develop Python to take advantage of these opportunities?


TLDR;? Skip to the last paragraphs/block...


Back in the ?good, old days, small eight-bit computers advanced beyond 
many of their predecessors, because we could begin to encode characters 
and "string" them together - as well as computing with numbers.


Initially, we used 7-bit ASCII code (on smaller machines - whereas IBM 
mainframes used EBCDIC, etc). ASCII gave us both upper- and lower-case 
letters, digits, special characters, and control codes. Later this was 
extended to 8-bits as "Code Page 1252", whereby MSFT added more special 
characters, superscripts, fractions, currency symbols, and many ordinary 
and combinatorial letters used in other "Romance languages" (European).


Latterly, we have implemented Unicode, which seeks to include all of the 
world's scripts and languages and may employ multiple bytes per 
'character'. (simplification)


A massive effort went into Python (well done PyDevs!), and the adoption 
of Unicode in-particular, made Python 3 a less-than seamless upgrade 
from Python 2. However, 'standing upon the shoulders of giants', we can 
now take advantage of Unicode both as an encoding for data files, and 
within the code of our own Python applications. We don't often see 
examples of the latter, eg


>>> π = 3.14159
>>> r = 1
>>> circumference = 2 * π * r
>>> print( circumference )
6.28318

>>> Empfänger = "dn"# Addressee/recipient
>>> Straßenname = "Lansstraße"  # Street name
>>> Immobilien_Hausnummer = "42"# Building/house number

(whilst the above is valid German, I have 'cheated' in order to add 
suitable characters - for the purposes of illustration to 
EN-monolinguals - apologies for any upset to your sense of "ordnung" - 
please consider the meaning of "42" to restore yourself...)



However, we are still shackled to an history where an asterisk (*) is 
used as the multiplication symbol, because "x" was an ASCII letter. 
Similarly, we have the ** for an exponential operator, because we didn't 
have superscripts (per algebraic expression). Worse, we made "=" mean: 
'use the identifier to the left to represent the right-hand-side 
value-result', ie "Let" or "Set" - this despite left-to-right expression 
making it more logical to say: 'transfer this (left-side) value to the 
part on the right', ie 'give all of the chocolate cake to me', as well 
as 'robbing' us of the symbol's usual meaning of "equality" (in Python 
that had to become the "==" symbol). Don't let me get started on "!" 
(exclamation/surprise!) meaning "not"!



There is/was a language called "APL" (and yes the acronym means "A 
Programming Language", and yes it started the craze, through "B" (and 
BCPL), and yes, that brought us "C" - which you are more likely to have 
heard about - and yes then there were DataSci folk, presumably more 
numerate than literate, who thought the next letter to be "R". So, sad!?).


The point of mentioning APL? It allowed the likes of:

AREA←PI×RADIUS⋆2

APL was hopelessly keyboard-unfriendly, requiring multiple key-presses 
or 'over-typing' to produce those arithmetic-operator symbols - 
remember, much of this was on mainframe 3270-style terminals, although 
later PC-implementations have existed (I can't comment on how 'active' 
any community might be). The over-typing was necessary to encode/produce 
the APL symbols which don't exist on a standard typewriter keyboard. Ugh!


I'm glad to have limited my APL-exposure to only reading about it during 
a 'Programming Languages' topic! (If you are 'into' functional 
programming you may like to explore further)



Turning now to "hardware" and the subtle 'limitations' it imposes upon us.

PC-users (see also Apple, and glass-keyboard users) have become wedded 
to the 'standard' 101~105-key "QWERTY"/"AZERTY"/etc keyboards (again, 
restricting myself to European languages - with due apologies). Yet, 
there exists a variety of ways to implement the 'standard', as well as a 
range of other keyboard layouts. Plus we have folk experimenting with 
SBCs, eg Raspberry Pi; learning how to interpret low-level hardware, ie 
key-presses and keyboard "arr

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 5236 matches

Mail list logo