Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-08 Thread Mikhail V
On 8 December 2016 at 19:45, Chris Angelico  wrote:

> At the moment, you're showing
> minor advantages to decimal, and other people are showing minor
> advantages to hex; but IMO nothing yet has been strong enough to
> justify the implementation of a completely new way to do things -
> remember, people have to understand *both* in order to read code.

If the arguments in the last post are not strong enough, I think
it will be too hard to make it more strong. In my eyes
benefits in this case outweigh the downsides clearly.

And anyway, since I can use f-string now to input it,
probably one can just relax now.

And this:
   f"{65:c}{66:c}{66:c}" ,

looks actually significantly better then:
   "\d{65}\d{66}\d{67}",

And it covers the cases I was addressing with the proposal.
I am happy. +1000 to developers, even if this is an "accidental" feature .
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-08 Thread Chris Angelico
On Fri, Dec 9, 2016 at 5:37 AM, Mikhail V  wrote:
>> You have to show
>> that decimal isn't just marginally better than hex; you have to show
>> that there are situations where the value of decimal character
>> literals is so great that it's worth forcing everyone to learn two
>> systems. And I'm not convinced you've even hit the first point.
>
> Frankly I don't fully understand your point here.

Let me clarify. When you construct a string, you can already use
escapes to represent characters:

"n\u0303" --> n followed by combining tilde

In order to be consistent with other languages, Python *has* to
support hexadecimal. Plus, Python has _already_ supported hex for some
time. To establish decimal as an alternative, you have to demonstrate
that it is worth having ANOTHER way to do this.

With completely green-field topics, you can debate the merits of one
notation against another, and the overall best one will win. But when
there's a well-established existing notation, you have to justify the
proliferation of notations. You have to show that your new format is
*so much* better than the existing one that it's worth adding it in
parallel. That's quite a high bar - not impossible, obviously, but you
need some very strong justification. At the moment, you're showing
minor advantages to decimal, and other people are showing minor
advantages to hex; but IMO nothing yet has been strong enough to
justify the implementation of a completely new way to do things -
remember, people have to understand *both* in order to read code.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-08 Thread Mikhail V
On 8 December 2016 at 17:52, Chris Angelico  wrote:

> In the first place, many people have pointed out to you that Unicode
> *is* laid out best in hexadecimal.

Ok if it is aligned intentionally on binary grid obviously
hex numbers will show some patterns, but who argues?

And to be fair, from my examples for Cyrillic:
Range start points in hex vs decimal:

capitals:
U+0410#1040
lowercase:
U+0430#1072

So I need one number 1040 to remember, then if I know if
it is 32 letters (except Ё) I just sum 1040 + 32 and get 1072,
and this will be the beginning of lowercase range,
there are of course people who can efficiently sum and
substract in head with hex, but I am not the one
(guess who is in minority here), and there is no need to do
it in this case. So if I know distances between ranges
I can do it all much easier in head.

Not a strong argument?
To be more pedantic, if you know the fact that in Russian
alphabet there are exactly 33 letters and not 32 as one
could suggest from unicode table, you could have
notice also that: letter Ё is U+0401, and ё is U+0451

This means they are torn away from other letters and
does not even lie in the range. In practice, this means
if I want to filter against code ranges, I need to
additionally check the value U+0451 and U+0401.
Is it not because someone decided to align
the alphabet in such a way?  Alignment is not bad idea,
but it should not contradict with common sense.

> You have to show
> that decimal isn't just marginally better than hex; you have to show
> that there are situations where the value of decimal character
> literals is so great that it's worth forcing everyone to learn two
> systems. And I'm not convinced you've even hit the first point.

Frankly I don't fully understand your point here. Everyone knows
decimal, address of an element in a table is a number, in most
cases I don't need to learn it by heart, since it is already
known and written in some table on your PC.

Also inputting characters by decimal is very common thing,
alternates key combos (Alt+0192) is something very well
established and many people *do* learn decimal code
points by heart, including me. So now it is you who
want me to learn two numbering systems for no reason.

And even with all that said, it is not the strongest argument.
Most important is that hex notation is an ugly circumstance,
and in this case there is too little reason to introduce it
in the algorithm which just checks the ranges and specific
values. And for *specific single* values it is absolutely
irrelevant which alignment do you have.
You just choose what is better readable and/or common
for abstract numbers. But that is other big question, and
current hex notation does not fall into category
"better readable" anyway.


Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-08 Thread David Mertz
The Unicode Consortium reference entirely lacks decimal values in all their
tables. EVERYTHING is given solely in hex. I'm sure someone somewhere had
created a table with decimal values, but it's very rare.

We should not change Python syntax because exactly one user prefers decimal
representations. At most there can be an external library to cover strings
in whatever manner he wants. Why is octal being neglected for us old
fogeys?! 😏

On Dec 7, 2016 6:11 PM, "Mikhail V"  wrote:

> On 8 December 2016 at 01:57, Nick Timkovich 
> wrote:
> >> hex notation not so readable and anyway decimal is kind of standard way
> to
> >> represent numbers
> >
> >
> > Can you cite some examples of Unicode reference tables I can look up a
> > decimal number in? They seem rare; perhaps in a list as a secondary
> column,
> > but they're not organized/grouped decimally. Readability counts, and
> > introducing a competing syntax will make it harder for others to read.
>
> There were links to such table in previos discussion. Googling
> "unicode table decimal" and
> first link will it be.
> I think most online tables include decimals as well, usually as tuples
> of 8-bit decimals.
> Also earlier the decimal code was the first column in most tables, but
> it somehow settled in
> peoples' minds that hex reference should be preferred, for no solid reason
> IMO.
> One reason I think due to HTML standards which started to use it in html
> files
> long ago and had much influence later, but one should understand,
> that is just for brevity in most cases. Other reason is, file viewers
> show hex by
> default, but that is just misfortune, nothin besides brevity and 4-bit
> word alignment
> gives the hex notation unfortunatly, at least in its current typeface.
> This was discussed actually in that thread.
> Many people also think they are cool hackers if they make everything in
> hex :)
> In some cases it is worth it, but not this case IMO. Mainly for
> bitwise stuff, but
> then one should look into binary/trinary/quaternary representation
> depending on nature
> of operations and hardware.
>
> Yes there is unicode table pagination correspondence in hex reference,
> but that hardly plays
> any positive role for real applications, most of the time I need to
> look in my code
> and also perform number operations on *specific* ranges and codes, but not
> on whole pages of the table. This could only play role if I do
> low-level filtering of large files
> and want to filter out data after character's page, but that is the
> only positive thing
> I can think of, and I don't think it is directly for Python.
>
> Imagine some cryptography exercise - you take 27 units, you just give
> them numbers (0..26)
> and you do calculations, yes you can view results as hex numbers, but
> I don't do it and most people
> don't and should not, since why? It is ugly and not readable.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-08 Thread Random832
On Thu, Dec 8, 2016, at 11:06, Mikhail V wrote:
> Some sites does not provide any code conversion, but everybody can
> do it easily, also I don't have problems generating a table
> programmatically.
> And I hope it is clear why most people stick to hex (I never argued that
> BTW), but it is mostly historical, nothing to do with "logical".

The problem is that there's a logic associated with how the character
sets are designed.

The character table works a lot better with rows of 16 than with rows of
10 or 20. In many blocks you get the uppercase letters lined up above
the lowercase letters, for example. And if your rows are 16 (or 32,
though that doesn't work as well for unicode because e.g. the Cyrillic
basic set А-Я/а-я starts from 0x410), then your row and column labels
work better in hex because you've lined up 0x40 above 0x50 and 0x60,
which share the last digit, unlike 64/80/96, and the whole row (or half
the row for 32) shares all but the last digit.

And those values are also only off by one bit, too. Even if we were to
arrange the characters themselves in rows of 10/20, so you've got 30 or
40 characters in an "alphabet row", then you'd have to add or subtract
to change the case, whereas many early character sets were designed to
be able to do this by changing a bit, for bit-paired keyboards.

What looks better?

Hex:
АБВГДЕЖЗИЙКЛМНОП
РСТУФХЦЧШЩЪЫЬЭЮЯ
абвгдежзийклмноп
рстуфхцчшщъыьэюя

Decimal:
АБВГДЕЖЗИЙКЛМНОПРСТУ
ФХЦЧШЩЪЫЬЭЮЯабвгдежз
ийклмнопрстуфхцчшщъы
ьэюя

And it's only luck that the uppercase Russian alphabet starts at the
beginning of a line. The ASCII section with the English alphabet looks
like this in decimal:
<=>?@ABCDEFGHIJKLMNO
PQRSTUVWXYZ[\]^_`abc
defghijklmnopqrstuvw
xyz

compared to this in hex:
@ABCDEFGHIJKLMNO
PQRSTUVWXYZ[\]^_
`abcdefghijklmno
pqrstuvwxyz


> There is just tendency
> to repeat what majority does and not always it is good, this case
> would be an example.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-08 Thread Chris Angelico
On Fri, Dec 9, 2016 at 3:06 AM, Mikhail V  wrote:
> Results for "unicode table" in google:
>
> Top Result # 2:
> www.utf8-chartable.de/
>
> Top Result # 4:
> http://www.tamasoft.co.jp/en/general-info/index.html

Both of those show hex first, and decimal as an additional feature.

> Some sites does not provide any code conversion, but everybody can
> do it easily, also I don't have problems generating a table programmatically.
> And I hope it is clear why most people stick to hex (I never argued that BTW),
> but it is mostly historical, nothing to do with "logical". There is
> just tendency
> to repeat what majority does and not always it is good, this case
> would be an example.

In the first place, many people have pointed out to you that Unicode
*is* laid out best in hexadecimal. (Another example: umop apisdn ?!
are ¿¡, which are ?! with one high bit set.) But in the second place,
"what the majority does" actually IS a strong argument. It's called
consistency. Why is "\r" a carriage return? Wouldn't it be more
logical to use "\c" for that? Except that EVERYONE uses \r for it. And
the one time in my life that I found "\123" to mean "{" rather than
"S", it was a great frustration for me:

http://rosuav.blogspot.com.au/2012/12/i-want-my-octal.html

And that's the choice between decimal and *octal*, which is a far less
well known base than hex is. I would still prefer octal, because it's
consistent.

So because of consistency, Python needs to support "\u0303" to mean
COMBINING TILDE, and any competing notation has to be in addition to
that. Can you justify the confusion of sometimes working with hex and
sometimes decimal? It's a pretty high bar to attain. You have to show
that decimal isn't just marginally better than hex; you have to show
that there are situations where the value of decimal character
literals is so great that it's worth forcing everyone to learn two
systems. And I'm not convinced you've even hit the first point.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-08 Thread Emanuel Barry
> From: Mikhail V
> Sent: Thursday, December 08, 2016 11:07 AM
> Subject: Re: [Python-ideas] Input characters in strings by decimals (Was:
> Proposal for default character representation)
> No I don't need to specify "unicode table *decimal*".
> 
> Results for "unicode table" in google:
> 
> Top Result # 2:
> www.utf8-chartable.de/
> 
> Top Result # 4:
> http://www.tamasoft.co.jp/en/general-info/index.html

Except that both of these websites show you hexadecimal notation.

> And I hope it is clear why most people stick to hex (I never argued that
BTW),
> but it is mostly historical, nothing to do with "logical".

That's not true. Characters are sorted by ranges. For example, I know that
everything below 0x20 is control code, uppercase ASCII letters start at 0x41
(0x40 is '@') and lowercase ASCII letters start at 0x61 (where 0x60 is '`')
- trivial to remember. I also know that ASCII goes as high as half a byte,
or 0x7f (half of 0x100). For instance, the first letter of my name is 0xc9,
and anyone can know, at a glance and without knowing my name or what the
letter is, that it's not ASCII.

Also, as far as I know, lowercase letters (ASCII or not) begin some multiple
of 0x10 after the beginning of the uppercase letters (0x20 for ASCII or
latin-1). As such, since I know that 'É' is 0xc9, I can know, without even
looking, that 0xe9 is 'é'. That would be a lot trickier in decimal to
remember and get right. As an aside, and I don't know this by heart, various
sets of characters begin at fixed points, and knowing those points (when you
need to work with specific sets of characters) can be very useful.

If you look at a website (https://unicode-table.com/ seems good), you can
even select ranges of characters, which conveniently end up being multiples
of 0x10 (or 16 in decimal). If your point is "it's easier to work with
numbers ending with 0", then you'll be pleased to know that character sets
are actually designed so that, using hexadecimal notation, you're dealing
with numbers ending with 0! Doing this using decimal notation is clunky at
best.

Yours,
\xc9manuel
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-08 Thread Mikhail V
On 8 December 2016 at 15:46, Alexandre Brault  wrote:

>>> Can you cite some examples of Unicode reference tables I can look up a
>>> decimal number in? They seem rare; perhaps in a list as a secondary column,
>>> but they're not organized/grouped decimally. Readability counts, and
>>> introducing a competing syntax will make it harder for others to read.
>> There were links to such table in previos discussion. Googling
>> "unicode table decimal" and
>> first link will it be.
>> I think most online tables include decimals as well, usually as tuples
>> of 8-bit decimals.

> The fact that you need to specify "unicode table *decimal*" in your
> search, and that even then around half of the top results give the table
> in hex, to me illustrates quite well how much of a minority opinion
> "writing unicode characters in decimal is more logical" is

No I don't need to specify "unicode table *decimal*".

Results for "unicode table" in google:

Top Result # 2:
www.utf8-chartable.de/

Top Result # 4:
http://www.tamasoft.co.jp/en/general-info/index.html

Some sites does not provide any code conversion, but everybody can
do it easily, also I don't have problems generating a table programmatically.
And I hope it is clear why most people stick to hex (I never argued that BTW),
but it is mostly historical, nothing to do with "logical". There is
just tendency
to repeat what majority does and not always it is good, this case
would be an example.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-08 Thread Alexandre Brault
On 2016-12-07 09:07 PM, Mikhail V wrote:
> On 8 December 2016 at 01:57, Nick Timkovich  wrote:
>>> hex notation not so readable and anyway decimal is kind of standard way to
>>> represent numbers
>>
>> Can you cite some examples of Unicode reference tables I can look up a
>> decimal number in? They seem rare; perhaps in a list as a secondary column,
>> but they're not organized/grouped decimally. Readability counts, and
>> introducing a competing syntax will make it harder for others to read.
> There were links to such table in previos discussion. Googling
> "unicode table decimal" and
> first link will it be.
> I think most online tables include decimals as well, usually as tuples
> of 8-bit decimals.
The fact that you need to specify "unicode table *decimal*" in your
search, and that even then around half of the top results give the table
in hex, to me illustrates quite well how much of a minority opinion
"writing unicode characters in decimal is more logical" is
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-08 Thread Victor Stinner
FYI you can also get a character by its name:

>>> import unicodedata
>>> unicodedata.name(chr(1040))
'CYRILLIC CAPITAL LETTER A'
>>> "\N{CYRILLIC CAPITAL LETTER A}"
'А'

Victor

2016-12-08 0:52 GMT+01:00 Mikhail V :
> In past discussion about inputing and printing characters,
> I was proposing decimal notation instead of hex.
> Since the discussion was lost in off-topic talks, I'll try to
> summarise my idea better.
>
> I use ASCII only for code input (there are good reasons for that).
> Here I'll use Python 3.6, and Windows 7, so I can use print() with unicode
> directly and it works now in system console.
>
> Suppose I only start programming and want to do some character manipulation.
> The vey first thing I would probably start with is a simple output for
> latin and cyrillic capital letters:
>
> caps_lat = ""
> for o in range(65, 91):
> caps_lat =  caps_lat + chr(o)
> print (caps_lat)
>
> caps_cyr = ""
> for o in range(1040, 1072):
> caps_cyr =  caps_cyr + chr(o)
> print (caps_cyr)
>
>
> Which prints:
> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
>
>
> Say, I want now to input something direct in code:
>
> s = "first cyrillic letters: " + chr(1040) + chr(1041) + chr(1042)
>
> Which works fine and has clean look. However it is not very convinient
> because of much typing and also, if I generate such strings,
> adds a bit more complexity. But in general it is fine, and I use this
> method currently.
>
> =
> Proposal: I would want to have a possibility to input it *by decimals*:
>
> s = "first cyrillic letters: \{1040}\{1041}\{1042}"
> or:
> s = "first cyrillic letters: \(1040)\(1041)\(1042)"
>
> =
>
> This is more compact and seems not very contradictive with
> current Python escape characters in string literals.
> So backslash is a start of some escaping in most cases.
>
> For me most important is that in such way I would avoid
> any presence of hex numbers in strings, which I find very good
> for readability and for me it is very convinient since I use decimals
> for processing everywhere (and encourage everyone to do so).
>
> So this is my proposal, any comments on this are appreciated.
>
>
> PS:
>
> Currently Python 3 supports these in addition to \x:
> (from https://docs.python.org/3/howto/unicode.html)
> """
> If you can’t enter a particular character in your editor or want to keep
> the source code ASCII-only for some reason, you can also use escape
> sequences in string literals.
>
 "\N{GREEK CAPITAL LETTER DELTA}"  # Using the character name
 "\u0394"  # Using a 16-bit hex value
 "\U0394"  # Using a 32-bit hex value
>
> """
> So I have many possibilities and all of them strangely contradicts with
> my image of intuitive and readable. Well, using charater name is readable,
> but seriously not much of a practical solution for input, but could be
> very useful
> for printing description of a character.
>
>
> Mikhail
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-08 Thread Paul Moore
On 7 December 2016 at 23:52, Mikhail V  wrote:
> Proposal: I would want to have a possibility to input it *by decimals*:
>
> s = "first cyrillic letters: \{1040}\{1041}\{1042}"
> or:
> s = "first cyrillic letters: \(1040)\(1041)\(1042)"
>
> =
>
> This is more compact and seems not very contradictive with
> current Python escape characters in string literals.
> So backslash is a start of some escaping in most cases.
>
> For me most important is that in such way I would avoid
> any presence of hex numbers in strings, which I find very good
> for readability and for me it is very convinient since I use decimals
> for processing everywhere (and encourage everyone to do so).
>
> So this is my proposal, any comments on this are appreciated.

-1. We already have plenty of ways to specify characters in
strings[1], we don't need another.

If readability is what matters to you, and you (unlike many others)
consider hex to be unreadable, use the \N{...} form.

Paul

[1] Including (ab)using f-strings to hide the use of chr().
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Greg Ewing

Mikhail V wrote:

first Ethan said
it will be never implemented, and it turns out it has already
been implemented.


Only by accident -- I don't think anyone anticipated that
f-strings would be used that way!

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Mikhail V
On 8 December 2016 at 05:39, Random832  wrote:
> On Wed, Dec 7, 2016, at 22:06, Mikhail V wrote:
>> So you were catched up from the beginning with hex, as I see ;)
>> I on the contrary in dark times of learning programming
>> (that was C) always oriented myself on decimal codes
>> and don't regret it now.
>
> C doesn't support decimal in string literals either, only octal and hex
> (incidentally octal seems to have been much more common in the
> environments where C was first invented). I can think of one context
> where decimal is used for characters, actually, now that I think about
> it. ANSI/ISO standards for 8-bit character sets often use a 'split'
> decimal format (i.e. DEL = 7/15 rather than 0x7F or 127.)


That is true, it does not support decimals in string literals,
but I don't remember (it was more than 10 years ago) that
I used anything but decimals for text processing in C.
So normally load a file in memory, iterate over bytes,
compare the value, and so on.
And somewhat very foggy in my memory, but at that time
most ASCII tables included decimals and they stood
normally in the first column, but I can be wrong now,
got to google some original tables.

Jeez, how positive came this thread out, first Ethan said
it will be never implemented, and it turns out it has already
been implemented. Christmas magic.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Random832
On Wed, Dec 7, 2016, at 22:06, Mikhail V wrote:
> So you were catched up from the beginning with hex, as I see ;)
> I on the contrary in dark times of learning programming
> (that was C) always oriented myself on decimal codes
> and don't regret it now.

C doesn't support decimal in string literals either, only octal and hex
(incidentally octal seems to have been much more common in the
environments where C was first invented). I can think of one context
where decimal is used for characters, actually, now that I think about
it. ANSI/ISO standards for 8-bit character sets often use a 'split'
decimal format (i.e. DEL = 7/15 rather than 0x7F or 127.)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Jonathan Goble
On Wed, Dec 7, 2016 at 10:45 PM, Mikhail V  wrote:
> Big big thanks, I didn't now this feature, but I have googled alot
> about "input characters as decimals" , so it is just added?
> Another evidence that Python rules!

Yes, f-strings are a new feature in Python 3.6, which is currently in
the release candidate stage. The final release of 3.6.0 (and thus the
first stable release with this feature) is scheduled for December 16.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Mikhail V
On 8 December 2016 at 03:32, Matthias welp  wrote:
> Dear Mikhail,
>
> With python3.6 you can use format strings to get very close to your
> desired behaviour:
>
> f"{48:c}" == "0"
> f"{:c}" == chr()
>
> It works with variables too:
>
> charvalue = 48
> f"{charcvalue:c}" == chr(charvalue) # == "0"
>

Waaa! This works!

>
> I hope this helps solve your apparent usability problem.

Big big thanks, I didn't now this feature, but I have googled alot
about "input characters as decimals" , so it is just added?
Another evidence that Python rules!

I'll rewrite some code, hope it'll have no side issues.

Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Mikhail V
On 8 December 2016 at 03:36, Alexander Belopolsky
 wrote:
>
> On Wed, Dec 7, 2016 at 9:07 PM, Mikhail V  wrote:
>>
>> it somehow settled in
>> peoples' minds that hex reference should be preferred, for no solid reason
>> IMO.
>
> I may be showing my age, but all the facts that I remember about ASCII codes
> are in hex:
>
> 1. SPACE is 0x20 followed by punctuation symbols.
> 2. Decimal digits start at 0x30 with '0' = 0x30, '1' = 0x31, ...
> 3. @ is 0x40 followed by upper-case letter: 'A' = 0x41, 'B' = 0x42, ...
> 4. Lower-case letters are offset by 0x20 from the uppercase ones: 'a' =
> 0x61, 'b' = 0x62, ...
>
> Unicode is also organized around hexadecimal codes with various scripts
> positioned in sections that start at round hexadecimal numbers.  For example
> Cyrillic is at 0x0400 through 0x4FF
> .
>
> The only decimal fact I remember about Unicode is that the largest
> code-point is 1114111 - a palindrome!

As an aside, I've just noticed that in my example:
s = "first cyrillic letters: \{1040}\{1041}\{1042}"
s = "first cyrillic letters: \u0410\u0411\u0412"

the hex and decimal codes are made up of same digits, such a peculiar
coincidence...

So you were catched up from the beginning with hex, as I see ;)
I on the contrary in dark times of learning programming
(that was C) always oriented myself on decimal codes
and don't regret it now.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Alexander Belopolsky
On Wed, Dec 7, 2016 at 9:07 PM, Mikhail V  wrote:
>
> it somehow settled in
> peoples' minds that hex reference should be preferred, for no solid
reason IMO.

I may be showing my age, but all the facts that I remember about ASCII
codes are in hex:

1. SPACE is 0x20 followed by punctuation symbols.
2. Decimal digits start at 0x30 with '0' = 0x30, '1' = 0x31, ...
3. @ is 0x40 followed by upper-case letter: 'A' = 0x41, 'B' = 0x42, ...
4. Lower-case letters are offset by 0x20 from the uppercase ones: 'a' =
0x61, 'b' = 0x62, ...

Unicode is also organized around hexadecimal codes with various scripts
positioned in sections that start at round hexadecimal numbers.  For
example Cyrillic is at 0x0400 through 0x4FF <
http://unicode.org/charts/PDF/U0400.pdf>.

The only decimal fact I remember about Unicode is that the largest
code-point is 1114111 - a palindrome!
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Matthias welp
Dear Mikhail,

With python3.6 you can use format strings to get very close to your
desired behaviour:

f"{48:c}" == "0"
f"{:c}" == chr()

It works with variables too:

charvalue = 48
f"{charcvalue:c}" == chr(charvalue) # == "0"


This is only 1 character overhead + 1 character extra per char
formatted compared to your example. And as an extra you can use
hex strings (f"{0x30:c}" == "0") and any other integer literal you might want.

I don't see the added value of making character escapes in a non-default
way only (chars escaped + 1) bytes shorter, with the added maintenance
and development cost.

I think that you can do a lot with f-strings, and using the built-in formatting
options you can already get the behaviour you want in Python 3.6, months
earlier than the next opportunity (Python 3.7).

Check out the formatting options for integers and other built-in types here:
https://docs.python.org/3.6/library/string.html#format-specification-mini-language

I hope this helps solve your apparent usability problem.


-Matthias

On 8 December 2016 at 03:07, Mikhail V  wrote:
> On 8 December 2016 at 01:57, Nick Timkovich  wrote:
>>> hex notation not so readable and anyway decimal is kind of standard way to
>>> represent numbers
>>
>>
>> Can you cite some examples of Unicode reference tables I can look up a
>> decimal number in? They seem rare; perhaps in a list as a secondary column,
>> but they're not organized/grouped decimally. Readability counts, and
>> introducing a competing syntax will make it harder for others to read.
>
> There were links to such table in previos discussion. Googling
> "unicode table decimal" and
> first link will it be.
> I think most online tables include decimals as well, usually as tuples
> of 8-bit decimals.
> Also earlier the decimal code was the first column in most tables, but
> it somehow settled in
> peoples' minds that hex reference should be preferred, for no solid reason 
> IMO.
> One reason I think due to HTML standards which started to use it in html files
> long ago and had much influence later, but one should understand,
> that is just for brevity in most cases. Other reason is, file viewers
> show hex by
> default, but that is just misfortune, nothin besides brevity and 4-bit
> word alignment
> gives the hex notation unfortunatly, at least in its current typeface.
> This was discussed actually in that thread.
> Many people also think they are cool hackers if they make everything in hex :)
> In some cases it is worth it, but not this case IMO. Mainly for
> bitwise stuff, but
> then one should look into binary/trinary/quaternary representation
> depending on nature
> of operations and hardware.
>
> Yes there is unicode table pagination correspondence in hex reference,
> but that hardly plays
> any positive role for real applications, most of the time I need to
> look in my code
> and also perform number operations on *specific* ranges and codes, but not
> on whole pages of the table. This could only play role if I do
> low-level filtering of large files
> and want to filter out data after character's page, but that is the
> only positive thing
> I can think of, and I don't think it is directly for Python.
>
> Imagine some cryptography exercise - you take 27 units, you just give
> them numbers (0..26)
> and you do calculations, yes you can view results as hex numbers, but
> I don't do it and most people
> don't and should not, since why? It is ugly and not readable.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Mikhail V
On 8 December 2016 at 01:52, MRAB  wrote:
> On 2016-12-07 23:52, Mikhail V wrote:
...
>> =
>> Proposal: I would want to have a possibility to input it *by decimals*:
>>
>> s = "first cyrillic letters: \{1040}\{1041}\{1042}"
>> or:
>> s = "first cyrillic letters: \(1040)\(1041)\(1042)"
>>
>> =
>>

> It's usually the case that escapes are \ followed by an ASCII-range letter
> or digit; \ followed by anything else makes it a literal, even if it's a
> metacharacter, e.g. " terminates a string that starts with ", but \" is a
> literal ", so I don't like \{...}.
>
> Perl doesn't have \u... or \U..., it has \x{...} instead, and Python already
> has \N{...}, so:
>
> s = "first cyrillic letters: \d{1040}\d{1041}\d{1042}"
>
> might be better,

I like this and I agree this corresponds the current style better .

> but I'm still -1 because hex is usual when referring to
> Unicode codepoints.

:-(
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Mikhail V
On 8 December 2016 at 01:57, Nick Timkovich  wrote:
>> hex notation not so readable and anyway decimal is kind of standard way to
>> represent numbers
>
>
> Can you cite some examples of Unicode reference tables I can look up a
> decimal number in? They seem rare; perhaps in a list as a secondary column,
> but they're not organized/grouped decimally. Readability counts, and
> introducing a competing syntax will make it harder for others to read.

There were links to such table in previos discussion. Googling
"unicode table decimal" and
first link will it be.
I think most online tables include decimals as well, usually as tuples
of 8-bit decimals.
Also earlier the decimal code was the first column in most tables, but
it somehow settled in
peoples' minds that hex reference should be preferred, for no solid reason IMO.
One reason I think due to HTML standards which started to use it in html files
long ago and had much influence later, but one should understand,
that is just for brevity in most cases. Other reason is, file viewers
show hex by
default, but that is just misfortune, nothin besides brevity and 4-bit
word alignment
gives the hex notation unfortunatly, at least in its current typeface.
This was discussed actually in that thread.
Many people also think they are cool hackers if they make everything in hex :)
In some cases it is worth it, but not this case IMO. Mainly for
bitwise stuff, but
then one should look into binary/trinary/quaternary representation
depending on nature
of operations and hardware.

Yes there is unicode table pagination correspondence in hex reference,
but that hardly plays
any positive role for real applications, most of the time I need to
look in my code
and also perform number operations on *specific* ranges and codes, but not
on whole pages of the table. This could only play role if I do
low-level filtering of large files
and want to filter out data after character's page, but that is the
only positive thing
I can think of, and I don't think it is directly for Python.

Imagine some cryptography exercise - you take 27 units, you just give
them numbers (0..26)
and you do calculations, yes you can view results as hex numbers, but
I don't do it and most people
don't and should not, since why? It is ugly and not readable.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Nick Timkovich
>
> hex notation not so readable and anyway decimal is kind of standard way to
> represent numbers


Can you cite some examples of Unicode reference tables I can look up a
decimal number in? They seem rare; perhaps in a list as a secondary column,
but they're not organized/grouped decimally. Readability counts, and
introducing a competing syntax will make it harder for others to read.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Terry Reedy

On 12/7/2016 7:22 PM, Mikhail V wrote:

On 8 December 2016 at 01:13, Nick Timkovich  wrote:

Out of curiosity, why do you prefer decimal values to refer to Unicode code
points? Most references, http://unicode.org/charts/PDF/U0400.pdf (official)
or https://en.wikibooks.org/wiki/Unicode/Character_reference/-0FFF ,
prefer to refer to them by hexadecimal as the planes and ranges are broken
up by hex values.


Well, there was a huge discussion in October, see the subject name.
Just didnt want it to go again in that direction.
So in short hex notation not so readable and anyway decimal is
kind of standard way to represent numbers and I treat string as a number array
when I am processing it, so hex simply is redundant and not needed for me.


I sympathize with your preference, but ... Perhap the hex numbers would 
bother you less if you thought of them as 'serial numbers'.  It is 
standard for 'serial numbers' to include letters.  It is also common for 
digit-letter serial numbers to have meaningful fields, as as do the hex 
versions of unicode serial numbers.  The decimal versions are 
meaningless except as strict sequencers.


--
Terry Jan Reedy


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread MRAB

On 2016-12-07 23:52, Mikhail V wrote:

In past discussion about inputing and printing characters,
I was proposing decimal notation instead of hex.
Since the discussion was lost in off-topic talks, I'll try to
summarise my idea better.

I use ASCII only for code input (there are good reasons for that).
Here I'll use Python 3.6, and Windows 7, so I can use print() with unicode
directly and it works now in system console.

Suppose I only start programming and want to do some character manipulation.
The vey first thing I would probably start with is a simple output for
latin and cyrillic capital letters:

caps_lat = ""
for o in range(65, 91):
caps_lat =  caps_lat + chr(o)
print (caps_lat)

caps_cyr = ""
for o in range(1040, 1072):
caps_cyr =  caps_cyr + chr(o)
print (caps_cyr)


Which prints:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ


Say, I want now to input something direct in code:

s = "first cyrillic letters: " + chr(1040) + chr(1041) + chr(1042)

Which works fine and has clean look. However it is not very convinient
because of much typing and also, if I generate such strings,
adds a bit more complexity. But in general it is fine, and I use this
method currently.

=
Proposal: I would want to have a possibility to input it *by decimals*:

s = "first cyrillic letters: \{1040}\{1041}\{1042}"
or:
s = "first cyrillic letters: \(1040)\(1041)\(1042)"


> =
>
It's usually the case that escapes are \ followed by an ASCII-range 
letter or digit; \ followed by anything else makes it a literal, even if 
it's a metacharacter, e.g. " terminates a string that starts with ", but 
\" is a literal ", so I don't like \{...}.


Perl doesn't have \u... or \U..., it has \x{...} instead, and Python 
already has \N{...}, so:


s = "first cyrillic letters: \d{1040}\d{1041}\d{1042}"

might be better, but I'm still -1 because hex is usual when referring to 
Unicode codepoints.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Ethan Furman

On 12/07/2016 03:52 PM, Mikhail V wrote:


In past discussion about inputing and printing characters,
I was proposing decimal notation instead of hex.
Since the discussion was lost in off-topic talks, I'll try to
summarise my idea better.


While the discussion did range far and wide, one thing that was fairly constant 
is that the benefit of adding one more way to represent unicode characters is 
not worth the work involved to make it happen; and that using hexadecimal to 
reference unicode characters is nearly universal.

To sum up:  even if you wrote all the code yourself, it would not be accepted.

--
~Ethan~
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Mikhail V
On 8 December 2016 at 01:13, Nick Timkovich  wrote:
> Out of curiosity, why do you prefer decimal values to refer to Unicode code
> points? Most references, http://unicode.org/charts/PDF/U0400.pdf (official)
> or https://en.wikibooks.org/wiki/Unicode/Character_reference/-0FFF ,
> prefer to refer to them by hexadecimal as the planes and ranges are broken
> up by hex values.

Well, there was a huge discussion in October, see the subject name.
Just didnt want it to go again in that direction.
So in short hex notation not so readable and anyway decimal is
kind of standard way to represent numbers and I treat string as a number array
when I am processing it, so hex simply is redundant and not needed for me.

Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Nick Timkovich
Out of curiosity, why do you prefer decimal values to refer to Unicode code
points? Most references, http://unicode.org/charts/PDF/U0400.pdf (official)
or https://en.wikibooks.org/wiki/Unicode/Character_reference/-0FFF ,
prefer to refer to them by hexadecimal as the planes and ranges are broken
up by hex values.

On Wed, Dec 7, 2016 at 5:52 PM, Mikhail V  wrote:

> In past discussion about inputing and printing characters,
> I was proposing decimal notation instead of hex.
> Since the discussion was lost in off-topic talks, I'll try to
> summarise my idea better.
>
> I use ASCII only for code input (there are good reasons for that).
> Here I'll use Python 3.6, and Windows 7, so I can use print() with unicode
> directly and it works now in system console.
>
> Suppose I only start programming and want to do some character
> manipulation.
> The vey first thing I would probably start with is a simple output for
> latin and cyrillic capital letters:
>
> caps_lat = ""
> for o in range(65, 91):
> caps_lat =  caps_lat + chr(o)
> print (caps_lat)
>
> caps_cyr = ""
> for o in range(1040, 1072):
> caps_cyr =  caps_cyr + chr(o)
> print (caps_cyr)
>
>
> Which prints:
> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
>
>
> Say, I want now to input something direct in code:
>
> s = "first cyrillic letters: " + chr(1040) + chr(1041) + chr(1042)
>
> Which works fine and has clean look. However it is not very convinient
> because of much typing and also, if I generate such strings,
> adds a bit more complexity. But in general it is fine, and I use this
> method currently.
>
> =
> Proposal: I would want to have a possibility to input it *by decimals*:
>
> s = "first cyrillic letters: \{1040}\{1041}\{1042}"
> or:
> s = "first cyrillic letters: \(1040)\(1041)\(1042)"
>
> =
>
> This is more compact and seems not very contradictive with
> current Python escape characters in string literals.
> So backslash is a start of some escaping in most cases.
>
> For me most important is that in such way I would avoid
> any presence of hex numbers in strings, which I find very good
> for readability and for me it is very convinient since I use decimals
> for processing everywhere (and encourage everyone to do so).
>
> So this is my proposal, any comments on this are appreciated.
>
>
> PS:
>
> Currently Python 3 supports these in addition to \x:
> (from https://docs.python.org/3/howto/unicode.html)
> """
> If you can’t enter a particular character in your editor or want to keep
> the source code ASCII-only for some reason, you can also use escape
> sequences in string literals.
>
> >>> "\N{GREEK CAPITAL LETTER DELTA}"  # Using the character name
> >>> "\u0394"  # Using a 16-bit hex value
> >>> "\U0394"  # Using a 32-bit hex value
>
> """
> So I have many possibilities and all of them strangely contradicts with
> my image of intuitive and readable. Well, using charater name is readable,
> but seriously not much of a practical solution for input, but could be
> very useful
> for printing description of a character.
>
>
> Mikhail
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

2016-12-07 Thread Mikhail V
In past discussion about inputing and printing characters,
I was proposing decimal notation instead of hex.
Since the discussion was lost in off-topic talks, I'll try to
summarise my idea better.

I use ASCII only for code input (there are good reasons for that).
Here I'll use Python 3.6, and Windows 7, so I can use print() with unicode
directly and it works now in system console.

Suppose I only start programming and want to do some character manipulation.
The vey first thing I would probably start with is a simple output for
latin and cyrillic capital letters:

caps_lat = ""
for o in range(65, 91):
caps_lat =  caps_lat + chr(o)
print (caps_lat)

caps_cyr = ""
for o in range(1040, 1072):
caps_cyr =  caps_cyr + chr(o)
print (caps_cyr)


Which prints:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ


Say, I want now to input something direct in code:

s = "first cyrillic letters: " + chr(1040) + chr(1041) + chr(1042)

Which works fine and has clean look. However it is not very convinient
because of much typing and also, if I generate such strings,
adds a bit more complexity. But in general it is fine, and I use this
method currently.

=
Proposal: I would want to have a possibility to input it *by decimals*:

s = "first cyrillic letters: \{1040}\{1041}\{1042}"
or:
s = "first cyrillic letters: \(1040)\(1041)\(1042)"

=

This is more compact and seems not very contradictive with
current Python escape characters in string literals.
So backslash is a start of some escaping in most cases.

For me most important is that in such way I would avoid
any presence of hex numbers in strings, which I find very good
for readability and for me it is very convinient since I use decimals
for processing everywhere (and encourage everyone to do so).

So this is my proposal, any comments on this are appreciated.


PS:

Currently Python 3 supports these in addition to \x:
(from https://docs.python.org/3/howto/unicode.html)
"""
If you can’t enter a particular character in your editor or want to keep
the source code ASCII-only for some reason, you can also use escape
sequences in string literals.

>>> "\N{GREEK CAPITAL LETTER DELTA}"  # Using the character name
>>> "\u0394"  # Using a 16-bit hex value
>>> "\U0394"  # Using a 32-bit hex value

"""
So I have many possibilities and all of them strangely contradicts with
my image of intuitive and readable. Well, using charater name is readable,
but seriously not much of a practical solution for input, but could be
very useful
for printing description of a character.


Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/