Re: .title() - annoying mistake

2021-03-22 Thread Gene Heskett
On Monday 22 March 2021 22:31:27 Richard Damon wrote:

> On 3/22/21 4:20 PM, Christian Gollwitzer wrote:
> > Am 22.03.21 um 16:03 schrieb Robert Latest:
> >> Chris Angelico wrote:
> >>> Cool thing is, nobody in Python needs to maintain anything here.
> >>
> >> That's great because I'm actually having trouble with sending log
> >> messages over
> >> the socket conection you helped me with, would you mind having a
> >> look?
> >
> > You misunderstood the "here".
> >
> > (native German as well, but I think it means "for keeping .title()
> > up to date)
> >
> > I agree with Chris that .title() can be useful for "title-casing" a
> > single character, whatever that means. It should be documented,
> > though, that it is not suitable to title-case a string, not even in
> > English.
> >
> > Christian
>
> Part of the difficulty in getting clear documentation for this is that
> is it IS somewhat complicated. You title-case a string by Title-casing
> the first character of each word (and the primary issue error that
> started this was the definition of a 'word'), and lower casing the
> rest of the word. The complication comes in that title-casing a
> character 99.99% of the time doesn't give you a character in
> title-case, but more often in upper-case (or uncased). There are
> apparently only 31 actual characters that are 'Title-Case'. Thus the
> action of 'title-casing' a character is a bit strange to describe,
> especially to people used to simple languages which don't have any of
> the characters that cause the issue.
>
> We don't seem to have a problme that upper doesn't always return a
> true 'upper-case', like for '1' because we realize that many character
> don't have case, so it gets largly just assumed we know that. For
> title case, the fact that almost all characters do NOT have a
> 'title-case' form makes things a bit more awkward.
>
> Yes, perhaps the documentation could be made a bit more clear. I do
> note that the documentation for str.capitalize() does talk about using
> actual title case characters if the first character is a digraph.
> Something like that might make sense in the description of str.title()
>
> Note, that str.istitle() doesn't actually check if the character is a
> real 'title case' character, but that the string follows a rule
> similar to what str.title() produces. I am not positive that its
> description exactly matches what .title() produces, but it close, and
> the way it is written, "Python's".istitle() is False, as the s at the
> end needs to be uppper case to satisfy as ' is uncased, so the next
> cased character must be upper case.
>
> --
> Richard Damon

I am as tired of this thread as anybody here. To me, it must be capable 
to subbing a considerably more caligraphic attention getting font and  a 
bit larger in order for it to be worth its space on the drive. We can do 
that in openoffice and its ilk with a bit of bother, but it can be done.  
Learning how to use this just adds more bother because you can do it by 
hand quicker than looking up the man page for this. OTOH, in the average 
title, that should be restricted to the first character only.

However, we've beat this horse to death, so it is not going to rear up 
and win the next Derby. So dig a hole and bury it please.

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Richard Damon
On 3/22/21 4:20 PM, Christian Gollwitzer wrote:
> Am 22.03.21 um 16:03 schrieb Robert Latest:
>> Chris Angelico wrote:
>>> Cool thing is, nobody in Python needs to maintain anything here.
>>
>> That's great because I'm actually having trouble with sending log
>> messages over
>> the socket conection you helped me with, would you mind having a look?
>
> You misunderstood the "here".
>
> (native German as well, but I think it means "for keeping .title() up
> to date)
>
> I agree with Chris that .title() can be useful for "title-casing" a
> single character, whatever that means. It should be documented,
> though, that it is not suitable to title-case a string, not even in
> English.
>
> Christian

Part of the difficulty in getting clear documentation for this is that
is it IS somewhat complicated. You title-case a string by Title-casing
the first character of each word (and the primary issue error that
started this was the definition of a 'word'), and lower casing the rest
of the word. The complication comes in that title-casing a character
99.99% of the time doesn't give you a character in title-case, but more
often in upper-case (or uncased). There are apparently only 31 actual
characters that are 'Title-Case'. Thus the action of 'title-casing' a
character is a bit strange to describe, especially to people used to
simple languages which don't have any of the characters that cause the
issue.

We don't seem to have a problme that upper doesn't always return a true
'upper-case', like for '1' because we realize that many character don't
have case, so it gets largly just assumed we know that. For title case,
the fact that almost all characters do NOT have a 'title-case' form
makes things a bit more awkward.

Yes, perhaps the documentation could be made a bit more clear. I do note
that the documentation for str.capitalize() does talk about using actual
title case characters if the first character is a digraph. Something
like that might make sense in the description of str.title()

Note, that str.istitle() doesn't actually check if the character is a
real 'title case' character, but that the string follows a rule similar
to what str.title() produces. I am not positive that its description
exactly matches what .title() produces, but it close, and the way it is
written, "Python's".istitle() is False, as the s at the end needs to be
uppper case to satisfy as ' is uncased, so the next cased character must
be upper case.

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread dn via Python-list
On 23/03/2021 10.00, Avi Gross via Python-list wrote:
> Speaking for myself, I am beyond tired of this topic, however informative
> parts have been.

+1


> I will say it is irrational to try to impose rationally across all possible
> languages, let alone people like me who often combine 3 or more language in
> a single sentence when talking to others like myself with a dynamic to
> nonsensical grammar. Try capitalizing someone's name when they insist on
> being known by a purely numerical name like ..., or just 7 of 9 or even
> !@zq. 

Further reading: "Falsehoods Programmers Believe About Names"
(https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/)

...
> I have seen setups where a programmer makes every imaginable function they
> can think of but at some later point, some profiling of actual usage shows
> that 80% of them are NEVER used. Often, that is because nobody reads all the
> documentation to find out if it even exists or there is a simple workaround.
> If the only thing bothering you is that a small list of words like FBI comes
> out wrong, it is simple enough to write a function that post-processes the
> result of title() and changes those words back.

Pareto principle ~ 80:20 rule, or should that be written "80/20 rule" or
maybe "80/20 Rule"...


Python gives you the choice to use (or not use) many facilities. You may
also choose to rename such facilities, or to re-use Python's own names
to customise functionality. You have complete freedom to use Python in
any way(s) you see fit. Thus:-

Freedom
noun
UK  /ˈfriː.dəm/ US  /ˈfriː.dəm/
 the condition or right of being able or allowed to do, say, think, etc.
whatever you want to, without being controlled or limited:
Cambridge Dictionary

-- 
Regards,
=dn
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: .title() - annoying mistake

2021-03-22 Thread Avi Gross via Python-list
Speaking for myself, I am beyond tired of this topic, however informative
parts have been.

I will say it is irrational to try to impose rationally across all possible
languages, let alone people like me who often combine 3 or more language in
a single sentence when talking to others like myself with a dynamic to
nonsensical grammar. Try capitalizing someone's name when they insist on
being known by a purely numerical name like ..., or just 7 of 9 or even
!@zq. 

So, yes, a purist might want a truly comprehensive piece of code larger than
say all the code in Microsoft WORD, that first figures out what language
might be used and uses locale and guesswork to even try to narrow down that
the words are meant to be treated as they are in Bavaria and see if the use
is informal or say adhering to the rules for submitting in APA format or
whatever. We are talking about an extremely hard problem that will still
likely make mistakes if I am from Bavaria but submitting a science fiction
article mimicking what I imagine might be the way it will be written in
3010.

So I suggest we take the source for the title function and rename it
something enlightening like changeCaseRandomMethod42dInitialUpperElseLower()
so it can be used mysteriously by anyone that wants. Then deprecate the use
of title() while keeping that as some form of alias for the fantastic new
name. Anyone wishing to have title mean anything else, feel free to redefine
it.

But that jokingly does not mean there is no room for improvement. As an
example, people often would like words like "FBI" to be left alone and not
changed to "Fbi" or in some other contexts, not be flagged as a spelling
error let alone corrected. I suspect the same or relate functions might
accept an optional arguments like maintainAllCAPS=TRUE so FBI is left alone,
albeit LOTUS123 might become Lotus123 or not.

I have seen setups where a programmer makes every imaginable function they
can think of but at some later point, some profiling of actual usage shows
that 80% of them are NEVER used. Often, that is because nobody reads all the
documentation to find out if it even exists or there is a simple workaround.
If the only thing bothering you is that a small list of words like FBI comes
out wrong, it is simple enough to write a function that post-processes the
result of title() and changes those words back.

If you designed a brand new language core today, you may indeed want to
leave title() out of the core but include it as an optional module, perhaps
with a more descriptive name.

Tell me if the base should have functions called camelCase(), PascalCase(),
arrayHungarianCase() or underscore_case() 


-Original Message-
From: Python-list  On
Behalf Of Christian Gollwitzer
Sent: Monday, March 22, 2021 4:21 PM
To: python-list@python.org
Subject: Re: .title() - annoying mistake

Am 22.03.21 um 16:03 schrieb Robert Latest:
> Chris Angelico wrote:
>> Cool thing is, nobody in Python needs to maintain anything here.
> 
> That's great because I'm actually having trouble with sending log 
> messages over the socket conection you helped me with, would you mind
having a look?

You misunderstood the "here".

(native German as well, but I think it means "for keeping .title() up to
date)

I agree with Chris that .title() can be useful for "title-casing" a single
character, whatever that means. It should be documented, though, that it is
not suitable to title-case a string, not even in English.

Christian
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Christian Gollwitzer

Am 22.03.21 um 16:03 schrieb Robert Latest:

Chris Angelico wrote:

Cool thing is, nobody in Python needs to maintain anything here.


That's great because I'm actually having trouble with sending log messages over
the socket conection you helped me with, would you mind having a look?


You misunderstood the "here".

(native German as well, but I think it means "for keeping .title() up to 
date)


I agree with Chris that .title() can be useful for "title-casing" a 
single character, whatever that means. It should be documented, though, 
that it is not suitable to title-case a string, not even in English.


Christian
--
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Robert Latest via Python-list
Benjamin Schollnick wrote:

> I’m sorry, but it’s as if he’s arguing for the sake of arguing.  It’s
> starting to feel very unproductive, and unnecessary.

That was never five minutes just now!

robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Chris Angelico
On Tue, Mar 23, 2021 at 5:16 AM Karen Shaeffer via Python-list
 wrote:
>
> Hi Chris,
> Thanks for your comment.
>
> > Python doesn't work with UTF-8 encoded code points; it works with
> > Unicode code points. Are you looking for something that checks whether
> > something is a palindrome, or locates palindromes within it?
> >
> > def is_palindrome(txt):
> >return txt == txt[::-1]
> >
> > Easy.
>
> Of course, its easy. Its a pythonic idiom! But it doesn’t work. And you know 
> that. You even explained a few reasons why it doesn’t work below. There are 
> many more instances of strings that do not work. Here are two:
>
> idx = 6A man, a plan, a canal: Panama   is_palindrome() = False
> idx = 17ab́cdeedcb́a   is_palindrome() = False
>
> The palindrome isn’t worth any more time. It isn’t even a good example.
>
> In my experience processing unstructured, multilingual text, you encounter a 
> wide array of variances in both the text and in the encoding details, 
> including outright errors. You have to account for all of them, because 
> 99.99% of that text is valuable to you.
>
> The key idea: If you care about the details, working with unstructured 
> multi-lingual text is complicated. There are no easy solutions.
>
>
> >
> > Efficiently finding substring palindromes would be a bit harder, but
> > that'd be true even if you restricted it to ASCII. The advantage of
> > Python's way of doing it is that, if you have a method that would work
> > with ASCII bytes, the exact same thing will work with a Unicode
> > string.
> >
> > There's another big wrinkle not touched here, and that's what to do
> > with combining characters. Python makes it easy to normalize text as
> > much as is possible, and an NFC normalization would help a lot, but
> > it's not going to do everything. So you may want to first define a
> > proper way to split a string into whatever you're defining a character
> > to be, and that's a very difficult problem, regardless of programming
> > language. For example, Arabic text changes in visual shape when
> > letters are next to each other, and Greek has two different forms for
> > the letter sigma (U+03C2 and U+03C3) - should those distinctions
> > affect palindromminess? What about ligatures - is U+FB01 "fi" a single
> > character, or should it be matched by "if" on the other end?
> >
> > What part of this is trivial in Go?
>
> Go is simpler than Python. Both languages have the capabilities to solve any 
> text processing problem. I’m still learning Go, so I can’t really say more.
>
> Personally, I like Python for text processing. You can usually get 
> satisfactory results very quickly for most of the input space. And if you 
> don’t care about all the gotchas, then you are good to go.
>
> I have no more time for this. Thanks for your comment. I learned a little 
> reading the long thread dealing with .title(). (chuckles ;)
>

Hey, you're the one who brought up palindrome testing as a difficult
problem in Python :) Your post implied that it was easier in Go, and I
can't see that that's possible.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Karen Shaeffer via Python-list
Hi Chris,
Thanks for your comment.

> Python doesn't work with UTF-8 encoded code points; it works with
> Unicode code points. Are you looking for something that checks whether
> something is a palindrome, or locates palindromes within it?
> 
> def is_palindrome(txt):
>return txt == txt[::-1]
> 
> Easy.

Of course, its easy. Its a pythonic idiom! But it doesn’t work. And you know 
that. You even explained a few reasons why it doesn’t work below. There are 
many more instances of strings that do not work. Here are two:

idx = 6A man, a plan, a canal: Panama   is_palindrome() = False
idx = 17ab́cdeedcb́a   is_palindrome() = False

The palindrome isn’t worth any more time. It isn’t even a good example.

In my experience processing unstructured, multilingual text, you encounter a 
wide array of variances in both the text and in the encoding details, including 
outright errors. You have to account for all of them, because 99.99% of that 
text is valuable to you.

The key idea: If you care about the details, working with unstructured 
multi-lingual text is complicated. There are no easy solutions.


> 
> Efficiently finding substring palindromes would be a bit harder, but
> that'd be true even if you restricted it to ASCII. The advantage of
> Python's way of doing it is that, if you have a method that would work
> with ASCII bytes, the exact same thing will work with a Unicode
> string.
> 
> There's another big wrinkle not touched here, and that's what to do
> with combining characters. Python makes it easy to normalize text as
> much as is possible, and an NFC normalization would help a lot, but
> it's not going to do everything. So you may want to first define a
> proper way to split a string into whatever you're defining a character
> to be, and that's a very difficult problem, regardless of programming
> language. For example, Arabic text changes in visual shape when
> letters are next to each other, and Greek has two different forms for
> the letter sigma (U+03C2 and U+03C3) - should those distinctions
> affect palindromminess? What about ligatures - is U+FB01 "fi" a single
> character, or should it be matched by "if" on the other end?
> 
> What part of this is trivial in Go?

Go is simpler than Python. Both languages have the capabilities to solve any 
text processing problem. I’m still learning Go, so I can’t really say more.

Personally, I like Python for text processing. You can usually get satisfactory 
results very quickly for most of the input space. And if you don’t care about 
all the gotchas, then you are good to go.

I have no more time for this. Thanks for your comment. I learned a little 
reading the long thread dealing with .title(). (chuckles ;)

Humbly,
Karen


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Benjamin Schollnick
>> That's not true for digraphs where there is a third, distinct and
>> different "title" case. I think the doc should state that the initial
>> character is converted to titlecase. A parenthentical statement that
>> titlecase is usually but not always equal to uppercase would be nice,
>> but the current statement is obsolete and not correct in all, um...
>> cases.
> 
> Fair enough, but the trouble is that getting too pedantic in a
> docstring just makes it read like IBM documentation. :)

And actually conversely makes it harder to keep updating, because you’ll need 
to document every and all edge-cases, and then need to know when one of those 
edge cases breaks, etc.  

The core concept is documented, and it’s pretty straight-forward.  

I’m sorry, but it’s as if he’s arguing for the sake of arguing.  It’s starting 
to feel very unproductive, and unnecessary.

- Benjamin


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Benjamin Schollnick
>> I guess it depends on what you mean by "character". In my mind, the
>> first character of string s is s[1], and I would then expect that
>> 
>> s.title()[1] == s[1].upper()
>> 
> 
> I presume you mean [0], but no, that's not the case. A single
> character can titlecase to two characters, or to a single character
> that isn't the same as if you uppercase or lowercase it. See examples
> in previous post.

Or Kanji, etc.  Where a single character can represent more than one in a 
different unicode standard, as I understand. 

- Benjamin

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Chris Angelico
On Tue, Mar 23, 2021 at 2:28 AM Grant Edwards  wrote:
> The document for str.title() states that the initial character of each
> word is converted to uppercase.  My point is that for characters that
> remain single characters regardless of case that means (to me) that
>
>   s.title()[0] == s[0].upper()
>
> or for a single character string
>
>   s.title() == s.upper()
>
> That's not true for digraphs where there is a third, distinct and
> different "title" case. I think the doc should state that the initial
> character is converted to titlecase. A parenthentical statement that
> titlecase is usually but not always equal to uppercase would be nice,
> but the current statement is obsolete and not correct in all, um...
> cases.

Fair enough, but the trouble is that getting too pedantic in a
docstring just makes it read like IBM documentation. :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Grant Edwards
On 2021-03-22, Chris Angelico  wrote:
> On Tue, Mar 23, 2021 at 1:18 AM Grant Edwards  
> wrote:
>
>> I guess it depends on what you mean by "character". In my mind, the
>> first character of string s is s[1], and I would then expect that
>>
>> s.title()[1] == s[1].upper()
>>
> I presume you mean [0],

Yes.

> but no, that's not the case. A single character can titlecase to two
> characters, or to a single character that isn't the same as if you
> uppercase or lowercase it. See examples in previous post.

The document for str.title() states that the initial character of each
word is converted to uppercase.  My point is that for characters that
remain single characters regardless of case that means (to me) that

  s.title()[0] == s[0].upper()

or for a single character string

  s.title() == s.upper()

That's not true for digraphs where there is a third, distinct and
different "title" case. I think the doc should state that the initial
character is converted to titlecase. A parenthentical statement that
titlecase is usually but not always equal to uppercase would be nice,
but the current statement is obsolete and not correct in all, um...
cases.

--
Grant


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Robert Latest via Python-list
Chris Angelico wrote:
> Cool thing is, nobody in Python needs to maintain anything here.

That's great because I'm actually having trouble with sending log messages over
the socket conection you helped me with, would you mind having a look?

Regards,
robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Chris Angelico
On Tue, Mar 23, 2021 at 1:18 AM Grant Edwards  wrote:
>
> On 2021-03-21, MRAB  wrote:
>
> IMO, the doc is wrong.
>
> >> Hmm, maybe it's different in 3.10, but the docs I'm seeing look fine.
> >> But maybe there's a better way to word it for both of them.
> >
> > Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64
> > bit (AMD64)] on win32
> > Type "help", "copyright", "credits" or "license" for more information.
>  help(str.title)
> > Help on method_descriptor:
> >
> > title(self, /)
> >  Return a version of the string where each word is titlecased.
> >
> >  More specifically, words start with uppercased characters and
> >  all remaining cased characters have lower case.
> >
> > '\N{LATIN CAPITAL LETTER DZ}', '\N{LATIN SMALL LETTER DZ}' and '\N{LATIN
> > CAPITAL LETTER D WITH SMALL LETTER Z}' are all digraphs, so is it
> > correct to say that .title() uppercases the first character? Kind of.
>
> I guess it depends on what you mean by "character". In my mind, the
> first character of string s is s[1], and I would then expect that
>
> s.title()[1] == s[1].upper()
>

I presume you mean [0], but no, that's not the case. A single
character can titlecase to two characters, or to a single character
that isn't the same as if you uppercase or lowercase it. See examples
in previous post.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Grant Edwards
On 2021-03-21, MRAB  wrote:

IMO, the doc is wrong.

>> Hmm, maybe it's different in 3.10, but the docs I'm seeing look fine.
>> But maybe there's a better way to word it for both of them.
> 
> Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 
> bit (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
 help(str.title)
> Help on method_descriptor:
>
> title(self, /)
>  Return a version of the string where each word is titlecased.
>
>  More specifically, words start with uppercased characters and
>  all remaining cased characters have lower case.
>
> '\N{LATIN CAPITAL LETTER DZ}', '\N{LATIN SMALL LETTER DZ}' and '\N{LATIN 
> CAPITAL LETTER D WITH SMALL LETTER Z}' are all digraphs, so is it 
> correct to say that .title() uppercases the first character? Kind of.

I guess it depends on what you mean by "character". In my mind, the
first character of string s is s[1], and I would then expect that

s.title()[1] == s[1].upper()

--
Grant


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Robert Latest via Python-list
Karsten Hilbert wrote:
> and life with that wart.

Perfectly willing to as long as everybody agrees it's a wart ;-)

robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Chris Angelico
On Tue, Mar 23, 2021 at 1:01 AM Robert Latest via Python-list
 wrote:
> I don't mind .title() being in Python. I would very much mind to be the person
> in charge of maintaining it and having to port it into new versions of Python,
> always keeping an eye on the evolution of Unicode or other standards (see
> above).
>
> It probably just comes down to me not being able to conjure up a single
> sensible use case for .title() as well as the whole concept of "title casing"
> in the context of a programming language.
>
> > The neat thing about Unicode is
>
> [many things]
>
> > The documentation sometimes shorthands things with terms like "upper
> > case" and "lower case", but that's partly because being pedantically
> > correct in a docstring doesn't actually help anything, and the code
> > itself IS correct.
>
> ...but hard to maintain and useless. I just love to hate .title() ;-)
>

Cool thing is, nobody in Python needs to maintain anything here. It's
all built on top of the published Unicode data files. It's simply a
matter of updating to the latest version of the Unicode standard! :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Robert Latest via Python-list
Chris Angelico wrote:
> There are a small number of characters which, when case folded, become
> more than one character. The sharp S from German behaves thusly:
>
 "ß".upper(), "ß".lower(), "ß".casefold(), "ß".title()
> ('SS', 'ß', 'ss', 'Ss')

Now we're getting somewhere. I'm a native German speaker and I can tell you
that this doesn't happen in the real world, simply because 'ß' never appears at
the beginning of a word and thus is never "title cased". The only occurence of
uppercase 'ß' is in all-caps text, which Python handles properly:

'bißchen'.upper()
'BISSCHEN'

...that is, properly until 2008, when the capital 'ß' was officially introduced
into German ortography:

https://en.wikipedia.org/wiki/%C3%9F#Capital_form
"Traditionally, ß did not have a capital form, although some type designers
introduced de facto capitalized variants of ß. In 2017, the Council for German
Orthography ultimately adopted capital ß, ẞ, into German orthography, ending a
long orthographic debate.[3] [...] The capital variant (U+1E9E ẞ LATIN CAPITAL
LETTER SHARP S) was encoded by ISO 10646 in 2008." So Python 3.6.8 is about 12
years behind.

As a German I also appreciate the reduced occurence of the letter combination
'SS'.

That said, the concept of "title casing" doesn't even exist in German. Titles
are spelt just like any regular sentence. I know only two definitions of the
concept "title case":

1) From Wikipedia
"Title case or headline case is a style of capitalization used for rendering
the titles of published works or works of art in English. [...]"

2) From Python (paraphrased):
"Perform an arbitrary (but defined) operation on the characters of a string."

I don't mind .title() being in Python. I would very much mind to be the person
in charge of maintaining it and having to port it into new versions of Python,
always keeping an eye on the evolution of Unicode or other standards (see
above).

It probably just comes down to me not being able to conjure up a single
sensible use case for .title() as well as the whole concept of "title casing"
in the context of a programming language.

> The neat thing about Unicode is 

[many things]

> The documentation sometimes shorthands things with terms like "upper
> case" and "lower case", but that's partly because being pedantically
> correct in a docstring doesn't actually help anything, and the code
> itself IS correct.

...but hard to maintain and useless. I just love to hate .title() ;-)

robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Chris Angelico
On Mon, Mar 22, 2021 at 9:21 PM Robert Latest via Python-list
 wrote:
>
> Chris Angelico wrote:
> > If you still, after all these posts, have not yet understood that
> > title-casing *a single character* is a significant thing,
>
> I must admit I have no idea what title-casing even is, but I'm eager to learn.

Now that's an attitude I like to see :)

> The documentation only talks about "words" and "first characters" and
> "remaining characters." So a single character gets converted to uppercase,
> whatever that may mean in the context of .title(). The .upper() method is
> different in that it only applies to "cased" characters, so .title() may or 
> may
> not work differently on single characters.
>

There are a small number of characters which, when case folded, become
more than one character. The sharp S from German behaves thusly:

>>> "ß".upper(), "ß".lower(), "ß".casefold(), "ß".title()
('SS', 'ß', 'ss', 'Ss')
>>> "ẞ".upper(), "ẞ".lower(), "ẞ".casefold(), "ẞ".title()
('ẞ', 'ß', 'ss', 'ẞ')

Serbian has another, although it can often be written with two
individual characters:

>>> "DŽ".upper(), "DŽ".lower(), "DŽ".casefold(), "DŽ".title()
('DŽ', 'dž', 'dž', 'Dž')
>>> ["U+%04X" % ord(x) for x in _]
['U+01C4', 'U+01C6', 'U+01C6', 'U+01C5']

Even in text that's in the Latin script (the one we use with English),
there are some ligatures that behave differently when titlecased:

>>> "fi".upper(), "fi".lower(), "fi".casefold(), "fi".title()
('FI', 'fi', 'fi', 'Fi')
>>> [" ".join("U+%04X" % ord(c) for c in x) for x in _]
['U+0046 U+0049', 'U+FB01', 'U+0066 U+0069', 'U+0046 U+0069']

Each of these inputs is a single character; some of them have
single-character outputs (and in the case of U+01C5, that's a specific
character that is exclusively titlecased), others have multiple.

The neat thing about Unicode is that you don't have to worry about
exactly which characters behave in which ways. You get methods that do
precisely what you need, as long as you choose the right method. For
case insensitive comparisons, there's casefold(), which is most
commonly the same as lower(), but not always; to find out if
something's a digit, use isdigit(); to fracture something into lines,
use splitlines(). They're all aware of the entire Unicode range, and
they'll reliably work even if future versions of Unicode introduce
more characters (although you might have to wait for Python to be
updated).

The documentation sometimes shorthands things with terms like "upper
case" and "lower case", but that's partly because being pedantically
correct in a docstring doesn't actually help anything, and the code
itself IS correct.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Benjamin Schollnick
Robert,

I certainly see your point.  

> My only issue is that I completely fail to see how this function would be
> useful enough to warrant the inclusion into the *core* of a general-purpose
> language, including its misleading name.

But that’s where we have to disagree.

Sure, the name cold be a bit less “misleading”, I guess we could petition to 
have it renamed to:

def nonlanguage_aware_title or  def 
title_but_not_language_aware

But I’m sorry for the majority of us, we haven’t even realized that title 
wasn’t language aware.   That has to tell you something (mainly that we’ve 
never hit these problems).  

Now that doesn’t diminish your issue here.  

But you’re a programmer, people have already suggested someway to work around 
this with RE.

The other python specific might be to actually SPLIT on the ` symbol, title 
each segment returning from split, and the JOIN them together again.

Regarding the issue of THE in an actual title, I’d suggest that RE would be the 
best solution for that, it should be easy enough to come up with RE that’ll 
work for word “xxx The xxx” and convert it to “xxx the xxx”.  

> The fact that the function's behavior
> is correctly documented doesn't make its very existence less bewildering to 
> me.

Okay, I don’t recall you actually accepting at how large of a problem you are 
asking title to address.

The standard library is not going to fix and solve everything, like Apple I 
would expect them to target ~90% of the functionality, and then let people grow 
and expand on it’s functionality.  (Sort of like my CSV wrapper)

title is roughly comparable to PHP’s ucwords.

Wow, perl takes the prize though, it’s called ucwords.

$foo = 'hello world!';
$foo = ucwords($foo); // Hello World!

$bar = 'HELLO WORLD!';
$bar = ucwords($bar); // HELLO WORLD!
$bar = ucwords(strtolower($bar)); // Hello World!

perl appears to only manipulate the characters that it’s changing, so if the 
string is already uppercase, then it’ll just look like the entire string is 
uppercase when it’s done.  

So honestly that’s a mostly useless function if you need to pass a lowercase 
string into it.

What’s in common here?  both perl and php also seem to take the ` as a word 
separator.  

Neither of them appear to be language aware.

> Consider this function:
> 
> def add_seventeen(n): '''Return n with 16.8 added''' return n + 16.8
> 
> It's like .title(): It does almost the thing its name suggests, it is 
> correctly
> documented, it is useful to anybody who happens to want 16.8 added to numbers,
> and it might erroneously be used by someone who wants exactly 17 added and
> didn't bother to read the docs.

And would be laughed at significantly.  

I agree with your example, but I don’t consider it to be equivalent.  

title does at first, and second glance work as expected.  You have 
significantly higher expectations than we do, evidently.

At minimum, what I would suggest is come up with some test case examples, and 
toss that over to the python dev team.  Show them that this isn’t working as 
you expect, and see what they say.  

While there might be a few of the python developers on this list, I doubt it.  
Instead bring this up in a productive manner.  Show the harm, and then show a 
solution.  A Proposed change, an alternative framework.

We can’t solve the problem, but if you present it properly, then maybe you can 
be part of the solution.

> 
>> And as I mentioned the sheer amount of work that would be needed would
>> probably cover a phd dissertation or more…  It’s a huge problem set to
>> respect one language, let alone all languages.  
> 
> And that's why I believe that such a function should be delegated to a natural
> language-processing library and not the core of Python.

I have to disagree.  While I might expect a better version in a natural 
language processing library, every programming language that I’m aware of 
except for BASIC, offers an equivalent to title, as long as they offer an 
equivalent to upper and lower.

Why should python not offer title in light of this?

> said, I doubt that .title() would make it into Python today if it weren't 
> there
> already. I'm having fun with this.

Ah, so while being a bit serious, I’m reading a bit too much into this.  

At this point, it’s become an interesting thought experiment for you.  

Good luck,

- Benjamin


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Robert Latest via Python-list
Chris Angelico wrote:
> If you still, after all these posts, have not yet understood that
> title-casing *a single character* is a significant thing,

I must admit I have no idea what title-casing even is, but I'm eager to learn.
The documentation only talks about "words" and "first characters" and
"remaining characters." So a single character gets converted to uppercase,
whatever that may mean in the context of .title(). The .upper() method is
different in that it only applies to "cased" characters, so .title() may or may
not work differently on single characters. 

robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Chris Angelico
On Mon, Mar 22, 2021 at 8:26 PM Robert Latest via Python-list
 wrote:
>
> Benjamin Schollnick wrote:
> >
> >> I agree with everything you say. Especially the open source part. But
> >> wouldn't you agree that .title() with all its arbitrary specificity to
> >> appear in the very core of a general purpose language is quite an oddity?
> >
> > No, because it book ends the issue.
> >
> > Upper - Converts everything to uppercase Lower - Converts everything to
> > lowercase
> >
> > Title - Covers the cases in-between upper/lower.
>
> My only issue is that I completely fail to see how this function would be
> useful enough to warrant the inclusion into the *core* of a general-purpose
> language, including its misleading name. The fact that the function's behavior
> is correctly documented doesn't make its very existence less bewildering to 
> me.
> Consider this function:
>
> def add_seventeen(n): '''Return n with 16.8 added''' return n + 16.8
>
> It's like .title(): It does almost the thing its name suggests, it is 
> correctly
> documented, it is useful to anybody who happens to want 16.8 added to numbers,
> and it might erroneously be used by someone who wants exactly 17 added and
> didn't bother to read the docs.

If you still, after all these posts, have not yet understood that
title-casing *a single character* is a significant thing, then please
do not continue to complain about language design. Without this
method, how do you correctly title-case one character? What do you
use? upper()? lower()? casefold()?

> BTW I have no beef with the person who invented .title() nor with anybody who
> uses it. I know that everybody can join the Python development community and
> propose the removal of .title() and the inclusion of add_seventeen(). That
> said, I doubt that .title() would make it into Python today if it weren't 
> there
> already. I'm having fun with this.
>

I'm glad you're having fun, because being wrong can be a lot of unfun sometimes.

I'm done arguing with you. You're a brick wall and you refuse to
comprehend Unicode. The method was not invented for ASCII.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Karsten Hilbert
Am Mon, Mar 22, 2021 at 09:22:51AM + schrieb Robert Latest via Python-list:

> >> I agree with everything you say. Especially the open source part. But
> >> wouldn't you agree that .title() with all its arbitrary specificity to
> >> appear in the very core of a general purpose language is quite an oddity?
> >
> > No, because it book ends the issue.
> >
> > Upper - Converts everything to uppercase Lower - Converts everything to
> > lowercase
> >
> > Title - Covers the cases in-between upper/lower.
>
> My only issue is that I completely fail to see how this function would be
> useful enough to warrant the inclusion into the *core* of a general-purpose
> language, including its misleading name. The fact that the function's behavior
> is correctly documented doesn't make its very existence less bewildering to 
> me.

Its naming may be unfortunate, its existence may be
bewildering. However, it's now there, and for historical
reasons, supposedly.

It won't be removed or renamed in all likelihood. The best
one can do is to suggest a documentation patch (not fix)
like so:

The algorithm uses a simple language-independent

[...] and context-naive, not locale related, [...]

definition of a word [...]

and life with that wart.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Robert Latest via Python-list
Grant Edwards wrote:
> On 2021-03-20, Robert Latest via Python-list  wrote:
>> Mats Wichmann wrote:
>>> The problem is that there isn't a standard for title case,
>>
>> The problem is that we owe the very existence of the .title() method to too
>> much weed being smoked during Python development. It makes specific
>> assumptions about a specific use case of one specific language. It doesn't
>> get more idiotic, frankly.
>
> Ah, you've never used PHP then?
>
> I haven't checked but it's a fair bit that PHP has 3-4 different built-in
> ways to do it, and they're all broken in interestingly unpredictable ways.

I believe that 100%. PHP is the reason I switched to Python/WSGI, and I'm
loving it. 

robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-22 Thread Robert Latest via Python-list
Benjamin Schollnick wrote:
>
>> I agree with everything you say. Especially the open source part. But
>> wouldn't you agree that .title() with all its arbitrary specificity to
>> appear in the very core of a general purpose language is quite an oddity?
>
> No, because it book ends the issue.
>
> Upper - Converts everything to uppercase Lower - Converts everything to
> lowercase
>
> Title - Covers the cases in-between upper/lower.  

My only issue is that I completely fail to see how this function would be
useful enough to warrant the inclusion into the *core* of a general-purpose
language, including its misleading name. The fact that the function's behavior
is correctly documented doesn't make its very existence less bewildering to me.
Consider this function:

def add_seventeen(n): '''Return n with 16.8 added''' return n + 16.8

It's like .title(): It does almost the thing its name suggests, it is correctly
documented, it is useful to anybody who happens to want 16.8 added to numbers,
and it might erroneously be used by someone who wants exactly 17 added and
didn't bother to read the docs.

> And as I mentioned the sheer amount of work that would be needed would
> probably cover a phd dissertation or more…  It’s a huge problem set to
> respect one language, let alone all languages.  

And that's why I believe that such a function should be delegated to a natural
language-processing library and not the core of Python.

> So the only answer is to not respect the languages, and leave that up to a
> later implementation or for someone else to assist in adding in support.

And that too.

> Heck, how do we prevent it from titlecasing abbreviations?  (This is plain
> text not XML….  If it was XML it would be easy!)

And that too.

BTW I have no beef with the person who invented .title() nor with anybody who
uses it. I know that everybody can join the Python development community and
propose the removal of .title() and the inclusion of add_seventeen(). That
said, I doubt that .title() would make it into Python today if it weren't there
already. I'm having fun with this.

robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Chris Angelico
On Mon, Mar 22, 2021 at 2:00 PM Richard Damon  wrote:
> Basically, titlecasing a word IS making the first letter upper case and
> the rest lower case UNLESS the first letter is on of the 31 digraphs
> which have a special titlecase version, then that is used for the first
> letter. That gets pretty wordy for an explanation string.

It title cases. I don't understand what's a problem here. With
str.casefold(), there's no detailed explanation of how it's usually
equivalent to lowercasing - it just says that it returns a string
suitable for caseless comparisons. Unicode defines many things about
characters, and often the differences don't matter to a large subset
of those characters, but the differences exist for a reason, and lying
in the docstring isn't going to help anything.

It doesn't "uppercase unless it's one of this small group of
characters". It "title cases".

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Richard Damon
On 3/21/21 10:28 PM, Chris Angelico wrote:
> On Mon, Mar 22, 2021 at 12:26 PM Richard Damon  
> wrote:
>> On 3/21/21 7:31 PM, MRAB wrote:
>>> On 2021-03-21 22:30, Chris Angelico wrote:
 On Mon, Mar 22, 2021 at 9:04 AM Grant Edwards
  wrote:
> On 2021-03-21, Chris Angelico  wrote:
>> On Mon, Mar 22, 2021 at 2:16 AM Robert Latest via Python-list
>  wrote:
>>> I wonder if .title() properly capitalizes titles in any language.
> It doesn't in
>>> English (nor does it purport to), so it begs the question why it
> is there in
>>> the first place. German and Spanish don't have any special
> capitalization rules
>>> for titles; I don't know about any other languages.
>>>
>> It correctly title-cases a single character, as has been pointed out
>> already.
> Not according to the docs. The doc states that .title() converts the
> first character characger in each "word" to _upper_ case. Is the doc
> wrong?
>
> If you want titlecase, then you should call str.capitalize() which
> (again according to the doc) converts the first character to _title_
> case (starting in v3.8).
>
 Hmm, maybe it's different in 3.10, but the docs I'm seeing look fine.
 But maybe there's a better way to word it for both of them.

>>> Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928
>>> 64 bit (AMD64)] on win32
>>> Type "help", "copyright", "credits" or "license" for more information.
>> help(str.title)
>>> Help on method_descriptor:
>>>
>>> title(self, /)
>>> Return a version of the string where each word is titlecased.
>>>
>>> More specifically, words start with uppercased characters and all
>>> remaining
>>> cased characters have lower case.
>>>
>>> '\N{LATIN CAPITAL LETTER DZ}', '\N{LATIN SMALL LETTER DZ}' and
>>> '\N{LATIN CAPITAL LETTER D WITH SMALL LETTER Z}' are all digraphs, so
>>> is it correct to say that .title() uppercases the first character?
>>> Kind of.
>> I think the clarification calling them upper cased characters is close
>> enough considering that there are only 31 title cased characters, all
>> digraphs.
>>
> But it's wrong, and it would lead people to the exact error of
> thinking that it's the same as upper() on str[0] and lower() on the
> rest.
>
> ChrisA

If it didn't mention that it was generating a 'titlecase' that could be
an argument, but since for 99.99% of characters Title Casing is
identical to upper case (and that character IS called the upper case),
but for the 31 listed digraphs, it means the titlecase version of that
digraph where the first 'letter' in the digraph is like the upper case
of its equivalent, and the second 'letter' in the digraph is like the
lower case of its equivalent.

Basically, titlecasing a word IS making the first letter upper case and
the rest lower case UNLESS the first letter is on of the 31 digraphs
which have a special titlecase version, then that is used for the first
letter. That gets pretty wordy for an explanation string.

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Chris Angelico
On Mon, Mar 22, 2021 at 12:26 PM Richard Damon  wrote:
>
> On 3/21/21 7:31 PM, MRAB wrote:
> > On 2021-03-21 22:30, Chris Angelico wrote:
> >> On Mon, Mar 22, 2021 at 9:04 AM Grant Edwards
> >>  wrote:
> >>>
> >>> On 2021-03-21, Chris Angelico  wrote:
> >>> > On Mon, Mar 22, 2021 at 2:16 AM Robert Latest via Python-list
> >>>  wrote:
> >>> >
> >>> >> I wonder if .title() properly capitalizes titles in any language.
> >>> It doesn't in
> >>> >> English (nor does it purport to), so it begs the question why it
> >>> is there in
> >>> >> the first place. German and Spanish don't have any special
> >>> capitalization rules
> >>> >> for titles; I don't know about any other languages.
> >>> >>
> >>> >
> >>> > It correctly title-cases a single character, as has been pointed out
> >>> > already.
> >>>
> >>> Not according to the docs. The doc states that .title() converts the
> >>> first character characger in each "word" to _upper_ case. Is the doc
> >>> wrong?
> >>>
> >>> If you want titlecase, then you should call str.capitalize() which
> >>> (again according to the doc) converts the first character to _title_
> >>> case (starting in v3.8).
> >>>
> >>
> >> Hmm, maybe it's different in 3.10, but the docs I'm seeing look fine.
> >> But maybe there's a better way to word it for both of them.
> >>
> > Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928
> > 64 bit (AMD64)] on win32
> > Type "help", "copyright", "credits" or "license" for more information.
>  help(str.title)
> > Help on method_descriptor:
> >
> > title(self, /)
> > Return a version of the string where each word is titlecased.
> >
> > More specifically, words start with uppercased characters and all
> > remaining
> > cased characters have lower case.
> >
> > '\N{LATIN CAPITAL LETTER DZ}', '\N{LATIN SMALL LETTER DZ}' and
> > '\N{LATIN CAPITAL LETTER D WITH SMALL LETTER Z}' are all digraphs, so
> > is it correct to say that .title() uppercases the first character?
> > Kind of.
>
> I think the clarification calling them upper cased characters is close
> enough considering that there are only 31 title cased characters, all
> digraphs.
>

But it's wrong, and it would lead people to the exact error of
thinking that it's the same as upper() on str[0] and lower() on the
rest.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Richard Damon
On 3/21/21 7:31 PM, MRAB wrote:
> On 2021-03-21 22:30, Chris Angelico wrote:
>> On Mon, Mar 22, 2021 at 9:04 AM Grant Edwards
>>  wrote:
>>>
>>> On 2021-03-21, Chris Angelico  wrote:
>>> > On Mon, Mar 22, 2021 at 2:16 AM Robert Latest via Python-list
>>>  wrote:
>>> >
>>> >> I wonder if .title() properly capitalizes titles in any language.
>>> It doesn't in
>>> >> English (nor does it purport to), so it begs the question why it
>>> is there in
>>> >> the first place. German and Spanish don't have any special
>>> capitalization rules
>>> >> for titles; I don't know about any other languages.
>>> >>
>>> >
>>> > It correctly title-cases a single character, as has been pointed out
>>> > already.
>>>
>>> Not according to the docs. The doc states that .title() converts the
>>> first character characger in each "word" to _upper_ case. Is the doc
>>> wrong?
>>>
>>> If you want titlecase, then you should call str.capitalize() which
>>> (again according to the doc) converts the first character to _title_
>>> case (starting in v3.8).
>>>
>>
>> Hmm, maybe it's different in 3.10, but the docs I'm seeing look fine.
>> But maybe there's a better way to word it for both of them.
>>
> Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928
> 64 bit (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
 help(str.title)
> Help on method_descriptor:
>
> title(self, /)
>     Return a version of the string where each word is titlecased.
>
>     More specifically, words start with uppercased characters and all
> remaining
>     cased characters have lower case.
>
> '\N{LATIN CAPITAL LETTER DZ}', '\N{LATIN SMALL LETTER DZ}' and
> '\N{LATIN CAPITAL LETTER D WITH SMALL LETTER Z}' are all digraphs, so
> is it correct to say that .title() uppercases the first character?
> Kind of.

I think the clarification calling them upper cased characters is close
enough considering that there are only 31 title cased characters, all
digraphs.

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread MRAB

On 2021-03-21 22:30, Chris Angelico wrote:

On Mon, Mar 22, 2021 at 9:04 AM Grant Edwards  wrote:


On 2021-03-21, Chris Angelico  wrote:
> On Mon, Mar 22, 2021 at 2:16 AM Robert Latest via Python-list 
 wrote:
>
>> I wonder if .title() properly capitalizes titles in any language. It doesn't 
in
>> English (nor does it purport to), so it begs the question why it is there in
>> the first place. German and Spanish don't have any special capitalization 
rules
>> for titles; I don't know about any other languages.
>>
>
> It correctly title-cases a single character, as has been pointed out
> already.

Not according to the docs. The doc states that .title() converts the
first character characger in each "word" to _upper_ case. Is the doc
wrong?

If you want titlecase, then you should call str.capitalize() which
(again according to the doc) converts the first character to _title_
case (starting in v3.8).



Hmm, maybe it's different in 3.10, but the docs I'm seeing look fine.
But maybe there's a better way to word it for both of them.

Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 
bit (AMD64)] on win32

Type "help", "copyright", "credits" or "license" for more information.

help(str.title)

Help on method_descriptor:

title(self, /)
Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all 
remaining

cased characters have lower case.

'\N{LATIN CAPITAL LETTER DZ}', '\N{LATIN SMALL LETTER DZ}' and '\N{LATIN 
CAPITAL LETTER D WITH SMALL LETTER Z}' are all digraphs, so is it 
correct to say that .title() uppercases the first character? Kind of.

--
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Chris Angelico
On Mon, Mar 22, 2021 at 9:04 AM Grant Edwards  wrote:
>
> On 2021-03-21, Chris Angelico  wrote:
> > On Mon, Mar 22, 2021 at 2:16 AM Robert Latest via Python-list 
> >  wrote:
> >
> >> I wonder if .title() properly capitalizes titles in any language. It 
> >> doesn't in
> >> English (nor does it purport to), so it begs the question why it is there 
> >> in
> >> the first place. German and Spanish don't have any special capitalization 
> >> rules
> >> for titles; I don't know about any other languages.
> >>
> >
> > It correctly title-cases a single character, as has been pointed out
> > already.
>
> Not according to the docs. The doc states that .title() converts the
> first character characger in each "word" to _upper_ case. Is the doc
> wrong?
>
> If you want titlecase, then you should call str.capitalize() which
> (again according to the doc) converts the first character to _title_
> case (starting in v3.8).
>

Hmm, maybe it's different in 3.10, but the docs I'm seeing look fine.
But maybe there's a better way to word it for both of them.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Benjamin Schollnick
>> Heck, how do we prevent it from titlecasing abbreviations?  (This is plain 
>> text not XML….  If it was XML it would be easy!)
> 
> We haven't managed to teach humans how to do that, so I doubt we'll
> ever teach a simple standard library function to do it.
> 
> *cough*XMLHttpRequest*cough*

True, but I was thinking that it would be easy in XML because the XML/HTML 
would be able to have metadata helping to define the components in the text 
stream.  

And this is probably more of a pseduo-html, but that’s because I do more work 
in HTML then I do in pure XML.  XML I usually just tell the system to output, 
instead of craft by hand…  


blah blah this is a sentence that has the abbreviation 
NASA in it.


But that doesn’t help title, because it only handles pure plaintext.  

As many people have pointed out, or I think they meant to point out, there is 
no place in the text string to put metadata that would help assist parsing the 
string.  By definition the text can’t have metadata, since it’s plaintext.  

- Benjamin


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Grant Edwards
On 2021-03-21, Chris Angelico  wrote:
> On Mon, Mar 22, 2021 at 2:16 AM Robert Latest via Python-list 
>  wrote:
>
>> I wonder if .title() properly capitalizes titles in any language. It doesn't 
>> in
>> English (nor does it purport to), so it begs the question why it is there in
>> the first place. German and Spanish don't have any special capitalization 
>> rules
>> for titles; I don't know about any other languages.
>>
>
> It correctly title-cases a single character, as has been pointed out
> already.

Not according to the docs. The doc states that .title() converts the
first character characger in each "word" to _upper_ case. Is the doc
wrong?

If you want titlecase, then you should call str.capitalize() which
(again according to the doc) converts the first character to _title_
case (starting in v3.8).

--
Grant


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Grant Edwards
On 2021-03-20, Alan Bawden  wrote:
> Sibylle Koczian  writes:
>
>Am 20.03.2021 um 09:34 schrieb Alan Bawden:
>>
>> When you write that code to capitalize your book titles, you should be
>> calling .title() rather than .upper() if you are doing it right.
>>
>But that's exactly what he's doing, with a result which is documented, but
>not really satisfactory.
>
> Sorry, what I wrote was ambiguous.  What I _meant_ was that when you
> replace x.title() with my_title(x) , then in the definition of my_title
> you will be calling both .lower() and .title() on individual characters,
> but you will probably _never_ be calling .upper().

Does Python have any way to convert a character to titlecase? The
documentation for .title() explicitly states that it will convert a
character to upper case, not to title case.

--
Grant




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Grant Edwards
On 2021-03-20, Robert Latest via Python-list  wrote:
> Mats Wichmann wrote:
>> The problem is that there isn't a standard for title case,
>
> The problem is that we owe the very existence of the .title() method to too
> much weed being smoked during Python development. It makes specific 
> assumptions
> about a specific use case of one specific language. It doesn't get more
> idiotic, frankly.

Ah, you've never used PHP then?

I haven't checked but it's a fair bit that PHP has 3-4 different
built-in ways to do it, and they're all broken in interestingly
unpredictable ways.

--
Grant



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Richard Damon
On 3/21/21 2:39 PM, Benjamin Schollnick wrote:
>> I agree with everything you say. Especially the open source part. But 
>> wouldn't
>> you agree that .title() with all its arbitrary specificity to appear in the
>> very core of a general purpose language is quite an oddity?
> No, because it book ends the issue.
>
> Upper - Converts everything to uppercase
> Lower - Converts everything to lowercase
>
> Title - Covers the cases in-between upper/lower.  
>
> I’ll agree that if title was to respect language definitions, that there 
> would be a problem here…  But it specifically states otherwise.
>
> And as I mentioned the sheer amount of work that would be needed would 
> probably cover a phd dissertation or more…  It’s a huge problem set to 
> respect one language, let alone all languages.  
>
> So the only answer is to not respect the languages, and leave that up to a 
> later implementation or for someone else to assist in adding in support.
>
> Heck, how do we prevent it from titlecasing abbreviations?  (This is plain 
> text not XML….  If it was XML it would be easy!)
>
>   - Benjamin

One important thing to remember is that there ARE a few characters that
are themselves 'Title case', so we can't live with just upper and lower.
These all are 'digraphs', i.e. look like two letters, but this glyph
does act as a single character for many purposes. One example that has
been given is Dz which is different than Dz.

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Chris Angelico
On Mon, Mar 22, 2021 at 5:40 AM Benjamin Schollnick
 wrote:
> Heck, how do we prevent it from titlecasing abbreviations?  (This is plain 
> text not XML….  If it was XML it would be easy!)

We haven't managed to teach humans how to do that, so I doubt we'll
ever teach a simple standard library function to do it.

*cough*XMLHttpRequest*cough*

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Benjamin Schollnick

> I agree with everything you say. Especially the open source part. But wouldn't
> you agree that .title() with all its arbitrary specificity to appear in the
> very core of a general purpose language is quite an oddity?

No, because it book ends the issue.

Upper - Converts everything to uppercase
Lower - Converts everything to lowercase

Title - Covers the cases in-between upper/lower.  

I’ll agree that if title was to respect language definitions, that there would 
be a problem here…  But it specifically states otherwise.

And as I mentioned the sheer amount of work that would be needed would probably 
cover a phd dissertation or more…  It’s a huge problem set to respect one 
language, let alone all languages.  

So the only answer is to not respect the languages, and leave that up to a 
later implementation or for someone else to assist in adding in support.

Heck, how do we prevent it from titlecasing abbreviations?  (This is plain text 
not XML….  If it was XML it would be easy!)

- Benjamin


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Paul Bryan
The topic of titles is complex, and would be significant undertaking to
automate. It's not only highly language-dependent, it's also based on
the subject work itself, and subject to guidelines of those charged
with indexing such works.

MusicBrainz guidelines:
https://wiki.musicbrainz.org/Style/Titles
https://wiki.musicbrainz.org/Style#Language_specific_guidelines

Wikipedia guidelines:
https://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(capitalization)
https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Titles#Capital_letters

Addressing such complexities isn't going to be baked into the
simplistic `str.title` method. As demonstrated by the OP, it will
almost certainly come up short, even in the simplest use case. I
suggest the best approach then is to find (or write) a module that
addresses the specific use case, not try to address such shortcomings
in `str`.

Paul

On Sun, 2021-03-21 at 23:01 +1100, Chris Angelico wrote:
> On Sun, Mar 21, 2021 at 10:31 PM Robert Latest via Python-list
>  wrote:
> > 
> > Chris Angelico wrote:
> > > On Sun, Mar 21, 2021 at 4:31 AM Robert Latest via Python-list
> > >  wrote:
> > > > 
> > > > Mats Wichmann wrote:
> > > > > The problem is that there isn't a standard for title case,
> > > > 
> > > > The problem is that we owe the very existence of the .title()
> > > > method to too
> > > > much weed being smoked during Python development. It makes
> > > > specific
> > > > assumptions about a specific use case of one specific language.
> > > > It doesn't
> > > > get more idiotic, frankly.
> > > > 
> > > 
> > > The problem is that you haven't read the documentation :) It very
> > > carefully
> > > does NOT define itself by language, and its behaviour is
> > > identical regardless
> > > of the language used.
> > 
> > The documentation says: "The algorithm uses a simple language-
> > independent
> > definition of a word as groups of consecutive letters..."
> > 
> > Yes, I get that. But the purpose it (improperly) serves only makes
> > sense in the
> > English language.
> 
> Why? Do titles not exist in other languages? Does no other language
> capitalize words in book or other titles?
> 
> ChrisA

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Chris Angelico
On Mon, Mar 22, 2021 at 2:16 AM Robert Latest via Python-list
 wrote:
>
> Chris Angelico wrote:
> > On Sun, Mar 21, 2021 at 10:31 PM Robert Latest via Python-list
> > wrote:
> >> Yes, I get that. But the purpose it (improperly) serves only makes sense in
> >> the English language.
> >
> > Why? Do titles not exist in other languages? Does no other language
> > capitalize words in book or other titles?
>
> I wonder if .title() properly capitalizes titles in any language. It doesn't 
> in
> English (nor does it purport to), so it begs the question why it is there in
> the first place. German and Spanish don't have any special capitalization 
> rules
> for titles; I don't know about any other languages.
>

It correctly title-cases a single character, as has been pointed out
already. Attempting to do this with upper() or lower() will give
incorrect results. So if you want to define language-specific rules
(maybe with a regex) for splitting into words and subwords, you can
then use title() and lower() to perform the actual changes.

Treat it as a building-block rather than as a magical "do what I want"
function, and it is incredibly useful, and in fact, is the only way to
be correct.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Robert Latest via Python-list
Chris Angelico wrote:
> On Sun, Mar 21, 2021 at 10:31 PM Robert Latest via Python-list
> wrote:
>> Yes, I get that. But the purpose it (improperly) serves only makes sense in
>> the English language.
>
> Why? Do titles not exist in other languages? Does no other language
> capitalize words in book or other titles?

I wonder if .title() properly capitalizes titles in any language. It doesn't in
English (nor does it purport to), so it begs the question why it is there in
the first place. German and Spanish don't have any special capitalization rules
for titles; I don't know about any other languages. 

robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Robert Latest via Python-list
Benjamin Schollnick wrote:
>
> I’m sorry Robert, but just because it doesn’t meet your requirements, doesn’t
> mean it’s useless.
>
> I use .title to normalize strings for data comparison, all the time.  It’s a
> perfect alternative to using .UPPER or .lower.  
>
> Right in the documentation, it specifically acknowledges .title working in
> foreign character sets. 
>
> So while I feel for the fact that it doesn’t met your requirements, please
> keep in mind, it does meet other peoples requirements.  
>
> As with everything here, it’s open source.  If you feel that there should be
> a revised version that does met your requirements create it, or gather a
> bunch of people and go the route of SCANDIR and open-source it, and petition
> that it be moved into the standard library.
>
> Since this seems to be bugging you this much, come up with a solution.  

I agree with everything you say. Especially the open source part. But wouldn't
you agree that .title() with all its arbitrary specificity to appear in the
very core of a general purpose language is quite an oddity?

robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Richard Damon
On 3/21/21 8:19 AM, Albert-Jan Roskam wrote:
>On 20 Mar 2021 23:47, Cameron Simpson  wrote:
>
>  On 20Mar2021 12:53, Sibylle Koczian  wrote:
>  >Am 20.03.2021 um 09:34 schrieb Alan Bawden:
>  >>The real reason Python strings support a .title() method is surely
>  >>because Unicode supports upper, lower, _and_ title case letters, and
>  >>tells you how to map between them. [...]
>  >>
>  >But that's exactly what he's doing, with a result which is documented,
>  >but not really satisfactory.
>
>
>This would be a good
>start: 
> https://apastyle.apa.org/style-grammar-guidelines/capitalization/title-case
>It could be locale-dependent. What I also don't like about .title() is
>that it messes up abbreviations ("Oecd")

The built in title() function is basically an intentionally 80%
solution. It handles the simple cases simply, and if you might have the
more complicated cases, you have to handle that yourself because to
specify what the 'right' answer would be is basically impossible to do
in general (because there are conflicting definitions, and some things
require context beyond what just its input provides).

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Albert-Jan Roskam
   On 20 Mar 2021 23:47, Cameron Simpson  wrote:

 On 20Mar2021 12:53, Sibylle Koczian  wrote:
 >Am 20.03.2021 um 09:34 schrieb Alan Bawden:
 >>The real reason Python strings support a .title() method is surely
 >>because Unicode supports upper, lower, _and_ title case letters, and
 >>tells you how to map between them. [...]
 >>
 >But that's exactly what he's doing, with a result which is documented,
 >but not really satisfactory.

   
   This would be a good
   start: 
https://apastyle.apa.org/style-grammar-guidelines/capitalization/title-case
   It could be locale-dependent. What I also don't like about .title() is
   that it messes up abbreviations ("Oecd")
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Chris Angelico
On Sun, Mar 21, 2021 at 10:31 PM Robert Latest via Python-list
 wrote:
>
> Chris Angelico wrote:
> > On Sun, Mar 21, 2021 at 4:31 AM Robert Latest via Python-list
> > wrote:
> >>
> >> Mats Wichmann wrote:
> >> > The problem is that there isn't a standard for title case,
> >>
> >> The problem is that we owe the very existence of the .title() method to too
> >> much weed being smoked during Python development. It makes specific
> >> assumptions about a specific use case of one specific language. It doesn't
> >> get more idiotic, frankly.
> >>
> >
> > The problem is that you haven't read the documentation :) It very carefully
> > does NOT define itself by language, and its behaviour is identical 
> > regardless
> > of the language used.
>
> The documentation says: "The algorithm uses a simple language-independent
> definition of a word as groups of consecutive letters..."
>
> Yes, I get that. But the purpose it (improperly) serves only makes sense in 
> the
> English language.

Why? Do titles not exist in other languages? Does no other language
capitalize words in book or other titles?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Benjamin Schollnick
>> The problem is that you haven't read the documentation :) It very carefully
>> does NOT define itself by language, and its behaviour is identical regardless
>> of the language used.
> 
> The documentation says: "The algorithm uses a simple language-independent
> definition of a word as groups of consecutive letters..."
> 
> Yes, I get that. But the purpose it (improperly) serves only makes sense in 
> the
> English language. Which is also the reason they called it title() and not
> capitalize_words(). Frankly, I can't think of any situation where this method
> would have any use -- in any language, including English. It is just a
> completely arbitrary feature, as would be a function that capitalizes only the
> last letter of each word.

I’m sorry Robert, but just because it doesn’t meet your requirements, doesn’t 
mean it’s useless.

I use .title to normalize strings for data comparison, all the time.  It’s a 
perfect alternative to
using .UPPER or .lower.  

Right in the documentation, it specifically acknowledges .title working in 
foreign character sets. 

So while I feel for the fact that it doesn’t met your requirements, please keep 
in mind, it does meet other peoples requirements.  

As with everything here, it’s open source.  If you feel that there should be a 
revised version that does met your requirements create it, or gather a bunch of 
people and go the route of SCANDIR and open-source it, and petition that it be 
moved into the standard library.

Since this seems to be bugging you this much, come up with a solution.  

I suspect the problem you are going to have is that in effect you’ll be 
creating a multi-language parser, even worse, you may have to add nameparsing 
into this.

Good luck.

- Benjamin



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-21 Thread Robert Latest via Python-list
Chris Angelico wrote:
> On Sun, Mar 21, 2021 at 4:31 AM Robert Latest via Python-list
> wrote:
>>
>> Mats Wichmann wrote:
>> > The problem is that there isn't a standard for title case,
>>
>> The problem is that we owe the very existence of the .title() method to too
>> much weed being smoked during Python development. It makes specific
>> assumptions about a specific use case of one specific language. It doesn't
>> get more idiotic, frankly.
>>
>
> The problem is that you haven't read the documentation :) It very carefully
> does NOT define itself by language, and its behaviour is identical regardless
> of the language used.

The documentation says: "The algorithm uses a simple language-independent
definition of a word as groups of consecutive letters..."

Yes, I get that. But the purpose it (improperly) serves only makes sense in the
English language. Which is also the reason they called it title() and not
capitalize_words(). Frankly, I can't think of any situation where this method
would have any use -- in any language, including English. It is just a
completely arbitrary feature, as would be a function that capitalizes only the
last letter of each word.

robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-20 Thread Cameron Simpson
On 20Mar2021 23:18, Jon Ribbens  wrote:
>On 2021-03-20, Cameron Simpson  wrote:
>> Not to mention that the .title method _predates_ Python's use of 
>> Unicode
>> in strings.
>
>Well, it predates Python's use of Unicode in the default string type,
>but not Python's use of Unicode in strings.
>
>https://github.com/python/cpython/commit/4c08d554b9009899780a5e003d6bbeb5413906ee

Thank you for this correction. Cheers, Cameron
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-20 Thread Jon Ribbens via Python-list
On 2021-03-20, Cameron Simpson  wrote:
> On 20Mar2021 12:53, Sibylle Koczian  wrote:
>>Am 20.03.2021 um 09:34 schrieb Alan Bawden:
>>>The real reason Python strings support a .title() method is surely
>>>because Unicode supports upper, lower, _and_ title case letters, and
>>>tells you how to map between them. [...]
>>>
>>But that's exactly what he's doing, with a result which is documented, 
>>but not really satisfactory.
>
> Not to mention that the .title method _predates_ Python's use of Unicode 
> in strings.

Well, it predates Python's use of Unicode in the default string type,
but not Python's use of Unicode in strings.

https://github.com/python/cpython/commit/4c08d554b9009899780a5e003d6bbeb5413906ee
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-20 Thread Cameron Simpson
On 20Mar2021 12:53, Sibylle Koczian  wrote:
>Am 20.03.2021 um 09:34 schrieb Alan Bawden:
>>The real reason Python strings support a .title() method is surely
>>because Unicode supports upper, lower, _and_ title case letters, and
>>tells you how to map between them. [...]
>>
>But that's exactly what he's doing, with a result which is documented, 
>but not really satisfactory.

Not to mention that the .title method _predates_ Python's use of Unicode 
in strings.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-20 Thread Alan Bawden
Sibylle Koczian  writes:

   Am 20.03.2021 um 09:34 schrieb Alan Bawden:
   >
   > When you write that code to capitalize your book titles, you should be
   > calling .title() rather than .upper() if you are doing it right.
   >
   But that's exactly what he's doing, with a result which is documented, but
   not really satisfactory.

Sorry, what I wrote was ambiguous.  What I _meant_ was that when you
replace x.title() with my_title(x) , then in the definition of my_title
you will be calling both .lower() and .title() on individual characters,
but you will probably _never_ be calling .upper().

-- 
Alan Bawden
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-20 Thread Chris Angelico
On Sun, Mar 21, 2021 at 4:31 AM Robert Latest via Python-list
 wrote:
>
> Mats Wichmann wrote:
> > The problem is that there isn't a standard for title case,
>
> The problem is that we owe the very existence of the .title() method to too
> much weed being smoked during Python development. It makes specific 
> assumptions
> about a specific use case of one specific language. It doesn't get more
> idiotic, frankly.
>

The problem is that you haven't read the documentation :) It very
carefully does NOT define itself by language, and its behaviour is
identical regardless of the language used. Notably, it doesn't care
what script you're using, and will happily upper/lowercase letters in
a variety of different scripts, even in a single string.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-20 Thread Robert Latest via Python-list
Mats Wichmann wrote:
> The problem is that there isn't a standard for title case,

The problem is that we owe the very existence of the .title() method to too
much weed being smoked during Python development. It makes specific assumptions
about a specific use case of one specific language. It doesn't get more
idiotic, frankly.

robert
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-20 Thread Sibylle Koczian

Am 20.03.2021 um 09:34 schrieb Alan Bawden:

The real reason Python strings support a .title() method is surely
because Unicode supports upper, lower, _and_ title case letters, and
tells you how to map between them.  Consider:

>>> '\u01f1'.upper()
'\u01f1'

This is the "DZ" character.

>>> '\u01f1'.lower()
'\u01f3'

This is the "dz" character.

>>> '\u01f1'.title()
'\u01f2'

This is the "Dz" character.

When you write that code to capitalize your book titles, you should be
calling .title() rather than .upper() if you are doing it right.

But that's exactly what he's doing, with a result which is documented, 
but not really satisfactory.



--
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-20 Thread David Kolovratník
On Sat, Mar 20, 2021 at 04:34:02AM -0400, Alan Bawden wrote:
> The real reason Python strings support a .title() method is surely
> because Unicode supports upper, lower, _and_ title case letters, and
> tells you how to map between them.  Consider:
> 
>>>> '\u01f1'.upper()
>'\u01f1'
> 
> This is the "DZ" character.
> 
>>>> '\u01f1'.lower()
>'\u01f3'
> 
> This is the "dz" character.
> 
>>>> '\u01f1'.title()
>'\u01f2'
> 
> This is the "Dz" character.
> 
> When you write that code to capitalize your book titles, you should be
> calling .title() rather than .upper() if you are doing it right.
It would be great to read this reasoning in the documentation.

Cheers,
David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-20 Thread Alan Bawden
The real reason Python strings support a .title() method is surely
because Unicode supports upper, lower, _and_ title case letters, and
tells you how to map between them.  Consider:

   >>> '\u01f1'.upper()
   '\u01f1'

This is the "DZ" character.

   >>> '\u01f1'.lower()
   '\u01f3'

This is the "dz" character.

   >>> '\u01f1'.title()
   '\u01f2'

This is the "Dz" character.

When you write that code to capitalize your book titles, you should be
calling .title() rather than .upper() if you are doing it right.

-- 
Alan Bawden
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-20 Thread Abdur-Rahmaan Janhangeer
Greetings,

Missing the 'S renders in my opinion the method unfit to be included
in it's current form.

It is a call to improve it if possible. We wonder why in the first place
such a
method exists. If indeed it intends to capitalise all first letters, putting
others in lowercase, the ' is a too common case to be ignored.

The intent was:

"Return a titlecased version of the string where words start with an
uppercase
character and the remaining characters are lowercase."

But the ' is a case acknowledged by the docs:

"The algorithm uses a simple language-independent definition of a word as
groups of consecutive letters. The definition works in many contexts but it
means that apostrophes in contractions and possessives form word
boundaries,
which may not be the desired result:
...
A workaround for apostrophes can be constructed using regular expressions:
..."

Why not do it if you know people miss it?

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Cameron Simpson
On 19Mar2021 23:47, Abdur-Rahmaan Janhangeer  wrote:
>At least i'd expect what it pretends to do
>even if not following English.
>
>Missing ' is a weird behaviour, i get it
>they skipped every non lettet

I think the lesson here is that .title() doesn't even do English 
language title capitalisation, and you shouldn't expect it to. That's a 
context dependent vocabulary dependent thing and there's plenty of 
variations on how you might like to do it. So expecting a naive context 
free method to do what you want is unwise.

The .title method does something which resembles English capitalisation 
in simple cases. Rather than naming it .simplistic_if_you_are_lucky it 
was named .title because in some simple cases it does do what you might 
want: _read_ its specification and recognise that that is _not_ a 
natural language capable function, but s simple lexical function which 
toggles the leading character of a simple approximation of "words".

_If_ you're going to use it with natural language, its simplicity 
implies that the user needs to do some preparsing of the text to decide 
where this function can be used to get correct results.

Don't get hung up that it didn't do what you want, recognise that it 
does something simple and work with that limitation. Or make your own, 
likely as part of a more complex library with deeper understanding of 
language.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Terry Reedy

On 3/19/2021 6:17 PM, Thomas Jollans wrote:

 From a quick scan of my (medium-sized) bookshelf, most publishers seem 
to agree that the thing to do is set the title in all caps.


In my quick perusal, this is more true of 'popular' works, whereas 
'academic' work are more likely to use titlecase.  The title on the 
(optional) inner flyleaf and title page is more likely to be titlecase. 
 For the US Library of Congress catalog entry, the convention seems to 
be capitalize first word and proper names, like 'C' and 'Python', list a 
sentence, but no '.'.


Overall, the only rule is no rule.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Alan Gauld via Python-list
On 19/03/2021 19:58, Dan Stromberg wrote:

> In high school, I was taught that English has multiple capitalization
> rulesets to choose among for titles.
> 

And, of course, there are multiple forms of English.
English English is very different from US English.

And even in the UK there are (a few, minor) differences between
usage in Scotland compared to England, for example.

Trying to write programming languages to follow natural
language semantics and grammar is an exercise in frustration.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Chris Angelico
On Sat, Mar 20, 2021 at 9:19 AM Thomas Jollans  wrote:
>  From a quick scan of my (medium-sized) bookshelf, most publishers seem
> to agree that the thing to do is set the title in all caps.

The rule also applies to mentions of titles in the middles of
sentences, such as if I were to refer to "Alice's Adventures in
Wonderland" as part of this discussion.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Thomas Jollans

On 19/03/2021 20:33, dn via Python-list wrote:

On 20/03/2021 07.49, Grant Edwards wrote:

On 2021-03-19, MRAB  wrote:

You want English "man's" to become "Man's", but French "l'homme" to
become "L'Homme". It's language-dependant.

In English, certain words are not capitalized in titles unless they're
the first word in the title (short articles and prepositions), and
.title() doesn't get that right either:


"the man in the grey flannel suit".title()

'The Man In The Grey Flannel Suit'

should be

'The Man in the Grey Flannel Suit'


To be fair, aren't book-titles* a (formalised) sub-set of the English
language?


From a quick scan of my (medium-sized) bookshelf, most publishers seem 
to agree that the thing to do is set the title in all caps. Of the few 
(English-language) books I have with any lower-case letters on the spine 
at all, most seem to follow the same general sort of rules for what 
words to capitalize, but ‘Last Chance To See’, capital T, by Douglas 
Adams, is one exception.


Of the others, I noticed that ‘The Life and Times Times of The 
Thunderbolt Kid’ by Bill Bryson has an interesting choice of 
capitalization which is perfectly logical but certainly not the only option.




--
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread dn via Python-list
On 20/03/2021 06.17, Chris Angelico wrote:
> On Sat, Mar 20, 2021 at 4:01 AM Abdur-Rahmaan Janhangeer
>  wrote:
>>
>> It's about unnecessary capitalisation for a common use case
>> in English.
>>
>> You can see it in action on my site:
>> https://www.compileralchemy.com/#articles
>>
>> see 24.
>>
> 
> If you want something that's designed for English, get something
> that's designed for English. The string method is deliberately
> language-agnostic and simple.


There is an (unintended) 'psychological trap' here - that the
documentation is in English, but that the feature's application is
language-agnostic.


Perhaps the bigger trap is "SODD"! (Stack Overflow-Driven Development -
dn patent pending) Specifically, from where do we learn, the
authoritative (or otherwise) nature of our source(s), and how effective
that learning-process?

Recently, my grumpy-old-man status rose a notch, when I received a
'Finxter' email-advertisement offering a course in "all" of the Python
built-in functions. The advert claimed that this new offering was the
first time such a course& had been offered. Quite aside from such
teaching being contrary to theories of learning (other than rote), IMHO,
this claim seemed specious. A casual web-search rapidly revealed the
falsehood, by reminding me of a Udemy offering& and Techvidian's& -
amongst others. NB such evidence making the advertisement illegal in
this jurisdiction (albeit not many others)!

Applying such concerns to our str.title() conversation, a quick-fire
web-search quickly offers W3School's entry&, which is typical of the
genre (that I have conflated under the title "SODD").

If you'd care to repeat the experiment (& refs below), you will find
that this web-page (and its ilk) offer a quick way to remind oneself of
the syntax and purpose of the function, as needed. However, its brief
description is insufficient for learning - and totally-inappropriate
when it comes to helping the OP! The 'psychological trap' inherent in
these is the fallacy of apparent sufficiency - that the 'answer' might
be complete, and therefore the impression that there is nothing left to
learn. Whither their value?


Compare that/them with the Python documentation& (per @Paul's post). Not
only do we have the 'quick reminder' capability, but also the warning -
and further a sample work-around. Here we have "authority" and
"completeness" (and an open-source philosophy of enabling improvement).

The 'docs' team work very hard on our behalf. Not only do they deserve
our 'support' as readers, but such is also a more worthwhile investment
in our own learning!


BTW when programming, I will keep the Python Documentation web-page as
an open tab in my web-browser, precisely to facilitate such rapid
look-ups/reminders. Although, these days, editors/IDEs probably satisfy
the majority of such needs. In addition, the DuckDuckGo search engine
offers a "bang lookup" (short-cut) to the Python docs search page, ie
"!py .title()" realises
"https://docs.python.org/3/search.html?q=.title()" and thus gives
pointers to str.title() plus the bytes and turtle methods of the same
name - additionally to str.istitle() which is the inspection 'companion'
to the 'do it' function we've been discussing!

As many GPS-users have found to their cost, placing your reliance upon a
tool whose objective is 'convenience', can lead you "down the garden path"&!


& Web.Refs:
https://academy.finxter.com/university/python-built-in-functions-every-python-coder-must-know/
(https://www.udemy.com/course/the-python-built-in-function-tutorial-series/)
(https://techvidvan.com/tutorials/python-built-in-functions/)
https://www.w3schools.com/python/ref_string_title.asp
https://docs.python.org/3.9/library/stdtypes.html#str.title
https://idioms.thefreedictionary.com/lead+down+the+garden+path
-- 
Regards,
=dn
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Dan Stromberg
On Fri, Mar 19, 2021 at 11:51 AM Grant Edwards 
wrote:

> On 2021-03-19, MRAB  wrote:
> > On 2021-03-19 17:19, Abdur-Rahmaan Janhangeer wrote:
> >> Aie sorry,
> >>
> >> Did not know it targetted the non-english speakers.
> >>
> > You want English "man's" to become "Man's", but French "l'homme" to
> > become "L'Homme". It's language-dependant.
>
> In English, certain words are not capitalized in titles unless they're
> the first word in the title (short articles and prepositions), and
> .title() doesn't get that right either:
>
> >>> "the man in the grey flannel suit".title()
> 'The Man In The Grey Flannel Suit'
>
> should be
>
> 'The Man in the Grey Flannel Suit'
>

In high school, I was taught that English has multiple capitalization
rulesets to choose among for titles.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
At least i'd expect what it pretends to do
even if not following English.

Missing ' is a weird behaviour, i get it
they skipped every non lettet


On Fri, 19 Mar 2021, 22:50 Grant Edwards,  wrote:

>
> In English, certain words are not capitalized in titles unless they're
> the first word in the title (short articles and prepositions), and
> .title() doesn't get that right either:
>
> >>> "the man in the grey flannel suit".title()
> 'The Man In The Grey Flannel Suit'
>
> should be
>
> 'The Man in the Grey Flannel Suit'
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Mats Wichmann

On 3/19/21 12:49 PM, Grant Edwards wrote:

On 2021-03-19, MRAB  wrote:

On 2021-03-19 17:19, Abdur-Rahmaan Janhangeer wrote:

Aie sorry,

Did not know it targetted the non-english speakers.


You want English "man's" to become "Man's", but French "l'homme" to
become "L'Homme". It's language-dependant.


In English, certain words are not capitalized in titles unless they're
the first word in the title (short articles and prepositions), and
.title() doesn't get that right either:


"the man in the grey flannel suit".title()

'The Man In The Grey Flannel Suit'

should be

'The Man in the Grey Flannel Suit'


The problem is that there isn't a standard for title case, which I 
understood to be specifically for English, following the rules that 
Grant mentions ("certain words are not capitalized"), but on looking it 
up it turns out that the various style guides (some of which we don't 
get to mention here without stirring up controversy) each have their own 
interpretations of what it is.  And Python doesn't do any of those: 
Python does what is documented.


So possibly the choice to call it titlecase is the source of the confusion?
--
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread dn via Python-list
On 20/03/2021 07.49, Grant Edwards wrote:
> On 2021-03-19, MRAB  wrote:
>> You want English "man's" to become "Man's", but French "l'homme" to 
>> become "L'Homme". It's language-dependant.
> 
> In English, certain words are not capitalized in titles unless they're
> the first word in the title (short articles and prepositions), and
> .title() doesn't get that right either:
> 
 "the man in the grey flannel suit".title()
> 'The Man In The Grey Flannel Suit'
> 
> should be
> 
> 'The Man in the Grey Flannel Suit'


To be fair, aren't book-titles* a (formalised) sub-set of the English
language?

https://www.librarianshipstudies.com/2018/12/anglo-american-cataloguing-rules-aacr.html

* plays, movies, ...

See also people's/family-names which have been anglicised or
transliterated...
-- 
Regards,
=dn
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Grant Edwards
On 2021-03-19, MRAB  wrote:
> On 2021-03-19 17:19, Abdur-Rahmaan Janhangeer wrote:
>> Aie sorry,
>> 
>> Did not know it targetted the non-english speakers.
>> 
> You want English "man's" to become "Man's", but French "l'homme" to 
> become "L'Homme". It's language-dependant.

In English, certain words are not capitalized in titles unless they're
the first word in the title (short articles and prepositions), and
.title() doesn't get that right either:

>>> "the man in the grey flannel suit".title()
'The Man In The Grey Flannel Suit'

should be

'The Man in the Grey Flannel Suit'

-- 
Grant Edwards   grant.b.edwardsYow! I'm definitely not
  at   in Omaha!
  gmail.com

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: .title() - annoying mistake

2021-03-19 Thread Schachner, Joseph
I agree.  If the documentation notes this issue, and the (possibly new) Python 
user has to replace the .title() with a different function that uses regular 
expression and a lambda function to work around the issue, then perhaps it's 
time for a proposal to address this.  Perhaps there needs to be an optional 
argument to .title() which if supplied should tell it do the workaround.   Or, 
perhaps better, only capitalize the first word and subsequent words that are 
preceded by a white space.  That should solve "Someone's Apostrophe" and 
"Hyphenated-expressions". Someone who looks into this should check if the 
second part of a hyphenated expression needs to be capitalized. 

--- Joseph S.   


Teledyne Confidential; Commercially Sensitive Business Data

-Original Message-
From: Abdur-Rahmaan Janhangeer  
Sent: Friday, March 19, 2021 11:02 AM
To: Paul Bryan 
Cc: Python 
Subject: Re: .title() - annoying mistake

Thanks very much!

That's annoying. You have to roll your own solution!

Kind Regards,

Abdur-Rahmaan Janhangeer
about <https://compileralchemy.github.io/> | blog 
<https://www.pythonkitchen.com> github <https://github.com/Abdur-RahmaanJ>
Mauritius

>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
On Fri, 19 Mar 2021, 22:07 MRAB,  wrote:

> You want English "man's" to become "Man's", but French "l'homme" to
> become "L'Homme". It's language-dependant.
>

Ah depends on a language (English i guess).

Thanks

>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread MRAB

On 2021-03-19 17:19, Abdur-Rahmaan Janhangeer wrote:

Aie sorry,

Did not know it targetted the non-english speakers.

You want English "man's" to become "Man's", but French "l'homme" to 
become "L'Homme". It's language-dependant.

--
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Chris Angelico
On Sat, Mar 20, 2021 at 4:46 AM Karen Shaeffer via Python-list
 wrote:
>
>
>
> > On Mar 19, 2021, at 9:42 AM, Grant Edwards  
> > wrote:
> >
> > On 2021-03-19, Skip Montanaro  wrote:
> >>>
> >>> That's annoying. You have to roll your own solution!
> >>>
> >>
> >> Certainly seems like a known issue:
> >>
> >> https://bugs.python.org/issue12737
> >
> > While that is an issue with string.title(), I don't see how it's
> > related to what the OP is reporting. Issue 12737 is about Unicode
> > combining marks.
>
> Hi,
> I’ve been frustrated by my experiences processing unstructured multilingual 
> text with python. I’ve always assumed this was due to my insufficient 
> experience with python (3) text processing. I’ve recently begun coding with 
> Go. (I also continue to code in Python) And Go has exceptionally crisp and 
> clear capacity to process unstructured multilingual utf-8 encoded text.
>
> In just a few days of working with text processing in Go, using the book “The 
> Go Programming Language” by Donovan and Kernighan, along with the Go language 
> specification and other free online help, I have acquired a clear and crisp 
> understanding of how to work effectively with unstructured, multilingual 
> utf-8 encoded text (and emojis) and any unicode code point — even invalid 
> unicode code points.
>
> To see some of these issues first hand, write a palindrome detector that 
> works with any sequence of utf-8 encoded code points, including invalid code 
> points. I’m sure it can be done in python, although I’ve not done it. It’s a 
> trivial exercise in Go.
>
> I’m not bashing Python here. I will continue to code with python. Its an 
> exceptional language and community. Just commenting on my experience.
>

Python doesn't work with UTF-8 encoded code points; it works with
Unicode code points. Are you looking for something that checks whether
something is a palindrome, or locates palindromes within it?

def is_palindrome(txt):
return txt == txt[::-1]

Easy.

Efficiently finding substring palindromes would be a bit harder, but
that'd be true even if you restricted it to ASCII. The advantage of
Python's way of doing it is that, if you have a method that would work
with ASCII bytes, the exact same thing will work with a Unicode
string.

There's another big wrinkle not touched here, and that's what to do
with combining characters. Python makes it easy to normalize text as
much as is possible, and an NFC normalization would help a lot, but
it's not going to do everything. So you may want to first define a
proper way to split a string into whatever you're defining a character
to be, and that's a very difficult problem, regardless of programming
language. For example, Arabic text changes in visual shape when
letters are next to each other, and Greek has two different forms for
the letter sigma (U+03C2 and U+03C3) - should those distinctions
affect palindromminess? What about ligatures - is U+FB01 "fi" a single
character, or should it be matched by "if" on the other end?

What part of this is trivial in Go?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Karen Shaeffer via Python-list


> On Mar 19, 2021, at 9:42 AM, Grant Edwards  wrote:
> 
> On 2021-03-19, Skip Montanaro  wrote:
>>> 
>>> That's annoying. You have to roll your own solution!
>>> 
>> 
>> Certainly seems like a known issue:
>> 
>> https://bugs.python.org/issue12737
> 
> While that is an issue with string.title(), I don't see how it's
> related to what the OP is reporting. Issue 12737 is about Unicode
> combining marks.

Hi,
I’ve been frustrated by my experiences processing unstructured multilingual 
text with python. I’ve always assumed this was due to my insufficient 
experience with python (3) text processing. I’ve recently begun coding with Go. 
(I also continue to code in Python) And Go has exceptionally crisp and clear 
capacity to process unstructured multilingual utf-8 encoded text.

In just a few days of working with text processing in Go, using the book “The 
Go Programming Language” by Donovan and Kernighan, along with the Go language 
specification and other free online help, I have acquired a clear and crisp 
understanding of how to work effectively with unstructured, multilingual utf-8 
encoded text (and emojis) and any unicode code point — even invalid unicode 
code points.

To see some of these issues first hand, write a palindrome detector that works 
with any sequence of utf-8 encoded code points, including invalid code points. 
I’m sure it can be done in python, although I’ve not done it. It’s a trivial 
exercise in Go.

I’m not bashing Python here. I will continue to code with python. Its an 
exceptional language and community. Just commenting on my experience.

humbly,
Karen

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
Aie sorry,

Did not know it targetted the non-english speakers.

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Chris Angelico
On Sat, Mar 20, 2021 at 4:01 AM Abdur-Rahmaan Janhangeer
 wrote:
>
> It's about unnecessary capitalisation for a common use case
> in English.
>
> You can see it in action on my site:
> https://www.compileralchemy.com/#articles
>
> see 24.
>

If you want something that's designed for English, get something
that's designed for English. The string method is deliberately
language-agnostic and simple.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
It's about unnecessary capitalisation for a common use case
in English.

You can see it in action on my site:
https://www.compileralchemy.com/#articles

see 24.

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Grant Edwards
On 2021-03-19, Skip Montanaro  wrote:
>>
>> That's annoying. You have to roll your own solution!
>>
>
> Certainly seems like a known issue:
>
> https://bugs.python.org/issue12737

While that is an issue with string.title(), I don't see how it's
related to what the OP is reporting. Issue 12737 is about Unicode
combining marks. The OP's problem is related to the apostrophe used to
form a possessive.

-- 
Grant Edwards   grant.b.edwardsYow! Civilization is fun!
  at   Anyway, it keeps me busy!!
  gmail.com

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Skip Montanaro
>
> That's annoying. You have to roll your own solution!
>

Certainly seems like a known issue:

https://bugs.python.org/issue12737

That issue was opened in 2011.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
Thanks very much!

That's annoying. You have to roll your own solution!

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius

>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Paul Bryan
From https://docs.python.org/3.9/library/stdtypes.html#str.title:

> The algorithm uses a simple language-independent definition of a word
> as groups of consecutive letters. The definition works in many
> contexts but it means that apostrophes in contractions and
> possessives form word boundaries, which may not be the desired result

The link above includes a workaround for apostrophes.

Paul

On Fri, 2021-03-19 at 18:43 +0400, Abdur-Rahmaan Janhangeer wrote:
> Greetings list,
> 
> See this:
> 
> > > > "Python's usage".title()
> "Python'S Usage"
> 
> It should have been Python's Usage
> 
> Why capitalise the S?
> 
> Kind Regards,
> 
> Abdur-Rahmaan Janhangeer
> about  | blog
> 
> github 
> Mauritius

-- 
https://mail.python.org/mailman/listinfo/python-list


.title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
Greetings list,

See this:

>>> "Python's usage".title()
"Python'S Usage"

It should have been Python's Usage

Why capitalise the S?

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
-- 
https://mail.python.org/mailman/listinfo/python-list