Re: .title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
Greetings,

Missing the 'S renders in my opinion the method unfit to be included
in it's current form.

It is a call to improve it if possible. We wonder why in the first place
such a
method exists. If indeed it intends to capitalise all first letters, putting
others in lowercase, the ' is a too common case to be ignored.

The intent was:

"Return a titlecased version of the string where words start with an
uppercase
character and the remaining characters are lowercase."

But the ' is a case acknowledged by the docs:

"The algorithm uses a simple language-independent definition of a word as
groups of consecutive letters. The definition works in many contexts but it
means that apostrophes in contractions and possessives form word
boundaries,
which may not be the desired result:
...
A workaround for apostrophes can be constructed using regular expressions:
..."

Why not do it if you know people miss it?

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Cameron Simpson
On 19Mar2021 23:47, Abdur-Rahmaan Janhangeer  wrote:
>At least i'd expect what it pretends to do
>even if not following English.
>
>Missing ' is a weird behaviour, i get it
>they skipped every non lettet

I think the lesson here is that .title() doesn't even do English 
language title capitalisation, and you shouldn't expect it to. That's a 
context dependent vocabulary dependent thing and there's plenty of 
variations on how you might like to do it. So expecting a naive context 
free method to do what you want is unwise.

The .title method does something which resembles English capitalisation 
in simple cases. Rather than naming it .simplistic_if_you_are_lucky it 
was named .title because in some simple cases it does do what you might 
want: _read_ its specification and recognise that that is _not_ a 
natural language capable function, but s simple lexical function which 
toggles the leading character of a simple approximation of "words".

_If_ you're going to use it with natural language, its simplicity 
implies that the user needs to do some preparsing of the text to decide 
where this function can be used to get correct results.

Don't get hung up that it didn't do what you want, recognise that it 
does something simple and work with that limitation. Or make your own, 
likely as part of a more complex library with deeper understanding of 
language.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Terry Reedy

On 3/19/2021 6:17 PM, Thomas Jollans wrote:

 From a quick scan of my (medium-sized) bookshelf, most publishers seem 
to agree that the thing to do is set the title in all caps.


In my quick perusal, this is more true of 'popular' works, whereas 
'academic' work are more likely to use titlecase.  The title on the 
(optional) inner flyleaf and title page is more likely to be titlecase. 
 For the US Library of Congress catalog entry, the convention seems to 
be capitalize first word and proper names, like 'C' and 'Python', list a 
sentence, but no '.'.


Overall, the only rule is no rule.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Alan Gauld via Python-list
On 19/03/2021 19:58, Dan Stromberg wrote:

> In high school, I was taught that English has multiple capitalization
> rulesets to choose among for titles.
> 

And, of course, there are multiple forms of English.
English English is very different from US English.

And even in the UK there are (a few, minor) differences between
usage in Scotland compared to England, for example.

Trying to write programming languages to follow natural
language semantics and grammar is an exercise in frustration.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Chris Angelico
On Sat, Mar 20, 2021 at 9:19 AM Thomas Jollans  wrote:
>  From a quick scan of my (medium-sized) bookshelf, most publishers seem
> to agree that the thing to do is set the title in all caps.

The rule also applies to mentions of titles in the middles of
sentences, such as if I were to refer to "Alice's Adventures in
Wonderland" as part of this discussion.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Thomas Jollans

On 19/03/2021 20:33, dn via Python-list wrote:

On 20/03/2021 07.49, Grant Edwards wrote:

On 2021-03-19, MRAB  wrote:

You want English "man's" to become "Man's", but French "l'homme" to
become "L'Homme". It's language-dependant.

In English, certain words are not capitalized in titles unless they're
the first word in the title (short articles and prepositions), and
.title() doesn't get that right either:


"the man in the grey flannel suit".title()

'The Man In The Grey Flannel Suit'

should be

'The Man in the Grey Flannel Suit'


To be fair, aren't book-titles* a (formalised) sub-set of the English
language?


From a quick scan of my (medium-sized) bookshelf, most publishers seem 
to agree that the thing to do is set the title in all caps. Of the few 
(English-language) books I have with any lower-case letters on the spine 
at all, most seem to follow the same general sort of rules for what 
words to capitalize, but ‘Last Chance To See’, capital T, by Douglas 
Adams, is one exception.


Of the others, I noticed that ‘The Life and Times Times of The 
Thunderbolt Kid’ by Bill Bryson has an interesting choice of 
capitalization which is perfectly logical but certainly not the only option.




--
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread dn via Python-list
On 20/03/2021 06.17, Chris Angelico wrote:
> On Sat, Mar 20, 2021 at 4:01 AM Abdur-Rahmaan Janhangeer
>  wrote:
>>
>> It's about unnecessary capitalisation for a common use case
>> in English.
>>
>> You can see it in action on my site:
>> https://www.compileralchemy.com/#articles
>>
>> see 24.
>>
> 
> If you want something that's designed for English, get something
> that's designed for English. The string method is deliberately
> language-agnostic and simple.


There is an (unintended) 'psychological trap' here - that the
documentation is in English, but that the feature's application is
language-agnostic.


Perhaps the bigger trap is "SODD"! (Stack Overflow-Driven Development -
dn patent pending) Specifically, from where do we learn, the
authoritative (or otherwise) nature of our source(s), and how effective
that learning-process?

Recently, my grumpy-old-man status rose a notch, when I received a
'Finxter' email-advertisement offering a course in "all" of the Python
built-in functions. The advert claimed that this new offering was the
first time such a course& had been offered. Quite aside from such
teaching being contrary to theories of learning (other than rote), IMHO,
this claim seemed specious. A casual web-search rapidly revealed the
falsehood, by reminding me of a Udemy offering& and Techvidian's& -
amongst others. NB such evidence making the advertisement illegal in
this jurisdiction (albeit not many others)!

Applying such concerns to our str.title() conversation, a quick-fire
web-search quickly offers W3School's entry&, which is typical of the
genre (that I have conflated under the title "SODD").

If you'd care to repeat the experiment (& refs below), you will find
that this web-page (and its ilk) offer a quick way to remind oneself of
the syntax and purpose of the function, as needed. However, its brief
description is insufficient for learning - and totally-inappropriate
when it comes to helping the OP! The 'psychological trap' inherent in
these is the fallacy of apparent sufficiency - that the 'answer' might
be complete, and therefore the impression that there is nothing left to
learn. Whither their value?


Compare that/them with the Python documentation& (per @Paul's post). Not
only do we have the 'quick reminder' capability, but also the warning -
and further a sample work-around. Here we have "authority" and
"completeness" (and an open-source philosophy of enabling improvement).

The 'docs' team work very hard on our behalf. Not only do they deserve
our 'support' as readers, but such is also a more worthwhile investment
in our own learning!


BTW when programming, I will keep the Python Documentation web-page as
an open tab in my web-browser, precisely to facilitate such rapid
look-ups/reminders. Although, these days, editors/IDEs probably satisfy
the majority of such needs. In addition, the DuckDuckGo search engine
offers a "bang lookup" (short-cut) to the Python docs search page, ie
"!py .title()" realises
"https://docs.python.org/3/search.html?q=.title()" and thus gives
pointers to str.title() plus the bytes and turtle methods of the same
name - additionally to str.istitle() which is the inspection 'companion'
to the 'do it' function we've been discussing!

As many GPS-users have found to their cost, placing your reliance upon a
tool whose objective is 'convenience', can lead you "down the garden path"&!


& Web.Refs:
https://academy.finxter.com/university/python-built-in-functions-every-python-coder-must-know/
(https://www.udemy.com/course/the-python-built-in-function-tutorial-series/)
(https://techvidvan.com/tutorials/python-built-in-functions/)
https://www.w3schools.com/python/ref_string_title.asp
https://docs.python.org/3.9/library/stdtypes.html#str.title
https://idioms.thefreedictionary.com/lead+down+the+garden+path
-- 
Regards,
=dn
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Dan Stromberg
On Fri, Mar 19, 2021 at 11:51 AM Grant Edwards 
wrote:

> On 2021-03-19, MRAB  wrote:
> > On 2021-03-19 17:19, Abdur-Rahmaan Janhangeer wrote:
> >> Aie sorry,
> >>
> >> Did not know it targetted the non-english speakers.
> >>
> > You want English "man's" to become "Man's", but French "l'homme" to
> > become "L'Homme". It's language-dependant.
>
> In English, certain words are not capitalized in titles unless they're
> the first word in the title (short articles and prepositions), and
> .title() doesn't get that right either:
>
> >>> "the man in the grey flannel suit".title()
> 'The Man In The Grey Flannel Suit'
>
> should be
>
> 'The Man in the Grey Flannel Suit'
>

In high school, I was taught that English has multiple capitalization
rulesets to choose among for titles.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
At least i'd expect what it pretends to do
even if not following English.

Missing ' is a weird behaviour, i get it
they skipped every non lettet


On Fri, 19 Mar 2021, 22:50 Grant Edwards,  wrote:

>
> In English, certain words are not capitalized in titles unless they're
> the first word in the title (short articles and prepositions), and
> .title() doesn't get that right either:
>
> >>> "the man in the grey flannel suit".title()
> 'The Man In The Grey Flannel Suit'
>
> should be
>
> 'The Man in the Grey Flannel Suit'
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Mats Wichmann

On 3/19/21 12:49 PM, Grant Edwards wrote:

On 2021-03-19, MRAB  wrote:

On 2021-03-19 17:19, Abdur-Rahmaan Janhangeer wrote:

Aie sorry,

Did not know it targetted the non-english speakers.


You want English "man's" to become "Man's", but French "l'homme" to
become "L'Homme". It's language-dependant.


In English, certain words are not capitalized in titles unless they're
the first word in the title (short articles and prepositions), and
.title() doesn't get that right either:


"the man in the grey flannel suit".title()

'The Man In The Grey Flannel Suit'

should be

'The Man in the Grey Flannel Suit'


The problem is that there isn't a standard for title case, which I 
understood to be specifically for English, following the rules that 
Grant mentions ("certain words are not capitalized"), but on looking it 
up it turns out that the various style guides (some of which we don't 
get to mention here without stirring up controversy) each have their own 
interpretations of what it is.  And Python doesn't do any of those: 
Python does what is documented.


So possibly the choice to call it titlecase is the source of the confusion?
--
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread dn via Python-list
On 20/03/2021 07.49, Grant Edwards wrote:
> On 2021-03-19, MRAB  wrote:
>> You want English "man's" to become "Man's", but French "l'homme" to 
>> become "L'Homme". It's language-dependant.
> 
> In English, certain words are not capitalized in titles unless they're
> the first word in the title (short articles and prepositions), and
> .title() doesn't get that right either:
> 
 "the man in the grey flannel suit".title()
> 'The Man In The Grey Flannel Suit'
> 
> should be
> 
> 'The Man in the Grey Flannel Suit'


To be fair, aren't book-titles* a (formalised) sub-set of the English
language?

https://www.librarianshipstudies.com/2018/12/anglo-american-cataloguing-rules-aacr.html

* plays, movies, ...

See also people's/family-names which have been anglicised or
transliterated...
-- 
Regards,
=dn
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Grant Edwards
On 2021-03-19, MRAB  wrote:
> On 2021-03-19 17:19, Abdur-Rahmaan Janhangeer wrote:
>> Aie sorry,
>> 
>> Did not know it targetted the non-english speakers.
>> 
> You want English "man's" to become "Man's", but French "l'homme" to 
> become "L'Homme". It's language-dependant.

In English, certain words are not capitalized in titles unless they're
the first word in the title (short articles and prepositions), and
.title() doesn't get that right either:

>>> "the man in the grey flannel suit".title()
'The Man In The Grey Flannel Suit'

should be

'The Man in the Grey Flannel Suit'

-- 
Grant Edwards   grant.b.edwardsYow! I'm definitely not
  at   in Omaha!
  gmail.com

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: .title() - annoying mistake

2021-03-19 Thread Schachner, Joseph
I agree.  If the documentation notes this issue, and the (possibly new) Python 
user has to replace the .title() with a different function that uses regular 
expression and a lambda function to work around the issue, then perhaps it's 
time for a proposal to address this.  Perhaps there needs to be an optional 
argument to .title() which if supplied should tell it do the workaround.   Or, 
perhaps better, only capitalize the first word and subsequent words that are 
preceded by a white space.  That should solve "Someone's Apostrophe" and 
"Hyphenated-expressions". Someone who looks into this should check if the 
second part of a hyphenated expression needs to be capitalized. 

--- Joseph S.   


Teledyne Confidential; Commercially Sensitive Business Data

-Original Message-
From: Abdur-Rahmaan Janhangeer  
Sent: Friday, March 19, 2021 11:02 AM
To: Paul Bryan 
Cc: Python 
Subject: Re: .title() - annoying mistake

Thanks very much!

That's annoying. You have to roll your own solution!

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog 
 github 
Mauritius

>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
On Fri, 19 Mar 2021, 22:07 MRAB,  wrote:

> You want English "man's" to become "Man's", but French "l'homme" to
> become "L'Homme". It's language-dependant.
>

Ah depends on a language (English i guess).

Thanks

>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread MRAB

On 2021-03-19 17:19, Abdur-Rahmaan Janhangeer wrote:

Aie sorry,

Did not know it targetted the non-english speakers.

You want English "man's" to become "Man's", but French "l'homme" to 
become "L'Homme". It's language-dependant.

--
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Chris Angelico
On Sat, Mar 20, 2021 at 4:46 AM Karen Shaeffer via Python-list
 wrote:
>
>
>
> > On Mar 19, 2021, at 9:42 AM, Grant Edwards  
> > wrote:
> >
> > On 2021-03-19, Skip Montanaro  wrote:
> >>>
> >>> That's annoying. You have to roll your own solution!
> >>>
> >>
> >> Certainly seems like a known issue:
> >>
> >> https://bugs.python.org/issue12737
> >
> > While that is an issue with string.title(), I don't see how it's
> > related to what the OP is reporting. Issue 12737 is about Unicode
> > combining marks.
>
> Hi,
> I’ve been frustrated by my experiences processing unstructured multilingual 
> text with python. I’ve always assumed this was due to my insufficient 
> experience with python (3) text processing. I’ve recently begun coding with 
> Go. (I also continue to code in Python) And Go has exceptionally crisp and 
> clear capacity to process unstructured multilingual utf-8 encoded text.
>
> In just a few days of working with text processing in Go, using the book “The 
> Go Programming Language” by Donovan and Kernighan, along with the Go language 
> specification and other free online help, I have acquired a clear and crisp 
> understanding of how to work effectively with unstructured, multilingual 
> utf-8 encoded text (and emojis) and any unicode code point — even invalid 
> unicode code points.
>
> To see some of these issues first hand, write a palindrome detector that 
> works with any sequence of utf-8 encoded code points, including invalid code 
> points. I’m sure it can be done in python, although I’ve not done it. It’s a 
> trivial exercise in Go.
>
> I’m not bashing Python here. I will continue to code with python. Its an 
> exceptional language and community. Just commenting on my experience.
>

Python doesn't work with UTF-8 encoded code points; it works with
Unicode code points. Are you looking for something that checks whether
something is a palindrome, or locates palindromes within it?

def is_palindrome(txt):
return txt == txt[::-1]

Easy.

Efficiently finding substring palindromes would be a bit harder, but
that'd be true even if you restricted it to ASCII. The advantage of
Python's way of doing it is that, if you have a method that would work
with ASCII bytes, the exact same thing will work with a Unicode
string.

There's another big wrinkle not touched here, and that's what to do
with combining characters. Python makes it easy to normalize text as
much as is possible, and an NFC normalization would help a lot, but
it's not going to do everything. So you may want to first define a
proper way to split a string into whatever you're defining a character
to be, and that's a very difficult problem, regardless of programming
language. For example, Arabic text changes in visual shape when
letters are next to each other, and Greek has two different forms for
the letter sigma (U+03C2 and U+03C3) - should those distinctions
affect palindromminess? What about ligatures - is U+FB01 "fi" a single
character, or should it be matched by "if" on the other end?

What part of this is trivial in Go?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Karen Shaeffer via Python-list


> On Mar 19, 2021, at 9:42 AM, Grant Edwards  wrote:
> 
> On 2021-03-19, Skip Montanaro  wrote:
>>> 
>>> That's annoying. You have to roll your own solution!
>>> 
>> 
>> Certainly seems like a known issue:
>> 
>> https://bugs.python.org/issue12737
> 
> While that is an issue with string.title(), I don't see how it's
> related to what the OP is reporting. Issue 12737 is about Unicode
> combining marks.

Hi,
I’ve been frustrated by my experiences processing unstructured multilingual 
text with python. I’ve always assumed this was due to my insufficient 
experience with python (3) text processing. I’ve recently begun coding with Go. 
(I also continue to code in Python) And Go has exceptionally crisp and clear 
capacity to process unstructured multilingual utf-8 encoded text.

In just a few days of working with text processing in Go, using the book “The 
Go Programming Language” by Donovan and Kernighan, along with the Go language 
specification and other free online help, I have acquired a clear and crisp 
understanding of how to work effectively with unstructured, multilingual utf-8 
encoded text (and emojis) and any unicode code point — even invalid unicode 
code points.

To see some of these issues first hand, write a palindrome detector that works 
with any sequence of utf-8 encoded code points, including invalid code points. 
I’m sure it can be done in python, although I’ve not done it. It’s a trivial 
exercise in Go.

I’m not bashing Python here. I will continue to code with python. Its an 
exceptional language and community. Just commenting on my experience.

humbly,
Karen

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
Aie sorry,

Did not know it targetted the non-english speakers.

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Chris Angelico
On Sat, Mar 20, 2021 at 4:01 AM Abdur-Rahmaan Janhangeer
 wrote:
>
> It's about unnecessary capitalisation for a common use case
> in English.
>
> You can see it in action on my site:
> https://www.compileralchemy.com/#articles
>
> see 24.
>

If you want something that's designed for English, get something
that's designed for English. The string method is deliberately
language-agnostic and simple.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
It's about unnecessary capitalisation for a common use case
in English.

You can see it in action on my site:
https://www.compileralchemy.com/#articles

see 24.

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Grant Edwards
On 2021-03-19, Skip Montanaro  wrote:
>>
>> That's annoying. You have to roll your own solution!
>>
>
> Certainly seems like a known issue:
>
> https://bugs.python.org/issue12737

While that is an issue with string.title(), I don't see how it's
related to what the OP is reporting. Issue 12737 is about Unicode
combining marks. The OP's problem is related to the apostrophe used to
form a possessive.

-- 
Grant Edwards   grant.b.edwardsYow! Civilization is fun!
  at   Anyway, it keeps me busy!!
  gmail.com

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Skip Montanaro
>
> That's annoying. You have to roll your own solution!
>

Certainly seems like a known issue:

https://bugs.python.org/issue12737

That issue was opened in 2011.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
Thanks very much!

That's annoying. You have to roll your own solution!

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius

>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: .title() - annoying mistake

2021-03-19 Thread Paul Bryan
From https://docs.python.org/3.9/library/stdtypes.html#str.title:

> The algorithm uses a simple language-independent definition of a word
> as groups of consecutive letters. The definition works in many
> contexts but it means that apostrophes in contractions and
> possessives form word boundaries, which may not be the desired result

The link above includes a workaround for apostrophes.

Paul

On Fri, 2021-03-19 at 18:43 +0400, Abdur-Rahmaan Janhangeer wrote:
> Greetings list,
> 
> See this:
> 
> > > > "Python's usage".title()
> "Python'S Usage"
> 
> It should have been Python's Usage
> 
> Why capitalise the S?
> 
> Kind Regards,
> 
> Abdur-Rahmaan Janhangeer
> about  | blog
> 
> github 
> Mauritius

-- 
https://mail.python.org/mailman/listinfo/python-list


.title() - annoying mistake

2021-03-19 Thread Abdur-Rahmaan Janhangeer
Greetings list,

See this:

>>> "Python's usage".title()
"Python'S Usage"

It should have been Python's Usage

Why capitalise the S?

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
-- 
https://mail.python.org/mailman/listinfo/python-list