Dave,

Your goal is to compare titles and there can be endless replacements needed if 
you allow the text to contain anything but ASCII.

Have you considered stripping out things instead? I mean remove lots of stuff 
that is not ASCII in the first place and perhaps also remove lots of extra 
punctuation likesingle quotes or question marks or redundant white space and 
compare the sort of skeletons of the two? 

And even if that fails, could you have a measure of how different they are and 
tolerate if they were say off by one letter albeit "My desert" matching "My 
Dessert" might not be a valid match with one being a song about an arid 
environment and the other about food you don't need!

Your seemingly simple need can expand into a fairly complex project. There may 
be many ideas on how to deal with it but not anything perfect enough to catch 
all cases as even a trained human may have to make decisions at times and not 
match what other humans do. We have examples like the TV show "NUMB3RS" that 
used a perfectly valid digit 3 to stand for an "E" but yet is often written 
when I look it up as NUMBERS. You have obvious cases where titles of songs may 
contain composite symbols like "œ" which will not compare to one where it is 
written out as "oe" so the idea of comparing is quite complex and the best you 
might do is heuristic.

UNICODE has many symbols that are almost the same or even look the same or 
maybe in one font versus another. There are libraries of functions that allow 
some kinds of comparisons or conversions that you could look into but the gain 
for you may not be worth it. Nothing stops a person from naming a song any way 
they want and I speak many languages and often see a song re-titled in the 
local language and using the local alphabet mixed often with another.

Your original question is perhaps now many questions, depending on what you 
choose. You started by wanting to know how to compare and it is moving on to 
how to delete parts or make substitutions or use regular expressions and it can 
get worse. You can, for example, take a string and identify the words within it 
and create a regular expression that inserts sequences between the words that 
match any zero or one or more non-word characters such as spaces, tabs, 
punctuation or non-ASCII, so that song titles with the same words in a sequence 
match no matter what is between them. The possibilities are endless but 
consider some of the techniques that are used by some programs that parse text 
and suggest alternate spellings  or even programs like Google Translate that 
can take a sentence and then suggest you may mean a slightly altered sentence 
with one word changed to fit better. 

You need to decide what you want to deal with and what will be mis-classified 
by your program. Some of us have suggested folding the case of the words but 
that means asong about a dark skinned person in Poland called "Black Polish" 
would match a song about keeping your shoes dark with "black polish" so I keep 
repeating it is very hard or frankly impossible, to catch every case I can 
imagine and the many I can't!

But the emphasis here is not your overall problem. It is about whether and how 
the computer language called python, and perhaps some add-on modules, can be 
used to solve each smaller need such as recognizing a pattern or replacing 
text. It can do quite a bit but only when the specification of the problem is 
exact. 




-----Original Message-----
From: Dave <d...@looktowindward.com>
To: python-list@python.org
Sent: Wed, Jun 8, 2022 5:09 am
Subject: Re: How to replace characters in a string?

Hi,

Thanks for this! 

So, is there a copy function/method that returns a MutableString like in 
objective-C? I’ve solved this problems before in a number of languages like 
Objective-C and AppleScript.

Basically there is a set of common characters that need “normalizing” and I 
have a method that replaces them in a string, so:

myString = [myString normalizeCharacters];

Would return a new string with all the “common” replacements applied.

Since the following gives an error :

myString = 'Hello'
myNewstring = myString.replace(myString,'e','a’)

TypeError: 'str' object cannot be interpreted as an integer

I can’t see of a way to do this in Python? 

All the Best
Dave


> On 8 Jun 2022, at 10:14, Chris Angelico <ros...@gmail.com> wrote:
> 
> On Wed, 8 Jun 2022 at 18:12, Dave <d...@looktowindward.com> wrote:
> 
>> I tried the but it doesn’t seem to work?
>> myCompareFile1 = ascii(myTitleName)
>> myCompareFile1.replace("\u2019", "'")
> 
> Strings in Python are immutable. When you call ascii(), you get back a
> new string, but it's one that has actual backslashes and such in it.
> (You probably don't need this step, other than for debugging; check
> the string by printing out the ASCII version of it, but stick to the
> original for actual processing.) The same is true of the replace()
> method; it doesn't change the string, it returns a new string.
> 
>>>> word = "spam"
>>>> print(word.replace("sp", "h"))
> ham
>>>> print(word)
> spam
> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to