Re: How to test characters of a string

2022-06-08 Thread Dave
Hi,

I’ve found you also need to take care of multiple disk CD releases. These have 
a format of

“1-01 Track Name”
“2-02  Trackl Name"

Meaning Disk 1 Track1, Disk 2, Track 2.

Also A and B Sides (from Vinyl LPs)

“A1-Track Name”
“B2-Track Name”

Side A, Track 1, etc.

Cheers
Dave


> On 8 Jun 2022, at 19:36, Dennis Lee Bieber  wrote:
> 
> On Wed, 8 Jun 2022 01:53:26 + (UTC), Avi Gross 
> declaimed the following:
> 
> 
>> 
>> So is it necessary to insist on an exact pattern of two digits followed by a 
>> space? 
>> 
>> 
>> That would fail on "44 Minutes", "40 Oz. Dream", "50 Mission Cap", "50 Ways 
>> to Say Goodbye", "99 Ways to Die" 
>> 
>> It looks to me like you need to compare TWICE just in case. If it matches in 
>> the original (perhaps with some normalization of case and whitespace, fine. 
>> If not will they match if one or both have something to remove as a prefix 
>> such as "02 ". And if you are comparing items where the same song is in two 
>> different numeric sequences on different disks, ...
> 
>   I suspect the OP really needs to extract the /track number/ from the
> ID3 information, and (converting to a 2digit formatted string) see if the
> file name begins with that track number... The format of the those
> filenames appear to be those generated by some software when ripping CDs to
> MP3s -- for example:
> 
> -=-=-
> c:\Music\Roger Miller\All Time Greatest Hits>dir
> Volume in drive C is OS
> Volume Serial Number is 4ACC-3CB4
> 
> Directory of c:\Music\Roger Miller\All Time Greatest Hits
> 
> 04/11/2022  05:06 PM  .
> 04/11/2022  05:06 PM  ..
> 07/26/2018  11:20 AM 4,493,279 01 Dang Me.mp3
> 07/26/2018  11:20 AM 5,072,414 02 Chug-A-Lug.mp3
> 07/26/2018  11:20 AM 4,275,844 03 Do-Wacka-Do.mp3
> 07/26/2018  11:20 AM 4,284,208 04 In the Summertime.mp3
> 07/26/2018  11:20 AM 6,028,730 05 King of the Road.mp3
> 07/26/2018  11:20 AM 4,662,182 06 You Can't Roller Skate in a
> Buffalo Herd.mp3
> 07/26/2018  11:20 AM 5,624,704 07 Engine, Engine #9.mp3
> 07/26/2018  11:20 AM 5,002,492 08 One Dyin' and a Buryin'.mp3
> 07/26/2018  11:21 AM 6,799,224 09 Last Word in Lonesome Is Me.mp3
> 07/26/2018  11:21 AM 5,637,230 10 Kansas City Star.mp3
> 07/26/2018  11:21 AM 4,656,910 11 England Swings.mp3
> 07/26/2018  11:21 AM 5,836,638 12 Husbands and Wives.mp3
> 07/26/2018  11:21 AM 5,470,216 13 I've Been a Long Time Leavin'.mp3
> 07/26/2018  11:21 AM 6,230,236 14 Walkin' in the Sunshine.mp3
> 07/26/2018  11:21 AM 6,416,060 15 Little Green Apples.mp3
> 07/26/2018  11:21 AM 9,794,442 16 Me and Bobby McGee.mp3
> 07/26/2018  11:22 AM 7,330,642 17 Where Have All the Average People
> Gone.mp3
> 07/26/2018  11:22 AM 7,334,752 18 South.mp3
> 07/26/2018  11:22 AM 6,981,924 19 Tomorrow Night in Baltimore.mp3
> 07/26/2018  11:22 AM 9,353,872 20 River in the Rain.mp3
>  20 File(s)121,285,999 bytes
>   2 Dir(s)  295,427,198,976 bytes free
> 
> c:\Music\Roger Miller\All Time Greatest Hits>
> -=-=-
> 
>   Untested (especially the ID3 "variable" -- substitute variables as
> needed to match the original code):
> 
 id3Track = 2
 track_number = "%2.2d " % id3Track
 track_number
> '02 '
 filename = "02 This is the life.mp3"
 if filename.startswith(track_number):
> ...   nametitle = filename[3:]
> ... else:
> ...   nametitle = filename
> ...   
 if nametitle.endswith(".mp3"):
> ...   nametitle = nametitle[:-4]
> ...   
 nametitle
> 'This is the life'
> 
>   Handling ASCII ' and " vs Unicode "smart" quotes is a different matter.
> 
>   One may still run the risk of having a filename without a track number
> BUT having a number that just manages to match the track number. To account
> for that I'd suggest using the sequence:
> 
> * Strip extension (if filename.lower().endswith(".mp3"): ...)
> * Handle any Unicode/ASCII quotes in both filename AND ID3 track title
> * Compare filename and title.
> * IF MATCHED -- done
> * IF NOT MATCHED
> * Format ID3 track number as shown above
> * Compare filename to (formatted track number + track 
> title)
> * IF MATCHED -- done
> * IF NOT MATCHED
> * Log full filename and ID3 track 
> title/track number to a
> log for later examination.
> 
> 
> 
> -- 
>   Wulfraed Dennis Lee Bieber AF6VN
>   wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


PYLAUNCH_DEBUG not printing info

2022-06-08 Thread Richard David
Why am I not getting debug output on my windows 10 machine:

C:\temp>\Windows\py.exe -0
 -V:3.11 *Python 3.11 (64-bit)
 -V:3.10  Python 3.10 (64-bit)

C:\temp>set PYLAUNCH_DEBUG=1

C:\temp>\Windows\py.exe
Python 3.11.0b3 (main, Jun  1 2022, 13:29:14) [MSC v.1932 64 bit (AMD64)] on 
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> ^Z

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Greg Ewing

On 9/06/22 5:55 am, Dennis Lee Bieber wrote:


There are no mutable strings in Python.


If you really want a mutable sequence of characters, you can
use array.array, but you won't be able to use it directly in
place of a string in most contexts.

--
Greg

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Greg Ewing

On 8/06/22 10:26 pm, Jon Ribbens wrote:

Here's a head-start on some characters you might want to translate,


Another possibility that might make sense in this case is to simply
strip out all punctuation before comparing. That would take care of
things being spelled with or without hyphens, commas, etc.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Dennis Lee Bieber
On Wed, 8 Jun 2022 11:09:05 +0200, Dave  declaimed
the following:

>Hi,
>
>Thanks for this! 
>
>So, is there a copy function/method that returns a MutableString like in 
>objective-C? I’ve solved this problems before in a number of languages like 
>Objective-C and AppleScript.

There are no mutable strings in Python. Any operation manipulating a
string RETURNS A MODIFIED NEW STRING.

>myString = 'Hello'
>myNewstring = myString.replace(myString,'e','a’)
>

Please study the library reference manual -- it should be clear what
the various string methods can perform. Hint: they are "methods", which
means whatever is before the . becomes the automatic "self" argument inside
the method)

https://docs.python.org/3/library/stdtypes.html#string-methods

"""
str.replace(old, new[, count])

Return a copy of the string with all occurrences of substring old
replaced by new. If the optional argument count is given, only the first
count occurrences are replaced.
"""

myNewstring = myString.replace("e", "a")

However... Please study
"""
static str.maketrans(x[, y[, z]])

This static method returns a translation table usable for
str.translate().

If there is only one argument, it must be a dictionary mapping Unicode
ordinals (integers) or characters (strings of length 1) to Unicode
ordinals, strings (of arbitrary lengths) or None. Character keys will then
be converted to ordinals.

If there are two arguments, they must be strings of equal length, and
in the resulting dictionary, each character in x will be mapped to the
character at the same position in y. If there is a third argument, it must
be a string, whose characters will be mapped to None in the result.
"""
"""
str.translate(table)

Return a copy of the string in which each character has been mapped
through the given translation table. The table must be an object that
implements indexing via __getitem__(), typically a mapping or sequence.
When indexed by a Unicode ordinal (an integer), the table object can do any
of the following: return a Unicode ordinal or a string, to map the
character to one or more other characters; return None, to delete the
character from the return string; or raise a LookupError exception, to map
the character to itself.

You can use str.maketrans() to create a translation map from
character-to-character mappings in different formats.

See also the codecs module for a more flexible approach to custom
character mappings.
"""

Hmmm, I'm out-of-date... I'm on v3.8 and .removeprefix() and
.removesuffix() (from v3.9) simplify my previous post... Instead of

if myString.lower().endswith(".mp3"): #lower() is a precaution for case
myString = myString[:-4]

just use
myString = myString.lower().removesuffix(".mp3")
{note, you'll have to make the compare using .lower() on the other name
since this statement returns a lowercased version}


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-08 Thread De ongekruisigde
On 2022-06-08, 2qdxy4rzwzuui...@potatochowder.com 
<2qdxy4rzwzuui...@potatochowder.com> wrote:
> On 2022-06-09 at 04:15:46 +1000,
> Chris Angelico  wrote:
>
>> On Thu, 9 Jun 2022 at 04:14, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>> >
>> > On 2022-06-09 at 03:18:56 +1000,
>> > Chris Angelico  wrote:
>> >
>> > > On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>> > > >
>> > > > On 2022-06-08 at 08:07:40 -,
>> > > > De ongekruisigde  wrote:
>> > > >
>> > > > > Depending on the problem a regular expression may be the much simpler
>> > > > > solution. I love them for e.g. text parsing and use them all the 
>> > > > > time.
>> > > > > Unrivaled when e.g. parts of text have to be extracted, e.g. from 
>> > > > > lines
>> > > > > like these:
>> > > > >
>> > > > >   root:x:0:0:System 
>> > > > > administrator:/root:/run/current-system/sw/bin/bash
>> > > > >   dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin
>> > > > >   nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin
>> > > > >   avahi:x:997:996:avahi-daemon privilege separation 
>> > > > > user:/var/empty:/run/current-system/sw/bin/nologin
>> > > > >   sshd:x:998:993:SSH privilege separation 
>> > > > > user:/var/empty:/run/current-system/sw/bin/nologin
>> > > > >   geoclue:x:999:998:Geoinformation 
>> > > > > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin
>> > > > >
>> > > > > Compare a regexp solution like this:
>> > > > >
>> > > > >   >>> g = 
>> > > > > re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s)
>> > > > >   >>> print(g.groups())
>> > > > >   ('geoclue', 'x', '999', '998', 'Geoinformation service', 
>> > > > > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin')
>> > > > >
>> > > > > to the code one would require to process it manually, with all the 
>> > > > > edge
>> > > > > cases. The regexp surely reads much simpler (?).
>> > > >
>> > > > Uh...
>> > > >
>> > > > >>> import pwd # https://docs.python.org/3/library/pwd.html
>> > > > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue']
>> > > > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, 
>> > > > pw_gid=992, pw_gecos='Geoinformation service', 
>> > > > pw_dir='/var/lib/geoclue', pw_shell='/sbin/nologin')]
>> > >
>> > > That's great if the lines are specifically coming from your system's
>> > > own /etc/passwd, but not so much if you're trying to compare passwd
>> > > files from different systems, where you simply have the files
>> > > themselves.
>> >
>> > In addition to pwent to get specific entries from the local password
>> > database, POSIX has fpwent to get a specific entry from a stream that
>> > looks like /etc/passwd.  So even POSIX agrees that if you think you have
>> > to process this data manually, you're doing it wrong.  Python exposes
>> > neither functon directly (at least not in the pwd module or the os
>> > module; I didn't dig around or check PyPI).
>> 
>> So.. we can go find some other way of calling fpwent, or we can
>> just parse the file ourselves. It's a very VERY simple format.
>
> If you insist:
>
> >>> s = 
> 'nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin'
> >>> print(s.split(':'))
> ['nm-iodine', 'x', '996', '57', '', '/var/empty', 
> '/run/current-system/sw/bin/nologin']
>
> Hesitantly, because this is the Python mailing list, I claim (a) ':' is
> simpler than r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$', and
> (b) string.split covers pretty much the same edge cases as re.search.

Ah, but you don't catch the be numeric of fields (0-based) 2 and 3! But
agreed, it's not the best of examples.


-- 
 You're rewriting parts of Quake in *Python*?
 MUAHAHAHA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-08 Thread Dennis Lee Bieber
On Wed, 8 Jun 2022 01:53:26 + (UTC), Avi Gross 
declaimed the following:


>
>So is it necessary to insist on an exact pattern of two digits followed by a 
>space? 
>
>
>That would fail on "44 Minutes", "40 Oz. Dream", "50 Mission Cap", "50 Ways to 
>Say Goodbye", "99 Ways to Die" 
>
>It looks to me like you need to compare TWICE just in case. If it matches in 
>the original (perhaps with some normalization of case and whitespace, fine. If 
>not will they match if one or both have something to remove as a prefix such 
>as "02 ". And if you are comparing items where the same song is in two 
>different numeric sequences on different disks, ...

I suspect the OP really needs to extract the /track number/ from the
ID3 information, and (converting to a 2digit formatted string) see if the
file name begins with that track number... The format of the those
filenames appear to be those generated by some software when ripping CDs to
MP3s -- for example:

-=-=-
c:\Music\Roger Miller\All Time Greatest Hits>dir
 Volume in drive C is OS
 Volume Serial Number is 4ACC-3CB4

 Directory of c:\Music\Roger Miller\All Time Greatest Hits

04/11/2022  05:06 PM  .
04/11/2022  05:06 PM  ..
07/26/2018  11:20 AM 4,493,279 01 Dang Me.mp3
07/26/2018  11:20 AM 5,072,414 02 Chug-A-Lug.mp3
07/26/2018  11:20 AM 4,275,844 03 Do-Wacka-Do.mp3
07/26/2018  11:20 AM 4,284,208 04 In the Summertime.mp3
07/26/2018  11:20 AM 6,028,730 05 King of the Road.mp3
07/26/2018  11:20 AM 4,662,182 06 You Can't Roller Skate in a
Buffalo Herd.mp3
07/26/2018  11:20 AM 5,624,704 07 Engine, Engine #9.mp3
07/26/2018  11:20 AM 5,002,492 08 One Dyin' and a Buryin'.mp3
07/26/2018  11:21 AM 6,799,224 09 Last Word in Lonesome Is Me.mp3
07/26/2018  11:21 AM 5,637,230 10 Kansas City Star.mp3
07/26/2018  11:21 AM 4,656,910 11 England Swings.mp3
07/26/2018  11:21 AM 5,836,638 12 Husbands and Wives.mp3
07/26/2018  11:21 AM 5,470,216 13 I've Been a Long Time Leavin'.mp3
07/26/2018  11:21 AM 6,230,236 14 Walkin' in the Sunshine.mp3
07/26/2018  11:21 AM 6,416,060 15 Little Green Apples.mp3
07/26/2018  11:21 AM 9,794,442 16 Me and Bobby McGee.mp3
07/26/2018  11:22 AM 7,330,642 17 Where Have All the Average People
Gone.mp3
07/26/2018  11:22 AM 7,334,752 18 South.mp3
07/26/2018  11:22 AM 6,981,924 19 Tomorrow Night in Baltimore.mp3
07/26/2018  11:22 AM 9,353,872 20 River in the Rain.mp3
  20 File(s)121,285,999 bytes
   2 Dir(s)  295,427,198,976 bytes free

c:\Music\Roger Miller\All Time Greatest Hits>
-=-=-

Untested (especially the ID3 "variable" -- substitute variables as
needed to match the original code):

>>> id3Track = 2
>>> track_number = "%2.2d " % id3Track
>>> track_number
'02 '
>>> filename = "02 This is the life.mp3"
>>> if filename.startswith(track_number):
... nametitle = filename[3:]
... else:
... nametitle = filename
... 
>>> if nametitle.endswith(".mp3"):
... nametitle = nametitle[:-4]
... 
>>> nametitle
'This is the life'

Handling ASCII ' and " vs Unicode "smart" quotes is a different matter.

One may still run the risk of having a filename without a track number
BUT having a number that just manages to match the track number. To account
for that I'd suggest using the sequence:

*   Strip extension (if filename.lower().endswith(".mp3"): ...)
*   Handle any Unicode/ASCII quotes in both filename AND ID3 track title
*   Compare filename and title.
*   IF MATCHED -- done
*   IF NOT MATCHED
*   Format ID3 track number as shown above
*   Compare filename to (formatted track number + track 
title)
*   IF MATCHED -- done
*   IF NOT MATCHED
*   Log full filename and ID3 track 
title/track number to a
log for later examination.



-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-08 Thread De ongekruisigde
On 2022-06-08, 2qdxy4rzwzuui...@potatochowder.com 
<2qdxy4rzwzuui...@potatochowder.com> wrote:
> On 2022-06-08 at 08:07:40 -,
> De ongekruisigde  wrote:
>
>> Depending on the problem a regular expression may be the much simpler
>> solution. I love them for e.g. text parsing and use them all the time.
>> Unrivaled when e.g. parts of text have to be extracted, e.g. from lines
>> like these:
>> 
>>   root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash
>>   dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin
>>   nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin
>>   avahi:x:997:996:avahi-daemon privilege separation 
>> user:/var/empty:/run/current-system/sw/bin/nologin
>>   sshd:x:998:993:SSH privilege separation 
>> user:/var/empty:/run/current-system/sw/bin/nologin
>>   geoclue:x:999:998:Geoinformation 
>> service:/var/lib/geoclue:/run/current-system/sw/bin/nologin
>> 
>> Compare a regexp solution like this:
>> 
>>   >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s)
>>   >>> print(g.groups())
>>   ('geoclue', 'x', '999', '998', 'Geoinformation service', 
>> '/var/lib/geoclue', '/run/current-system/sw/bin/nologin')
>> 
>> to the code one would require to process it manually, with all the edge
>> cases. The regexp surely reads much simpler (?).
>
> Uh...
>
> >>> import pwd # https://docs.python.org/3/library/pwd.html
> >>> [x for x in pwd.getpwall() if x[0] == 'geoclue']
> [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, 
> pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', 
> pw_shell='/sbin/nologin')]

Yeah... Well, it was just an example and it must be clear by now I'm not
a Python programmer.

-- 
 You're rewriting parts of Quake in *Python*?
 MUAHAHAHA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-08 Thread 2QdxY4RzWzUUiLuE
On 2022-06-09 at 04:15:46 +1000,
Chris Angelico  wrote:

> On Thu, 9 Jun 2022 at 04:14, <2qdxy4rzwzuui...@potatochowder.com> wrote:
> >
> > On 2022-06-09 at 03:18:56 +1000,
> > Chris Angelico  wrote:
> >
> > > On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote:
> > > >
> > > > On 2022-06-08 at 08:07:40 -,
> > > > De ongekruisigde  wrote:
> > > >
> > > > > Depending on the problem a regular expression may be the much simpler
> > > > > solution. I love them for e.g. text parsing and use them all the time.
> > > > > Unrivaled when e.g. parts of text have to be extracted, e.g. from 
> > > > > lines
> > > > > like these:
> > > > >
> > > > >   root:x:0:0:System 
> > > > > administrator:/root:/run/current-system/sw/bin/bash
> > > > >   dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin
> > > > >   nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin
> > > > >   avahi:x:997:996:avahi-daemon privilege separation 
> > > > > user:/var/empty:/run/current-system/sw/bin/nologin
> > > > >   sshd:x:998:993:SSH privilege separation 
> > > > > user:/var/empty:/run/current-system/sw/bin/nologin
> > > > >   geoclue:x:999:998:Geoinformation 
> > > > > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin
> > > > >
> > > > > Compare a regexp solution like this:
> > > > >
> > > > >   >>> g = 
> > > > > re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s)
> > > > >   >>> print(g.groups())
> > > > >   ('geoclue', 'x', '999', '998', 'Geoinformation service', 
> > > > > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin')
> > > > >
> > > > > to the code one would require to process it manually, with all the 
> > > > > edge
> > > > > cases. The regexp surely reads much simpler (?).
> > > >
> > > > Uh...
> > > >
> > > > >>> import pwd # https://docs.python.org/3/library/pwd.html
> > > > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue']
> > > > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, 
> > > > pw_gid=992, pw_gecos='Geoinformation service', 
> > > > pw_dir='/var/lib/geoclue', pw_shell='/sbin/nologin')]
> > >
> > > That's great if the lines are specifically coming from your system's
> > > own /etc/passwd, but not so much if you're trying to compare passwd
> > > files from different systems, where you simply have the files
> > > themselves.
> >
> > In addition to pwent to get specific entries from the local password
> > database, POSIX has fpwent to get a specific entry from a stream that
> > looks like /etc/passwd.  So even POSIX agrees that if you think you have
> > to process this data manually, you're doing it wrong.  Python exposes
> > neither functon directly (at least not in the pwd module or the os
> > module; I didn't dig around or check PyPI).
> 
> So.. we can go find some other way of calling fpwent, or we can
> just parse the file ourselves. It's a very VERY simple format.

If you insist:

>>> s = 'nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin'
>>> print(s.split(':'))
['nm-iodine', 'x', '996', '57', '', '/var/empty', 
'/run/current-system/sw/bin/nologin']

Hesitantly, because this is the Python mailing list, I claim (a) ':' is
simpler than r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$', and
(b) string.split covers pretty much the same edge cases as re.search.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-08 Thread Chris Angelico
On Thu, 9 Jun 2022 at 04:14, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>
> On 2022-06-09 at 03:18:56 +1000,
> Chris Angelico  wrote:
>
> > On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote:
> > >
> > > On 2022-06-08 at 08:07:40 -,
> > > De ongekruisigde  wrote:
> > >
> > > > Depending on the problem a regular expression may be the much simpler
> > > > solution. I love them for e.g. text parsing and use them all the time.
> > > > Unrivaled when e.g. parts of text have to be extracted, e.g. from lines
> > > > like these:
> > > >
> > > >   root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash
> > > >   dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin
> > > >   nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin
> > > >   avahi:x:997:996:avahi-daemon privilege separation 
> > > > user:/var/empty:/run/current-system/sw/bin/nologin
> > > >   sshd:x:998:993:SSH privilege separation 
> > > > user:/var/empty:/run/current-system/sw/bin/nologin
> > > >   geoclue:x:999:998:Geoinformation 
> > > > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin
> > > >
> > > > Compare a regexp solution like this:
> > > >
> > > >   >>> g = 
> > > > re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s)
> > > >   >>> print(g.groups())
> > > >   ('geoclue', 'x', '999', '998', 'Geoinformation service', 
> > > > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin')
> > > >
> > > > to the code one would require to process it manually, with all the edge
> > > > cases. The regexp surely reads much simpler (?).
> > >
> > > Uh...
> > >
> > > >>> import pwd # https://docs.python.org/3/library/pwd.html
> > > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue']
> > > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, 
> > > pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', 
> > > pw_shell='/sbin/nologin')]
> >
> > That's great if the lines are specifically coming from your system's
> > own /etc/passwd, but not so much if you're trying to compare passwd
> > files from different systems, where you simply have the files
> > themselves.
>
> In addition to pwent to get specific entries from the local password
> database, POSIX has fpwent to get a specific entry from a stream that
> looks like /etc/passwd.  So even POSIX agrees that if you think you have
> to process this data manually, you're doing it wrong.  Python exposes
> neither functon directly (at least not in the pwd module or the os
> module; I didn't dig around or check PyPI).

So.. we can go find some other way of calling fpwent, or we can
just parse the file ourselves. It's a very VERY simple format.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-08 Thread 2QdxY4RzWzUUiLuE
On 2022-06-09 at 03:18:56 +1000,
Chris Angelico  wrote:

> On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote:
> >
> > On 2022-06-08 at 08:07:40 -,
> > De ongekruisigde  wrote:
> >
> > > Depending on the problem a regular expression may be the much simpler
> > > solution. I love them for e.g. text parsing and use them all the time.
> > > Unrivaled when e.g. parts of text have to be extracted, e.g. from lines
> > > like these:
> > >
> > >   root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash
> > >   dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin
> > >   nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin
> > >   avahi:x:997:996:avahi-daemon privilege separation 
> > > user:/var/empty:/run/current-system/sw/bin/nologin
> > >   sshd:x:998:993:SSH privilege separation 
> > > user:/var/empty:/run/current-system/sw/bin/nologin
> > >   geoclue:x:999:998:Geoinformation 
> > > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin
> > >
> > > Compare a regexp solution like this:
> > >
> > >   >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' 
> > > , s)
> > >   >>> print(g.groups())
> > >   ('geoclue', 'x', '999', '998', 'Geoinformation service', 
> > > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin')
> > >
> > > to the code one would require to process it manually, with all the edge
> > > cases. The regexp surely reads much simpler (?).
> >
> > Uh...
> >
> > >>> import pwd # https://docs.python.org/3/library/pwd.html
> > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue']
> > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, 
> > pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', 
> > pw_shell='/sbin/nologin')]
> 
> That's great if the lines are specifically coming from your system's
> own /etc/passwd, but not so much if you're trying to compare passwd
> files from different systems, where you simply have the files
> themselves.

In addition to pwent to get specific entries from the local password
database, POSIX has fpwent to get a specific entry from a stream that
looks like /etc/passwd.  So even POSIX agrees that if you think you have
to process this data manually, you're doing it wrong.  Python exposes
neither functon directly (at least not in the pwd module or the os
module; I didn't dig around or check PyPI).

IMO, higher level functions to process such data is way better than a
[insert your own adjective/expletive here] regular expression that
collects the pieces into numbered groups rather than labeled fields.
Readability counts.

Yes, absolutely, use a regular expression when all else fails.  Don't
forget to handle all the edge cases!  (I assume that sane OSes preclude
colons in paths that are likely to come up in the local password
database, but I don't know what happens, e.g., when there's a reason for
GECOS to contain a colon.)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Barry Scott


> On 8 Jun 2022, at 18:01, Dave  wrote:
> 
> Hi,
> 
> This is a tool I’m using on my own files to save me time. Basically or most 
> of the tracks were imported with different version iTunes over the years. 
> There are two problems:
> 
> 1.   File System characters are replaced (you can’t have ‘/‘ or ‘:’ in a file 
> name).
ok
> 2.   Smart Quotes were added at some point, these need to replaced.
ok
> 3.   Other character based of name being of a non-english origin.
Why is this a problem? Its only if the chars are confusing/will not compare 
that there is something to fix?
All modern OS allow unicode filenames.

Barry


> 
> If find others I’ll add them.
> 
> I’m using MusicBrainz to do a fuzzy match and get the correct name.
> 
> it’s not perfect, but works for 99% of files which is good enough for me!
> 
> Cheers
> Dave
> 
> 
>> On 8 Jun 2022, at 18:23, Avi Gross via Python-list  
>> wrote:
>> 
>> Dave,
>> 
>> Your goal is to compare titles and there can be endless replacements needed 
>> if you allow the text to contain anything but ASCII.
>> 
>> Have you considered stripping out things instead? I mean remove lots of 
>> stuff that is not ASCII in the first place and perhaps also remove lots of 
>> extra punctuation likesingle quotes or question marks or redundant white 
>> space and compare the sort of skeletons of the two? 
>> 
>> And even if that fails, could you have a measure of how different they are 
>> and tolerate if they were say off by one letter albeit "My desert" matching 
>> "My Dessert" might not be a valid match with one being a song about an arid 
>> environment and the other about food you don't need!
>> 
>> Your seemingly simple need can expand into a fairly complex project. There 
>> may be many ideas on how to deal with it but not anything perfect enough to 
>> catch all cases as even a trained human may have to make decisions at times 
>> and not match what other humans do. We have examples like the TV show 
>> "NUMB3RS" that used a perfectly valid digit 3 to stand for an "E" but yet is 
>> often written when I look it up as NUMBERS. You have obvious cases where 
>> titles of songs may contain composite symbols like "œ" which will not 
>> compare to one where it is written out as "oe" so the idea of comparing is 
>> quite complex and the best you might do is heuristic.
>> 
>> UNICODE has many symbols that are almost the same or even look the same or 
>> maybe in one font versus another. There are libraries of functions that 
>> allow some kinds of comparisons or conversions that you could look into but 
>> the gain for you may not be worth it. Nothing stops a person from naming a 
>> song any way they want and I speak many languages and often see a song 
>> re-titled in the local language and using the local alphabet mixed often 
>> with another.
>> 
>> Your original question is perhaps now many questions, depending on what you 
>> choose. You started by wanting to know how to compare and it is moving on to 
>> how to delete parts or make substitutions or use regular expressions and it 
>> can get worse. You can, for example, take a string and identify the words 
>> within it and create a regular expression that inserts sequences between the 
>> words that match any zero or one or more non-word characters such as spaces, 
>> tabs, punctuation or non-ASCII, so that song titles with the same words in a 
>> sequence match no matter what is between them. The possibilities are endless 
>> but consider some of the techniques that are used by some programs that 
>> parse text and suggest alternate spellings  or even programs like Google 
>> Translate that can take a sentence and then suggest you may mean a slightly 
>> altered sentence with one word changed to fit better. 
>> 
>> You need to decide what you want to deal with and what will be 
>> mis-classified by your program. Some of us have suggested folding the case 
>> of the words but that means asong about a dark skinned person in Poland 
>> called "Black Polish" would match a song about keeping your shoes dark with 
>> "black polish" so I keep repeating it is very hard or frankly impossible, to 
>> catch every case I can imagine and the many I can't!
>> 
>> But the emphasis here is not your overall problem. It is about whether and 
>> how the computer language called python, and perhaps some add-on modules, 
>> can be used to solve each smaller need such as recognizing a pattern or 
>> replacing text. It can do quite a bit but only when the specification of the 
>> problem is exact. 
>> 
>> 
>> 
>> 
>> -Original Message-
>> From: Dave 
>> To: python-list@python.org
>> Sent: Wed, Jun 8, 2022 5:09 am
>> Subject: Re: How to replace characters in a string?
>> 
>> Hi,
>> 
>> Thanks for this! 
>> 
>> So, is there a copy function/method that returns a MutableString like in 
>> objective-C? I’ve solved this problems before in a number of languages like 
>> Objective-C and AppleScript.
>> 
>> Basically there is a set of common cha

Re: How to test characters of a string

2022-06-08 Thread Chris Angelico
On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>
> On 2022-06-08 at 08:07:40 -,
> De ongekruisigde  wrote:
>
> > Depending on the problem a regular expression may be the much simpler
> > solution. I love them for e.g. text parsing and use them all the time.
> > Unrivaled when e.g. parts of text have to be extracted, e.g. from lines
> > like these:
> >
> >   root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash
> >   dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin
> >   nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin
> >   avahi:x:997:996:avahi-daemon privilege separation 
> > user:/var/empty:/run/current-system/sw/bin/nologin
> >   sshd:x:998:993:SSH privilege separation 
> > user:/var/empty:/run/current-system/sw/bin/nologin
> >   geoclue:x:999:998:Geoinformation 
> > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin
> >
> > Compare a regexp solution like this:
> >
> >   >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , 
> > s)
> >   >>> print(g.groups())
> >   ('geoclue', 'x', '999', '998', 'Geoinformation service', 
> > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin')
> >
> > to the code one would require to process it manually, with all the edge
> > cases. The regexp surely reads much simpler (?).
>
> Uh...
>
> >>> import pwd # https://docs.python.org/3/library/pwd.html
> >>> [x for x in pwd.getpwall() if x[0] == 'geoclue']
> [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, 
> pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', 
> pw_shell='/sbin/nologin')]

That's great if the lines are specifically coming from your system's
own /etc/passwd, but not so much if you're trying to compare passwd
files from different systems, where you simply have the files
themselves.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-08 Thread 2QdxY4RzWzUUiLuE
On 2022-06-08 at 08:07:40 -,
De ongekruisigde  wrote:

> Depending on the problem a regular expression may be the much simpler
> solution. I love them for e.g. text parsing and use them all the time.
> Unrivaled when e.g. parts of text have to be extracted, e.g. from lines
> like these:
> 
>   root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash
>   dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin
>   nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin
>   avahi:x:997:996:avahi-daemon privilege separation 
> user:/var/empty:/run/current-system/sw/bin/nologin
>   sshd:x:998:993:SSH privilege separation 
> user:/var/empty:/run/current-system/sw/bin/nologin
>   geoclue:x:999:998:Geoinformation 
> service:/var/lib/geoclue:/run/current-system/sw/bin/nologin
> 
> Compare a regexp solution like this:
> 
>   >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s)
>   >>> print(g.groups())
>   ('geoclue', 'x', '999', '998', 'Geoinformation service', 
> '/var/lib/geoclue', '/run/current-system/sw/bin/nologin')
> 
> to the code one would require to process it manually, with all the edge
> cases. The regexp surely reads much simpler (?).

Uh...

>>> import pwd # https://docs.python.org/3/library/pwd.html
>>> [x for x in pwd.getpwall() if x[0] == 'geoclue']
[pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, 
pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', 
pw_shell='/sbin/nologin')]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-08 Thread Barry Scott


> On 7 Jun 2022, at 23:24, Dave  wrote:
> 
> Yes, it was probably just a typeo on my part.
> 
> I’ve now fixed the majority of cases but still got two strings that look 
> identical but fail to match, this time (again by 10cc), “I’m Mandy Fly Me”.
> 
> I’m putting money on it being a utf8 problem but I’m stuck on how to handle 
> it. It’s probably the single quote in I’m, although it has worked with other 
> songs.
> 
> Any ideas?

You can use difflib to give you a diff of the two strings:

:>>> print('\n'.join(difflib.unified_diff('abc', 'adc')))
---

+++

@@ -1,3 +1,3 @@

 a
-b
+d
 c
:>>>

The docs talk about lines, but difflib works on sequence. I use it a lot to find
differences within lines.

Barry



> 
> All the Best
> Cheers
> Dave
> 
> Here is the whole function/method or whatever it’s called in Python:
> 
> 
> #
> #   checkMusicFiles
> #
> 
> def checkMusicFiles(theBaseMusicLibraryFolder):
>myArtistDict = []
> 
> #
> #  Loop thru Artists Folder
> #
>myArtistsFoldlerList = getFolderList(theBaseMusicLibraryFolder)
>myArtistCount = 0
>for myArtistFolder in myArtistsFoldlerList:
>print('Artist: ' + myArtistFolder)
> #
> #  Loop thru Albums Folder
> #
>myAlbumList = getFolderList(theBaseMusicLibraryFolder + myArtistFolder)
>for myAlbum in myAlbumList:
>print('Album: ' + myAlbum)
> 
> #
> #  Loop thru Tracks (Files) Folder
> #
>myAlbumPath = theBaseMusicLibraryFolder + myArtistFolder + '/' + 
> myAlbum + '/'
>myFilesList = getFileList(myAlbumPath)
>for myFile in myFilesList:
>myFilePath = myAlbumPath + myFile
>myID3 = eyed3.load(myFilePath)
>if myID3 is None:
>continue
> 
>myArtistName = myID3.tag.artist
>if myArtistName is None:
>continue
> 
>myAlbumName = myID3.tag.album
>if myAlbumName is None:
>continue
> 
>myTitleName = myID3.tag.title
>if myTitleName is None:
>continue
> 
>myCompareFileName = myFile[0:-4]
>if myCompareFileName[0].isdigit() and 
> myCompareFileName[1].isdigit():
>myCompareFileName = myFile[3:-4]
> 
>if myCompareFileName != myTitleName:
>myLength1 = len(myCompareFileName)
>myLength2 = len(myTitleName)
>print('File Name Mismatch - Artist: [' + myArtistName + '] 
>  Album: ['+ myAlbumName + ']  Track: [' + myTitleName + ']  File: [' + 
> myCompareFileName + ']')
>if (myLength1 == myLength2):
>print('lengths match: ',myLength1)
>else:
>print('lengths mismatch: ',myLength1,'  ',myLength2)
> 
>print(' ')
> 
> 
> 
> 
>return myArtistsFoldlerList
> 
> 
> 
> 
> 
> 
>> On 8 Jun 2022, at 00:07, MRAB  wrote:
>> 
>> On 2022-06-07 21:23, Dave wrote:
>>> Thanks a lot for this! isDigit was the method I was looking for and 
>>> couldn’t find.
>>> I have another problem related to this, the following code uses the code 
>>> you just sent. I am getting a files ID3 tags using eyed3, this part seems 
>>> to work and I get expected values in this case myTitleName (Track name) is 
>>> set to “Deadlock Holiday” and myCompareFileName is set to “01 Deadlock 
>>> Holiday” (File Name with the Track number prepended). The is digit test 
>>> works and myCompareFileName is set to  “Deadlock Holiday”, so they should 
>>> match, right?
>> OT, but are you sure about that name? Isn't it "Dreadlock Holiday" (by 10cc)?
>> 
>> [snip]
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Dave
Hi,

This is a tool I’m using on my own files to save me time. Basically or most of 
the tracks were imported with different version iTunes over the years. There 
are two problems:

1.   File System characters are replaced (you can’t have ‘/‘ or ‘:’ in a file 
name).
2.   Smart Quotes were added at some point, these need to replaced.
3.   Other character based of name being of a non-english origin.

If find others I’ll add them.

I’m using MusicBrainz to do a fuzzy match and get the correct name.

it’s not perfect, but works for 99% of files which is good enough for me!

Cheers
Dave


> On 8 Jun 2022, at 18:23, Avi Gross via Python-list  
> wrote:
> 
> Dave,
> 
> Your goal is to compare titles and there can be endless replacements needed 
> if you allow the text to contain anything but ASCII.
> 
> Have you considered stripping out things instead? I mean remove lots of stuff 
> that is not ASCII in the first place and perhaps also remove lots of extra 
> punctuation likesingle quotes or question marks or redundant white space and 
> compare the sort of skeletons of the two? 
> 
> And even if that fails, could you have a measure of how different they are 
> and tolerate if they were say off by one letter albeit "My desert" matching 
> "My Dessert" might not be a valid match with one being a song about an arid 
> environment and the other about food you don't need!
> 
> Your seemingly simple need can expand into a fairly complex project. There 
> may be many ideas on how to deal with it but not anything perfect enough to 
> catch all cases as even a trained human may have to make decisions at times 
> and not match what other humans do. We have examples like the TV show 
> "NUMB3RS" that used a perfectly valid digit 3 to stand for an "E" but yet is 
> often written when I look it up as NUMBERS. You have obvious cases where 
> titles of songs may contain composite symbols like "œ" which will not compare 
> to one where it is written out as "oe" so the idea of comparing is quite 
> complex and the best you might do is heuristic.
> 
> UNICODE has many symbols that are almost the same or even look the same or 
> maybe in one font versus another. There are libraries of functions that allow 
> some kinds of comparisons or conversions that you could look into but the 
> gain for you may not be worth it. Nothing stops a person from naming a song 
> any way they want and I speak many languages and often see a song re-titled 
> in the local language and using the local alphabet mixed often with another.
> 
> Your original question is perhaps now many questions, depending on what you 
> choose. You started by wanting to know how to compare and it is moving on to 
> how to delete parts or make substitutions or use regular expressions and it 
> can get worse. You can, for example, take a string and identify the words 
> within it and create a regular expression that inserts sequences between the 
> words that match any zero or one or more non-word characters such as spaces, 
> tabs, punctuation or non-ASCII, so that song titles with the same words in a 
> sequence match no matter what is between them. The possibilities are endless 
> but consider some of the techniques that are used by some programs that parse 
> text and suggest alternate spellings  or even programs like Google Translate 
> that can take a sentence and then suggest you may mean a slightly altered 
> sentence with one word changed to fit better. 
> 
> You need to decide what you want to deal with and what will be mis-classified 
> by your program. Some of us have suggested folding the case of the words but 
> that means asong about a dark skinned person in Poland called "Black Polish" 
> would match a song about keeping your shoes dark with "black polish" so I 
> keep repeating it is very hard or frankly impossible, to catch every case I 
> can imagine and the many I can't!
> 
> But the emphasis here is not your overall problem. It is about whether and 
> how the computer language called python, and perhaps some add-on modules, can 
> be used to solve each smaller need such as recognizing a pattern or replacing 
> text. It can do quite a bit but only when the specification of the problem is 
> exact. 
> 
> 
> 
> 
> -Original Message-
> From: Dave 
> To: python-list@python.org
> Sent: Wed, Jun 8, 2022 5:09 am
> Subject: Re: How to replace characters in a string?
> 
> Hi,
> 
> Thanks for this! 
> 
> So, is there a copy function/method that returns a MutableString like in 
> objective-C? I’ve solved this problems before in a number of languages like 
> Objective-C and AppleScript.
> 
> Basically there is a set of common characters that need “normalizing” and I 
> have a method that replaces them in a string, so:
> 
> myString = [myString normalizeCharacters];
> 
> Would return a new string with all the “common” replacements applied.
> 
> Since the following gives an error :
> 
> myString = 'Hello'
> myNewstring = myString.replace(myString,'e','a’)
> 

Re: How to replace characters in a string?

2022-06-08 Thread Avi Gross via Python-list
Dave,

Your goal is to compare titles and there can be endless replacements needed if 
you allow the text to contain anything but ASCII.

Have you considered stripping out things instead? I mean remove lots of stuff 
that is not ASCII in the first place and perhaps also remove lots of extra 
punctuation likesingle quotes or question marks or redundant white space and 
compare the sort of skeletons of the two? 

And even if that fails, could you have a measure of how different they are and 
tolerate if they were say off by one letter albeit "My desert" matching "My 
Dessert" might not be a valid match with one being a song about an arid 
environment and the other about food you don't need!

Your seemingly simple need can expand into a fairly complex project. There may 
be many ideas on how to deal with it but not anything perfect enough to catch 
all cases as even a trained human may have to make decisions at times and not 
match what other humans do. We have examples like the TV show "NUMB3RS" that 
used a perfectly valid digit 3 to stand for an "E" but yet is often written 
when I look it up as NUMBERS. You have obvious cases where titles of songs may 
contain composite symbols like "œ" which will not compare to one where it is 
written out as "oe" so the idea of comparing is quite complex and the best you 
might do is heuristic.

UNICODE has many symbols that are almost the same or even look the same or 
maybe in one font versus another. There are libraries of functions that allow 
some kinds of comparisons or conversions that you could look into but the gain 
for you may not be worth it. Nothing stops a person from naming a song any way 
they want and I speak many languages and often see a song re-titled in the 
local language and using the local alphabet mixed often with another.

Your original question is perhaps now many questions, depending on what you 
choose. You started by wanting to know how to compare and it is moving on to 
how to delete parts or make substitutions or use regular expressions and it can 
get worse. You can, for example, take a string and identify the words within it 
and create a regular expression that inserts sequences between the words that 
match any zero or one or more non-word characters such as spaces, tabs, 
punctuation or non-ASCII, so that song titles with the same words in a sequence 
match no matter what is between them. The possibilities are endless but 
consider some of the techniques that are used by some programs that parse text 
and suggest alternate spellings  or even programs like Google Translate that 
can take a sentence and then suggest you may mean a slightly altered sentence 
with one word changed to fit better. 

You need to decide what you want to deal with and what will be mis-classified 
by your program. Some of us have suggested folding the case of the words but 
that means asong about a dark skinned person in Poland called "Black Polish" 
would match a song about keeping your shoes dark with "black polish" so I keep 
repeating it is very hard or frankly impossible, to catch every case I can 
imagine and the many I can't!

But the emphasis here is not your overall problem. It is about whether and how 
the computer language called python, and perhaps some add-on modules, can be 
used to solve each smaller need such as recognizing a pattern or replacing 
text. It can do quite a bit but only when the specification of the problem is 
exact. 




-Original Message-
From: Dave 
To: python-list@python.org
Sent: Wed, Jun 8, 2022 5:09 am
Subject: Re: How to replace characters in a string?

Hi,

Thanks for this! 

So, is there a copy function/method that returns a MutableString like in 
objective-C? I’ve solved this problems before in a number of languages like 
Objective-C and AppleScript.

Basically there is a set of common characters that need “normalizing” and I 
have a method that replaces them in a string, so:

myString = [myString normalizeCharacters];

Would return a new string with all the “common” replacements applied.

Since the following gives an error :

myString = 'Hello'
myNewstring = myString.replace(myString,'e','a’)

TypeError: 'str' object cannot be interpreted as an integer

I can’t see of a way to do this in Python? 

All the Best
Dave


> On 8 Jun 2022, at 10:14, Chris Angelico  wrote:
> 
> On Wed, 8 Jun 2022 at 18:12, Dave  wrote:
> 
>> I tried the but it doesn’t seem to work?
>> myCompareFile1 = ascii(myTitleName)
>> myCompareFile1.replace("\u2019", "'")
> 
> Strings in Python are immutable. When you call ascii(), you get back a
> new string, but it's one that has actual backslashes and such in it.
> (You probably don't need this step, other than for debugging; check
> the string by printing out the ASCII version of it, but stick to the
> original for actual processing.) The same is true of the replace()
> method; it doesn't change the string, it returns a new string.
> 
 word = "spam"
 print(word.replace("sp", "h"))
> 

Re: How to replace characters in a string?

2022-06-08 Thread Jon Ribbens via Python-list
On 2022-06-08, Dave  wrote:
> I misunderstood how it worked, basically I’ve added this function:
>
> def filterCommonCharacters(theString):
> myNewString = theString.replace("\u2019", "'")
> return myNewString

> Which returns a new string replacing the common characters.
>
> This can easily be extended to include other characters as and when
> they come up by adding a line as so:
>
> myNewString = theString.replace("\u2014", “]”  #just an example
>
> Which is what I was trying to achieve.

Here's a head-start on some characters you might want to translate,
mostly spaces, hyphens, quotation marks, and ligatures:

def unicode_translate(s):
return s.translate({
8192: ' ', 8193: ' ', 8194: ' ', 8195: ' ', 8196: ' ',
8197: ' ', 198: 'AE', 8199: ' ', 8200: ' ', 8201: ' ',
8202: ' ', 8203: '', 64258: 'fl', 8208: '-', 8209: '-',
8210: '-', 8211: '-', 8212: '-', 8722: '-', 8216: "'",
8217: "'", 8220: '"', 8221: '"', 64256: 'ff', 160: ' ',
64260: 'ffl', 8198: ' ', 230: 'ae', 12288: ' ', 173: '',
497: 'DZ', 498: 'Dz', 499: 'dz', 64259: 'ffi', 8230: '...',
64257: 'fi', 64262: 'st'})

If you want to go further then the Unidecode package might be helpful:

https://pypi.org/project/Unidecode/

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-08 Thread De ongekruisigde
On 2022-06-08, Christian Gollwitzer  wrote:
> Am 07.06.22 um 23:01 schrieb Christian Gollwitzer:
>
>>> In [3]: re.sub(r'^\d+\s*', '', s) Out[3]: 'Trinket'
>>>
>
> that RE does match what you intended to do, but not exactly what you 
> wrote in the OP. that would be '^\d\d.'  start with exactly two digits 
> followed by any character.

Indeed but then I'd like '\d{2}' even better.


>   Christian


-- 
 You're rewriting parts of Quake in *Python*?
 MUAHAHAHA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Dave


> On 8 Jun 2022, at 11:25, Dave  wrote:
> 
>myNewString = theString.replace("\u2014", “]”  #just an example


Opps! Make that

   myNewString = myNewString.replace("\u2014", “]”  #just an example
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-08 Thread De ongekruisigde
On 2022-06-08, Dave  wrote:
> I hate regEx and avoid it whenever possible, I’ve never found something that 
> was impossible to do without it.

I love regular expressions and use them where appropriate. Saves tons of
code and is often much more readable than the pages of code required to
do the same.

-- 
 You're rewriting parts of Quake in *Python*?
 MUAHAHAHA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-08 Thread De ongekruisigde
On 2022-06-08, dn  wrote:
> On 08/06/2022 10.18, De ongekruisigde wrote:
>> On 2022-06-08, Christian Gollwitzer  wrote:
>>> Am 07.06.22 um 21:56 schrieb Dave:
 It depends on the language I’m using, in Objective C, I’d use isNumeric, 
 just wanted to know what the equivalent is in Python.

>>>
>>> Your problem is also a typical case for regular expressions. You can 
>>> create an expression for "starts with any number of digits plus optional 
>>> whitespace" and then replace this with nothing:
>> 
>> Regular expressions are overkill for this and much slower than the
>> simple isdigit based solution.
>
> ...
>
>> Regular expressions are indeeed extremely powerful and useful but I tend
>> to avoid them when there's a (faster) normal solution.
>
> Yes, simple solutions are (likely) easier to read.

Depending on the problem a regular expression may be the much simpler
solution. I love them for e.g. text parsing and use them all the time.
Unrivaled when e.g. parts of text have to be extracted, e.g. from lines
like these:

  root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash
  dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin
  nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin
  avahi:x:997:996:avahi-daemon privilege separation 
user:/var/empty:/run/current-system/sw/bin/nologin
  sshd:x:998:993:SSH privilege separation 
user:/var/empty:/run/current-system/sw/bin/nologin
  geoclue:x:999:998:Geoinformation 
service:/var/lib/geoclue:/run/current-system/sw/bin/nologin

Compare a regexp solution like this:

  >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s)
  >>> print(g.groups())
  ('geoclue', 'x', '999', '998', 'Geoinformation service', '/var/lib/geoclue', 
'/run/current-system/sw/bin/nologin')

to the code one would require to process it manually, with all the edge
cases. The regexp surely reads much simpler (?).


> RegEx-s are more powerful (and well worth learning for this reason), but
> are only 'readable' to those who use them frequently.
>
> Has either of you performed a timeit comparison?

No need: the isdigit solution doesn't require the overhead of a regex
processor.

-- 
 You're rewriting parts of Quake in *Python*?
 MUAHAHAHA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread De ongekruisigde
On 2022-06-08, Dave  wrote:
> Hi All,
>
> I decided to start a new thread as this really is a new subject.
>
> I've got two that appear to be identical, but fail to compare. After getting 
> the ascii encoding I see that they are indeed different, my question is how 
> can I replace the \u2019m with a regular single quote mark (or apostrophe)?

You're not facing this alone:

 https://changelog.complete.org/archives/9938-the-python-unicode-mess

Perhaps useful insights can be found at:

 https://realpython.com/python-encodings-guide/

> +++

-- 
 You're rewriting parts of Quake in *Python*?
 MUAHAHAHA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Dan Stromberg
On Wed, Jun 8, 2022 at 1:11 AM Dave  wrote:

> I've got two that appear to be identical, but fail to compare. After
> getting the ascii encoding I see that they are indeed different, my
> question is how can I replace the \u2019m with a regular single quote mark
> (or apostrophe)?
>

Perhaps try https://pypi.org/project/Unidecode/ ?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Roel Schroeven

Op 8/06/2022 om 11:25 schreef Dave:

Hi,

I misunderstood how it worked, basically I’ve added this function:

def filterCommonCharacters(theString):

 myNewString = theString.replace("\u2019", "'")

 return myNewString
Which returns a new string replacing the common characters.

This can easily be extended to include other characters as and when they come 
up by adding a line as so:

 myNewString = theString.replace("\u2014", “]”  #just an example

Which is what I was trying to achieve.
When you have multiple replacements to do, there's an alternative for 
multiple replace calls: you can use theString.translate() with a 
translation map (which you can make yourself or make with 
str.maketrans()) to do all the replacements at once. Example


    # Make a map that translates every character from the first string 
to the

    # corresponding character in the second string
    translation_map = str.maketrans("\u2019\u2014", "']")

    # All the replacements in one go
    myNewString = theString.translate(translation_map)

See:
    - https://docs.python.org/3.10/library/stdtypes.html#str.maketrans
    - https://docs.python.org/3.10/library/stdtypes.html#str.translate

--

"There is a theory which states that if ever anyone discovers exactly what the
Universe is for and why it is here, it will instantly disappear and be
replaced by something even more bizarre and inexplicable.
There is another theory which states that this has already happened."
-- Douglas Adams, The Restaurant at the End of the Universe

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Dave
Hi,

I misunderstood how it worked, basically I’ve added this function:

def filterCommonCharacters(theString):

myNewString = theString.replace("\u2019", "'")

return myNewString
Which returns a new string replacing the common characters.

This can easily be extended to include other characters as and when they come 
up by adding a line as so:

myNewString = theString.replace("\u2014", “]”  #just an example

Which is what I was trying to achieve.

All the Best
Dave

> On 8 Jun 2022, at 11:17, Chris Angelico  wrote:
> 
> On Wed, 8 Jun 2022 at 19:13, Dave  wrote:
>> 
>> Hi,
>> 
>> Thanks for this!
>> 
>> So, is there a copy function/method that returns a MutableString like in 
>> objective-C? I’ve solved this problems before in a number of languages like 
>> Objective-C and AppleScript.
>> 
>> Basically there is a set of common characters that need “normalizing” and I 
>> have a method that replaces them in a string, so:
>> 
>> myString = [myString normalizeCharacters];
>> 
>> Would return a new string with all the “common” replacements applied.
>> 
>> Since the following gives an error :
>> 
>> myString = 'Hello'
>> myNewstring = myString.replace(myString,'e','a’)
>> 
>> TypeError: 'str' object cannot be interpreted as an integer
>> 
>> I can’t see of a way to do this in Python?
>> 
> 
> Not sure why you're passing the string as an argument as well as using
> it as the object you're calling a method on. All you should need to do
> is:
> 
> myString.replace('e', 'a')
> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Joel Goldstick
On Wed, Jun 8, 2022 at 5:22 AM Karsten Hilbert  wrote:
>
> Am Wed, Jun 08, 2022 at 11:09:05AM +0200 schrieb Dave:
>
> > myString = 'Hello'
> > myNewstring = myString.replace(myString,'e','a’)
>
> That won't work (last quote) but apart from that:
>
> myNewstring = myString.replace('e', 'a')
>
> Karsten
> --
> GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B
> --
> https://mail.python.org/mailman/listinfo/python-list


Sorry if I'm not reading the nuances correctly, but it looks to me
that you failed to realize that string methods return results.  They
don't change the string in place:

Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> str1 = "\u2019string with starting smart quote"
>>> str1
'’string with starting smart quote'
>>> new_str = str1.replace("\u2019","'")
>>> str1
'’string with starting smart quote'
>>> new_str
"'string with starting smart quote"
>>> repr(str1)
"'’string with starting smart quote'"
>>> repr(new_str)
'"\'string with starting smart quote"'
>>>

As you can see, str1 doesn't change, but when you 'replace' on it, the
result you want is returned to new_str

-- 
Joel Goldstick
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Karsten Hilbert
Am Wed, Jun 08, 2022 at 11:09:05AM +0200 schrieb Dave:

> myString = 'Hello'
> myNewstring = myString.replace(myString,'e','a’)

That won't work (last quote) but apart from that:

myNewstring = myString.replace('e', 'a')

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Chris Angelico
On Wed, 8 Jun 2022 at 19:13, Dave  wrote:
>
> Hi,
>
> Thanks for this!
>
> So, is there a copy function/method that returns a MutableString like in 
> objective-C? I’ve solved this problems before in a number of languages like 
> Objective-C and AppleScript.
>
> Basically there is a set of common characters that need “normalizing” and I 
> have a method that replaces them in a string, so:
>
> myString = [myString normalizeCharacters];
>
> Would return a new string with all the “common” replacements applied.
>
> Since the following gives an error :
>
> myString = 'Hello'
> myNewstring = myString.replace(myString,'e','a’)
>
> TypeError: 'str' object cannot be interpreted as an integer
>
> I can’t see of a way to do this in Python?
>

Not sure why you're passing the string as an argument as well as using
it as the object you're calling a method on. All you should need to do
is:

myString.replace('e', 'a')

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Dave
Hi,

Thanks for this! 

So, is there a copy function/method that returns a MutableString like in 
objective-C? I’ve solved this problems before in a number of languages like 
Objective-C and AppleScript.

Basically there is a set of common characters that need “normalizing” and I 
have a method that replaces them in a string, so:

myString = [myString normalizeCharacters];

Would return a new string with all the “common” replacements applied.

Since the following gives an error :

myString = 'Hello'
myNewstring = myString.replace(myString,'e','a’)

TypeError: 'str' object cannot be interpreted as an integer

I can’t see of a way to do this in Python? 

All the Best
Dave


> On 8 Jun 2022, at 10:14, Chris Angelico  wrote:
> 
> On Wed, 8 Jun 2022 at 18:12, Dave  wrote:
> 
>> I tried the but it doesn’t seem to work?
>> myCompareFile1 = ascii(myTitleName)
>> myCompareFile1.replace("\u2019", "'")
> 
> Strings in Python are immutable. When you call ascii(), you get back a
> new string, but it's one that has actual backslashes and such in it.
> (You probably don't need this step, other than for debugging; check
> the string by printing out the ASCII version of it, but stick to the
> original for actual processing.) The same is true of the replace()
> method; it doesn't change the string, it returns a new string.
> 
 word = "spam"
 print(word.replace("sp", "h"))
> ham
 print(word)
> spam
> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Chris Angelico
On Wed, 8 Jun 2022 at 18:20, Dave  wrote:
>
> PS
>
> I’ve also tried:
> myCompareFile1 = myTitleName
> myCompareFile1.replace("\u2019", "'")
> myCompareFile2 = myCompareFileName
> myCompareFile2.replace("\u2019", "'")
> Which also doesn’t work, the replace itself work but it still fails the 
> compare?
>

This is a great time to start exploring what actually happens when you
do "myCompareFile2 = myCompareFileName". I recommend doing some poking
around with strings (which are immutable), lists (which aren't), and
tuples (which aren't, but can contain mutable children).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Dave
PS

I’ve also tried:
myCompareFile1 = myTitleName
myCompareFile1.replace("\u2019", "'")
myCompareFile2 = myCompareFileName
myCompareFile2.replace("\u2019", "'")
Which also doesn’t work, the replace itself work but it still fails the compare?


> On 8 Jun 2022, at 10:08, Dave  wrote:
> 
> Hi All,
> 
> I decided to start a new thread as this really is a new subject.
> 
> I've got two that appear to be identical, but fail to compare. After getting 
> the ascii encoding I see that they are indeed different, my question is how 
> can I replace the \u2019m with a regular single quote mark (or apostrophe)?
> 
> myCompareFile1 = ascii(myTitleName)
> myCompareFile2 = ascii(myCompareFileName)
> myCompareFile1: 'I\u2019m Mandy Fly Me'
> myCompareFile2: "I'm Mandy Fly Me"
> 
> I tried the but it doesn’t seem to work?
> myCompareFile1 = ascii(myTitleName)
> myCompareFile1.replace("\u2019", "'")
> myCompareFile2 = ascii(myCompareFileName)
> myCompareFile2.replace("\u2019", "'")
> if myCompareFile1 != myCompareFile2:
>print('myCompareFile1:',myCompareFile1)
>print('myCompareFile2:',myCompareFile2)
>myLength1 = len(myCompareFileName)
>myLength2 = len(myTitleName)
>print('File Name Mismatch - Artist: [' + myArtistName + ']  Album: ['+ 
> myAlbumName + ']  Track: [' + myTitleName + ']  File: [' + myCompareFileName 
> + ']')
>if (myLength1 == myLength2):
>print('lengths match: ',myLength1)
>else:
>print('lengths mismatch: ',myLength1,'  ',myLength2)
>print(' ')
> Console:
> 
> myCompareFile1: 'I\u2019m Mandy Fly Me'
> myCompareFile2: "I'm Mandy Fly Me"
> 
> So it looks like the replace isn’t doing anything?
> 
> I’m an experienced developer but learning Python.
> 
> All the Best
> Dave
> 
> 
> 
> 
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Chris Angelico
On Wed, 8 Jun 2022 at 18:12, Dave  wrote:

> I tried the but it doesn’t seem to work?
> myCompareFile1 = ascii(myTitleName)
> myCompareFile1.replace("\u2019", "'")

Strings in Python are immutable. When you call ascii(), you get back a
new string, but it's one that has actual backslashes and such in it.
(You probably don't need this step, other than for debugging; check
the string by printing out the ASCII version of it, but stick to the
original for actual processing.) The same is true of the replace()
method; it doesn't change the string, it returns a new string.

>>> word = "spam"
>>> print(word.replace("sp", "h"))
ham
>>> print(word)
spam

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


How to replace characters in a string?

2022-06-08 Thread Dave
Hi All,

I decided to start a new thread as this really is a new subject.

I've got two that appear to be identical, but fail to compare. After getting 
the ascii encoding I see that they are indeed different, my question is how can 
I replace the \u2019m with a regular single quote mark (or apostrophe)?

myCompareFile1 = ascii(myTitleName)
myCompareFile2 = ascii(myCompareFileName)
myCompareFile1: 'I\u2019m Mandy Fly Me'
myCompareFile2: "I'm Mandy Fly Me"

I tried the but it doesn’t seem to work?
myCompareFile1 = ascii(myTitleName)
myCompareFile1.replace("\u2019", "'")
myCompareFile2 = ascii(myCompareFileName)
myCompareFile2.replace("\u2019", "'")
if myCompareFile1 != myCompareFile2:
print('myCompareFile1:',myCompareFile1)
print('myCompareFile2:',myCompareFile2)
myLength1 = len(myCompareFileName)
myLength2 = len(myTitleName)
print('File Name Mismatch - Artist: [' + myArtistName + ']  Album: ['+ 
myAlbumName + ']  Track: [' + myTitleName + ']  File: [' + myCompareFileName + 
']')
if (myLength1 == myLength2):
print('lengths match: ',myLength1)
else:
print('lengths mismatch: ',myLength1,'  ',myLength2)
print(' ')
Console:

myCompareFile1: 'I\u2019m Mandy Fly Me'
myCompareFile2: "I'm Mandy Fly Me"

So it looks like the replace isn’t doing anything?

I’m an experienced developer but learning Python.

All the Best
Dave





-- 
https://mail.python.org/mailman/listinfo/python-list