Re: How to test characters of a string
Hi, I’ve found you also need to take care of multiple disk CD releases. These have a format of “1-01 Track Name” “2-02 Trackl Name" Meaning Disk 1 Track1, Disk 2, Track 2. Also A and B Sides (from Vinyl LPs) “A1-Track Name” “B2-Track Name” Side A, Track 1, etc. Cheers Dave > On 8 Jun 2022, at 19:36, Dennis Lee Bieber wrote: > > On Wed, 8 Jun 2022 01:53:26 + (UTC), Avi Gross > declaimed the following: > > >> >> So is it necessary to insist on an exact pattern of two digits followed by a >> space? >> >> >> That would fail on "44 Minutes", "40 Oz. Dream", "50 Mission Cap", "50 Ways >> to Say Goodbye", "99 Ways to Die" >> >> It looks to me like you need to compare TWICE just in case. If it matches in >> the original (perhaps with some normalization of case and whitespace, fine. >> If not will they match if one or both have something to remove as a prefix >> such as "02 ". And if you are comparing items where the same song is in two >> different numeric sequences on different disks, ... > > I suspect the OP really needs to extract the /track number/ from the > ID3 information, and (converting to a 2digit formatted string) see if the > file name begins with that track number... The format of the those > filenames appear to be those generated by some software when ripping CDs to > MP3s -- for example: > > -=-=- > c:\Music\Roger Miller\All Time Greatest Hits>dir > Volume in drive C is OS > Volume Serial Number is 4ACC-3CB4 > > Directory of c:\Music\Roger Miller\All Time Greatest Hits > > 04/11/2022 05:06 PM . > 04/11/2022 05:06 PM .. > 07/26/2018 11:20 AM 4,493,279 01 Dang Me.mp3 > 07/26/2018 11:20 AM 5,072,414 02 Chug-A-Lug.mp3 > 07/26/2018 11:20 AM 4,275,844 03 Do-Wacka-Do.mp3 > 07/26/2018 11:20 AM 4,284,208 04 In the Summertime.mp3 > 07/26/2018 11:20 AM 6,028,730 05 King of the Road.mp3 > 07/26/2018 11:20 AM 4,662,182 06 You Can't Roller Skate in a > Buffalo Herd.mp3 > 07/26/2018 11:20 AM 5,624,704 07 Engine, Engine #9.mp3 > 07/26/2018 11:20 AM 5,002,492 08 One Dyin' and a Buryin'.mp3 > 07/26/2018 11:21 AM 6,799,224 09 Last Word in Lonesome Is Me.mp3 > 07/26/2018 11:21 AM 5,637,230 10 Kansas City Star.mp3 > 07/26/2018 11:21 AM 4,656,910 11 England Swings.mp3 > 07/26/2018 11:21 AM 5,836,638 12 Husbands and Wives.mp3 > 07/26/2018 11:21 AM 5,470,216 13 I've Been a Long Time Leavin'.mp3 > 07/26/2018 11:21 AM 6,230,236 14 Walkin' in the Sunshine.mp3 > 07/26/2018 11:21 AM 6,416,060 15 Little Green Apples.mp3 > 07/26/2018 11:21 AM 9,794,442 16 Me and Bobby McGee.mp3 > 07/26/2018 11:22 AM 7,330,642 17 Where Have All the Average People > Gone.mp3 > 07/26/2018 11:22 AM 7,334,752 18 South.mp3 > 07/26/2018 11:22 AM 6,981,924 19 Tomorrow Night in Baltimore.mp3 > 07/26/2018 11:22 AM 9,353,872 20 River in the Rain.mp3 > 20 File(s)121,285,999 bytes > 2 Dir(s) 295,427,198,976 bytes free > > c:\Music\Roger Miller\All Time Greatest Hits> > -=-=- > > Untested (especially the ID3 "variable" -- substitute variables as > needed to match the original code): > id3Track = 2 track_number = "%2.2d " % id3Track track_number > '02 ' filename = "02 This is the life.mp3" if filename.startswith(track_number): > ... nametitle = filename[3:] > ... else: > ... nametitle = filename > ... if nametitle.endswith(".mp3"): > ... nametitle = nametitle[:-4] > ... nametitle > 'This is the life' > > Handling ASCII ' and " vs Unicode "smart" quotes is a different matter. > > One may still run the risk of having a filename without a track number > BUT having a number that just manages to match the track number. To account > for that I'd suggest using the sequence: > > * Strip extension (if filename.lower().endswith(".mp3"): ...) > * Handle any Unicode/ASCII quotes in both filename AND ID3 track title > * Compare filename and title. > * IF MATCHED -- done > * IF NOT MATCHED > * Format ID3 track number as shown above > * Compare filename to (formatted track number + track > title) > * IF MATCHED -- done > * IF NOT MATCHED > * Log full filename and ID3 track > title/track number to a > log for later examination. > > > > -- > Wulfraed Dennis Lee Bieber AF6VN > wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
PYLAUNCH_DEBUG not printing info
Why am I not getting debug output on my windows 10 machine: C:\temp>\Windows\py.exe -0 -V:3.11 *Python 3.11 (64-bit) -V:3.10 Python 3.10 (64-bit) C:\temp>set PYLAUNCH_DEBUG=1 C:\temp>\Windows\py.exe Python 3.11.0b3 (main, Jun 1 2022, 13:29:14) [MSC v.1932 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> ^Z -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
On 9/06/22 5:55 am, Dennis Lee Bieber wrote: There are no mutable strings in Python. If you really want a mutable sequence of characters, you can use array.array, but you won't be able to use it directly in place of a string in most contexts. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
On 8/06/22 10:26 pm, Jon Ribbens wrote: Here's a head-start on some characters you might want to translate, Another possibility that might make sense in this case is to simply strip out all punctuation before comparing. That would take care of things being spelled with or without hyphens, commas, etc. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
On Wed, 8 Jun 2022 11:09:05 +0200, Dave declaimed the following: >Hi, > >Thanks for this! > >So, is there a copy function/method that returns a MutableString like in >objective-C? I’ve solved this problems before in a number of languages like >Objective-C and AppleScript. There are no mutable strings in Python. Any operation manipulating a string RETURNS A MODIFIED NEW STRING. >myString = 'Hello' >myNewstring = myString.replace(myString,'e','a’) > Please study the library reference manual -- it should be clear what the various string methods can perform. Hint: they are "methods", which means whatever is before the . becomes the automatic "self" argument inside the method) https://docs.python.org/3/library/stdtypes.html#string-methods """ str.replace(old, new[, count]) Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced. """ myNewstring = myString.replace("e", "a") However... Please study """ static str.maketrans(x[, y[, z]]) This static method returns a translation table usable for str.translate(). If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters (strings of length 1) to Unicode ordinals, strings (of arbitrary lengths) or None. Character keys will then be converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result. """ """ str.translate(table) Return a copy of the string in which each character has been mapped through the given translation table. The table must be an object that implements indexing via __getitem__(), typically a mapping or sequence. When indexed by a Unicode ordinal (an integer), the table object can do any of the following: return a Unicode ordinal or a string, to map the character to one or more other characters; return None, to delete the character from the return string; or raise a LookupError exception, to map the character to itself. You can use str.maketrans() to create a translation map from character-to-character mappings in different formats. See also the codecs module for a more flexible approach to custom character mappings. """ Hmmm, I'm out-of-date... I'm on v3.8 and .removeprefix() and .removesuffix() (from v3.9) simplify my previous post... Instead of if myString.lower().endswith(".mp3"): #lower() is a precaution for case myString = myString[:-4] just use myString = myString.lower().removesuffix(".mp3") {note, you'll have to make the compare using .lower() on the other name since this statement returns a lowercased version} -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test characters of a string
On 2022-06-08, 2qdxy4rzwzuui...@potatochowder.com <2qdxy4rzwzuui...@potatochowder.com> wrote: > On 2022-06-09 at 04:15:46 +1000, > Chris Angelico wrote: > >> On Thu, 9 Jun 2022 at 04:14, <2qdxy4rzwzuui...@potatochowder.com> wrote: >> > >> > On 2022-06-09 at 03:18:56 +1000, >> > Chris Angelico wrote: >> > >> > > On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote: >> > > > >> > > > On 2022-06-08 at 08:07:40 -, >> > > > De ongekruisigde wrote: >> > > > >> > > > > Depending on the problem a regular expression may be the much simpler >> > > > > solution. I love them for e.g. text parsing and use them all the >> > > > > time. >> > > > > Unrivaled when e.g. parts of text have to be extracted, e.g. from >> > > > > lines >> > > > > like these: >> > > > > >> > > > > root:x:0:0:System >> > > > > administrator:/root:/run/current-system/sw/bin/bash >> > > > > dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin >> > > > > nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin >> > > > > avahi:x:997:996:avahi-daemon privilege separation >> > > > > user:/var/empty:/run/current-system/sw/bin/nologin >> > > > > sshd:x:998:993:SSH privilege separation >> > > > > user:/var/empty:/run/current-system/sw/bin/nologin >> > > > > geoclue:x:999:998:Geoinformation >> > > > > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin >> > > > > >> > > > > Compare a regexp solution like this: >> > > > > >> > > > > >>> g = >> > > > > re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s) >> > > > > >>> print(g.groups()) >> > > > > ('geoclue', 'x', '999', '998', 'Geoinformation service', >> > > > > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin') >> > > > > >> > > > > to the code one would require to process it manually, with all the >> > > > > edge >> > > > > cases. The regexp surely reads much simpler (?). >> > > > >> > > > Uh... >> > > > >> > > > >>> import pwd # https://docs.python.org/3/library/pwd.html >> > > > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue'] >> > > > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, >> > > > pw_gid=992, pw_gecos='Geoinformation service', >> > > > pw_dir='/var/lib/geoclue', pw_shell='/sbin/nologin')] >> > > >> > > That's great if the lines are specifically coming from your system's >> > > own /etc/passwd, but not so much if you're trying to compare passwd >> > > files from different systems, where you simply have the files >> > > themselves. >> > >> > In addition to pwent to get specific entries from the local password >> > database, POSIX has fpwent to get a specific entry from a stream that >> > looks like /etc/passwd. So even POSIX agrees that if you think you have >> > to process this data manually, you're doing it wrong. Python exposes >> > neither functon directly (at least not in the pwd module or the os >> > module; I didn't dig around or check PyPI). >> >> So.. we can go find some other way of calling fpwent, or we can >> just parse the file ourselves. It's a very VERY simple format. > > If you insist: > > >>> s = > 'nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin' > >>> print(s.split(':')) > ['nm-iodine', 'x', '996', '57', '', '/var/empty', > '/run/current-system/sw/bin/nologin'] > > Hesitantly, because this is the Python mailing list, I claim (a) ':' is > simpler than r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$', and > (b) string.split covers pretty much the same edge cases as re.search. Ah, but you don't catch the be numeric of fields (0-based) 2 and 3! But agreed, it's not the best of examples. -- You're rewriting parts of Quake in *Python*? MUAHAHAHA -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test characters of a string
On Wed, 8 Jun 2022 01:53:26 + (UTC), Avi Gross declaimed the following: > >So is it necessary to insist on an exact pattern of two digits followed by a >space? > > >That would fail on "44 Minutes", "40 Oz. Dream", "50 Mission Cap", "50 Ways to >Say Goodbye", "99 Ways to Die" > >It looks to me like you need to compare TWICE just in case. If it matches in >the original (perhaps with some normalization of case and whitespace, fine. If >not will they match if one or both have something to remove as a prefix such >as "02 ". And if you are comparing items where the same song is in two >different numeric sequences on different disks, ... I suspect the OP really needs to extract the /track number/ from the ID3 information, and (converting to a 2digit formatted string) see if the file name begins with that track number... The format of the those filenames appear to be those generated by some software when ripping CDs to MP3s -- for example: -=-=- c:\Music\Roger Miller\All Time Greatest Hits>dir Volume in drive C is OS Volume Serial Number is 4ACC-3CB4 Directory of c:\Music\Roger Miller\All Time Greatest Hits 04/11/2022 05:06 PM . 04/11/2022 05:06 PM .. 07/26/2018 11:20 AM 4,493,279 01 Dang Me.mp3 07/26/2018 11:20 AM 5,072,414 02 Chug-A-Lug.mp3 07/26/2018 11:20 AM 4,275,844 03 Do-Wacka-Do.mp3 07/26/2018 11:20 AM 4,284,208 04 In the Summertime.mp3 07/26/2018 11:20 AM 6,028,730 05 King of the Road.mp3 07/26/2018 11:20 AM 4,662,182 06 You Can't Roller Skate in a Buffalo Herd.mp3 07/26/2018 11:20 AM 5,624,704 07 Engine, Engine #9.mp3 07/26/2018 11:20 AM 5,002,492 08 One Dyin' and a Buryin'.mp3 07/26/2018 11:21 AM 6,799,224 09 Last Word in Lonesome Is Me.mp3 07/26/2018 11:21 AM 5,637,230 10 Kansas City Star.mp3 07/26/2018 11:21 AM 4,656,910 11 England Swings.mp3 07/26/2018 11:21 AM 5,836,638 12 Husbands and Wives.mp3 07/26/2018 11:21 AM 5,470,216 13 I've Been a Long Time Leavin'.mp3 07/26/2018 11:21 AM 6,230,236 14 Walkin' in the Sunshine.mp3 07/26/2018 11:21 AM 6,416,060 15 Little Green Apples.mp3 07/26/2018 11:21 AM 9,794,442 16 Me and Bobby McGee.mp3 07/26/2018 11:22 AM 7,330,642 17 Where Have All the Average People Gone.mp3 07/26/2018 11:22 AM 7,334,752 18 South.mp3 07/26/2018 11:22 AM 6,981,924 19 Tomorrow Night in Baltimore.mp3 07/26/2018 11:22 AM 9,353,872 20 River in the Rain.mp3 20 File(s)121,285,999 bytes 2 Dir(s) 295,427,198,976 bytes free c:\Music\Roger Miller\All Time Greatest Hits> -=-=- Untested (especially the ID3 "variable" -- substitute variables as needed to match the original code): >>> id3Track = 2 >>> track_number = "%2.2d " % id3Track >>> track_number '02 ' >>> filename = "02 This is the life.mp3" >>> if filename.startswith(track_number): ... nametitle = filename[3:] ... else: ... nametitle = filename ... >>> if nametitle.endswith(".mp3"): ... nametitle = nametitle[:-4] ... >>> nametitle 'This is the life' Handling ASCII ' and " vs Unicode "smart" quotes is a different matter. One may still run the risk of having a filename without a track number BUT having a number that just manages to match the track number. To account for that I'd suggest using the sequence: * Strip extension (if filename.lower().endswith(".mp3"): ...) * Handle any Unicode/ASCII quotes in both filename AND ID3 track title * Compare filename and title. * IF MATCHED -- done * IF NOT MATCHED * Format ID3 track number as shown above * Compare filename to (formatted track number + track title) * IF MATCHED -- done * IF NOT MATCHED * Log full filename and ID3 track title/track number to a log for later examination. -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test characters of a string
On 2022-06-08, 2qdxy4rzwzuui...@potatochowder.com <2qdxy4rzwzuui...@potatochowder.com> wrote: > On 2022-06-08 at 08:07:40 -, > De ongekruisigde wrote: > >> Depending on the problem a regular expression may be the much simpler >> solution. I love them for e.g. text parsing and use them all the time. >> Unrivaled when e.g. parts of text have to be extracted, e.g. from lines >> like these: >> >> root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash >> dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin >> nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin >> avahi:x:997:996:avahi-daemon privilege separation >> user:/var/empty:/run/current-system/sw/bin/nologin >> sshd:x:998:993:SSH privilege separation >> user:/var/empty:/run/current-system/sw/bin/nologin >> geoclue:x:999:998:Geoinformation >> service:/var/lib/geoclue:/run/current-system/sw/bin/nologin >> >> Compare a regexp solution like this: >> >> >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s) >> >>> print(g.groups()) >> ('geoclue', 'x', '999', '998', 'Geoinformation service', >> '/var/lib/geoclue', '/run/current-system/sw/bin/nologin') >> >> to the code one would require to process it manually, with all the edge >> cases. The regexp surely reads much simpler (?). > > Uh... > > >>> import pwd # https://docs.python.org/3/library/pwd.html > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue'] > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, > pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', > pw_shell='/sbin/nologin')] Yeah... Well, it was just an example and it must be clear by now I'm not a Python programmer. -- You're rewriting parts of Quake in *Python*? MUAHAHAHA -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test characters of a string
On 2022-06-09 at 04:15:46 +1000, Chris Angelico wrote: > On Thu, 9 Jun 2022 at 04:14, <2qdxy4rzwzuui...@potatochowder.com> wrote: > > > > On 2022-06-09 at 03:18:56 +1000, > > Chris Angelico wrote: > > > > > On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote: > > > > > > > > On 2022-06-08 at 08:07:40 -, > > > > De ongekruisigde wrote: > > > > > > > > > Depending on the problem a regular expression may be the much simpler > > > > > solution. I love them for e.g. text parsing and use them all the time. > > > > > Unrivaled when e.g. parts of text have to be extracted, e.g. from > > > > > lines > > > > > like these: > > > > > > > > > > root:x:0:0:System > > > > > administrator:/root:/run/current-system/sw/bin/bash > > > > > dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin > > > > > nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin > > > > > avahi:x:997:996:avahi-daemon privilege separation > > > > > user:/var/empty:/run/current-system/sw/bin/nologin > > > > > sshd:x:998:993:SSH privilege separation > > > > > user:/var/empty:/run/current-system/sw/bin/nologin > > > > > geoclue:x:999:998:Geoinformation > > > > > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin > > > > > > > > > > Compare a regexp solution like this: > > > > > > > > > > >>> g = > > > > > re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s) > > > > > >>> print(g.groups()) > > > > > ('geoclue', 'x', '999', '998', 'Geoinformation service', > > > > > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin') > > > > > > > > > > to the code one would require to process it manually, with all the > > > > > edge > > > > > cases. The regexp surely reads much simpler (?). > > > > > > > > Uh... > > > > > > > > >>> import pwd # https://docs.python.org/3/library/pwd.html > > > > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue'] > > > > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, > > > > pw_gid=992, pw_gecos='Geoinformation service', > > > > pw_dir='/var/lib/geoclue', pw_shell='/sbin/nologin')] > > > > > > That's great if the lines are specifically coming from your system's > > > own /etc/passwd, but not so much if you're trying to compare passwd > > > files from different systems, where you simply have the files > > > themselves. > > > > In addition to pwent to get specific entries from the local password > > database, POSIX has fpwent to get a specific entry from a stream that > > looks like /etc/passwd. So even POSIX agrees that if you think you have > > to process this data manually, you're doing it wrong. Python exposes > > neither functon directly (at least not in the pwd module or the os > > module; I didn't dig around or check PyPI). > > So.. we can go find some other way of calling fpwent, or we can > just parse the file ourselves. It's a very VERY simple format. If you insist: >>> s = 'nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin' >>> print(s.split(':')) ['nm-iodine', 'x', '996', '57', '', '/var/empty', '/run/current-system/sw/bin/nologin'] Hesitantly, because this is the Python mailing list, I claim (a) ':' is simpler than r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$', and (b) string.split covers pretty much the same edge cases as re.search. -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test characters of a string
On Thu, 9 Jun 2022 at 04:14, <2qdxy4rzwzuui...@potatochowder.com> wrote: > > On 2022-06-09 at 03:18:56 +1000, > Chris Angelico wrote: > > > On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote: > > > > > > On 2022-06-08 at 08:07:40 -, > > > De ongekruisigde wrote: > > > > > > > Depending on the problem a regular expression may be the much simpler > > > > solution. I love them for e.g. text parsing and use them all the time. > > > > Unrivaled when e.g. parts of text have to be extracted, e.g. from lines > > > > like these: > > > > > > > > root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash > > > > dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin > > > > nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin > > > > avahi:x:997:996:avahi-daemon privilege separation > > > > user:/var/empty:/run/current-system/sw/bin/nologin > > > > sshd:x:998:993:SSH privilege separation > > > > user:/var/empty:/run/current-system/sw/bin/nologin > > > > geoclue:x:999:998:Geoinformation > > > > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin > > > > > > > > Compare a regexp solution like this: > > > > > > > > >>> g = > > > > re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s) > > > > >>> print(g.groups()) > > > > ('geoclue', 'x', '999', '998', 'Geoinformation service', > > > > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin') > > > > > > > > to the code one would require to process it manually, with all the edge > > > > cases. The regexp surely reads much simpler (?). > > > > > > Uh... > > > > > > >>> import pwd # https://docs.python.org/3/library/pwd.html > > > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue'] > > > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, > > > pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', > > > pw_shell='/sbin/nologin')] > > > > That's great if the lines are specifically coming from your system's > > own /etc/passwd, but not so much if you're trying to compare passwd > > files from different systems, where you simply have the files > > themselves. > > In addition to pwent to get specific entries from the local password > database, POSIX has fpwent to get a specific entry from a stream that > looks like /etc/passwd. So even POSIX agrees that if you think you have > to process this data manually, you're doing it wrong. Python exposes > neither functon directly (at least not in the pwd module or the os > module; I didn't dig around or check PyPI). So.. we can go find some other way of calling fpwent, or we can just parse the file ourselves. It's a very VERY simple format. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test characters of a string
On 2022-06-09 at 03:18:56 +1000, Chris Angelico wrote: > On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote: > > > > On 2022-06-08 at 08:07:40 -, > > De ongekruisigde wrote: > > > > > Depending on the problem a regular expression may be the much simpler > > > solution. I love them for e.g. text parsing and use them all the time. > > > Unrivaled when e.g. parts of text have to be extracted, e.g. from lines > > > like these: > > > > > > root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash > > > dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin > > > nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin > > > avahi:x:997:996:avahi-daemon privilege separation > > > user:/var/empty:/run/current-system/sw/bin/nologin > > > sshd:x:998:993:SSH privilege separation > > > user:/var/empty:/run/current-system/sw/bin/nologin > > > geoclue:x:999:998:Geoinformation > > > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin > > > > > > Compare a regexp solution like this: > > > > > > >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' > > > , s) > > > >>> print(g.groups()) > > > ('geoclue', 'x', '999', '998', 'Geoinformation service', > > > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin') > > > > > > to the code one would require to process it manually, with all the edge > > > cases. The regexp surely reads much simpler (?). > > > > Uh... > > > > >>> import pwd # https://docs.python.org/3/library/pwd.html > > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue'] > > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, > > pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', > > pw_shell='/sbin/nologin')] > > That's great if the lines are specifically coming from your system's > own /etc/passwd, but not so much if you're trying to compare passwd > files from different systems, where you simply have the files > themselves. In addition to pwent to get specific entries from the local password database, POSIX has fpwent to get a specific entry from a stream that looks like /etc/passwd. So even POSIX agrees that if you think you have to process this data manually, you're doing it wrong. Python exposes neither functon directly (at least not in the pwd module or the os module; I didn't dig around or check PyPI). IMO, higher level functions to process such data is way better than a [insert your own adjective/expletive here] regular expression that collects the pieces into numbered groups rather than labeled fields. Readability counts. Yes, absolutely, use a regular expression when all else fails. Don't forget to handle all the edge cases! (I assume that sane OSes preclude colons in paths that are likely to come up in the local password database, but I don't know what happens, e.g., when there's a reason for GECOS to contain a colon.) -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
> On 8 Jun 2022, at 18:01, Dave wrote: > > Hi, > > This is a tool I’m using on my own files to save me time. Basically or most > of the tracks were imported with different version iTunes over the years. > There are two problems: > > 1. File System characters are replaced (you can’t have ‘/‘ or ‘:’ in a file > name). ok > 2. Smart Quotes were added at some point, these need to replaced. ok > 3. Other character based of name being of a non-english origin. Why is this a problem? Its only if the chars are confusing/will not compare that there is something to fix? All modern OS allow unicode filenames. Barry > > If find others I’ll add them. > > I’m using MusicBrainz to do a fuzzy match and get the correct name. > > it’s not perfect, but works for 99% of files which is good enough for me! > > Cheers > Dave > > >> On 8 Jun 2022, at 18:23, Avi Gross via Python-list >> wrote: >> >> Dave, >> >> Your goal is to compare titles and there can be endless replacements needed >> if you allow the text to contain anything but ASCII. >> >> Have you considered stripping out things instead? I mean remove lots of >> stuff that is not ASCII in the first place and perhaps also remove lots of >> extra punctuation likesingle quotes or question marks or redundant white >> space and compare the sort of skeletons of the two? >> >> And even if that fails, could you have a measure of how different they are >> and tolerate if they were say off by one letter albeit "My desert" matching >> "My Dessert" might not be a valid match with one being a song about an arid >> environment and the other about food you don't need! >> >> Your seemingly simple need can expand into a fairly complex project. There >> may be many ideas on how to deal with it but not anything perfect enough to >> catch all cases as even a trained human may have to make decisions at times >> and not match what other humans do. We have examples like the TV show >> "NUMB3RS" that used a perfectly valid digit 3 to stand for an "E" but yet is >> often written when I look it up as NUMBERS. You have obvious cases where >> titles of songs may contain composite symbols like "œ" which will not >> compare to one where it is written out as "oe" so the idea of comparing is >> quite complex and the best you might do is heuristic. >> >> UNICODE has many symbols that are almost the same or even look the same or >> maybe in one font versus another. There are libraries of functions that >> allow some kinds of comparisons or conversions that you could look into but >> the gain for you may not be worth it. Nothing stops a person from naming a >> song any way they want and I speak many languages and often see a song >> re-titled in the local language and using the local alphabet mixed often >> with another. >> >> Your original question is perhaps now many questions, depending on what you >> choose. You started by wanting to know how to compare and it is moving on to >> how to delete parts or make substitutions or use regular expressions and it >> can get worse. You can, for example, take a string and identify the words >> within it and create a regular expression that inserts sequences between the >> words that match any zero or one or more non-word characters such as spaces, >> tabs, punctuation or non-ASCII, so that song titles with the same words in a >> sequence match no matter what is between them. The possibilities are endless >> but consider some of the techniques that are used by some programs that >> parse text and suggest alternate spellings or even programs like Google >> Translate that can take a sentence and then suggest you may mean a slightly >> altered sentence with one word changed to fit better. >> >> You need to decide what you want to deal with and what will be >> mis-classified by your program. Some of us have suggested folding the case >> of the words but that means asong about a dark skinned person in Poland >> called "Black Polish" would match a song about keeping your shoes dark with >> "black polish" so I keep repeating it is very hard or frankly impossible, to >> catch every case I can imagine and the many I can't! >> >> But the emphasis here is not your overall problem. It is about whether and >> how the computer language called python, and perhaps some add-on modules, >> can be used to solve each smaller need such as recognizing a pattern or >> replacing text. It can do quite a bit but only when the specification of the >> problem is exact. >> >> >> >> >> -Original Message- >> From: Dave >> To: python-list@python.org >> Sent: Wed, Jun 8, 2022 5:09 am >> Subject: Re: How to replace characters in a string? >> >> Hi, >> >> Thanks for this! >> >> So, is there a copy function/method that returns a MutableString like in >> objective-C? I’ve solved this problems before in a number of languages like >> Objective-C and AppleScript. >> >> Basically there is a set of common cha
Re: How to test characters of a string
On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote: > > On 2022-06-08 at 08:07:40 -, > De ongekruisigde wrote: > > > Depending on the problem a regular expression may be the much simpler > > solution. I love them for e.g. text parsing and use them all the time. > > Unrivaled when e.g. parts of text have to be extracted, e.g. from lines > > like these: > > > > root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash > > dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin > > nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin > > avahi:x:997:996:avahi-daemon privilege separation > > user:/var/empty:/run/current-system/sw/bin/nologin > > sshd:x:998:993:SSH privilege separation > > user:/var/empty:/run/current-system/sw/bin/nologin > > geoclue:x:999:998:Geoinformation > > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin > > > > Compare a regexp solution like this: > > > > >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , > > s) > > >>> print(g.groups()) > > ('geoclue', 'x', '999', '998', 'Geoinformation service', > > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin') > > > > to the code one would require to process it manually, with all the edge > > cases. The regexp surely reads much simpler (?). > > Uh... > > >>> import pwd # https://docs.python.org/3/library/pwd.html > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue'] > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, > pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', > pw_shell='/sbin/nologin')] That's great if the lines are specifically coming from your system's own /etc/passwd, but not so much if you're trying to compare passwd files from different systems, where you simply have the files themselves. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test characters of a string
On 2022-06-08 at 08:07:40 -, De ongekruisigde wrote: > Depending on the problem a regular expression may be the much simpler > solution. I love them for e.g. text parsing and use them all the time. > Unrivaled when e.g. parts of text have to be extracted, e.g. from lines > like these: > > root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash > dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin > nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin > avahi:x:997:996:avahi-daemon privilege separation > user:/var/empty:/run/current-system/sw/bin/nologin > sshd:x:998:993:SSH privilege separation > user:/var/empty:/run/current-system/sw/bin/nologin > geoclue:x:999:998:Geoinformation > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin > > Compare a regexp solution like this: > > >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s) > >>> print(g.groups()) > ('geoclue', 'x', '999', '998', 'Geoinformation service', > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin') > > to the code one would require to process it manually, with all the edge > cases. The regexp surely reads much simpler (?). Uh... >>> import pwd # https://docs.python.org/3/library/pwd.html >>> [x for x in pwd.getpwall() if x[0] == 'geoclue'] [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', pw_shell='/sbin/nologin')] -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test characters of a string
> On 7 Jun 2022, at 23:24, Dave wrote: > > Yes, it was probably just a typeo on my part. > > I’ve now fixed the majority of cases but still got two strings that look > identical but fail to match, this time (again by 10cc), “I’m Mandy Fly Me”. > > I’m putting money on it being a utf8 problem but I’m stuck on how to handle > it. It’s probably the single quote in I’m, although it has worked with other > songs. > > Any ideas? You can use difflib to give you a diff of the two strings: :>>> print('\n'.join(difflib.unified_diff('abc', 'adc'))) --- +++ @@ -1,3 +1,3 @@ a -b +d c :>>> The docs talk about lines, but difflib works on sequence. I use it a lot to find differences within lines. Barry > > All the Best > Cheers > Dave > > Here is the whole function/method or whatever it’s called in Python: > > > # > # checkMusicFiles > # > > def checkMusicFiles(theBaseMusicLibraryFolder): >myArtistDict = [] > > # > # Loop thru Artists Folder > # >myArtistsFoldlerList = getFolderList(theBaseMusicLibraryFolder) >myArtistCount = 0 >for myArtistFolder in myArtistsFoldlerList: >print('Artist: ' + myArtistFolder) > # > # Loop thru Albums Folder > # >myAlbumList = getFolderList(theBaseMusicLibraryFolder + myArtistFolder) >for myAlbum in myAlbumList: >print('Album: ' + myAlbum) > > # > # Loop thru Tracks (Files) Folder > # >myAlbumPath = theBaseMusicLibraryFolder + myArtistFolder + '/' + > myAlbum + '/' >myFilesList = getFileList(myAlbumPath) >for myFile in myFilesList: >myFilePath = myAlbumPath + myFile >myID3 = eyed3.load(myFilePath) >if myID3 is None: >continue > >myArtistName = myID3.tag.artist >if myArtistName is None: >continue > >myAlbumName = myID3.tag.album >if myAlbumName is None: >continue > >myTitleName = myID3.tag.title >if myTitleName is None: >continue > >myCompareFileName = myFile[0:-4] >if myCompareFileName[0].isdigit() and > myCompareFileName[1].isdigit(): >myCompareFileName = myFile[3:-4] > >if myCompareFileName != myTitleName: >myLength1 = len(myCompareFileName) >myLength2 = len(myTitleName) >print('File Name Mismatch - Artist: [' + myArtistName + '] > Album: ['+ myAlbumName + '] Track: [' + myTitleName + '] File: [' + > myCompareFileName + ']') >if (myLength1 == myLength2): >print('lengths match: ',myLength1) >else: >print('lengths mismatch: ',myLength1,' ',myLength2) > >print(' ') > > > > >return myArtistsFoldlerList > > > > > > >> On 8 Jun 2022, at 00:07, MRAB wrote: >> >> On 2022-06-07 21:23, Dave wrote: >>> Thanks a lot for this! isDigit was the method I was looking for and >>> couldn’t find. >>> I have another problem related to this, the following code uses the code >>> you just sent. I am getting a files ID3 tags using eyed3, this part seems >>> to work and I get expected values in this case myTitleName (Track name) is >>> set to “Deadlock Holiday” and myCompareFileName is set to “01 Deadlock >>> Holiday” (File Name with the Track number prepended). The is digit test >>> works and myCompareFileName is set to “Deadlock Holiday”, so they should >>> match, right? >> OT, but are you sure about that name? Isn't it "Dreadlock Holiday" (by 10cc)? >> >> [snip] >> -- >> https://mail.python.org/mailman/listinfo/python-list > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
Hi, This is a tool I’m using on my own files to save me time. Basically or most of the tracks were imported with different version iTunes over the years. There are two problems: 1. File System characters are replaced (you can’t have ‘/‘ or ‘:’ in a file name). 2. Smart Quotes were added at some point, these need to replaced. 3. Other character based of name being of a non-english origin. If find others I’ll add them. I’m using MusicBrainz to do a fuzzy match and get the correct name. it’s not perfect, but works for 99% of files which is good enough for me! Cheers Dave > On 8 Jun 2022, at 18:23, Avi Gross via Python-list > wrote: > > Dave, > > Your goal is to compare titles and there can be endless replacements needed > if you allow the text to contain anything but ASCII. > > Have you considered stripping out things instead? I mean remove lots of stuff > that is not ASCII in the first place and perhaps also remove lots of extra > punctuation likesingle quotes or question marks or redundant white space and > compare the sort of skeletons of the two? > > And even if that fails, could you have a measure of how different they are > and tolerate if they were say off by one letter albeit "My desert" matching > "My Dessert" might not be a valid match with one being a song about an arid > environment and the other about food you don't need! > > Your seemingly simple need can expand into a fairly complex project. There > may be many ideas on how to deal with it but not anything perfect enough to > catch all cases as even a trained human may have to make decisions at times > and not match what other humans do. We have examples like the TV show > "NUMB3RS" that used a perfectly valid digit 3 to stand for an "E" but yet is > often written when I look it up as NUMBERS. You have obvious cases where > titles of songs may contain composite symbols like "œ" which will not compare > to one where it is written out as "oe" so the idea of comparing is quite > complex and the best you might do is heuristic. > > UNICODE has many symbols that are almost the same or even look the same or > maybe in one font versus another. There are libraries of functions that allow > some kinds of comparisons or conversions that you could look into but the > gain for you may not be worth it. Nothing stops a person from naming a song > any way they want and I speak many languages and often see a song re-titled > in the local language and using the local alphabet mixed often with another. > > Your original question is perhaps now many questions, depending on what you > choose. You started by wanting to know how to compare and it is moving on to > how to delete parts or make substitutions or use regular expressions and it > can get worse. You can, for example, take a string and identify the words > within it and create a regular expression that inserts sequences between the > words that match any zero or one or more non-word characters such as spaces, > tabs, punctuation or non-ASCII, so that song titles with the same words in a > sequence match no matter what is between them. The possibilities are endless > but consider some of the techniques that are used by some programs that parse > text and suggest alternate spellings or even programs like Google Translate > that can take a sentence and then suggest you may mean a slightly altered > sentence with one word changed to fit better. > > You need to decide what you want to deal with and what will be mis-classified > by your program. Some of us have suggested folding the case of the words but > that means asong about a dark skinned person in Poland called "Black Polish" > would match a song about keeping your shoes dark with "black polish" so I > keep repeating it is very hard or frankly impossible, to catch every case I > can imagine and the many I can't! > > But the emphasis here is not your overall problem. It is about whether and > how the computer language called python, and perhaps some add-on modules, can > be used to solve each smaller need such as recognizing a pattern or replacing > text. It can do quite a bit but only when the specification of the problem is > exact. > > > > > -Original Message- > From: Dave > To: python-list@python.org > Sent: Wed, Jun 8, 2022 5:09 am > Subject: Re: How to replace characters in a string? > > Hi, > > Thanks for this! > > So, is there a copy function/method that returns a MutableString like in > objective-C? I’ve solved this problems before in a number of languages like > Objective-C and AppleScript. > > Basically there is a set of common characters that need “normalizing” and I > have a method that replaces them in a string, so: > > myString = [myString normalizeCharacters]; > > Would return a new string with all the “common” replacements applied. > > Since the following gives an error : > > myString = 'Hello' > myNewstring = myString.replace(myString,'e','a’) >
Re: How to replace characters in a string?
Dave, Your goal is to compare titles and there can be endless replacements needed if you allow the text to contain anything but ASCII. Have you considered stripping out things instead? I mean remove lots of stuff that is not ASCII in the first place and perhaps also remove lots of extra punctuation likesingle quotes or question marks or redundant white space and compare the sort of skeletons of the two? And even if that fails, could you have a measure of how different they are and tolerate if they were say off by one letter albeit "My desert" matching "My Dessert" might not be a valid match with one being a song about an arid environment and the other about food you don't need! Your seemingly simple need can expand into a fairly complex project. There may be many ideas on how to deal with it but not anything perfect enough to catch all cases as even a trained human may have to make decisions at times and not match what other humans do. We have examples like the TV show "NUMB3RS" that used a perfectly valid digit 3 to stand for an "E" but yet is often written when I look it up as NUMBERS. You have obvious cases where titles of songs may contain composite symbols like "œ" which will not compare to one where it is written out as "oe" so the idea of comparing is quite complex and the best you might do is heuristic. UNICODE has many symbols that are almost the same or even look the same or maybe in one font versus another. There are libraries of functions that allow some kinds of comparisons or conversions that you could look into but the gain for you may not be worth it. Nothing stops a person from naming a song any way they want and I speak many languages and often see a song re-titled in the local language and using the local alphabet mixed often with another. Your original question is perhaps now many questions, depending on what you choose. You started by wanting to know how to compare and it is moving on to how to delete parts or make substitutions or use regular expressions and it can get worse. You can, for example, take a string and identify the words within it and create a regular expression that inserts sequences between the words that match any zero or one or more non-word characters such as spaces, tabs, punctuation or non-ASCII, so that song titles with the same words in a sequence match no matter what is between them. The possibilities are endless but consider some of the techniques that are used by some programs that parse text and suggest alternate spellings or even programs like Google Translate that can take a sentence and then suggest you may mean a slightly altered sentence with one word changed to fit better. You need to decide what you want to deal with and what will be mis-classified by your program. Some of us have suggested folding the case of the words but that means asong about a dark skinned person in Poland called "Black Polish" would match a song about keeping your shoes dark with "black polish" so I keep repeating it is very hard or frankly impossible, to catch every case I can imagine and the many I can't! But the emphasis here is not your overall problem. It is about whether and how the computer language called python, and perhaps some add-on modules, can be used to solve each smaller need such as recognizing a pattern or replacing text. It can do quite a bit but only when the specification of the problem is exact. -Original Message- From: Dave To: python-list@python.org Sent: Wed, Jun 8, 2022 5:09 am Subject: Re: How to replace characters in a string? Hi, Thanks for this! So, is there a copy function/method that returns a MutableString like in objective-C? I’ve solved this problems before in a number of languages like Objective-C and AppleScript. Basically there is a set of common characters that need “normalizing” and I have a method that replaces them in a string, so: myString = [myString normalizeCharacters]; Would return a new string with all the “common” replacements applied. Since the following gives an error : myString = 'Hello' myNewstring = myString.replace(myString,'e','a’) TypeError: 'str' object cannot be interpreted as an integer I can’t see of a way to do this in Python? All the Best Dave > On 8 Jun 2022, at 10:14, Chris Angelico wrote: > > On Wed, 8 Jun 2022 at 18:12, Dave wrote: > >> I tried the but it doesn’t seem to work? >> myCompareFile1 = ascii(myTitleName) >> myCompareFile1.replace("\u2019", "'") > > Strings in Python are immutable. When you call ascii(), you get back a > new string, but it's one that has actual backslashes and such in it. > (You probably don't need this step, other than for debugging; check > the string by printing out the ASCII version of it, but stick to the > original for actual processing.) The same is true of the replace() > method; it doesn't change the string, it returns a new string. > word = "spam" print(word.replace("sp", "h")) >
Re: How to replace characters in a string?
On 2022-06-08, Dave wrote: > I misunderstood how it worked, basically I’ve added this function: > > def filterCommonCharacters(theString): > myNewString = theString.replace("\u2019", "'") > return myNewString > Which returns a new string replacing the common characters. > > This can easily be extended to include other characters as and when > they come up by adding a line as so: > > myNewString = theString.replace("\u2014", “]” #just an example > > Which is what I was trying to achieve. Here's a head-start on some characters you might want to translate, mostly spaces, hyphens, quotation marks, and ligatures: def unicode_translate(s): return s.translate({ 8192: ' ', 8193: ' ', 8194: ' ', 8195: ' ', 8196: ' ', 8197: ' ', 198: 'AE', 8199: ' ', 8200: ' ', 8201: ' ', 8202: ' ', 8203: '', 64258: 'fl', 8208: '-', 8209: '-', 8210: '-', 8211: '-', 8212: '-', 8722: '-', 8216: "'", 8217: "'", 8220: '"', 8221: '"', 64256: 'ff', 160: ' ', 64260: 'ffl', 8198: ' ', 230: 'ae', 12288: ' ', 173: '', 497: 'DZ', 498: 'Dz', 499: 'dz', 64259: 'ffi', 8230: '...', 64257: 'fi', 64262: 'st'}) If you want to go further then the Unidecode package might be helpful: https://pypi.org/project/Unidecode/ -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test characters of a string
On 2022-06-08, Christian Gollwitzer wrote: > Am 07.06.22 um 23:01 schrieb Christian Gollwitzer: > >>> In [3]: re.sub(r'^\d+\s*', '', s) Out[3]: 'Trinket' >>> > > that RE does match what you intended to do, but not exactly what you > wrote in the OP. that would be '^\d\d.' start with exactly two digits > followed by any character. Indeed but then I'd like '\d{2}' even better. > Christian -- You're rewriting parts of Quake in *Python*? MUAHAHAHA -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
> On 8 Jun 2022, at 11:25, Dave wrote: > >myNewString = theString.replace("\u2014", “]” #just an example Opps! Make that myNewString = myNewString.replace("\u2014", “]” #just an example -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test characters of a string
On 2022-06-08, Dave wrote: > I hate regEx and avoid it whenever possible, I’ve never found something that > was impossible to do without it. I love regular expressions and use them where appropriate. Saves tons of code and is often much more readable than the pages of code required to do the same. -- You're rewriting parts of Quake in *Python*? MUAHAHAHA -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test characters of a string
On 2022-06-08, dn wrote: > On 08/06/2022 10.18, De ongekruisigde wrote: >> On 2022-06-08, Christian Gollwitzer wrote: >>> Am 07.06.22 um 21:56 schrieb Dave: It depends on the language I’m using, in Objective C, I’d use isNumeric, just wanted to know what the equivalent is in Python. >>> >>> Your problem is also a typical case for regular expressions. You can >>> create an expression for "starts with any number of digits plus optional >>> whitespace" and then replace this with nothing: >> >> Regular expressions are overkill for this and much slower than the >> simple isdigit based solution. > > ... > >> Regular expressions are indeeed extremely powerful and useful but I tend >> to avoid them when there's a (faster) normal solution. > > Yes, simple solutions are (likely) easier to read. Depending on the problem a regular expression may be the much simpler solution. I love them for e.g. text parsing and use them all the time. Unrivaled when e.g. parts of text have to be extracted, e.g. from lines like these: root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin avahi:x:997:996:avahi-daemon privilege separation user:/var/empty:/run/current-system/sw/bin/nologin sshd:x:998:993:SSH privilege separation user:/var/empty:/run/current-system/sw/bin/nologin geoclue:x:999:998:Geoinformation service:/var/lib/geoclue:/run/current-system/sw/bin/nologin Compare a regexp solution like this: >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s) >>> print(g.groups()) ('geoclue', 'x', '999', '998', 'Geoinformation service', '/var/lib/geoclue', '/run/current-system/sw/bin/nologin') to the code one would require to process it manually, with all the edge cases. The regexp surely reads much simpler (?). > RegEx-s are more powerful (and well worth learning for this reason), but > are only 'readable' to those who use them frequently. > > Has either of you performed a timeit comparison? No need: the isdigit solution doesn't require the overhead of a regex processor. -- You're rewriting parts of Quake in *Python*? MUAHAHAHA -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
On 2022-06-08, Dave wrote: > Hi All, > > I decided to start a new thread as this really is a new subject. > > I've got two that appear to be identical, but fail to compare. After getting > the ascii encoding I see that they are indeed different, my question is how > can I replace the \u2019m with a regular single quote mark (or apostrophe)? You're not facing this alone: https://changelog.complete.org/archives/9938-the-python-unicode-mess Perhaps useful insights can be found at: https://realpython.com/python-encodings-guide/ > +++ -- You're rewriting parts of Quake in *Python*? MUAHAHAHA -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
On Wed, Jun 8, 2022 at 1:11 AM Dave wrote: > I've got two that appear to be identical, but fail to compare. After > getting the ascii encoding I see that they are indeed different, my > question is how can I replace the \u2019m with a regular single quote mark > (or apostrophe)? > Perhaps try https://pypi.org/project/Unidecode/ ? -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
Op 8/06/2022 om 11:25 schreef Dave: Hi, I misunderstood how it worked, basically I’ve added this function: def filterCommonCharacters(theString): myNewString = theString.replace("\u2019", "'") return myNewString Which returns a new string replacing the common characters. This can easily be extended to include other characters as and when they come up by adding a line as so: myNewString = theString.replace("\u2014", “]” #just an example Which is what I was trying to achieve. When you have multiple replacements to do, there's an alternative for multiple replace calls: you can use theString.translate() with a translation map (which you can make yourself or make with str.maketrans()) to do all the replacements at once. Example # Make a map that translates every character from the first string to the # corresponding character in the second string translation_map = str.maketrans("\u2019\u2014", "']") # All the replacements in one go myNewString = theString.translate(translation_map) See: - https://docs.python.org/3.10/library/stdtypes.html#str.maketrans - https://docs.python.org/3.10/library/stdtypes.html#str.translate -- "There is a theory which states that if ever anyone discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable. There is another theory which states that this has already happened." -- Douglas Adams, The Restaurant at the End of the Universe -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
Hi, I misunderstood how it worked, basically I’ve added this function: def filterCommonCharacters(theString): myNewString = theString.replace("\u2019", "'") return myNewString Which returns a new string replacing the common characters. This can easily be extended to include other characters as and when they come up by adding a line as so: myNewString = theString.replace("\u2014", “]” #just an example Which is what I was trying to achieve. All the Best Dave > On 8 Jun 2022, at 11:17, Chris Angelico wrote: > > On Wed, 8 Jun 2022 at 19:13, Dave wrote: >> >> Hi, >> >> Thanks for this! >> >> So, is there a copy function/method that returns a MutableString like in >> objective-C? I’ve solved this problems before in a number of languages like >> Objective-C and AppleScript. >> >> Basically there is a set of common characters that need “normalizing” and I >> have a method that replaces them in a string, so: >> >> myString = [myString normalizeCharacters]; >> >> Would return a new string with all the “common” replacements applied. >> >> Since the following gives an error : >> >> myString = 'Hello' >> myNewstring = myString.replace(myString,'e','a’) >> >> TypeError: 'str' object cannot be interpreted as an integer >> >> I can’t see of a way to do this in Python? >> > > Not sure why you're passing the string as an argument as well as using > it as the object you're calling a method on. All you should need to do > is: > > myString.replace('e', 'a') > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
On Wed, Jun 8, 2022 at 5:22 AM Karsten Hilbert wrote: > > Am Wed, Jun 08, 2022 at 11:09:05AM +0200 schrieb Dave: > > > myString = 'Hello' > > myNewstring = myString.replace(myString,'e','a’) > > That won't work (last quote) but apart from that: > > myNewstring = myString.replace('e', 'a') > > Karsten > -- > GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B > -- > https://mail.python.org/mailman/listinfo/python-list Sorry if I'm not reading the nuances correctly, but it looks to me that you failed to realize that string methods return results. They don't change the string in place: Python 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> str1 = "\u2019string with starting smart quote" >>> str1 '’string with starting smart quote' >>> new_str = str1.replace("\u2019","'") >>> str1 '’string with starting smart quote' >>> new_str "'string with starting smart quote" >>> repr(str1) "'’string with starting smart quote'" >>> repr(new_str) '"\'string with starting smart quote"' >>> As you can see, str1 doesn't change, but when you 'replace' on it, the result you want is returned to new_str -- Joel Goldstick -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
Am Wed, Jun 08, 2022 at 11:09:05AM +0200 schrieb Dave: > myString = 'Hello' > myNewstring = myString.replace(myString,'e','a’) That won't work (last quote) but apart from that: myNewstring = myString.replace('e', 'a') Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
On Wed, 8 Jun 2022 at 19:13, Dave wrote: > > Hi, > > Thanks for this! > > So, is there a copy function/method that returns a MutableString like in > objective-C? I’ve solved this problems before in a number of languages like > Objective-C and AppleScript. > > Basically there is a set of common characters that need “normalizing” and I > have a method that replaces them in a string, so: > > myString = [myString normalizeCharacters]; > > Would return a new string with all the “common” replacements applied. > > Since the following gives an error : > > myString = 'Hello' > myNewstring = myString.replace(myString,'e','a’) > > TypeError: 'str' object cannot be interpreted as an integer > > I can’t see of a way to do this in Python? > Not sure why you're passing the string as an argument as well as using it as the object you're calling a method on. All you should need to do is: myString.replace('e', 'a') ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
Hi, Thanks for this! So, is there a copy function/method that returns a MutableString like in objective-C? I’ve solved this problems before in a number of languages like Objective-C and AppleScript. Basically there is a set of common characters that need “normalizing” and I have a method that replaces them in a string, so: myString = [myString normalizeCharacters]; Would return a new string with all the “common” replacements applied. Since the following gives an error : myString = 'Hello' myNewstring = myString.replace(myString,'e','a’) TypeError: 'str' object cannot be interpreted as an integer I can’t see of a way to do this in Python? All the Best Dave > On 8 Jun 2022, at 10:14, Chris Angelico wrote: > > On Wed, 8 Jun 2022 at 18:12, Dave wrote: > >> I tried the but it doesn’t seem to work? >> myCompareFile1 = ascii(myTitleName) >> myCompareFile1.replace("\u2019", "'") > > Strings in Python are immutable. When you call ascii(), you get back a > new string, but it's one that has actual backslashes and such in it. > (You probably don't need this step, other than for debugging; check > the string by printing out the ASCII version of it, but stick to the > original for actual processing.) The same is true of the replace() > method; it doesn't change the string, it returns a new string. > word = "spam" print(word.replace("sp", "h")) > ham print(word) > spam > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
On Wed, 8 Jun 2022 at 18:20, Dave wrote: > > PS > > I’ve also tried: > myCompareFile1 = myTitleName > myCompareFile1.replace("\u2019", "'") > myCompareFile2 = myCompareFileName > myCompareFile2.replace("\u2019", "'") > Which also doesn’t work, the replace itself work but it still fails the > compare? > This is a great time to start exploring what actually happens when you do "myCompareFile2 = myCompareFileName". I recommend doing some poking around with strings (which are immutable), lists (which aren't), and tuples (which aren't, but can contain mutable children). ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
PS I’ve also tried: myCompareFile1 = myTitleName myCompareFile1.replace("\u2019", "'") myCompareFile2 = myCompareFileName myCompareFile2.replace("\u2019", "'") Which also doesn’t work, the replace itself work but it still fails the compare? > On 8 Jun 2022, at 10:08, Dave wrote: > > Hi All, > > I decided to start a new thread as this really is a new subject. > > I've got two that appear to be identical, but fail to compare. After getting > the ascii encoding I see that they are indeed different, my question is how > can I replace the \u2019m with a regular single quote mark (or apostrophe)? > > myCompareFile1 = ascii(myTitleName) > myCompareFile2 = ascii(myCompareFileName) > myCompareFile1: 'I\u2019m Mandy Fly Me' > myCompareFile2: "I'm Mandy Fly Me" > > I tried the but it doesn’t seem to work? > myCompareFile1 = ascii(myTitleName) > myCompareFile1.replace("\u2019", "'") > myCompareFile2 = ascii(myCompareFileName) > myCompareFile2.replace("\u2019", "'") > if myCompareFile1 != myCompareFile2: >print('myCompareFile1:',myCompareFile1) >print('myCompareFile2:',myCompareFile2) >myLength1 = len(myCompareFileName) >myLength2 = len(myTitleName) >print('File Name Mismatch - Artist: [' + myArtistName + '] Album: ['+ > myAlbumName + '] Track: [' + myTitleName + '] File: [' + myCompareFileName > + ']') >if (myLength1 == myLength2): >print('lengths match: ',myLength1) >else: >print('lengths mismatch: ',myLength1,' ',myLength2) >print(' ') > Console: > > myCompareFile1: 'I\u2019m Mandy Fly Me' > myCompareFile2: "I'm Mandy Fly Me" > > So it looks like the replace isn’t doing anything? > > I’m an experienced developer but learning Python. > > All the Best > Dave > > > > > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace characters in a string?
On Wed, 8 Jun 2022 at 18:12, Dave wrote: > I tried the but it doesn’t seem to work? > myCompareFile1 = ascii(myTitleName) > myCompareFile1.replace("\u2019", "'") Strings in Python are immutable. When you call ascii(), you get back a new string, but it's one that has actual backslashes and such in it. (You probably don't need this step, other than for debugging; check the string by printing out the ASCII version of it, but stick to the original for actual processing.) The same is true of the replace() method; it doesn't change the string, it returns a new string. >>> word = "spam" >>> print(word.replace("sp", "h")) ham >>> print(word) spam ChrisA -- https://mail.python.org/mailman/listinfo/python-list
How to replace characters in a string?
Hi All, I decided to start a new thread as this really is a new subject. I've got two that appear to be identical, but fail to compare. After getting the ascii encoding I see that they are indeed different, my question is how can I replace the \u2019m with a regular single quote mark (or apostrophe)? myCompareFile1 = ascii(myTitleName) myCompareFile2 = ascii(myCompareFileName) myCompareFile1: 'I\u2019m Mandy Fly Me' myCompareFile2: "I'm Mandy Fly Me" I tried the but it doesn’t seem to work? myCompareFile1 = ascii(myTitleName) myCompareFile1.replace("\u2019", "'") myCompareFile2 = ascii(myCompareFileName) myCompareFile2.replace("\u2019", "'") if myCompareFile1 != myCompareFile2: print('myCompareFile1:',myCompareFile1) print('myCompareFile2:',myCompareFile2) myLength1 = len(myCompareFileName) myLength2 = len(myTitleName) print('File Name Mismatch - Artist: [' + myArtistName + '] Album: ['+ myAlbumName + '] Track: [' + myTitleName + '] File: [' + myCompareFileName + ']') if (myLength1 == myLength2): print('lengths match: ',myLength1) else: print('lengths mismatch: ',myLength1,' ',myLength2) print(' ') Console: myCompareFile1: 'I\u2019m Mandy Fly Me' myCompareFile2: "I'm Mandy Fly Me" So it looks like the replace isn’t doing anything? I’m an experienced developer but learning Python. All the Best Dave -- https://mail.python.org/mailman/listinfo/python-list