On 2022-06-09 at 03:18:56 +1000, Chris Angelico <ros...@gmail.com> wrote:
> On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote: > > > > On 2022-06-08 at 08:07:40 -0000, > > De ongekruisigde <ongekruisi...@news.eternal-september.org> wrote: > > > > > Depending on the problem a regular expression may be the much simpler > > > solution. I love them for e.g. text parsing and use them all the time. > > > Unrivaled when e.g. parts of text have to be extracted, e.g. from lines > > > like these: > > > > > > root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash > > > dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin > > > nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin > > > avahi:x:997:996:avahi-daemon privilege separation > > > user:/var/empty:/run/current-system/sw/bin/nologin > > > sshd:x:998:993:SSH privilege separation > > > user:/var/empty:/run/current-system/sw/bin/nologin > > > geoclue:x:999:998:Geoinformation > > > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin > > > > > > Compare a regexp solution like this: > > > > > > >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' > > > , s) > > > >>> print(g.groups()) > > > ('geoclue', 'x', '999', '998', 'Geoinformation service', > > > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin') > > > > > > to the code one would require to process it manually, with all the edge > > > cases. The regexp surely reads much simpler (?). > > > > Uh... > > > > >>> import pwd # https://docs.python.org/3/library/pwd.html > > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue'] > > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, > > pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', > > pw_shell='/sbin/nologin')] > > That's great if the lines are specifically coming from your system's > own /etc/passwd, but not so much if you're trying to compare passwd > files from different systems, where you simply have the files > themselves. In addition to pwent to get specific entries from the local password database, POSIX has fpwent to get a specific entry from a stream that looks like /etc/passwd. So even POSIX agrees that if you think you have to process this data manually, you're doing it wrong. Python exposes neither functon directly (at least not in the pwd module or the os module; I didn't dig around or check PyPI). IMO, higher level functions to process such data is way better than a [insert your own adjective/expletive here] regular expression that collects the pieces into numbered groups rather than labeled fields. Readability counts. Yes, absolutely, use a regular expression when all else fails. Don't forget to handle all the edge cases! (I assume that sane OSes preclude colons in paths that are likely to come up in the local password database, but I don't know what happens, e.g., when there's a reason for GECOS to contain a colon.) -- https://mail.python.org/mailman/listinfo/python-list