Hello Eryk,
it is true that the most correct way to represent strings in Python 2 is by
dealing with Unicode but it is also true that the most common scenario in
both the stdlib and most third party libs is to return and deal with str
(bytes) instead, so this is why I decided to do the same in psutil. Other
than _winreg I can't recall other APIs returning Unicode by default unless
explicitly asked (e.g os.getcwdu() or os.listdir(u'.')) and I didn't want
to duplicate psutil APIs in the same fashion.
It must be noted that many stdlib APIs in Python 2 are "broken" when it
comes to Unicode, see:
http://bugs.python.org/issue18695
...so the most convenient and definitive "fix" to correctly handle strings
in Python is switching to Python 3.

With that said, in psutil on Python 2 you are still supposed to be able
retrieve the "correct" string by using the "replace" error handler:

>>> unicode(proc.exe(), sys.getdefaultencoding(), errors="replace")

This is an example which filters processes with a funky name which works
with both Python 2
and 3:

    import psutil, sys

    PY3 = sys.version_info[0] == 2
    LOOKFOR = u"ƒőő.exe"
    for proc in psutil.process_iter(attrs=['name']):
        name = proc.info['name']
        if not PY3:
            name = unicode(name, sys.getdefaultencoding(), errors="replace")
        if LOOKFOR == name:
             print("process %s found" % p)

This is IMO the best compromise for a lib which aims to work on both Python
2 and 3. It's either that or returning Unicode all over the place in Python
2, but that's something I considered and rejected because most of the times
the string is not supposed to have funky characters, so "practicality beats
purity" in this case.


On Sun, Sep 3, 2017 at 11:38 PM, eryk sun <eryk...@gmail.com> wrote:

> On Sun, Sep 3, 2017 at 9:58 AM, Giampaolo Rodola' <g.rod...@gmail.com>
> wrote:
> >
> > - #1040: all strings are encoded by using OS fs encoding.
> > - #1040: the following Windows APIs on Python 2 now return a string
> instead
> > of
> >   unicode:
> >   - Process.memory_maps().path
> >   - WindowsService.bin_path()
> >   - WindowsService.description()
> >   - WindowsService.display_name()
> >   - WindowsService.username()
>
> This seems wrong. User names, file paths, registry strings, etc are
> all Unicode in Windows. One cannot in general encode them as the
> legacy (as in it really should be avoided) 'mbcs' encoding, i.e. ANSI.
> Using the 'replace' handler will make a mess with best-fit
> replacements and question marks. For example, _winreg in Python 2 has
> to return unicode strings and always has, which should be the
> precedent for psutil. Python 2 code that supports Windows has to be
> able to handle this.
>



-- 
Giampaolo - http://grodola.blogspot.com
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to