Gustavo Goretkin added the comment:
>Instead of trying to enumerate all possible wordchars, I think a more robust
>solution is to use whitespace_split to include *all* characters not otherwise
>considered special.
I agree with that approach.
Also note that dash/hyphen gets incorrectly tokenized.
>>> import shlex
>>> list(shlex.shlex("mkdir -p somepath"))
['mkdir', '-', 'p', 'somepath']
White listing all valid word characters is not good, because the
surrogateescape mechanism can include all sorts of "characters".
In bash:
$ echo mkdir $(echo -ne "Bad\xffButLegalPath")
mkdir Bad?ButLegalPath
the path is one token.
However currently in shlex, it gets broken into multiple tokens:
>>> list(shlex.shlex(b"mkdir Bad\ffButLegalPath".decode("utf-8",
>>> "surrogoateescape")))
['mkdir', 'Bad', '\x0c', 'fButLegalPath']
----------
nosy: +Gustavo Goretkin
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue28595>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com