Gustavo Goretkin added the comment:

>Instead of trying to enumerate all possible wordchars, I think a more robust 
>solution is to use whitespace_split to include *all* characters not otherwise 
>considered special.

I agree with that approach.

Also note that dash/hyphen gets incorrectly tokenized.

>>> import shlex
>>> list(shlex.shlex("mkdir -p somepath"))
['mkdir', '-', 'p', 'somepath']

White listing all valid word characters is not good, because the 
surrogateescape mechanism can include all sorts of "characters".

In bash:

$ echo mkdir $(echo -ne "Bad\xffButLegalPath")
mkdir Bad?ButLegalPath

the path is one token.

However currently in shlex, it gets broken into multiple tokens:

>>> list(shlex.shlex(b"mkdir Bad\ffButLegalPath".decode("utf-8", 
>>> "surrogoateescape")))
['mkdir', 'Bad', '\x0c', 'fButLegalPath']

----------
nosy: +Gustavo Goretkin

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28595>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to