Gustavo Goretkin added the comment: >Instead of trying to enumerate all possible wordchars, I think a more robust >solution is to use whitespace_split to include *all* characters not otherwise >considered special.
I agree with that approach. Also note that dash/hyphen gets incorrectly tokenized. >>> import shlex >>> list(shlex.shlex("mkdir -p somepath")) ['mkdir', '-', 'p', 'somepath'] White listing all valid word characters is not good, because the surrogateescape mechanism can include all sorts of "characters". In bash: $ echo mkdir $(echo -ne "Bad\xffButLegalPath") mkdir Bad?ButLegalPath the path is one token. However currently in shlex, it gets broken into multiple tokens: >>> list(shlex.shlex(b"mkdir Bad\ffButLegalPath".decode("utf-8", >>> "surrogoateescape"))) ['mkdir', 'Bad', '\x0c', 'fButLegalPath'] ---------- nosy: +Gustavo Goretkin _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28595> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com