[issue1521950] shlex.split() does not tokenize like the shell

Dan Christian Fri, 25 Nov 2011 13:03:45 -0800

Dan Christian <robo...@users.sourceforge.net> added the comment:

I just realized that I left out a major case.  The shell will also
split ().  I think this is now complete.  If you do "man bash" and
skip down to DEFINITONS it lists all the control characters.


I've attached updated versions of ref_shlex.py and test_shlex.diff.
They replace the previous ones.

-Dan

On Fri, Nov 25, 2011 at 12:25 PM, Dan Christian <rep...@bugs.python.org> wrote:
>
> Dan Christian <robo...@users.sourceforge.net> added the comment:
>
> I've attached a diff to test_shlex.py and a script that I used to
> verify what the shells actually do.
> Both are relative to Python-3.2.2/Lib/test
>
> I'm completely ignoring the quotes issue for now.  That should
> probably be an enhancement.  I don't think it really matters until the
> parsing issues are resolved.
>
> ref_shlex is python 2 syntax.  python -3 shows that it should convert cleanly.
> ./ref_shlex.py
> It will run by default against /bin/*sh
> If you don't want that, do something like: export SHELLS='/bin/sh,/bin/csh'
> It runs as a unittest.  So you will only see dots if all shells do
> what it expects.  Some shells are flaky (e.g. zsh, tcsh), so you may
> need to run it multiple times.
>
> Getting this into the mainline will be interesting.  I would think it
> would take some community discussion.  I may be able to convince
> people that the current behaviour is wrong, but I can't tell you what
> will break if it is "fixed".  And should the fix be the default?  As
> you mentioned, it depends on what people expect it to do and how it is
> currently being used.  I see the first step as presenting a clear case
> of how it should work.
>
> -Dan

----------
Added file: http://bugs.python.org/file23780/ref_shlex.py
Added file: http://bugs.python.org/file23781/test_shlex.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1521950>
_______________________________________

#!/usr/bin/env python

"""Test how various shells parse syntax.
This is only expected to work on Unix based systems.
We use the unittest infrastructure, but this isn't a normal test.

Usage:
  ref_shelex.py [options] shells...
"""
# Written by: Dan Christian for issue1521950
# References: man bash   # look at DEFINITIONS and SHELL GRAMMAR

import glob
import re
import os, sys
import subprocess
import unittest


TempDir = '/tmp'                 # where we will write temp files
Shells = ['/bin/sh', '/bin/bash'] # list of shells to test against

class ShellTest(unittest.TestCase):
    bgRe = re.compile(r'\[\d+\]\s+(\d+|\+ Done)$') # backgrounded command output

    def Run(self,
            shell,           # shell to use
            command,         # command to run
            filepath=None):  # any files that are expected
        """Carefully run a shell command.
        Capture stdout, stderr, and exit status.
        Returns: (ret, out, err)
           ret is the return status
           out is the list of lines to stdout
           err is the list of lines to stderr
        """
        start_cwd = os.getcwd()
        call = [shell, '-c', command]
        #print "Running: %s -c '%s'" % (shell, command)
        outpath = 'stdout.txt'
        errpath = 'stderr.txt'
        ret = -1
        out = None
        err = None
        fileout = None
        try:
            os.chdir(TempDir)
            outfp = open(outpath, 'w')
            errfp = open(errpath, 'w')
            if filepath and os.path.isfile(filepath):
                os.remove(filepath)
            ret = subprocess.call(call, stdout=outfp, stderr = errfp)
            #print "Returned: %d" % ret
            outfp = open(outpath, 'r')
            out = outfp.readlines()
            os.remove(outpath)
            errfp = open(errpath, 'r')
            err = errfp.readlines()
            os.remove(errpath)
            if filepath:
                ffp = open(filepath)
                fileout = ffp.readlines()
                os.remove(filepath)
        except OSError as msg:
            print "Exception!", msg
            os.chdir(start_cwd)
            # leave files behind for debugging
            self.assertTrue(0, "Hit an exception running: " % (
                    ' '.join(call)))
        return (ret, out, err, fileout)

    def testTrue(self):
        """ Trivial case to test execution. """
        for shell in Shells:
            cmd = '/bin/true'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(
                [], out,
                "Expected %s -c '%s' send nothing to stdout, not: %s" % (
                    shell, cmd, out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))

    def testEcho(self):
        """ Simple case to test stdout. """
        for shell in Shells:
            cmd = 'echo "hello world"'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(
                1, len(out),
                "Expected %s -c '%s' to output 1 line of stdout, not: %s" % (
                    shell, cmd, out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))

    def testRedirectS(self):
        """ output redirect with space """
        for shell in Shells:
            fpath = "out.txt"
            cmd = 'echo "hi" > %s' % fpath
            (ret, out, err, fout) = self.Run(shell, cmd, fpath)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(
                [], out,
                "Expected %s -c '%s' send nothing to stdout, not: %s" % (
                    shell, cmd, out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))
            self.assertEquals(1, len(fout))

    def testRedirectNS(self):
        """ output redirect without space """
        for shell in Shells:
            fpath = "out.txt"
            cmd = 'echo "hi"> %s' % fpath
            (ret, out, err, fout) = self.Run(shell, cmd, fpath)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(
                [], out,
                "Expected %s -c '%s' send nothing to stdout, not: %s" % (
                    shell, cmd, out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))
            self.assertEquals(1, len(fout))

    def testTwoEchoS(self):
        """ Two seperate output lines (with space) """
        for shell in Shells:
            cmd = 'echo hi ; echo bye'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(['hi\n', 'bye\n'], out)
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))

    def testTwoEchoNS(self):
        """ Two seperate output lines (with space) """
        for shell in Shells:
            cmd = 'echo hi;echo bye'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(['hi\n', 'bye\n'], out)
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))

    def testParenS(self):
        """ Sub shell (with spaces) """
        for shell in Shells:
            cmd = '( echo hi )'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(
                ['hi\n'], out,
                "Expected %s -c '%s' to return 'hi', not: %s" % (
                    shell, cmd, out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))

    def testParenNS(self):
        """ Sub shell (no spaces) """
        for shell in Shells:
            cmd = '(echo hi)'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            self.assertEquals(['hi\n'], out,
                "Expected %s -c '%s' to return 'hi', not: %s" % (
                    shell, cmd, out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))

    def testBgEcho(self):
        """ Two seperate output lines but unordered """
        # This is flaky.  The output can vary on zsh and tcsh.  Just re-run.
        for shell in Shells:
            cmd = 'echo hi&echo bye; wait'
            (ret, out, err, fout) = self.Run(shell, cmd)
            self.assertEquals(
                0, ret,
                "Expected %s -c '%s' to return 0, not %d" % (shell, cmd, ret))
            # You may get extra lines on csh (hi, bye, bg notice, done notice)
            self.assertTrue(
                len(out) in (2, 3, 4),
                "Expected %s -c '%s' to output 2-4 lines, not %d\n%s" % (
                    shell, cmd, len(out), out))
            self.assertEquals(
                [], err,
                "Expected %s -c '%s' send nothing to stderr, not: %s" % (
                    shell, cmd, err))


def main(args):
    global TempDir, Shells

    val = os.getenv('TEMPDIR')
    if val:
        TempDir = val
    val = os.getenv('SHELLS')
    if val in ('AUTO', 'auto'):
        Shells = glob.glob('/bin/*sh')
        if not Shells:
            print "No shells found as /bin/*sh"
            sys.exit(2)
    elif val is not None:
        Shells = val.split(',')

    print "Testing shells: %s" % ', '.join(Shells)
    unittest.main()  
    

if __name__ == "__main__":
    main(sys.argv[1:])

--- test_shlex-orig.py  2011-09-03 10:16:44.000000000 -0600
+++ test_shlex.py       2011-11-25 13:56:23.000000000 -0700
@@ -173,6 +173,54 @@
                              "%s: %s != %s" %
                              (self.data[i][0], l, self.data[i][1:]))
 
+    def testSyntaxSplitAmpersand(self):
+        """Test handling of syntax splitting of &"""
+        # Could take these forms: &&, &, |&, ;&, ;;&
+        # of course, the same applies to | and ||
+        # these should all parse to the same output
+        src = ['echo hi && echo bye',
+               'echo hi&&echo bye',
+               'echo "hi"&&echo "bye"']
+        ref = ['echo', 'hi', '&&', 'echo', 'bye']
+        # Maybe this should be: ['echo', 'hi', '&', '&', 'echo', 'bye']
+        for ss in src:
+            result = shlex.split(ss)
+            self.assertEqual(ref, result, "While splitting '%s'" % ss)
+
+    def testSyntaxSplitSemicolon(self):
+        """Test handling of syntax splitting of ;"""
+        # Could take these forms: ;, ;;, ;&, ;;&
+        # these should all parse to the same output
+        src = ['echo hi ; echo bye',
+               'echo hi; echo bye',
+               'echo hi;echo bye']
+        ref = ['echo', 'hi', ';', 'echo', 'bye']
+        for ss in src:
+            result = shlex.split(ss)
+            self.assertEqual(ref, result, "While splitting '%s'" % ss)
+
+    def testSyntaxSplitRedirect(self):
+        """Test handling of syntax splitting of >"""
+        # of course, the same applies to <, |
+        # these should all parse to the same output
+        src = ['echo hi > out',
+               'echo hi> out',
+               'echo hi>out']
+        ref = ['echo', 'hi', '>', 'out']
+        for ss in src:
+            result = shlex.split(ss)
+            self.assertEqual(ref, result, "While splitting '%s'" % ss)
+
+    def testSyntaxSplitParen(self):
+        """Test handling of syntax splitting of ()"""
+        # these should all parse to the same output
+        src = ['( echo hi )',
+               '(echo hi)']
+        ref = ['(', 'echo', 'hi', ')']
+        for ss in src:
+            result = shlex.split(ss)
+            self.assertEqual(ref, result, "While splitting '%s'" % ss)
+
 # Allow this test to be used with old shlex.py
 if not getattr(shlex, "split", None):
     for methname in dir(ShlexTest):

_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1521950] shlex.split() does not tokenize like the shell

Reply via email to