[issue1521950] shlex.split() does not tokenize like the shell

2016-07-29 Thread Roundup Robot

Roundup Robot added the comment:

New changeset ea99e2f0b829 by Vinay Sajip in branch 'default':
Closes #1521950: Made shlex parsing more shell-like.
https://hg.python.org/cpython/rev/ea99e2f0b829

--
nosy: +python-dev
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2016-07-27 Thread Vinay Sajip

Changes by Vinay Sajip :


--
assignee:  -> vinay.sajip

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2016-07-27 Thread Vinay Sajip

Vinay Sajip added the comment:

Okay, I've updated with a new patch addressing SilentGhost's comments, and 
addressed the comments on that patch. If I don't hear any objections by Friday, 
I plan to commit this change.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2016-07-22 Thread Vinay Sajip

Changes by Vinay Sajip :


Added file: http://bugs.python.org/file43831/refresh-2016.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2016-07-21 Thread R. David Murray

R. David Murray added the comment:

No objection from me.  I'm not likely to have the time to give it the kind of 
thorough review I'd *like* to, but I don't think it is really needed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2016-07-21 Thread Vinay Sajip

Vinay Sajip added the comment:

This has been knocking around since 3.3, but never got enough attention to make 
it in. Barring objections from anyone, I'd like to commit this patch once I 
check that it applies cleanly against 3.6, before we get into 3.6 beta.

--
versions: +Python 3.6 -Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2016-06-01 Thread Andrey Kislyuk

Andrey Kislyuk added the comment:

Is there any chance of getting this into 3.6? We are still in a situation where 
the shlex module misleads developers into believing that it has functionality 
to parse things the way the shell does. I've had to vendor the copy of shlex 
with patches from this bug applied (thanks Vinay!)

--
nosy: +Andrey.Kislyuk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2014-10-01 Thread Vinay Sajip

Changes by Vinay Sajip vinay_sa...@yahoo.co.uk:


Added file: http://bugs.python.org/file36772/80eea6bd898c.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2014-05-15 Thread Chris Rebert

Changes by Chris Rebert pyb...@rebertia.com:


--
nosy: +cvrebert

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2013-12-28 Thread Vinay Sajip

Vinay Sajip added the comment:

Let's hope we can get this into 3.5. I updated my patch a while ago to address 
RDM's comments.

--
versions: +Python 3.5 -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-08-01 Thread Vinay Sajip

Changes by Vinay Sajip vinay_sa...@yahoo.co.uk:


--
versions: +Python 3.4 -Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-06-03 Thread Vinay Sajip

Changes by Vinay Sajip vinay_sa...@yahoo.co.uk:


Added file: http://bugs.python.org/file25809/388411be9b61.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-06-02 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

I've updated the patch following comments by RDM - it probably could do with a 
code review (now that I've addressed RDM's comments on the docs).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-06-02 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Review, including a code-but-not-algorithm review :), posted.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-04-27 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-04-25 Thread Vinay Sajip

Changes by Vinay Sajip vinay_sa...@yahoo.co.uk:


Added file: http://bugs.python.org/file25365/9252961a03e7.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-04-23 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

 I'd like to take a look at this (I wasn't aware of it before).
Are you interested in shlex in general or only this bug?  If the former, then 
I’ll try to remember to make you nosy on future issues.

BTW, what is the shlex unicode bug you mentioned a few times on Rietveld?  The 
one I know is fixed now.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-04-23 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

I am interested in shell stuff in general.

The unicode bug is issue 1170.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-04-22 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

I believe Dan meant that the behaviour of shlex.split() now is different from 
what it was when he first raised the issue (in July 2006). Looking at the 
default branch of CPython, this is what I see:

Python 3.3.0a2+ (default:ff6593aa8376, Apr 22 2012, 12:39:08) 
[GCC 4.3.3] on linux
Type help, copyright, credits or license for more information.
 import shlex
 list(shlex.shlex('e;'))
['e', ';']
 list(shlex.shlex('abc';))
['', 'abc', ';']

Likewise, on the 2.6 branch:

Python 2.6.8+ (unknown, Apr 22 2012, 12:44:43) 
[GCC 4.3.3] on linux2
Type help, copyright, credits or license for more information.
 import shlex
 list(shlex.shlex('e;'))
['e', ';']
 list(shlex.shlex('abc';))
['', 'abc', ';']

So from what Dan is saying, it would seem that he is saying that shlex 
behaviour (before my patch being applied) is different now to how he remembers 
it - not that the patch introduces any incompatibility.

Still, another set of eyeballs on the patch would be good.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-04-21 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

I've received no comments on the latest revision of my patch (incorporating 
comments on the previous version); is it OK to commit this?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-04-21 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

I'd like to take a look at this (I wasn't aware of it before).  I'll try to do 
that some time in the next 24 hours, and if I don't you shouldn't wait for me :)

Did you address Dan's concern about 'old' possibly not matching the old 
behavior completely?

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-02-23 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

Éric Araujo mer...@netwok.org added the comment:

I did not fully get what you meant here, but the example you added to the doc 
made it clear.  Is this covered by tests?

Yes, I believe that testSyntaxSplitCustom covers this.

Overall great patch!  Dan, do you have time to test it (or read the new 
examples in the patch) to tell us if it meets what you wanted?

Thanks! It was a bit fiddly, shlex is somewhat difficult to extend cleanly. I 
developed this functionality for a subprocess ease-of-use-wrapper module called 
sarge, and I had to basically copy and modify the whole read_token method :-(

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-02-23 Thread Dan Christian

Dan Christian robo...@users.sourceforge.net added the comment:

I haven't been following this much.  Sorry.  My day job isn't in this area any 
more (and I'm stuck using 2.4 :-().

Looking at the docs, I notice the old is different from what it used to be.  
Notably: 'e;' gets split into two tokens; and 'abc'; gets split into 3.  I'm 
pretty sure that baseline code doesn't split those at all.  So there is a 
question of if old is fully backward compatible.

The new functionality looks great.  That's what I was looking for when I 
filed the bug.

Thank you!
-Dan

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-02-22 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

 Previously when punctuation chars were set, wordchars was being augmented by 
 '-'. This was
 incomplete, so the augmentation is now with '~-./*?=' which allows for 
 wildcards, filename
 chars and argument flags.
I did not fully get what you meant here, but the example you added to the doc 
made it clear.  Is this covered by tests?

Overall great patch!  Dan, do you have time to test it (or read the new 
examples in the patch) to tell us if it meets what you wanted?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-02-22 Thread Gustavo Niemeyer

Changes by Gustavo Niemeyer gust...@niemeyer.net:


--
nosy:  -niemeyer

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-02-21 Thread Vinay Sajip

Changes by Vinay Sajip vinay_sa...@yahoo.co.uk:


Added file: http://bugs.python.org/file24590/079ab75d29a4.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-02-21 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

I updated the patch to reflect Éric's comments on Rietveld, but there are also 
some other changes:

Previously when punctuation chars were set, wordchars was being augmented by 
'-'. This was incomplete, so the augmentation is now with '~-./*?=' which 
allows for wildcards, filename chars and argument flags.

I added a token_type attribute whose value is 'a' for alphanumeric tokens and 
'c' for punctuation tokens. This token type is internally tracked anyway - we 
just expose it now. It is needed for when multiple punctuation tokens need to 
be disambiguated, because we might return two logically separate punctuation 
tokens as one if they are not separated by whitespace in the source being 
tokenised.

New attributes and the changes to wordchars have been documented, and a test 
added for token_type return values.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-02-21 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

Plus I also changed a few instances of the anachronism

a = a + b

to

a += b

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-02-20 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

This time you should have received an email from Rietveld, I made sure that 
your ID was expanded to an email address.

I like all the suggestions you made in reply to my comments.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-01-06 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

I've made a patch which implements this functionality, together with docs and 
tests. Please review.

--
hgrepos: +99
nosy: +vinay.sajip
stage: test needed - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2012-01-06 Thread Vinay Sajip

Changes by Vinay Sajip vinay_sa...@yahoo.co.uk:


Added file: http://bugs.python.org/file24158/9e12275eec25.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2011-11-26 Thread Éric Araujo

Changes by Éric Araujo mer...@netwok.org:


Removed file: http://bugs.python.org/file23778/ref_shlex.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2011-11-26 Thread Éric Araujo

Changes by Éric Araujo mer...@netwok.org:


Removed file: http://bugs.python.org/file23779/test_shlex.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2011-11-26 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

Thanks for the diff and test.  (I removed the older versions; there are “edit” 
links in the list of files leading to pages where it’s possible to remove them, 
if one has the required permissions.)

Your script passes with dash, which is probably the most POSIX-compliant shell 
we can find.  (bash has extensions, zsh/csh don’t use the POSIX shell language, 
so I think the behavior of dash should be our reference, not the bash man page.)

 I may be able to convince people that the current behaviour is wrong, but I 
 can't tell you what will
 break if it is fixed.  And should the fix be the default?  As you 
 mentioned, it depends on what
 people expect it to do and how it is currently being used.

python-dev takes compatibility seriously.  Some things are clearly bugs and we 
fix them, even if it will break buggy code out there.  For example, we recently 
fixed bugs in HTML parsing: We had a specification to decide that they were 
really bugs, and we judged that no sane program could be relying on the exact 
behavior of the parser.  shlex is another case; in my opinion, it’s been used 
for years to implement parsing similar, but not identical in all cases, to the 
shell’s, and as there is code out there that depends on the current behavior of 
shlex and does not need to support  || ; ( ), if we add support for these 
tokens we should not break the existing code.  Given that we can’t test all 
programs that use shlex, I think we’ll have to add a new parameter, with a 
default value which gets us the previous behavior, as I said in my previous 
message.

(BTW, would you mind editing the quoted section when you reply by email?  
Otherwise we get unhelpful, distracting walls of quoted texts.  Thanks in 
advance.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2011-11-26 Thread Dan Christian

Dan Christian robo...@users.sourceforge.net added the comment:

On Sat, Nov 26, 2011 at 7:12 AM, Éric Araujo rep...@bugs.python.org wrote:
 Your script passes with dash, which is probably the most POSIX-compliant 
 shell we can find.  (bash has extensions, zsh/csh don’t use the POSIX shell 
 language, so I think the behavior of dash should be our reference, not the 
 bash man page.)

I was just looking for a reference where I didn't have to sift through
tons of documentation.  Most systems have bash.  Before that I was
just working from experience (I've done a lot of shell scripting).

 there is code out there that depends on the current behavior of shlex and 
 does not need to support  || ; ( ), if we add support for these tokens we 
 should not break the existing code.

Here's a thought on how that might work (just brainstorming).  shlex
uses a series of character strings to drive it's parsing:  whitespace,
escape, quotes.  Add another one: control = '();|'.  If it is unset
(by default?), then the behavior is as before.  If it is set, then
shlex will output any character in control as a separate token.

There might be a shell specific script (or maybe it's left to the
user) that decides that certain tokens can be recombined:  '', '||',
'|', '', etc.  This code is pretty simple:  walk the token
sequence, if you see a two token pair, pop the second and combine it
into the first.

-Dan

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2011-11-26 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

 I was just looking for a reference where I didn't have to sift through tons 
 of documentation.
Sure :)  That’s why I suggest using dash for quick tests and rely on the work 
of other people who did read the POSIX spec.  I’ll have to check it too before 
committing a patch.

 shlex uses a series of character strings to drive it's parsing:  whitespace, 
 escape, quotes.
 Add another one: control = '();|'.  If it is unset (by default?), then the 
 behavior is as
 before.
So we would need to add a Shlex subclass to the module to provide the new 
behavior.  I think I prefer a new argument, because we can just extend the 
existing class and functions instead of adding subtly differing duplicates.

 If it is set, then shlex will output any character in control as a separate 
 token.
Unless it is part of a quoted segment, right?  (See #7611 for 'foo#bar' vs. 
'foo #bar').

 There might be a shell specific script (or maybe it's left to the user)
 that decides that certain tokens can be recombined:
Seems to much complexity.  I really prefer if we agree on one command parsing 
behavior (POSIX, i.e. dash) and improve shlex to support that.  People wanting 
zsh rules can write their own subclass.

 '', '||', '|', '', etc.
Wouldn’t it be more correct to consider them different tokens?  I don’t have a 
format training in CS or programming, so I’m not sure that my definition is 
correct at all, but in my mind a token is a unit, and thus  and  are two 
different things.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2011-11-26 Thread Dan Christian

Dan Christian robo...@users.sourceforge.net added the comment:

 Sure :)  That’s why I suggest using dash for quick tests and rely on the work 
 of other people who did read the POSIX spec.  I’ll have to check it too 
 before committing a patch.

The point of ref_shlex.py is that all shells act the same for common
cases and shlex doesn't match any of them.  The only real split it
that csh based shells do some things differently that sh based shells
('2' vs '').

 shlex uses a series of character strings to drive it's parsing:  whitespace, 
 escape, quotes.
 Add another one: control = '();|'.  If it is unset (by default?), then 
 the behavior is as
 before.
 So we would need to add a Shlex subclass to the module to provide the new 
 behavior.  I think I prefer a new argument, because we can just extend the 
 existing class and functions instead of adding subtly differing duplicates.

You don't have to do a subclass (although that might have some
advantages).  You could do something like:
def shlex(s, comments=False, posix=True, control=False):
...
  if control:
if control is True:
  self.control = '();|'
else:
  self.control = control  # let user specify their own control set

 If it is set, then shlex will output any character in control as a separate 
 token.
 Unless it is part of a quoted segment, right?  (See #7611 for 'foo#bar' vs. 
 'foo #bar').

Correct, quotes wouldn't change.

 There might be a shell specific script (or maybe it's left to the user)
 that decides that certain tokens can be recombined:
 Seems to much complexity.  I really prefer if we agree on one command parsing 
 behavior (POSIX, i.e. dash) and improve shlex to support that.  People 
 wanting zsh rules can write their own subclass.

shlex is a pretty simple lexer (as lexers go), and I wouldn't want it
to get complicated.  It's easier in the current code structure to
split everything and then re-join as needed.  This also allows you to
select sh vs csh joining rules (e.g. '|' means different things in sh
vs csh).  Every shell that I've seen follows one of those two flavors
for syntax.

 '', '||', '|', '', etc.
 Wouldn’t it be more correct to consider them different tokens?  I don’t have 
 a format training in CS or programming, so I’m not sure that my definition is 
 correct at all, but in my mind a token is a unit, and thus  and  are two 
 different things.

Ideally, the final tokens have exact meanings.  It easier to write
handler code for '' than ('', '').  This is just a case of whether
the parse joins them together or it's done in a second step.  The
current code doesn't do much look ahead, so it's hard for the lexer to
produce things like '' directly.

-Dan

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2011-11-25 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

 Of course, that's how it's used.  That's all it can do right now.
:) What I meant is that it is *meant* to be used in this way.

 I was was splitting and combining commands (using ;, , and ||) and then 
 running the resulting
 (mega) one liners over ssh.  It still gets run by a shell, but I was 
 specifying the control flow.
Thank you for the reply.  It is indeed a valuable use case to pass a command 
line as one string to ssh, and the split/quote combo should round-trip and be 
useful for this usage.

 I'll see if I can come up with a reference case and maybe a unittest this 
 weekend
Great!  A new argument (with a default value which gets us the previous 
behavior) will probably be needed, to preserve backward compatibility.

--
nosy: +niemeyer
versions: +Python 3.3 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2011-11-25 Thread Dan Christian

Dan Christian robo...@users.sourceforge.net added the comment:

I've attached a diff to test_shlex.py and a script that I used to
verify what the shells actually do.
Both are relative to Python-3.2.2/Lib/test

I'm completely ignoring the quotes issue for now.  That should
probably be an enhancement.  I don't think it really matters until the
parsing issues are resolved.

ref_shlex is python 2 syntax.  python -3 shows that it should convert cleanly.
./ref_shlex.py
It will run by default against /bin/*sh
If you don't want that, do something like: export SHELLS='/bin/sh,/bin/csh'
It runs as a unittest.  So you will only see dots if all shells do
what it expects.  Some shells are flaky (e.g. zsh, tcsh), so you may
need to run it multiple times.

Getting this into the mainline will be interesting.  I would think it
would take some community discussion.  I may be able to convince
people that the current behaviour is wrong, but I can't tell you what
will break if it is fixed.  And should the fix be the default?  As
you mentioned, it depends on what people expect it to do and how it is
currently being used.  I see the first step as presenting a clear case
of how it should work.

-Dan

On Fri, Nov 25, 2011 at 10:01 AM, Éric Araujo rep...@bugs.python.org wrote:

 Éric Araujo mer...@netwok.org added the comment:

 Of course, that's how it's used.  That's all it can do right now.
 :) What I meant is that it is *meant* to be used in this way.

 I was was splitting and combining commands (using ;, , and ||) and then 
 running the resulting
 (mega) one liners over ssh.  It still gets run by a shell, but I was 
 specifying the control flow.
 Thank you for the reply.  It is indeed a valuable use case to pass a command 
 line as one string to ssh, and the split/quote combo should round-trip and be 
 useful for this usage.

 I'll see if I can come up with a reference case and maybe a unittest this 
 weekend
 Great!  A new argument (with a default value which gets us the previous 
 behavior) will probably be needed, to preserve backward compatibility.

 --
 nosy: +niemeyer
 versions: +Python 3.3 -Python 3.2

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue1521950
 ___


--
keywords: +patch
Added file: http://bugs.python.org/file23778/ref_shlex.py
Added file: http://bugs.python.org/file23779/test_shlex.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___#!/usr/bin/env python

Test how various shells parse syntax.
This is only expected to work on Unix based systems.
We use the unittest infrastructure, but this isn't a normal test.

Usage:
  ref_shelex.py [options] shells...

# Written by: Dan Christian for issue1521950

import glob
import re
import os, sys
import optparse
import subprocess
import unittest


TempDir = '/tmp' # where we will write temp files
Shells = ['/bin/sh', '/bin/bash'] # list of shells to test against

class ShellTest(unittest.TestCase):
bgRe = re.compile(r'\[\d+\]\s+(\d+|\+ Done)$') # backgrounded command output

def Run(self,
shell,   # shell to use
command, # command to run
filepath=None):  # any files that are expected
Carefully run a shell command.
Capture stdout, stderr, and exit status.
Returns: (ret, out, err)
   ret is the return status
   out is the list of lines to stdout
   err is the list of lines to stderr

start_cwd = os.getcwd()
call = [shell, '-c', command]
#print Running: %s -c '%s' % (shell, command)
outpath = 'stdout.txt'
errpath = 'stderr.txt'
ret = -1
out = None
err = None
fileout = None
try:
os.chdir(TempDir)
outfp = open(outpath, 'w')
errfp = open(errpath, 'w')
if filepath and os.path.isfile(filepath):
os.remove(filepath)
ret = subprocess.call(call, stdout=outfp, stderr = errfp)
#print Returned: %d % ret
outfp = open(outpath, 'r')
out = outfp.readlines()
os.remove(outpath)
errfp = open(errpath, 'r')
err = errfp.readlines()
os.remove(errpath)
if filepath:
ffp = open(filepath)
fileout = ffp.readlines()
os.remove(filepath)
except OSError as msg:
print Exception!, msg
os.chdir(start_cwd)
# leave files behind for debugging
self.assertTrue(0, Hit an exception running:  % (
' '.join(call)))
return (ret, out, err, fileout)

def testTrue(self):
 Trivial case to test execution. 
for shell in Shells:
cmd = '/bin/true'
(ret, 

[issue1521950] shlex.split() does not tokenize like the shell

2011-11-25 Thread Dan Christian

Dan Christian robo...@users.sourceforge.net added the comment:

I just realized that I left out a major case.  The shell will also
split ().  I think this is now complete.  If you do man bash and
skip down to DEFINITONS it lists all the control characters.

I've attached updated versions of ref_shlex.py and test_shlex.diff.
They replace the previous ones.

-Dan

On Fri, Nov 25, 2011 at 12:25 PM, Dan Christian rep...@bugs.python.org wrote:

 Dan Christian robo...@users.sourceforge.net added the comment:

 I've attached a diff to test_shlex.py and a script that I used to
 verify what the shells actually do.
 Both are relative to Python-3.2.2/Lib/test

 I'm completely ignoring the quotes issue for now.  That should
 probably be an enhancement.  I don't think it really matters until the
 parsing issues are resolved.

 ref_shlex is python 2 syntax.  python -3 shows that it should convert cleanly.
 ./ref_shlex.py
 It will run by default against /bin/*sh
 If you don't want that, do something like: export SHELLS='/bin/sh,/bin/csh'
 It runs as a unittest.  So you will only see dots if all shells do
 what it expects.  Some shells are flaky (e.g. zsh, tcsh), so you may
 need to run it multiple times.

 Getting this into the mainline will be interesting.  I would think it
 would take some community discussion.  I may be able to convince
 people that the current behaviour is wrong, but I can't tell you what
 will break if it is fixed.  And should the fix be the default?  As
 you mentioned, it depends on what people expect it to do and how it is
 currently being used.  I see the first step as presenting a clear case
 of how it should work.

 -Dan

--
Added file: http://bugs.python.org/file23780/ref_shlex.py
Added file: http://bugs.python.org/file23781/test_shlex.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___#!/usr/bin/env python

Test how various shells parse syntax.
This is only expected to work on Unix based systems.
We use the unittest infrastructure, but this isn't a normal test.

Usage:
  ref_shelex.py [options] shells...

# Written by: Dan Christian for issue1521950
# References: man bash   # look at DEFINITIONS and SHELL GRAMMAR

import glob
import re
import os, sys
import subprocess
import unittest


TempDir = '/tmp' # where we will write temp files
Shells = ['/bin/sh', '/bin/bash'] # list of shells to test against

class ShellTest(unittest.TestCase):
bgRe = re.compile(r'\[\d+\]\s+(\d+|\+ Done)$') # backgrounded command output

def Run(self,
shell,   # shell to use
command, # command to run
filepath=None):  # any files that are expected
Carefully run a shell command.
Capture stdout, stderr, and exit status.
Returns: (ret, out, err)
   ret is the return status
   out is the list of lines to stdout
   err is the list of lines to stderr

start_cwd = os.getcwd()
call = [shell, '-c', command]
#print Running: %s -c '%s' % (shell, command)
outpath = 'stdout.txt'
errpath = 'stderr.txt'
ret = -1
out = None
err = None
fileout = None
try:
os.chdir(TempDir)
outfp = open(outpath, 'w')
errfp = open(errpath, 'w')
if filepath and os.path.isfile(filepath):
os.remove(filepath)
ret = subprocess.call(call, stdout=outfp, stderr = errfp)
#print Returned: %d % ret
outfp = open(outpath, 'r')
out = outfp.readlines()
os.remove(outpath)
errfp = open(errpath, 'r')
err = errfp.readlines()
os.remove(errpath)
if filepath:
ffp = open(filepath)
fileout = ffp.readlines()
os.remove(filepath)
except OSError as msg:
print Exception!, msg
os.chdir(start_cwd)
# leave files behind for debugging
self.assertTrue(0, Hit an exception running:  % (
' '.join(call)))
return (ret, out, err, fileout)

def testTrue(self):
 Trivial case to test execution. 
for shell in Shells:
cmd = '/bin/true'
(ret, out, err, fout) = self.Run(shell, cmd)
self.assertEquals(
0, ret,
Expected %s -c '%s' to return 0, not %d % (shell, cmd, ret))
self.assertEquals(
[], out,
Expected %s -c '%s' send nothing to stdout, not: %s % (
shell, cmd, out))
self.assertEquals(
[], err,
Expected %s -c '%s' send nothing to stderr, not: %s % (
shell, cmd, err))

def testEcho(self):
 Simple case to test stdout. 
for shell in Shells:
 

[issue1521950] shlex.split() does not tokenize like the shell

2011-11-24 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

Thanks for the comments.

 There are really two cases in one bug.
 The first part is that the shell will split tokens at characters that shlex 
 doesn't.  The handling
 of , |, ;, , and  could be done by adjusting the definition of 
 shlex.wordchars.  The shell may
 also understands things like: , ||, |, and .  The exact definition of 
 these depends on the
 shell, so maybe it's best to just split them out as separate tokens and let 
 the user figure out the
 compound meanings.
Yes.  I think that the main use of shlex is really to parse a line into chunks 
with a way to embed spaces; it’s intended to parse a program command line 
(“prog --blah value stillthesamevalue arg samearg”), but not necessarily a 
full shell line (with  and | and whatnot).  When people have a line containing 
 and |, then they need a shell to execute it, so they would not call 
shlex.split but just pass the full line to os.system or subprocess.Popen.  Do 
you remember what use cases you had when you opened this report?

 The proper handling of quotes/escapes requires some kind of new interface.  
 You need to distinguish
 between tokens that were modified by the quote/escape rules and those that 
 were not.
I don’t see why I would care about quotes in the result of shlex.split.

See also #7611.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2011-11-24 Thread Dan Christian

Dan Christian robo...@users.sourceforge.net added the comment:

Of course, that's how it's used.  That's all it can do right now.

I was was splitting and combining commands (using ;, , and ||) and
then running the resulting (mega) one liners over ssh.  It still gets
run by a shell, but I was specifying the control flow. 0

 It's kind of like a makefile command block.  You want to be able to
specify if a failure aborts the sequence, or is ignored ( vs ;).
Sometimes there are fallback commands (via ||).  Of course, you can
also group using ().

Once things are split properly, then understanding the shell control
characters is straight forward.  I my mind, shlex.split() should
either be as close to shell syntax as possible, or have a clear
explanation of what is different (and why).

I ended up doing my own parsing.  I'm not actually at that company
anymore, so I can't pull up the code.

I'll see if I can come up with a reference case and maybe a unittest
this weekend (that's really the only time I'll have to dig into it).

-Dan

On Thu, Nov 24, 2011 at 9:20 AM, Éric Araujo rep...@bugs.python.org wrote:

 Éric Araujo mer...@netwok.org added the comment:

 Thanks for the comments.

 There are really two cases in one bug.
 The first part is that the shell will split tokens at characters that shlex 
 doesn't.  The handling
 of , |, ;, , and  could be done by adjusting the definition of 
 shlex.wordchars.  The shell may
 also understands things like: , ||, |, and .  The exact definition of 
 these depends on the
 shell, so maybe it's best to just split them out as separate tokens and let 
 the user figure out the
 compound meanings.
 Yes.  I think that the main use of shlex is really to parse a line into 
 chunks with a way to embed spaces; it’s intended to parse a program command 
 line (“prog --blah value stillthesamevalue arg samearg”), but not 
 necessarily a full shell line (with  and | and whatnot).  When people have a 
 line containing  and |, then they need a shell to execute it, so they would 
 not call shlex.split but just pass the full line to os.system or 
 subprocess.Popen.  Do you remember what use cases you had when you opened 
 this report?

 The proper handling of quotes/escapes requires some kind of new interface.  
 You need to distinguish
 between tokens that were modified by the quote/escape rules and those that 
 were not.
 I don’t see why I would care about quotes in the result of shlex.split.

 See also #7611.

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue1521950
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2010-09-03 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

Thanks for the report. Would you like to work on a patch, or translate your 
examples into unit tests?

The docs do not mention “” at all, and platform discrepancies have to be taken 
into account too, so I really don’t know if this is a bug fix for the normal 
mode, the POSIX mode, or a feature request requiring a new argument to the 
shlex function to preserve compatibility.

--
nosy: +eric.araujo, eric.smith

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2010-09-03 Thread Dan Christian

Dan Christian robo...@users.sourceforge.net added the comment:

It's been a while since I looked at this.  I'm not really in a
position to contribute code/tests right now; but I can comment.

I don't think POSIX mode existed when I first reported this, but
that's where it makes sense.  I think all POSIX shells (borne, C,
korne), will behave the same way for the issues mentioned.

There are really two cases in one bug.

The first part is that the shell will split tokens at characters that
shlex doesn't.  The handling of , |, ;, , and  could be done by
adjusting the definition of shlex.wordchars.  The shell may also
understands things like: , ||, |, and .  The exact definition of
these depends on the shell, so maybe it's best to just split them out
as separate tokens and let the user figure out the compound meanings.

The proper handling of quotes/escapes requires some kind of new
interface.  You need to distinguish between tokens that were modified
by the quote/escape rules and those that were not.  One suggestion is
to add a new method as such:

shlex.get_token2()
   Return a tuple of the token and the original text of the token
(including quotes and escapes).  Otherwise, this is the same as
shlex.get_token().

Comparing the two values for equality (or maybe identity) would tell
you if something special was going on.  You can always pass the second
value to a reconstructed command line without losing any of the
original parsing information.

-Dan

On Fri, Sep 3, 2010 at 10:27 AM, Éric Araujo rep...@bugs.python.org wrote:

 Éric Araujo mer...@netwok.org added the comment:

 Thanks for the report. Would you like to work on a patch, or translate your 
 examples into unit tests?

 The docs do not mention “” at all, and platform discrepancies have to be 
 taken into account too, so I really don’t know if this is a bug fix for the 
 normal mode, the POSIX mode, or a feature request requiring a new argument to 
 the shlex function to preserve compatibility.

 --
 nosy: +eric.araujo, eric.smith

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue1521950
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2010-07-10 Thread Mark Lawrence

Changes by Mark Lawrence breamore...@yahoo.co.uk:


--
versions: +Python 3.2 -Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521950] shlex.split() does not tokenize like the shell

2009-03-29 Thread Daniel Diniz

Changes by Daniel Diniz aja...@gmail.com:


--
stage:  - test needed
type:  - feature request
versions: +Python 2.7 -Python 2.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1521950
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com