In [401]: import shlex In [402]: shlex.split("""Joe went to 'the store' where he bought a "box of chocolates" and stuff.""") Out[402]: ['Joe', 'went', 'to', 'the store', 'where', 'he', 'bought', 'a', 'box of chocolates', 'and', 'stuff.']
how's that work for ya? http://docs.python.org/library/shlex.html On Tue, 10 Feb 2009 16:46:30 -0600 Tim Chase <python.l...@tim.thechases.com> wrote: > >> Or for a slightly less simple minded splitting you could try > >> re.split: > >> > >>>>> re.split("(\w+)", "The quick brown fox jumps, and falls > >>>>> over.")[1::2] > >> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over'] > > > > > > Perhaps I'm missing something, but the above regex does the exact > > same thing as line.split() except it is significantly slower and > > harder to read. > > > > Neither deal with quoted text, apostrophes, hyphens, punctuation or > > any other details of real-world text. That's what I mean by > > "simple-minded". > > >>> s = "The quick brown fox jumps, and falls over." > >>> import re > >>> re.split(r"(\w+)", s)[1::2] > ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over'] > >>> s.split() > ['The', 'quick', 'brown', 'fox', 'jumps,', 'and', 'falls', > 'over.'] > > Note the difference in "jumps" vs. "jumps," (extra comma in the > string.split() version) and likewise the period after "over". > Thus not quite "the exact same thing as line.split()". > > I think an easier-to-read variant would be > > >>> re.findall(r"\w+", s) > ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over'] > > which just finds words. One could also just limit it to letters with > > re.findall("[a-zA-Z]", s) > > as "\w" is a little more encompassing (letters and underscores) > if that's a problem. > > -tkc > > > > > -- > http://mail.python.org/mailman/listinfo/python-list -- Josh Dukes MicroVu IT Department -- http://mail.python.org/mailman/listinfo/python-list