Paul Melis wrote: > Hi, > > mosscliffe wrote: > >> I am looking for a simple split function to create a list of entries >> from a string which contains quoted elements. Like in 'google' >> search. >> >> eg string = 'bob john "johnny cash" 234 june' >> >> and I want to have a list of ['bob', 'john, 'johnny cash', '234', >> 'june'] >> >> I wondered about using the csv routines, but I thought I would ask the >> experts first. >> >> There maybe a simple function, but as yet I have not found it. > > > Here a not-so-simple-function using regular expressions. It repeatedly > matched two regexps, one that matches any sequence of characters except > a space and one that matches a double-quoted string. If there are two > matches the one occurring first in the string is taken and the matching > part of the string cut off. This is repeated until the whole string is > matched. If there are two matches at the same point in the string the > longer of the two matches is taken. (This can't be done with a single > regexp using the A|B operator, as it uses lazy evaluation. If A matches > then it is returned even if B would match a longer string).
Here a slightly improved version which is a bit more compact and which removes the quotes on the matched output quoted string. import re def split_string(s): pat1 = re.compile('[^" ]+') pat2 = re.compile('"([^"]*)"') parts = [] m1 = pat1.search(s) m2 = pat2.search(s) while m1 or m2: if m1 and m2: if m1.start(0) < m2.start(0): match = 1 elif m2.start(0) < m1.start(0): match = 2 else: if len(m1.group(0)) > len(m2.group(0)): match = 1 else: match = 2 elif m1: match = 1 else: match = 2 if match == 1: part = m1.group(0) s = s[m1.end(0):] else: part = m2.group(1) s = s[m2.end(0):] parts.append(part) m1 = pat1.search(s) m2 = pat2.search(s) return parts print split_string('bob john "johnny cash" 234 june') print split_string('"abc""abc"') -- http://mail.python.org/mailman/listinfo/python-list