Paul Melis wrote:
> Hi,
> 
> mosscliffe wrote:
> 
>> I am looking for a simple split function to create a list of entries
>> from a string which contains quoted elements.  Like in 'google'
>> search.
>>
>> eg  string = 'bob john "johnny cash" 234 june'
>>
>> and I want to have a list of ['bob', 'john, 'johnny cash', '234',
>> 'june']
>>
>> I wondered about using the csv routines, but I thought I would ask the
>> experts first.
>>
>> There maybe a simple function, but as yet I have not found it.
> 
> 
> Here a not-so-simple-function using regular expressions. It repeatedly 
> matched two regexps, one that matches any sequence of characters except 
> a space and one that matches a double-quoted string. If there are two 
> matches the one occurring first in the string is taken and the matching 
> part of the string cut off. This is repeated until the whole string is 
> matched. If there are two matches at the same point in the string the 
> longer of the two matches is taken. (This can't be done with a single 
> regexp using the A|B operator, as it uses lazy evaluation. If A matches 
> then it is returned even if B would match a longer string).

Here a slightly improved version which is a bit more compact and which 
removes the quotes on the matched output quoted string.

import re

def split_string(s):
        
        pat1 = re.compile('[^" ]+')
        pat2 = re.compile('"([^"]*)"')

        parts = []

        m1 = pat1.search(s)
        m2 = pat2.search(s)
        while m1 or m2:
                
                if m1 and m2:
                        if m1.start(0) < m2.start(0):
                                match = 1
                        elif m2.start(0) < m1.start(0):
                                match = 2
                        else:
                                if len(m1.group(0)) > len(m2.group(0)):
                                        match = 1
                                else:
                                        match = 2
                elif m1:
                        match = 1
                else:
                        match = 2
                                
                if match == 1:
                        part = m1.group(0)
                        s = s[m1.end(0):]
                else:
                        part = m2.group(1)
                        s = s[m2.end(0):]
                                        
                parts.append(part)
                        
                m1 = pat1.search(s)
                m2 = pat2.search(s)
                
        return parts

print split_string('bob john "johnny cash" 234 june')
print split_string('"abc""abc"')
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to