Re: String splitting with exceptions

2013-08-29 Thread wxjmfauth
Le mercredi 28 août 2013 18:44:53 UTC+2, John Levine a écrit :
 I have a crufty old DNS provisioning system that I'm rewriting and I
 
 hope improving in python.  (It's based on tinydns if you know what
 
 that is.)
 
 
 
 The record formats are, in the worst case, like this:
 
 
 
 foo.[DOM]::[IP6::4361:6368:6574]:600::
 
 
 
 What I would like to do is to split this string into a list like this:
 
 
 
 [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]
 
 
 
 Colons are separators except when they're inside square brackets.  I
 
 have been messing around with re.split() and re.findall() and haven't
 
 been able to come up with either a working separator pattern for
 
 split() or a working field pattern for findall().  I came pretty
 
 close with findall() but can't get it to reliably match the
 
 nothing between two adjacent colons not inside brackets.
 
 
 
 Any suggestions? I realize I could do it in a loop where I pick stuff
 
 off the front of the string, but yuck.
 
 
 
 This is in python 2.7.5.
 
 
 
 -- 
 
 Regards,
 
 John Levine, jo...@iecc.com, Primary Perpetrator of The Internet for 
 Dummies,
 
 Please consider the environment before reading this e-mail. http://jl.ly

--

Basic idea: protect - split - unprotect

 s = 'foo.[DOM]::[IP6::4361:6368:6574]:600::'
 r = s.replace('[IP6::', '***')
 a = r.split('::')
 a
['foo.[DOM]', '***4361:6368:6574]:600', '']
 a[1] = a[1].replace('***', '[IP6::')
 a
['foo.[DOM]', '[IP6::4361:6368:6574]:600', '']

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


String splitting with exceptions

2013-08-28 Thread John Levine
I have a crufty old DNS provisioning system that I'm rewriting and I
hope improving in python.  (It's based on tinydns if you know what
that is.)

The record formats are, in the worst case, like this:

foo.[DOM]::[IP6::4361:6368:6574]:600::

What I would like to do is to split this string into a list like this:

[ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]

Colons are separators except when they're inside square brackets.  I
have been messing around with re.split() and re.findall() and haven't
been able to come up with either a working separator pattern for
split() or a working field pattern for findall().  I came pretty
close with findall() but can't get it to reliably match the
nothing between two adjacent colons not inside brackets.

Any suggestions? I realize I could do it in a loop where I pick stuff
off the front of the string, but yuck.

This is in python 2.7.5.

-- 
Regards,
John Levine, jo...@iecc.com, Primary Perpetrator of The Internet for Dummies,
Please consider the environment before reading this e-mail. http://jl.ly
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting with exceptions

2013-08-28 Thread Skip Montanaro
 The record formats are, in the worst case, like this:

 foo.[DOM]::[IP6::4361:6368:6574]:600::

 Any suggestions?

Write a little parser that can handle the record format?

Skip
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting with exceptions

2013-08-28 Thread random832
On Wed, Aug 28, 2013, at 12:44, John Levine wrote:
 I have a crufty old DNS provisioning system that I'm rewriting and I
 hope improving in python.  (It's based on tinydns if you know what
 that is.)
 
 The record formats are, in the worst case, like this:
 
 foo.[DOM]::[IP6::4361:6368:6574]:600::
 
 What I would like to do is to split this string into a list like this:
 
 [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]
 
 Colons are separators except when they're inside square brackets.  I
 have been messing around with re.split() and re.findall() and haven't
 been able to come up with either a working separator pattern for
 split() or a working field pattern for findall().  I came pretty
 close with findall() but can't get it to reliably match the
 nothing between two adjacent colons not inside brackets.
 
 Any suggestions? I realize I could do it in a loop where I pick stuff
 off the front of the string, but yuck.
 
 This is in python 2.7.5.

Can you have brackets within brackets? If so, this is impossible to deal
with within a regex.

Otherwise:
 re.findall('((?:[^[:]|\[[^]]*\])*):?',s)
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', '']

I'm not sure why _your_ list only has one empty string at the end. Is
the record always terminated by a colon that is not meant to imply an
empty field after it? If so, remove the question mark:

 re.findall('((?:[^[:]|\[[^]]*\])*):',s)
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '']

I've done this kind of thing (for validation, not capturing) for email
addresses (there are some obscure bits of email address syntax that need
it) before, so it came to mind immediately.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting with exceptions

2013-08-28 Thread Tim Chase
On 2013-08-28 13:14, random...@fastmail.us wrote:
 On Wed, Aug 28, 2013, at 12:44, John Levine wrote:
  I have a crufty old DNS provisioning system that I'm rewriting
  and I hope improving in python.  (It's based on tinydns if you
  know what that is.)
  
  The record formats are, in the worst case, like this:
  
  foo.[DOM]::[IP6::4361:6368:6574]:600::
 
 Otherwise:
  re.findall('((?:[^[:]|\[[^]]*\])*):?',s)
 ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', '']
 
 I'm not sure why _your_ list only has one empty string at the end.

I wondered that.  I also wondered about bracketed quoting that
doesn't start at the beginning of a field:

  foo.[one:two]::[IP6::1234:5678:9101]:600::
  ^
This might be bogus, or one might want to catch this case.

-tkc

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting with exceptions

2013-08-28 Thread Neil Cerutti
On 2013-08-28, John Levine jo...@iecc.com wrote:
 I have a crufty old DNS provisioning system that I'm rewriting and I
 hope improving in python.  (It's based on tinydns if you know what
 that is.)

 The record formats are, in the worst case, like this:

 foo.[DOM]::[IP6::4361:6368:6574]:600::

 What I would like to do is to split this string into a list like this:

 [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]

 Colons are separators except when they're inside square
 brackets.  I have been messing around with re.split() and
 re.findall() and haven't been able to come up with either a
 working separator pattern for split() or a working field
 pattern for findall().  I came pretty close with findall() but
 can't get it to reliably match the nothing between two adjacent
 colons not inside brackets.

 Any suggestions? I realize I could do it in a loop where I pick
 stuff off the front of the string, but yuck.

A little parser, as Skip suggested, is a good way to go.

The brackets make your string context-sensitive, a difficult
concept to cleanly parse with a regex.

I initially hoped a csv module dialect could work, but the quote
character is (currently) hard-coded to be a single, simple
character, i.e., I can't tell it to treat [xxx] as xxx.

What about Skip's suggestion? A little parser. It might seem
crass or something, but it really is easier than musceling a
regex into a context sensitive grammer.

def dns_split(s):
in_brackets = False
b = 0 # index of beginning of current string
for i, c in enumerate(s):
if not in_brackets:
if c == [:
in_brackets = True
elif c == ':':
yield s[b:i]
b = i+1
elif c == ]:
in_brackets = False

 print(list(dns_split(s)))
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '']

It'll gag on nested brackets (fixable with a counter) and has no
error handling (requires thought), but it's a start.

-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting with exceptions

2013-08-28 Thread Neil Cerutti
On 2013-08-28, Tim Chase python.l...@tim.thechases.com wrote:
 On 2013-08-28 13:14, random...@fastmail.us wrote:
 On Wed, Aug 28, 2013, at 12:44, John Levine wrote:
  I have a crufty old DNS provisioning system that I'm rewriting
  and I hope improving in python.  (It's based on tinydns if you
  know what that is.)
  
  The record formats are, in the worst case, like this:
  
  foo.[DOM]::[IP6::4361:6368:6574]:600::
 
 Otherwise:
  re.findall('((?:[^[:]|\[[^]]*\])*):?',s)
 ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', '']
 
 I'm not sure why _your_ list only has one empty string at the end.

 I wondered that.

Good point. My little parser fails on that, too. It'll miss *all*
final fields. My parser needs if s: yield s[b:] at the end, to
operate like str.split, where the empty string is special.

-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting with exceptions

2013-08-28 Thread Peter Otten
Neil Cerutti wrote:

 On 2013-08-28, John Levine jo...@iecc.com wrote:
 I have a crufty old DNS provisioning system that I'm rewriting and I
 hope improving in python.  (It's based on tinydns if you know what
 that is.)

 The record formats are, in the worst case, like this:

 foo.[DOM]::[IP6::4361:6368:6574]:600::

 What I would like to do is to split this string into a list like this:

 [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]

 Colons are separators except when they're inside square
 brackets.  I have been messing around with re.split() and
 re.findall() and haven't been able to come up with either a
 working separator pattern for split() or a working field
 pattern for findall().  I came pretty close with findall() but
 can't get it to reliably match the nothing between two adjacent
 colons not inside brackets.

 Any suggestions? I realize I could do it in a loop where I pick
 stuff off the front of the string, but yuck.
 
 A little parser, as Skip suggested, is a good way to go.
 
 The brackets make your string context-sensitive, a difficult
 concept to cleanly parse with a regex.
 
 I initially hoped a csv module dialect could work, but the quote
 character is (currently) hard-coded to be a single, simple
 character, i.e., I can't tell it to treat [xxx] as xxx.
 
 What about Skip's suggestion? A little parser. It might seem
 crass or something, but it really is easier than musceling a
 regex into a context sensitive grammer.
 
 def dns_split(s):
 in_brackets = False
 b = 0 # index of beginning of current string
 for i, c in enumerate(s):
 if not in_brackets:
 if c == [:
 in_brackets = True
 elif c == ':':
 yield s[b:i]
 b = i+1
 elif c == ]:
 in_brackets = False

I think you need one more yield outside the loop.

 print(list(dns_split(s)))
 ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '']
 
 It'll gag on nested brackets (fixable with a counter) and has no
 error handling (requires thought), but it's a start.
 
Something similar on top of regex:

 def split(s):
... start = level = 0
... for m in re.compile(r[[:\]]).finditer(s):
... if m.group() == [: level += 1
... elif m.group() == ]:
... assert level
... level -= 1
... elif level == 0:
... yield s[start:m.start()]
... start = m.end()
... yield s[start:]
... 
 list(split(a[b:c:]:d))
['a[b:c:]', 'd']
 list(split(a[b:c[:]]:d))
['a[b:c[:]]', 'd']
 list(split())
['']
 list(split(:))
['', '']
 list(split(:x))
['', 'x']
 list(split([:x]))
['[:x]']
 list(split(:[:x]))
['', '[:x]']
 list(split(:[:[:]:x]))
['', '[:[:]:x]']
 list(split([:::]))
['[:::]']
 s = foo.[DOM]::[IP6::4361:6368:6574]:600::
 list(split(s))
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', '']

Note that there is one more empty string which I believe the OP forgot.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting with exceptions

2013-08-28 Thread John Levine
Can you have brackets within brackets? If so, this is impossible to deal
with within a regex.

Nope.  It's a regular language, not a CFL.

Otherwise:
 re.findall('((?:[^[:]|\[[^]]*\])*):?',s)
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', '']

That seems to do it, thanks.

-- 
Regards,
John Levine, jo...@iecc.com, Primary Perpetrator of The Internet for Dummies,
Please consider the environment before reading this e-mail. http://jl.ly
-- 
http://mail.python.org/mailman/listinfo/python-list


String splitting by spaces question

2011-11-23 Thread Massi
Hi everyone,

I have to parse a string and splitting it by spaces. The problem is
that the string can include substrings comprises by quotations which
must mantain the spaces. What I need is to pass from a string like:

This is an 'example string'

to the following vector:

[This, is, an, example string]

Which is the best way to achieve this?
Thanks in advance!
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: String splitting by spaces question

2011-11-23 Thread Alemu Tadesse

Hi Everyone,

Can we use rsplit function on an array or vector of strings ? it works
for one not for vector

Alemu


-Original Message-
From: python-list-bounces+atadesse=sunedison@python.org
[mailto:python-list-bounces+atadesse=sunedison@python.org] On Behalf
Of Massi
Sent: Wednesday, November 23, 2011 10:10 AM
To: python-list@python.org
Subject: String splitting by spaces question

Hi everyone,

I have to parse a string and splitting it by spaces. The problem is
that the string can include substrings comprises by quotations which
must mantain the spaces. What I need is to pass from a string like:

This is an 'example string'

to the following vector:

[This, is, an, example string]

Which is the best way to achieve this?
Thanks in advance!
-- 
http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting by spaces question

2011-11-23 Thread Arnaud Delobelle
On 23 November 2011 17:10, Massi massi_...@msn.com wrote:
 Hi everyone,

 I have to parse a string and splitting it by spaces. The problem is
 that the string can include substrings comprises by quotations which
 must mantain the spaces. What I need is to pass from a string like:

 This is an 'example string'

 to the following vector:

You mean list

 [This, is, an, example string]


Here's a way:

 s = This is an 'example string' with 'quotes again'
 [x for i, p in enumerate(s.split(')) for x in ([p] if i%2 else p.split())]
['This', 'is', 'an', 'example string', 'with', 'quotes again']

-- 
Arnaud
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting by spaces question

2011-11-23 Thread Nick Dokos
Alemu Tadesse atade...@sunedison.com wrote:

 Can we use rsplit function on an array or vector of strings ? it works
 for one not for vector

 ...
 
 I have to parse a string and splitting it by spaces. The problem is
 that the string can include substrings comprises by quotations which
 must mantain the spaces. What I need is to pass from a string like:
 
 This is an 'example string'
 
 to the following vector:
 
 [This, is, an, example string]
 
 Which is the best way to achieve this?
 Thanks in advance!

You can use a list comprehension:

l2 = [x.rsplit(...) for x in l]

But for the original question, maybe the csv module would be
more useful: you can change delimiters and quotechars to match
your input:

import csv
reader = csv.reader(open(foo.txt, rb), delimiter=' ', quotechar=')
for row in reader:
print row

Nick
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting by spaces question

2011-11-23 Thread Jerry Hill
On Wed, Nov 23, 2011 at 12:10 PM, Massi massi_...@msn.com wrote:

 Hi everyone,

 I have to parse a string and splitting it by spaces. The problem is
 that the string can include substrings comprises by quotations which
 must mantain the spaces. What I need is to pass from a string like:

 This is an 'example string'

 to the following vector:

 [This, is, an, example string]

 Which is the best way to achieve this?



This sounds a lot like the way a shell parses arguments on the command
line.  If that's your desire, python has a module in the standard library
that will help, called shlex (http://docs.python.org/library/shlex.html).
Particularly, shlex.split may do exactly what you want out of the box:

Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit
(Intel)] on win32

 import shlex
 s = This is an 'example string'
 shlex.split(s)
['This', 'is', 'an', 'example string']


-- 
Jerry
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting by spaces question

2011-11-23 Thread Miki Tebeka
http://docs.python.org/library/shlex.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting by spaces question

2011-11-23 Thread Phil Rist
In article 3f19e4c0-e010-4cb2-9f71-dd09e0d3c...@r9g2000vbw.googlegroups.com,
Massi says...

Hi everyone,

I have to parse a string and splitting it by spaces. The problem is
that the string can include substrings comprises by quotations which
must mantain the spaces. What I need is to pass from a string like:

This is an 'example string'

to the following vector:

[This, is, an, example string]

Which is the best way to achieve this?
Thanks in advance!


Is this what you want?

import shlex


lText = This is a 'short string' for you to read.
lWords = shlex.split(lText)
print lWords

produces,

['This', 'is', 'a', 'short string', 'for', 'you', 'to', 'read.']

Shlex can be found under 'Program Frameworks' under 'The Python Standard
Library' of ActivePython 2.7 documentation.

C:\Source\Python\New

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String splitting by spaces question

2011-11-23 Thread DevPlayer
This is an 'example string'

Don't for get to watch for things like:

Don't, Can't, Won't, I'll, He'll, Hor'davors, Mc'Kinly
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string splitting

2006-10-18 Thread George Sakkis
[EMAIL PROTECTED] wrote:

 Hello,
 I have thousands of files that look something like this:

 wisconsin_state.txt
 french_guiana_district.txt
 central_african_republic_province.txt

 I need to extract the string between the *last* underscore and the
 extention.
 So based on the files above, I want returned:
 state
 district
 province

def extract(s):
return s[s.rfind('_')+1:s.rfind('.')]


George

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string splitting

2006-10-17 Thread stefaan
 Anyone have any ideas?

l = wisconsin_state.txt
l.split(.)[0].split(_)[-1]

Explanation:
---
the split(.)[0] part takes everything before the .

the split(_)[-1] part selects in the last element in the list of
substrings which are separated by _

-- 
http://mail.python.org/mailman/listinfo/python-list


string splitting

2006-10-16 Thread rdharles
Hello,
I have thousands of files that look something like this:

wisconsin_state.txt
french_guiana_district.txt
central_african_republic_province.txt

I need to extract the string between the *last* underscore and the
extention.
So based on the files above, I want returned:
state
district
province

My plan was to use .split or .find but I can't figure out how locate
only the last underscore in the filename.

Anyone have any ideas?

Thanks.
R.D.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string splitting

2006-10-16 Thread Simon Brunning
On 16 Oct 2006 12:12:38 -0700, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hello,
 I have thousands of files that look something like this:

 wisconsin_state.txt
 french_guiana_district.txt
 central_african_republic_province.txt

 I need to extract the string between the *last* underscore and the
 extention.
 So based on the files above, I want returned:
 state
 district
 province

 My plan was to use .split or .find but I can't figure out how locate
 only the last underscore in the filename.

 spam = 'central_african_republic_province.txt'
 spam.split('.')[0].rsplit('_', 1)[-1]
'province'

-- 
Cheers,
Simon B
[EMAIL PROTECTED]
http://www.brunningonline.net/simon/blog/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string splitting

2006-10-16 Thread hiaips

[EMAIL PROTECTED] wrote:
 Hello,
 I have thousands of files that look something like this:

 wisconsin_state.txt
 french_guiana_district.txt
 central_african_republic_province.txt

 I need to extract the string between the *last* underscore and the
 extention.
 So based on the files above, I want returned:
 state
 district
 province

 My plan was to use .split or .find but I can't figure out how locate
 only the last underscore in the filename.

 Anyone have any ideas?

 Thanks.
 R.D.

Hi,

Try splitting the string on . and using rfind to find the last
instance of _.

i.e.,
myStr = wisconsin_state.txt
pieces = myStr.split(.)
substr = pieces[0][pieces[0].rfind(_) + 1:]

--hiaips

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string splitting

2006-10-16 Thread rdharles
Much thanks for your replies hiaips  Simon!
R.D.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string splitting

2006-10-16 Thread bearophileHUGS
A pair of solutions:

 s = central_african_republic_province.txt
 s.rsplit(_, 1)[-1].split(.)[0]
'province'
 import re
 p = re.compile(r_ ([^_]+) \., re.VERBOSE)
 s = \
... wisconsin_state.txt
... french_guiana_district.txt
... central_african_republic_province.txt
 p.findall(s)
['state', 'district', 'province']

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Quote-aware string splitting

2005-04-26 Thread Paul McGuire
Quoted strings are surprisingly stateful, so that using a parser isn't
totally out of line.  Here is a pyparsing example with some added test
cases.  Pyparsing's quotedString built-in handles single or double
quotes (if you don't want to be this permissive, there are also
sglQuotedString and dblQuotedString to choose from), plus escaped quote
characters.

The snippet below includes two samples.  The first 3 lines give the
equivalent to other suggestions on this thread.  It is followed by a
slightly enhanced version that strips quotation marks from any quoted
entries.

-- Paul
(get pyparsing at http://pyparsing.sourceforge.net)
==
from pyparsing import *
test = r'''spam 'it don\'t mean a thing' the life of brian
   42 'the meaning of life' grail'''
print OneOrMore( quotedString | Word(printables) ).parseString( test )

# strip quotes during parsing
def stripQuotes(s,l,toks):
return toks[0][1:-1]
quotedString.setParseAction( stripQuotes )
print OneOrMore( quotedString | Word(printables) ).parseString( test )
==

returns:
['spam', 'it don\\'t mean a thing', 'the life of brian', '42',
'\'the meaning of life\'', 'grail']
['spam', it don\\'t mean a thing, 'the life of brian', '42', 'the
meaning of life', 'grail']

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Quote-aware string splitting

2005-04-26 Thread Jeffrey Froman
Bengt Richter wrote:

 Oops, note some spaces inside quotes near ss and missing double quotes in
 result.

And here I thought the main problem with my answer was that it didn't split
unquoted segments into separate words at all! Clearly I missed the
generalization being sought, and a more robust solution is in order.
Fortunately, others have been forthcoming with them.

Thank you,
Jeffrey
-- 
http://mail.python.org/mailman/listinfo/python-list


Quote-aware string splitting

2005-04-25 Thread J. W. McCall
Hello,
I need to split a string as per string.strip(), but with a modification: 
I want it to recognize quoted strings and return them as one list item, 
regardless of any whitespace within the quoted string.

For example, given the string:
'spam the life of brian 42'
I'd want it to return:
['spam', 'the life of brian', '42']
I see no standard library function to do this, so what would be the most 
simple way to achieve this?  This should be simple, but I must be tired 
as I'm not currently able to think of an elegant way to do this.

Any ideas?
Thanks,
J. W. McCall
--
http://mail.python.org/mailman/listinfo/python-list


Re: Quote-aware string splitting

2005-04-25 Thread Tim Heaney
J. W. McCall [EMAIL PROTECTED] writes:

 I need to split a string as per string.strip(), but with a
 modification: I want it to recognize quoted strings and return them as
 one list item, regardless of any whitespace within the quoted string.

 For example, given the string:

 'spam the life of brian 42'

 I'd want it to return:

 ['spam', 'the life of brian', '42']

 I see no standard library function to do this, so what would be the
 most simple way to achieve this?  This should be simple, but I must be
 tired as I'm not currently able to think of an elegant way to do this.

 Any ideas?

How about the csv module? It seems like it might be overkill, but it
does already handle that sort of quoting

   import csv
   csv.reader(['spam the life of brian 42'], delimiter=' ').next()
  ['spam', 'the life of brian', '42']

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Quote-aware string splitting

2005-04-25 Thread Tony Meyer
 I need to split a string as per string.strip(), but with a 
 modification: 
 I want it to recognize quoted strings and return them as one 
 list item, 
 regardless of any whitespace within the quoted string.

See the recent python-tutor thread starting here:

http://mail.python.org/pipermail/tutor/2005-April/037288.html

For various solutions.  Or just use a regular expression, which is what the
thread concludes.

=Tony.Meyer

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Quote-aware string splitting

2005-04-25 Thread George Sakkis
 J. W. McCall [EMAIL PROTECTED] writes:
 
  I need to split a string as per string.strip(), but with a
  modification: I want it to recognize quoted strings and return them
as
  one list item, regardless of any whitespace within the quoted
string.
 
  For example, given the string:
 
  'spam the life of brian 42'
 
  I'd want it to return:
 
  ['spam', 'the life of brian', '42']
 
  I see no standard library function to do this, so what would be the
  most simple way to achieve this?  This should be simple, but I must
be
  tired as I'm not currently able to think of an elegant way to do
this.
 
  Any ideas?

 How about the csv module? It seems like it might be overkill, but it
 does already handle that sort of quoting

import csv
csv.reader(['spam the life of brian 42'], delimiter='
').next()
   ['spam', 'the life of brian', '42']



I don't know if this is as good as CSV's splitter, but it works
reasonably well for me:

import re
regex = re.compile(r'''
   '.*?' |  # single quoted substring
   .*? |  # double quoted substring
   \S+  # all the rest
   ''', re.VERBOSE)

print regex.findall('''
This is 'single quoted string'
followed by a double 'quoted' string
''')

George

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Quote-aware string splitting

2005-04-25 Thread Jeffrey Froman
J. W. McCall wrote:

 For example, given the string:
 
 'spam the life of brian 42'
 
 I'd want it to return:
 
 ['spam', 'the life of brian', '42']

The .split() method of strings can take a substring, such as a quotation
mark, as a delimiter. So a simple solution is:

 x = 'spam the life of brian 42'
 [z.strip() for z in x.split('')]
['spam', 'the life of brian', '42']


Jeffrey
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Quote-aware string splitting

2005-04-25 Thread George Sakkis
 import re
 regex = re.compile(r'''
'.*?' |  # single quoted substring
.*? |  # double quoted substring
\S+  # all the rest
''', re.VERBOSE)

Oh, and if your strings may span more than one line, replace re.VERBOSE
with re.VERBOSE | re.DOTALL.

George

-- 
http://mail.python.org/mailman/listinfo/python-list