Re: String splitting with exceptions
Le mercredi 28 août 2013 18:44:53 UTC+2, John Levine a écrit : I have a crufty old DNS provisioning system that I'm rewriting and I hope improving in python. (It's based on tinydns if you know what that is.) The record formats are, in the worst case, like this: foo.[DOM]::[IP6::4361:6368:6574]:600:: What I would like to do is to split this string into a list like this: [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ] Colons are separators except when they're inside square brackets. I have been messing around with re.split() and re.findall() and haven't been able to come up with either a working separator pattern for split() or a working field pattern for findall(). I came pretty close with findall() but can't get it to reliably match the nothing between two adjacent colons not inside brackets. Any suggestions? I realize I could do it in a loop where I pick stuff off the front of the string, but yuck. This is in python 2.7.5. -- Regards, John Levine, jo...@iecc.com, Primary Perpetrator of The Internet for Dummies, Please consider the environment before reading this e-mail. http://jl.ly -- Basic idea: protect - split - unprotect s = 'foo.[DOM]::[IP6::4361:6368:6574]:600::' r = s.replace('[IP6::', '***') a = r.split('::') a ['foo.[DOM]', '***4361:6368:6574]:600', ''] a[1] = a[1].replace('***', '[IP6::') a ['foo.[DOM]', '[IP6::4361:6368:6574]:600', ''] jmf -- http://mail.python.org/mailman/listinfo/python-list
String splitting with exceptions
I have a crufty old DNS provisioning system that I'm rewriting and I hope improving in python. (It's based on tinydns if you know what that is.) The record formats are, in the worst case, like this: foo.[DOM]::[IP6::4361:6368:6574]:600:: What I would like to do is to split this string into a list like this: [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ] Colons are separators except when they're inside square brackets. I have been messing around with re.split() and re.findall() and haven't been able to come up with either a working separator pattern for split() or a working field pattern for findall(). I came pretty close with findall() but can't get it to reliably match the nothing between two adjacent colons not inside brackets. Any suggestions? I realize I could do it in a loop where I pick stuff off the front of the string, but yuck. This is in python 2.7.5. -- Regards, John Levine, jo...@iecc.com, Primary Perpetrator of The Internet for Dummies, Please consider the environment before reading this e-mail. http://jl.ly -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting with exceptions
The record formats are, in the worst case, like this: foo.[DOM]::[IP6::4361:6368:6574]:600:: Any suggestions? Write a little parser that can handle the record format? Skip -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting with exceptions
On Wed, Aug 28, 2013, at 12:44, John Levine wrote: I have a crufty old DNS provisioning system that I'm rewriting and I hope improving in python. (It's based on tinydns if you know what that is.) The record formats are, in the worst case, like this: foo.[DOM]::[IP6::4361:6368:6574]:600:: What I would like to do is to split this string into a list like this: [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ] Colons are separators except when they're inside square brackets. I have been messing around with re.split() and re.findall() and haven't been able to come up with either a working separator pattern for split() or a working field pattern for findall(). I came pretty close with findall() but can't get it to reliably match the nothing between two adjacent colons not inside brackets. Any suggestions? I realize I could do it in a loop where I pick stuff off the front of the string, but yuck. This is in python 2.7.5. Can you have brackets within brackets? If so, this is impossible to deal with within a regex. Otherwise: re.findall('((?:[^[:]|\[[^]]*\])*):?',s) ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', ''] I'm not sure why _your_ list only has one empty string at the end. Is the record always terminated by a colon that is not meant to imply an empty field after it? If so, remove the question mark: re.findall('((?:[^[:]|\[[^]]*\])*):',s) ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', ''] I've done this kind of thing (for validation, not capturing) for email addresses (there are some obscure bits of email address syntax that need it) before, so it came to mind immediately. -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting with exceptions
On 2013-08-28 13:14, random...@fastmail.us wrote: On Wed, Aug 28, 2013, at 12:44, John Levine wrote: I have a crufty old DNS provisioning system that I'm rewriting and I hope improving in python. (It's based on tinydns if you know what that is.) The record formats are, in the worst case, like this: foo.[DOM]::[IP6::4361:6368:6574]:600:: Otherwise: re.findall('((?:[^[:]|\[[^]]*\])*):?',s) ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', ''] I'm not sure why _your_ list only has one empty string at the end. I wondered that. I also wondered about bracketed quoting that doesn't start at the beginning of a field: foo.[one:two]::[IP6::1234:5678:9101]:600:: ^ This might be bogus, or one might want to catch this case. -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting with exceptions
On 2013-08-28, John Levine jo...@iecc.com wrote: I have a crufty old DNS provisioning system that I'm rewriting and I hope improving in python. (It's based on tinydns if you know what that is.) The record formats are, in the worst case, like this: foo.[DOM]::[IP6::4361:6368:6574]:600:: What I would like to do is to split this string into a list like this: [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ] Colons are separators except when they're inside square brackets. I have been messing around with re.split() and re.findall() and haven't been able to come up with either a working separator pattern for split() or a working field pattern for findall(). I came pretty close with findall() but can't get it to reliably match the nothing between two adjacent colons not inside brackets. Any suggestions? I realize I could do it in a loop where I pick stuff off the front of the string, but yuck. A little parser, as Skip suggested, is a good way to go. The brackets make your string context-sensitive, a difficult concept to cleanly parse with a regex. I initially hoped a csv module dialect could work, but the quote character is (currently) hard-coded to be a single, simple character, i.e., I can't tell it to treat [xxx] as xxx. What about Skip's suggestion? A little parser. It might seem crass or something, but it really is easier than musceling a regex into a context sensitive grammer. def dns_split(s): in_brackets = False b = 0 # index of beginning of current string for i, c in enumerate(s): if not in_brackets: if c == [: in_brackets = True elif c == ':': yield s[b:i] b = i+1 elif c == ]: in_brackets = False print(list(dns_split(s))) ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', ''] It'll gag on nested brackets (fixable with a counter) and has no error handling (requires thought), but it's a start. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting with exceptions
On 2013-08-28, Tim Chase python.l...@tim.thechases.com wrote: On 2013-08-28 13:14, random...@fastmail.us wrote: On Wed, Aug 28, 2013, at 12:44, John Levine wrote: I have a crufty old DNS provisioning system that I'm rewriting and I hope improving in python. (It's based on tinydns if you know what that is.) The record formats are, in the worst case, like this: foo.[DOM]::[IP6::4361:6368:6574]:600:: Otherwise: re.findall('((?:[^[:]|\[[^]]*\])*):?',s) ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', ''] I'm not sure why _your_ list only has one empty string at the end. I wondered that. Good point. My little parser fails on that, too. It'll miss *all* final fields. My parser needs if s: yield s[b:] at the end, to operate like str.split, where the empty string is special. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting with exceptions
Neil Cerutti wrote: On 2013-08-28, John Levine jo...@iecc.com wrote: I have a crufty old DNS provisioning system that I'm rewriting and I hope improving in python. (It's based on tinydns if you know what that is.) The record formats are, in the worst case, like this: foo.[DOM]::[IP6::4361:6368:6574]:600:: What I would like to do is to split this string into a list like this: [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ] Colons are separators except when they're inside square brackets. I have been messing around with re.split() and re.findall() and haven't been able to come up with either a working separator pattern for split() or a working field pattern for findall(). I came pretty close with findall() but can't get it to reliably match the nothing between two adjacent colons not inside brackets. Any suggestions? I realize I could do it in a loop where I pick stuff off the front of the string, but yuck. A little parser, as Skip suggested, is a good way to go. The brackets make your string context-sensitive, a difficult concept to cleanly parse with a regex. I initially hoped a csv module dialect could work, but the quote character is (currently) hard-coded to be a single, simple character, i.e., I can't tell it to treat [xxx] as xxx. What about Skip's suggestion? A little parser. It might seem crass or something, but it really is easier than musceling a regex into a context sensitive grammer. def dns_split(s): in_brackets = False b = 0 # index of beginning of current string for i, c in enumerate(s): if not in_brackets: if c == [: in_brackets = True elif c == ':': yield s[b:i] b = i+1 elif c == ]: in_brackets = False I think you need one more yield outside the loop. print(list(dns_split(s))) ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', ''] It'll gag on nested brackets (fixable with a counter) and has no error handling (requires thought), but it's a start. Something similar on top of regex: def split(s): ... start = level = 0 ... for m in re.compile(r[[:\]]).finditer(s): ... if m.group() == [: level += 1 ... elif m.group() == ]: ... assert level ... level -= 1 ... elif level == 0: ... yield s[start:m.start()] ... start = m.end() ... yield s[start:] ... list(split(a[b:c:]:d)) ['a[b:c:]', 'd'] list(split(a[b:c[:]]:d)) ['a[b:c[:]]', 'd'] list(split()) [''] list(split(:)) ['', ''] list(split(:x)) ['', 'x'] list(split([:x])) ['[:x]'] list(split(:[:x])) ['', '[:x]'] list(split(:[:[:]:x])) ['', '[:[:]:x]'] list(split([:::])) ['[:::]'] s = foo.[DOM]::[IP6::4361:6368:6574]:600:: list(split(s)) ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', ''] Note that there is one more empty string which I believe the OP forgot. -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting with exceptions
Can you have brackets within brackets? If so, this is impossible to deal with within a regex. Nope. It's a regular language, not a CFL. Otherwise: re.findall('((?:[^[:]|\[[^]]*\])*):?',s) ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', ''] That seems to do it, thanks. -- Regards, John Levine, jo...@iecc.com, Primary Perpetrator of The Internet for Dummies, Please consider the environment before reading this e-mail. http://jl.ly -- http://mail.python.org/mailman/listinfo/python-list
String splitting by spaces question
Hi everyone, I have to parse a string and splitting it by spaces. The problem is that the string can include substrings comprises by quotations which must mantain the spaces. What I need is to pass from a string like: This is an 'example string' to the following vector: [This, is, an, example string] Which is the best way to achieve this? Thanks in advance! -- http://mail.python.org/mailman/listinfo/python-list
RE: String splitting by spaces question
Hi Everyone, Can we use rsplit function on an array or vector of strings ? it works for one not for vector Alemu -Original Message- From: python-list-bounces+atadesse=sunedison@python.org [mailto:python-list-bounces+atadesse=sunedison@python.org] On Behalf Of Massi Sent: Wednesday, November 23, 2011 10:10 AM To: python-list@python.org Subject: String splitting by spaces question Hi everyone, I have to parse a string and splitting it by spaces. The problem is that the string can include substrings comprises by quotations which must mantain the spaces. What I need is to pass from a string like: This is an 'example string' to the following vector: [This, is, an, example string] Which is the best way to achieve this? Thanks in advance! -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting by spaces question
On 23 November 2011 17:10, Massi massi_...@msn.com wrote: Hi everyone, I have to parse a string and splitting it by spaces. The problem is that the string can include substrings comprises by quotations which must mantain the spaces. What I need is to pass from a string like: This is an 'example string' to the following vector: You mean list [This, is, an, example string] Here's a way: s = This is an 'example string' with 'quotes again' [x for i, p in enumerate(s.split(')) for x in ([p] if i%2 else p.split())] ['This', 'is', 'an', 'example string', 'with', 'quotes again'] -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting by spaces question
Alemu Tadesse atade...@sunedison.com wrote: Can we use rsplit function on an array or vector of strings ? it works for one not for vector ... I have to parse a string and splitting it by spaces. The problem is that the string can include substrings comprises by quotations which must mantain the spaces. What I need is to pass from a string like: This is an 'example string' to the following vector: [This, is, an, example string] Which is the best way to achieve this? Thanks in advance! You can use a list comprehension: l2 = [x.rsplit(...) for x in l] But for the original question, maybe the csv module would be more useful: you can change delimiters and quotechars to match your input: import csv reader = csv.reader(open(foo.txt, rb), delimiter=' ', quotechar=') for row in reader: print row Nick -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting by spaces question
On Wed, Nov 23, 2011 at 12:10 PM, Massi massi_...@msn.com wrote: Hi everyone, I have to parse a string and splitting it by spaces. The problem is that the string can include substrings comprises by quotations which must mantain the spaces. What I need is to pass from a string like: This is an 'example string' to the following vector: [This, is, an, example string] Which is the best way to achieve this? This sounds a lot like the way a shell parses arguments on the command line. If that's your desire, python has a module in the standard library that will help, called shlex (http://docs.python.org/library/shlex.html). Particularly, shlex.split may do exactly what you want out of the box: Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32 import shlex s = This is an 'example string' shlex.split(s) ['This', 'is', 'an', 'example string'] -- Jerry -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting by spaces question
http://docs.python.org/library/shlex.html -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting by spaces question
In article 3f19e4c0-e010-4cb2-9f71-dd09e0d3c...@r9g2000vbw.googlegroups.com, Massi says... Hi everyone, I have to parse a string and splitting it by spaces. The problem is that the string can include substrings comprises by quotations which must mantain the spaces. What I need is to pass from a string like: This is an 'example string' to the following vector: [This, is, an, example string] Which is the best way to achieve this? Thanks in advance! Is this what you want? import shlex lText = This is a 'short string' for you to read. lWords = shlex.split(lText) print lWords produces, ['This', 'is', 'a', 'short string', 'for', 'you', 'to', 'read.'] Shlex can be found under 'Program Frameworks' under 'The Python Standard Library' of ActivePython 2.7 documentation. C:\Source\Python\New -- http://mail.python.org/mailman/listinfo/python-list
Re: String splitting by spaces question
This is an 'example string' Don't for get to watch for things like: Don't, Can't, Won't, I'll, He'll, Hor'davors, Mc'Kinly -- http://mail.python.org/mailman/listinfo/python-list
Re: string splitting
[EMAIL PROTECTED] wrote: Hello, I have thousands of files that look something like this: wisconsin_state.txt french_guiana_district.txt central_african_republic_province.txt I need to extract the string between the *last* underscore and the extention. So based on the files above, I want returned: state district province def extract(s): return s[s.rfind('_')+1:s.rfind('.')] George -- http://mail.python.org/mailman/listinfo/python-list
Re: string splitting
Anyone have any ideas? l = wisconsin_state.txt l.split(.)[0].split(_)[-1] Explanation: --- the split(.)[0] part takes everything before the . the split(_)[-1] part selects in the last element in the list of substrings which are separated by _ -- http://mail.python.org/mailman/listinfo/python-list
string splitting
Hello, I have thousands of files that look something like this: wisconsin_state.txt french_guiana_district.txt central_african_republic_province.txt I need to extract the string between the *last* underscore and the extention. So based on the files above, I want returned: state district province My plan was to use .split or .find but I can't figure out how locate only the last underscore in the filename. Anyone have any ideas? Thanks. R.D. -- http://mail.python.org/mailman/listinfo/python-list
Re: string splitting
On 16 Oct 2006 12:12:38 -0700, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hello, I have thousands of files that look something like this: wisconsin_state.txt french_guiana_district.txt central_african_republic_province.txt I need to extract the string between the *last* underscore and the extention. So based on the files above, I want returned: state district province My plan was to use .split or .find but I can't figure out how locate only the last underscore in the filename. spam = 'central_african_republic_province.txt' spam.split('.')[0].rsplit('_', 1)[-1] 'province' -- Cheers, Simon B [EMAIL PROTECTED] http://www.brunningonline.net/simon/blog/ -- http://mail.python.org/mailman/listinfo/python-list
Re: string splitting
[EMAIL PROTECTED] wrote: Hello, I have thousands of files that look something like this: wisconsin_state.txt french_guiana_district.txt central_african_republic_province.txt I need to extract the string between the *last* underscore and the extention. So based on the files above, I want returned: state district province My plan was to use .split or .find but I can't figure out how locate only the last underscore in the filename. Anyone have any ideas? Thanks. R.D. Hi, Try splitting the string on . and using rfind to find the last instance of _. i.e., myStr = wisconsin_state.txt pieces = myStr.split(.) substr = pieces[0][pieces[0].rfind(_) + 1:] --hiaips -- http://mail.python.org/mailman/listinfo/python-list
Re: string splitting
Much thanks for your replies hiaips Simon! R.D. -- http://mail.python.org/mailman/listinfo/python-list
Re: string splitting
A pair of solutions: s = central_african_republic_province.txt s.rsplit(_, 1)[-1].split(.)[0] 'province' import re p = re.compile(r_ ([^_]+) \., re.VERBOSE) s = \ ... wisconsin_state.txt ... french_guiana_district.txt ... central_african_republic_province.txt p.findall(s) ['state', 'district', 'province'] Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Quote-aware string splitting
Quoted strings are surprisingly stateful, so that using a parser isn't totally out of line. Here is a pyparsing example with some added test cases. Pyparsing's quotedString built-in handles single or double quotes (if you don't want to be this permissive, there are also sglQuotedString and dblQuotedString to choose from), plus escaped quote characters. The snippet below includes two samples. The first 3 lines give the equivalent to other suggestions on this thread. It is followed by a slightly enhanced version that strips quotation marks from any quoted entries. -- Paul (get pyparsing at http://pyparsing.sourceforge.net) == from pyparsing import * test = r'''spam 'it don\'t mean a thing' the life of brian 42 'the meaning of life' grail''' print OneOrMore( quotedString | Word(printables) ).parseString( test ) # strip quotes during parsing def stripQuotes(s,l,toks): return toks[0][1:-1] quotedString.setParseAction( stripQuotes ) print OneOrMore( quotedString | Word(printables) ).parseString( test ) == returns: ['spam', 'it don\\'t mean a thing', 'the life of brian', '42', '\'the meaning of life\'', 'grail'] ['spam', it don\\'t mean a thing, 'the life of brian', '42', 'the meaning of life', 'grail'] -- http://mail.python.org/mailman/listinfo/python-list
Re: Quote-aware string splitting
Bengt Richter wrote: Oops, note some spaces inside quotes near ss and missing double quotes in result. And here I thought the main problem with my answer was that it didn't split unquoted segments into separate words at all! Clearly I missed the generalization being sought, and a more robust solution is in order. Fortunately, others have been forthcoming with them. Thank you, Jeffrey -- http://mail.python.org/mailman/listinfo/python-list
Quote-aware string splitting
Hello, I need to split a string as per string.strip(), but with a modification: I want it to recognize quoted strings and return them as one list item, regardless of any whitespace within the quoted string. For example, given the string: 'spam the life of brian 42' I'd want it to return: ['spam', 'the life of brian', '42'] I see no standard library function to do this, so what would be the most simple way to achieve this? This should be simple, but I must be tired as I'm not currently able to think of an elegant way to do this. Any ideas? Thanks, J. W. McCall -- http://mail.python.org/mailman/listinfo/python-list
Re: Quote-aware string splitting
J. W. McCall [EMAIL PROTECTED] writes: I need to split a string as per string.strip(), but with a modification: I want it to recognize quoted strings and return them as one list item, regardless of any whitespace within the quoted string. For example, given the string: 'spam the life of brian 42' I'd want it to return: ['spam', 'the life of brian', '42'] I see no standard library function to do this, so what would be the most simple way to achieve this? This should be simple, but I must be tired as I'm not currently able to think of an elegant way to do this. Any ideas? How about the csv module? It seems like it might be overkill, but it does already handle that sort of quoting import csv csv.reader(['spam the life of brian 42'], delimiter=' ').next() ['spam', 'the life of brian', '42'] -- http://mail.python.org/mailman/listinfo/python-list
RE: Quote-aware string splitting
I need to split a string as per string.strip(), but with a modification: I want it to recognize quoted strings and return them as one list item, regardless of any whitespace within the quoted string. See the recent python-tutor thread starting here: http://mail.python.org/pipermail/tutor/2005-April/037288.html For various solutions. Or just use a regular expression, which is what the thread concludes. =Tony.Meyer -- http://mail.python.org/mailman/listinfo/python-list
Re: Quote-aware string splitting
J. W. McCall [EMAIL PROTECTED] writes: I need to split a string as per string.strip(), but with a modification: I want it to recognize quoted strings and return them as one list item, regardless of any whitespace within the quoted string. For example, given the string: 'spam the life of brian 42' I'd want it to return: ['spam', 'the life of brian', '42'] I see no standard library function to do this, so what would be the most simple way to achieve this? This should be simple, but I must be tired as I'm not currently able to think of an elegant way to do this. Any ideas? How about the csv module? It seems like it might be overkill, but it does already handle that sort of quoting import csv csv.reader(['spam the life of brian 42'], delimiter=' ').next() ['spam', 'the life of brian', '42'] I don't know if this is as good as CSV's splitter, but it works reasonably well for me: import re regex = re.compile(r''' '.*?' | # single quoted substring .*? | # double quoted substring \S+ # all the rest ''', re.VERBOSE) print regex.findall(''' This is 'single quoted string' followed by a double 'quoted' string ''') George -- http://mail.python.org/mailman/listinfo/python-list
Re: Quote-aware string splitting
J. W. McCall wrote: For example, given the string: 'spam the life of brian 42' I'd want it to return: ['spam', 'the life of brian', '42'] The .split() method of strings can take a substring, such as a quotation mark, as a delimiter. So a simple solution is: x = 'spam the life of brian 42' [z.strip() for z in x.split('')] ['spam', 'the life of brian', '42'] Jeffrey -- http://mail.python.org/mailman/listinfo/python-list
Re: Quote-aware string splitting
import re regex = re.compile(r''' '.*?' | # single quoted substring .*? | # double quoted substring \S+ # all the rest ''', re.VERBOSE) Oh, and if your strings may span more than one line, replace re.VERBOSE with re.VERBOSE | re.DOTALL. George -- http://mail.python.org/mailman/listinfo/python-list