question on regular expressions
I'm stuck. I'm trying to make this: file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C %5Cfolderx%5Cfoldery%5Cmydoc2.pdf (no linebreaks) look like this: ./mydoc1.pdf,./mydoc2.pdf my regular expression abilities are dismal. I won't list all the unsuccessful things I've tried, in a nutshell, the greedy operators are messing me up, truncating the output to ./mydoc2.pdf. Could someone offer a suggestion? Thanks, Darren -- http://mail.python.org/mailman/listinfo/python-list
Re: question on regular expressions
Darren Dale [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I'm stuck. I'm trying to make this: file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C %5Cfolderx%5Cfoldery%5Cmydoc2.pdf (no linebreaks) look like this: ./mydoc1.pdf,./mydoc2.pdf my regular expression abilities are dismal. I won't list all the unsuccessful things I've tried, in a nutshell, the greedy operators are messing me up, truncating the output to ./mydoc2.pdf. Could someone offer a suggestion? Thanks, Darren from os.path import basename import urllib url = 'file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf' print './%s'%basename(urllib.url2pathname(url)) HTH, Sean -- http://mail.python.org/mailman/listinfo/python-list
RE: question on regular expressions
Darren Dale wrote: I'm stuck. I'm trying to make this: file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C %5Cfolderx%5Cfoldery%5Cmydoc2.pdf (no linebreaks) look like this: ./mydoc1.pdf,./mydoc2.pdf Regular expressions are much easier to write when you only have to worry about single characters. So the first step might be to replace all of the %5C's with \: a 'file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C%5Cfolderx%5Cfoldery %5Cmydoc2.pdf' a = a.replace(%5C, \\) a 'file://C:\\folder1\\folder2\\mydoc1.pdf,file://C\\folderx\\foldery\\myd oc2.pdf' Then you can use something like: re.findall(r([^\\]*\.[^,]*)(?:,|$), a) ['mydoc1.pdf', 'mydoc2.pdf'] ...or Sean Ross' suggestion about urllib. Robert Brewer MIS Amor Ministries [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: question on regular expressions
Michael Fuhr wrote: Darren Dale [EMAIL PROTECTED] writes: I'm stuck. I'm trying to make this: file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C %5Cfolderx%5Cfoldery%5Cmydoc2.pdf (no linebreaks) look like this: ./mydoc1.pdf,./mydoc2.pdf my regular expression abilities are dismal. This works for the example string you gave: newstring = re.sub(r'[^,]*%5[Cc]', './', examplestring) This replaces all instances of zero or more non-commas that are followed by '%5C' or '%5c' with './'. Greediness causes the pattern to replace everything up to the last '%5C' before a comma or the end of the string. Regular expressions aren't the only way to do what you want. Python has standard modules for parsing URLs and file paths -- take a look at urlparse, urllib/urllib2, and os.path. Thanks to both of you. I thought re's were appropriate because the string I gave is buried in an xml file. A more representative example is: [...snip...]urlfile://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf/url[...snip... data]urlfile://C%5Cfolderx%5Cfoldery%5Cmydoc2.pdf/url[...snip...] -- http://mail.python.org/mailman/listinfo/python-list