subject:"question on regular expressions"

question on regular expressions

2004-12-03 Thread Darren Dale

I'm stuck. I'm trying to make this:

file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
%5Cfolderx%5Cfoldery%5Cmydoc2.pdf

(no linebreaks) look like this:

./mydoc1.pdf,./mydoc2.pdf

my regular expression abilities are dismal. I won't list all the
unsuccessful things I've tried, in a nutshell, the greedy operators are
messing me up, truncating the output to ./mydoc2.pdf. Could someone offer a
suggestion?

Thanks,
Darren
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: question on regular expressions

2004-12-03 Thread Sean Ross

Darren Dale [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 I'm stuck. I'm trying to make this:

 file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
 %5Cfolderx%5Cfoldery%5Cmydoc2.pdf

 (no linebreaks) look like this:

 ./mydoc1.pdf,./mydoc2.pdf

 my regular expression abilities are dismal. I won't list all the
 unsuccessful things I've tried, in a nutshell, the greedy operators are
 messing me up, truncating the output to ./mydoc2.pdf. Could someone offer
a
 suggestion?

 Thanks,
 Darren

from os.path import basename
import urllib

url = 'file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf'
print './%s'%basename(urllib.url2pathname(url))

HTH,
Sean



-- 
http://mail.python.org/mailman/listinfo/python-list

RE: question on regular expressions

2004-12-03 Thread Robert Brewer

Darren Dale wrote:
 I'm stuck. I'm trying to make this:
 
 file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
 %5Cfolderx%5Cfoldery%5Cmydoc2.pdf
 
 (no linebreaks) look like this:
 
 ./mydoc1.pdf,./mydoc2.pdf

Regular expressions are much easier to write when you only have to worry
about single characters. So the first step might be to replace all of
the %5C's with \:

 a
'file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C%5Cfolderx%5Cfoldery
%5Cmydoc2.pdf'
 a = a.replace(%5C, \\)
 a
'file://C:\\folder1\\folder2\\mydoc1.pdf,file://C\\folderx\\foldery\\myd
oc2.pdf'


Then you can use something like:

 re.findall(r([^\\]*\.[^,]*)(?:,|$), a)
['mydoc1.pdf', 'mydoc2.pdf']

...or Sean Ross' suggestion about urllib.


Robert Brewer
MIS
Amor Ministries
[EMAIL PROTECTED]
--
http://mail.python.org/mailman/listinfo/python-list

Re: question on regular expressions

2004-12-03 Thread Darren Dale

Michael Fuhr wrote:

 Darren Dale [EMAIL PROTECTED] writes:
 
 I'm stuck. I'm trying to make this:

 file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
 %5Cfolderx%5Cfoldery%5Cmydoc2.pdf

 (no linebreaks) look like this:

 ./mydoc1.pdf,./mydoc2.pdf

 my regular expression abilities are dismal.
 
 This works for the example string you gave:
 
 newstring = re.sub(r'[^,]*%5[Cc]', './', examplestring)
 
 This replaces all instances of zero or more non-commas that are
 followed by '%5C' or '%5c' with './'.  Greediness causes the pattern
 to replace everything up to the last '%5C' before a comma or the
 end of the string.
 
 Regular expressions aren't the only way to do what you want.  Python
 has standard modules for parsing URLs and file paths -- take a look
 at urlparse, urllib/urllib2, and os.path.
 

Thanks to both of you. I thought re's were appropriate because the string I
gave is buried in an xml file. A more representative example is:

[...snip...]urlfile://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf/url[...snip...
data]urlfile://C%5Cfolderx%5Cfoldery%5Cmydoc2.pdf/url[...snip...]
-- 
http://mail.python.org/mailman/listinfo/python-list

question on regular expressions

Re: question on regular expressions

RE: question on regular expressions

Re: question on regular expressions

4 matches

Site Navigation

Mail list logo

Footer information