Re: Newbie regular expression ?

2005-10-04 Thread len
Thanks everyone for your help.

I took the option of f1.lower().startswith("unq").

Len Sumnler

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie regular expression ?

2005-10-04 Thread Steve Holden
len wrote:
> I have the following statement and it works fine;
> 
> list1 = glob.glob('*.dat')
> 
> however I now have an additional requirement the the string must begin
> with
> any form of "UNQ,Unq,unq,..."
> 
> as an example if I had the following four files in the directory:
> 
> unq123abc.dat
> xy4223.dat
> myfile.dat
> UNQxyc123489-24.dat
> 
> only unq123abc.dat and UNQxyc123489-24.dat would be selected
> 
> I have read through the documentation and I am now so
> confussedd!!
> 
You don't need regular expressions. You want

 list1 = glob.glob("[Uu][Nn][Qq]*.dat")

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006  www.python.org/pycon/

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie regular expression ?

2005-10-04 Thread Micah Elliott
On Oct 04, Micah Elliott wrote:
>$ man 3 fnmatch

Actually "man 7 glob" would be better (assuming you've got *nix). Also
note that globs are not regular expressions.  "pydoc glob" is another
reference.

-- 
Micah Elliott

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie regular expression ?

2005-10-04 Thread Fredrik Lundh
"len" <[EMAIL PROTECTED]> wrote:

>I have the following statement and it works fine;
>
>list1 = glob.glob('*.dat')

that's a glob pattern, not a regular expression.

> however I now have an additional requirement the the string must begin
> with any form of "UNQ,Unq,unq,..."

list1 = glob.glob('*.dat')
list1 = [file for file in list1 if file.lower().startswith("unq")]

 



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie regular expression ?

2005-10-04 Thread Micah Elliott
On Oct 04, len wrote:
> I have the following statement and it works fine;
> 
> list1 = glob.glob('*.dat')
> 
> however I now have an additional requirement the the string must begin
> with any form of "UNQ,Unq,unq,..."
> 
> as an example if I had the following four files in the directory:
> 
> unq123abc.dat
> xy4223.dat
> myfile.dat
> UNQxyc123489-24.dat
> 
> only unq123abc.dat and UNQxyc123489-24.dat would be selected

If glob is your preferred means, one option is:

   $ touch unq1.dat UnQ1.dat unQ1.dat UNQ1.dat foo.dat
   $ python -c '
   - import glob
   - print glob.glob("[uU][nN][qQ]*.dat")
   - '
   ['unq1.dat', 'UnQ1.dat', 'unQ1.dat', 'UNQ1.dat']
   $ man 3 fnmatch

-- 
Micah Elliott

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie regular expression ?

2005-10-04 Thread jepler
Here are two ideas that come to mind:
files = glob.glob("UNQ*.dat") + glob.glob("Unq*.dat") + glob.glob("unq.dat")

files = [f for f in glob.glob("*.dat") if f[:3] in ("UNQ", "Unq", "unq")]

Jeff


pgp30Rue2EGi7.pgp
Description: PGP signature
-- 
http://mail.python.org/mailman/listinfo/python-list

Newbie regular expression ?

2005-10-04 Thread len
I have the following statement and it works fine;

list1 = glob.glob('*.dat')

however I now have an additional requirement the the string must begin
with
any form of "UNQ,Unq,unq,..."

as an example if I had the following four files in the directory:

unq123abc.dat
xy4223.dat
myfile.dat
UNQxyc123489-24.dat

only unq123abc.dat and UNQxyc123489-24.dat would be selected

I have read through the documentation and I am now so
confussedd!!

Len Sumnler

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie regular expression and whitespace question

2005-09-22 Thread Paul McGuire
"Fredrik Lundh" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]

>
> > timeit -s "import test" "test.test3()"
> 100 loops, best of 3: 6.73 msec per loop
>
> > timeit -s "import test" "test.test4()"
> 1 loops, best of 3: 27.8 usec per loop
>
> that's a 240x slowdown.  hmm.
>
> 
>
>
Well, what of it?  How fast does it have to be?  Is it a one-shot
conversion?  People tend to be willing to wait a bit longer for one-time
conversion programs.  What else is going on in this program?  Is this the
bottleneck?  Are we reading the input over the Internet through HTTP?

If I'm running this program and waiting for the results, 7 msec isn't
perceptibly slower than 28 usec - both are going to seem pretty much
instantaneous.  On the other hand, if I'm processing 100 files, then this
goes up to, um, .7 sec vs 3 msec.

There is no question, regexp's beat the pants off of pyparsing in raw
performance.  But this newsgroup has visited the raw performance issue many
times in the past, usually when responding to the "Python can't be very
fast, it's interpreted" argument.  Raw performance is just one aspect in
determining suitability of a given technical approach.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie regular expression and whitespace question

2005-09-22 Thread googleboy
Thanks for the great positive responses.  I was close with what I was
trying,  I guess,  but close only counts in horseshoes and um..
something else that close counts in.

:-)

googleboy

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie regular expression and whitespace question

2005-09-22 Thread George Sakkis
"googleboy" <[EMAIL PROTECTED]> wrote:
> Hi.
>
> I am trying to collapse an html table into a single line.  Basically,
> anytime I see ">" & "<" with nothing but whitespace between them,  I'd
> like to remove all the whitespace, including newlines. I've read the
> how-to and I have tried a bunch of things,  but nothing seems to work
> for me:
>
> [snip]

As others have shown you already, you need to use the sub method of the re 
module:

import re
regex = re.compile(r'>\s*<')
print regex.sub('><',data)

> For extra kudos (and I confess I have been so stuck on the above
> problem I haven't put much thought into how to do this one) I'd like to
> be able to measure the number of characters between the  & 
> tags, and then insert a newline character at the end of the next word
> after an arbitrary number of characters.   I am reading in to a
> script a bunch of paragraphs formatted for a webpage, but they're all
> on one big long line and I would like to split them for readability.

What I guess you want to do is wrap some text. Do not reinvent the wheel, 
there's already a module
for that:

import textwrap
print textwrap.fill(oneBigLongLine, 60)

HTH,
George


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie regular expression and whitespace question

2005-09-22 Thread Fredrik Lundh
Paul McGuire wrote:

> If you're absolutely stuck on using RE's, then others will have to step
> forward.  Meanwhile, here's a pyparsing solution (get pyparsing at
> http://pyparsing.sourceforge.net):

so, let's see.  using ...

from pyparsing import *
import re

data = """ ... table example from op ... """

def test1():
LT = Literal("<")
GT = Literal(">")
collapsableSpace = GT + LT
collapsableSpace.setParseAction( replaceWith("><") )
return collapsableSpace.transformString(data)

def test2():
return re.sub(">\s+<", "><", data)

I get

> timeit -s "import test" "test.test1()"
100 loops, best of 3: 6.8 msec per loop

> timeit -s "import test" "test.test2()"
1 loops, best of 3: 33.3 usec per loop

or in other words, five lines instead of one, and a 200x slowdown.

but alright, maybe we should precompile the expressions to get a
fair comparision.  adding

LT = Literal("<")
GT = Literal(">")
collapsableSpace = GT + LT
collapsableSpace.setParseAction( replaceWith("><") )

def test3():
return collapsableSpace.transformString(data)

p = re.compile(">\s+<")

def test4():
return p.sub("><", data)

to the first program, I get

> timeit -s "import test" "test.test3()"
100 loops, best of 3: 6.73 msec per loop

> timeit -s "import test" "test.test4()"
1 loops, best of 3: 27.8 usec per loop

that's a 240x slowdown.  hmm.

 



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie regular expression and whitespace question

2005-09-22 Thread Paul McGuire
"googleboy" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Hi.
>
> I am trying to collapse an html table into a single line.  Basically,
> anytime I see ">" & "<" with nothing but whitespace between them,  I'd
> like to remove all the whitespace, including newlines. I've read the
> how-to and I have tried a bunch of things,  but nothing seems to work
> for me:
>
> --
>
> table = open(r'D:\path\to\tabletest.txt', 'rb')
> strTable = table.read()
>
> #Below find the different sort of things I have tried, one at a time:
>
> strTable = strTable.replace(">\s<", "><") #I got this from the module
> docs
>
> strTable = strTable.replace(">.<", "><")
>
> strTable = ">\s+<".join(strTable)
>
> strTable = ">\s<".join(strTable)
>
> print strTable
>
> --
>
> The table in question looks like this:
>
> 
>   
>  
> Introduction
> 3
>   
>   
>  
>   
>   
> ONE
> Childraising for Parrots
> 11
>   
> 
>
>
>
> For extra kudos (and I confess I have been so stuck on the above
> problem I haven't put much thought into how to do this one) I'd like to
> be able to measure the number of characters between the  & 
> tags, and then insert a newline character at the end of the next word
> after an arbitrary number of characters.   I am reading in to a
> script a bunch of paragraphs formatted for a webpage, but they're all
> on one big long line and I would like to split them for readability.
>
> TIA
>
> Googleboy
>
If you're absolutely stuck on using RE's, then others will have to step
forward.  Meanwhile, here's a pyparsing solution (get pyparsing at
http://pyparsing.sourceforge.net):

---
from pyparsing import *

LT = Literal("<")
GT = Literal(">")

collapsableSpace = GT + LT# matches with or without intervening
whitespace
collapsableSpace.setParseAction( replaceWith("><") )

print collapsableSpace.transformString(data)
---

The reason this works is that pyparsing implicitly skips over whitespace
while looking for matches of collapsable space (a '>' followed by a '<').
When found, the parse action is triggered, which in this case, replaces
whatever was matched with the string "><".  Finally, the input data (in this
case your HTML table, stored in the string variable, data) is passed to
transformString, which scans for matches of the collapsableSpace expression,
runs the parse action when they are found, and returns the final transformed
string.

As for word wrapping within ... tags, there are at least two recipes
in the Python Cookbook for word wrapping.  Be careful, though, as many HTML
pages are very bad about omitting the trailing  tags.

-- Paul



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie regular expression and whitespace question

2005-09-22 Thread Bruno Desthuilliers
googleboy a écrit :
> Hi.
> 
> I am trying to collapse an html table into a single line.  Basically,
> anytime I see ">" & "<" with nothing but whitespace between them,  I'd
> like to remove all the whitespace, including newlines. I've read the
> how-to and I have tried a bunch of things,  but nothing seems to work
> for me:
> 
> --
> 
> table = open(r'D:\path\to\tabletest.txt', 'rb')
> strTable = table.read()
> 
> #Below find the different sort of things I have tried, one at a time:
> 
> strTable = strTable.replace(">\s<", "><") #I got this from the module
> docs

 From which module's doc ?

">\s<" is the litteral string ">\s<", not a regular expression. Please 
re-read the re module doc, and the re howto (you'll find a link to it in 
the re module's doc...)
-- 
http://mail.python.org/mailman/listinfo/python-list


Newbie regular expression and whitespace question

2005-09-22 Thread googleboy
Hi.

I am trying to collapse an html table into a single line.  Basically,
anytime I see ">" & "<" with nothing but whitespace between them,  I'd
like to remove all the whitespace, including newlines. I've read the
how-to and I have tried a bunch of things,  but nothing seems to work
for me:

--

table = open(r'D:\path\to\tabletest.txt', 'rb')
strTable = table.read()

#Below find the different sort of things I have tried, one at a time:

strTable = strTable.replace(">\s<", "><") #I got this from the module
docs

strTable = strTable.replace(">.<", "><")

strTable = ">\s+<".join(strTable)

strTable = ">\s<".join(strTable)

print strTable

--

The table in question looks like this:


  
 
Introduction
3
  
  
 
  
  
ONE
Childraising for Parrots
11
  




For extra kudos (and I confess I have been so stuck on the above
problem I haven't put much thought into how to do this one) I'd like to
be able to measure the number of characters between the  & 
tags, and then insert a newline character at the end of the next word
after an arbitrary number of characters.   I am reading in to a
script a bunch of paragraphs formatted for a webpage, but they're all
on one big long line and I would like to split them for readability.

TIA

Googleboy

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Simple (newbie) regular expression question

2005-01-21 Thread André Roberge
John Machin wrote:
André Roberge wrote:
Sorry for the simple question, but I find regular
expressions rather intimidating.  And I've never
needed them before ...
How would I go about to 'define' a regular expression that
would identify strings like
__alphanumerical__  as in __init__
(Just to spell things out, as I have seen underscores disappear
from messages before, that's  2 underscores immediately
followed by an alphanumerical string immediately followed
by 2 underscore; in other words, a python 'private' method).
Simple one-liner would be good.
One-liner with explanation would be better.
One-liner with explanation, and pointer to 'great tutorial'
(for future reference) would probably be ideal.
(I know, google is my friend for that last part. :-)
Andre

Firstly, some corrections: (1) google is your friend for _all_ parts of
your question (2) Python has an initial P and doesn't have private
methods.
Read this:

pat1 = r'__[A-Za-z0-9_]*__'
pat2 = r'__\w*__'
import re
tests = ['x', '__', '', '_', '__!__', '__a__', '__Z__',
'__8__', '__xyzzy__', '__plugh']
[x for x in tests if re.search(pat1, x)]
['', '_', '__a__', '__Z__', '__8__', '__xyzzy__']
[x for x in tests if re.search(pat2, x)]
['', '_', '__a__', '__Z__', '__8__', '__xyzzy__']
I've interpreted your question as meaning "valid Python identifier that
starts and ends with two [implicitly, or more] underscores".
In the two alternative patterns, the part in the middle says "zero or
more instances of a character that can appear in the middle of a Python
identifier". The first pattern spells this out as "capital letters,
small letters, digits, and underscore". The second pattern uses the \w
shorthand to give the same effect.
You should be able to follow that from the Python documentation.
Now, read this: http://www.amk.ca/python/howto/regex/
HTH,
John
Thanks for it all. It does help!
André
--
http://mail.python.org/mailman/listinfo/python-list


Re: Simple (newbie) regular expression question

2005-01-21 Thread John Machin

André Roberge wrote:
> Sorry for the simple question, but I find regular
> expressions rather intimidating.  And I've never
> needed them before ...
>
> How would I go about to 'define' a regular expression that
> would identify strings like
> __alphanumerical__  as in __init__
> (Just to spell things out, as I have seen underscores disappear
> from messages before, that's  2 underscores immediately
> followed by an alphanumerical string immediately followed
> by 2 underscore; in other words, a python 'private' method).
>
> Simple one-liner would be good.
> One-liner with explanation would be better.
>
> One-liner with explanation, and pointer to 'great tutorial'
> (for future reference) would probably be ideal.
> (I know, google is my friend for that last part. :-)
>
> Andre

Firstly, some corrections: (1) google is your friend for _all_ parts of
your question (2) Python has an initial P and doesn't have private
methods.

Read this:

>>> pat1 = r'__[A-Za-z0-9_]*__'
>>> pat2 = r'__\w*__'
>>> import re
>>> tests = ['x', '__', '', '_', '__!__', '__a__', '__Z__',
'__8__', '__xyzzy__', '__plugh']
>>> [x for x in tests if re.search(pat1, x)]
['', '_', '__a__', '__Z__', '__8__', '__xyzzy__']
>>> [x for x in tests if re.search(pat2, x)]
['', '_', '__a__', '__Z__', '__8__', '__xyzzy__']
>>>

I've interpreted your question as meaning "valid Python identifier that
starts and ends with two [implicitly, or more] underscores".

In the two alternative patterns, the part in the middle says "zero or
more instances of a character that can appear in the middle of a Python
identifier". The first pattern spells this out as "capital letters,
small letters, digits, and underscore". The second pattern uses the \w
shorthand to give the same effect.
You should be able to follow that from the Python documentation.
Now, read this: http://www.amk.ca/python/howto/regex/

HTH,

John

--
http://mail.python.org/mailman/listinfo/python-list


Simple (newbie) regular expression question

2005-01-21 Thread André Roberge
Sorry for the simple question, but I find regular
expressions rather intimidating.  And I've never
needed them before ...
How would I go about to 'define' a regular expression that
would identify strings like
__alphanumerical__  as in __init__
(Just to spell things out, as I have seen underscores disappear
from messages before, that's  2 underscores immediately
followed by an alphanumerical string immediately followed
by 2 underscore; in other words, a python 'private' method).
Simple one-liner would be good.
One-liner with explanation would be better.
One-liner with explanation, and pointer to 'great tutorial'
(for future reference) would probably be ideal.
(I know, google is my friend for that last part. :-)
Andre
--
http://mail.python.org/mailman/listinfo/python-list