subject:"Help with regex"

Re: help with regex

2014-10-08 Thread Ben Finney

Peter Otten <__pete...@web.de> writes:

> >>> pattern = re.compile("(\d+)$")
> >>> match = pattern.search( "LINE: 235 : Primary Shelf Number (attempt 1): 1")
> >>> match.group()
> '1'

An alternative way to accomplish the above using the ‘match’ method::

>>> import re
>>> pattern = re.compile("^.*:(? *)(\d+)$")
>>> match = pattern.match("LINE: 235 : Primary Shelf Number (attempt 1): 1")
>>> match.groups()
('1',)

> See 

Right. Always refer to the API documentation for the API you're
attempting to use.

-- 
 \“Without cultural sanction, most or all of our religious |
  `\  beliefs and rituals would fall into the domain of mental |
_o__) disturbance.” —John F. Schumaker |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: help with regex

2014-10-07 Thread Peter Otten

James Smith wrote:

> I want the last "1"
> I can't this to work:
> 
 pattern=re.compile( "(\d+)$" )
 match=pattern.match( "LINE: 235 : Primary Shelf Number (attempt 1): 1")
 print match.group()

>>> pattern = re.compile("(\d+)$")
>>> match = pattern.search( "LINE: 235 : Primary Shelf Number (attempt 1): 
1")
>>> match.group()
'1'


See 

-- 
https://mail.python.org/mailman/listinfo/python-list

help with regex

2014-10-07 Thread James Smith

I want the last "1"
I can't this to work:

>>> pattern=re.compile( "(\d+)$" )
>>> match=pattern.match( "LINE: 235 : Primary Shelf Number (attempt 1): 1")
>>> print match.group()
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Help with regex needed

2011-04-12 Thread D'Arcy J.M. Cain

On Tue, 12 Apr 2011 22:11:55 +0300
Yuri Slobodyanyuk  wrote:
> Thanks for the insight, while this code will run once a week and
> optimization isn't really a must here, it is
> still  a good idea not to leave half-baked code behind me, especially given
> that it will be running on this server  for the next  13 years ;)

You don't want to embarrass yourself in 13 years, eh?  :-)

> I have one doubt though . Doesn't using the list comprehension here increase
> number of loops per the same string ?

Nope.  It just optimizes the code a bit.

> for nnn in [x.split() for x in hhh]:
> My vision here is that after doing the comprehension it would look:
> for nnn in [1st_field,2nd_field,3rd_filed,...,nth_filed]:

  for nnn in hhh:
becomes
  for nnn in ('a b c', 'd e f', 'g h i'):  # loops three times

  for nnn in [x.split() for x in hhh]:
becomes
  for nnn in [['a','b','c'],['d','e','f'],['g','h','i']: # still three

> ... and therefore would do number of loops equal to number of fields while
> we really need just one ?

The number of loops is still the number of objects (strings) in hhh.
The only difference is that in the former case you need to split the
variable inside the loop causing the production and destruction of an
extra object.

Be careful though.  You can optimize your loops by doing too much in
the for statement.  You would be hard put to find a case where
optimization is worth more than readability.

-- 
D'Arcy J.M. Cain  |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex needed

2011-04-12 Thread Yuri Slobodyanyuk

Thanks for the insight, while this code will run once a week and
optimization isn't really a must here, it is
still  a good idea not to leave half-baked code behind me, especially given
that it will be running on this server  for the next  13 years ;)
I have one doubt though . Doesn't using the list comprehension here increase
number of loops per the same string ?
for nnn in [x.split() for x in hhh]:
My vision here is that after doing the comprehension it would look:
for nnn in [1st_field,2nd_field,3rd_filed,...,nth_filed]:
... and therefore would do number of loops equal to number of fields while
we really need just one ?

Thanks
Yuri


On Tue, Apr 12, 2011 at 3:50 PM, D'Arcy J.M. Cain  wrote:

> On Tue, 12 Apr 2011 15:06:25 +0300
> Yuri Slobodyanyuk  wrote:
> > Thanks everybody , and especially Chris - i used split and it took me 15
> > mins to make it work :)
>
> That's great.  One thing though...
>
> > for nnn in hhh:
> > if nnn.split()[2] == str(time_tuple[1]).strip(' \t\n\r')   and
> > nnn.split()[4]  == str(time_tuple[0]).strip(' \t\n\r') and
>  nnn.split()[3]
> > == str(time_tuple[2]).strip(' \t\n\r')   :
>
> You are running split() on the same string three times and running
> strip on the same time tuple each time through the loop.  I know that
> you shouldn't optimize before testing for bottlenecks but this is just
> egrecious as well as making it more difficult to read.  Consider this.
>
> year = str(time_tuple[0]) # strip() not really needed here
> mon = str(time_tuple[1])
> day = str(time_tuple[2])
>
> for nnn in [x.split() for x in hhh]:
>if nnn[2] == mon and nnn[3] = day and nnn[4] = year:
>
> If strip() were needed you could leave off the argument.  The default
> is to strip all whitespace from both ends.  In fact, read up on locales
> to see why it is a good idea to omit the argument.
>
> --
> D'Arcy J.M. Cain  |  Democracy is three wolves
> http://www.druid.net/darcy/|  and a sheep voting on
> +1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.
>



-- 
Taking challenges one by one.
http://yurisk.info
http://ccie-security-blog.com
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex needed

2011-04-12 Thread D'Arcy J.M. Cain

On Tue, 12 Apr 2011 15:06:25 +0300
Yuri Slobodyanyuk  wrote:
> Thanks everybody , and especially Chris - i used split and it took me 15
> mins to make it work :)

That's great.  One thing though...

> for nnn in hhh:
> if nnn.split()[2] == str(time_tuple[1]).strip(' \t\n\r')   and
> nnn.split()[4]  == str(time_tuple[0]).strip(' \t\n\r') and  nnn.split()[3]
> == str(time_tuple[2]).strip(' \t\n\r')   :

You are running split() on the same string three times and running
strip on the same time tuple each time through the loop.  I know that
you shouldn't optimize before testing for bottlenecks but this is just
egrecious as well as making it more difficult to read.  Consider this.

year = str(time_tuple[0]) # strip() not really needed here
mon = str(time_tuple[1])
day = str(time_tuple[2])

for nnn in [x.split() for x in hhh]:
if nnn[2] == mon and nnn[3] = day and nnn[4] = year:

If strip() were needed you could leave off the argument.  The default
is to strip all whitespace from both ends.  In fact, read up on locales
to see why it is a good idea to omit the argument.

-- 
D'Arcy J.M. Cain  |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex needed

2011-04-12 Thread Yuri Slobodyanyuk

Thanks everybody , and especially Chris - i used split and it took me 15
mins to make it work :)

The final version looks like:

from datetime import datetime, date, time
today_day = datetime.now()
time_tuple= today_day.timetuple()
hhh = open("file_with_data.data",'r')
for nnn in hhh:
if nnn.split()[2] == str(time_tuple[1]).strip(' \t\n\r')   and
nnn.split()[4]  == str(time_tuple[0]).strip(' \t\n\r') and  nnn.split()[3]
== str(time_tuple[2]).strip(' \t\n\r')   :
 print nnn

Cheers and good day everyone
Yuri

On Tue, Apr 12, 2011 at 8:58 AM, Chris Rebert  wrote:

> On Mon, Apr 11, 2011 at 10:20 PM, Yuri Slobodyanyuk
>  wrote:
> > Good day everyone,
> > I am trying to make this pretty simple regex to work but got stuck,
> > I'd appreciate your help .
>
> "Some people, when confronted with a problem, think 'I know, I'll use
> regular expressions.' Now they have two problems."
>
> 
> > 111 Fri 4  8 2011
> > 2323232 Fri 4 15 2011
> > 4343434 Fri 4 22 2011
> > 8522298 Fri 4 29 2011
> > .
> > 5456678 Fri 10 28 2011
> > 563 Fri 11  4 2011
> > 4141411 Fri 11 11 2011
> > 332 Fri 11 18 2011
>
> There's no need to use regexes to parse such a simple file format.
> Just use str.split() [without any arguments] on each line of the file,
> and do the field equality checks yourself; your code will simpler.
> Relevant docs: http://docs.python.org/library/stdtypes.html#str.split
>
> Cheers,
> Chris
> --
> http://blog.rebertia.com
>



-- 
 Taking challenges one by one.
http://yurisk.info
http://ccie-security-blog.com
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex needed

2011-04-11 Thread Chris Rebert

On Mon, Apr 11, 2011 at 10:20 PM, Yuri Slobodyanyuk
 wrote:
> Good day everyone,
> I am trying to make this pretty simple regex to work but got stuck,
> I'd appreciate your help .

"Some people, when confronted with a problem, think 'I know, I'll use
regular expressions.' Now they have two problems."

> 111 Fri 4  8 2011
> 2323232 Fri 4 15 2011
> 4343434 Fri 4 22 2011
> 8522298 Fri 4 29 2011
> .
> 5456678 Fri 10 28 2011
> 563 Fri 11  4 2011
> 4141411 Fri 11 11 2011
> 332 Fri 11 18 2011

There's no need to use regexes to parse such a simple file format.
Just use str.split() [without any arguments] on each line of the file,
and do the field equality checks yourself; your code will simpler.
Relevant docs: http://docs.python.org/library/stdtypes.html#str.split

Cheers,
Chris
--
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex needed

2011-04-11 Thread Chris Rebert

On Mon, Apr 11, 2011 at 10:20 PM, Yuri Slobodyanyuk
 wrote:
> Good day everyone,
> I am trying to make this pretty simple regex to work but got stuck,
> I'd appreciate your help .
> Task: Get current date , then read file of format below, find the line that
> matches
> the current date of month,month and year and extract the number from such
> line.
> Here is what I did , but if i run it at 11 April 2011 ...
>    - regex pattern_match  matches nothing
>    - regex pattern_test  matches the line "4141411 Fri 11 11 2011" , logical
> as it is the last matching line in year 2011 with the date of 11th.
> My question - why regex pattern_match  does not match anything and how to
> make it match the exact line i want.
> Thanks
> Yuri
>
> from datetime import datetime, date, time
> import re
> today_day = datetime.now()
> time_tuple= today_day.timetuple()
> pattern_match = re.compile("([0-9])+ +" +  "Fri +" + str(time_tuple[1]) + "
> +"  +   str(time_tuple[2]) + " +"  +  str(time_tuple[0]) + " +")
   ^^

This trailing " +" *requires* that the lines have trailing spaces. Do
they? Such files typically don't, and your example input doesn't
either (although that may be due to email formatting lossage).

Cheers,
Chris
--
http://blog.rebertia.com

> hhh = open("file_with_data.data",'r')
> pattern_test =  re.compile("([0-9]+)" + ".*"  + " +" +
> str(time_tuple[2]).strip(' \t\n\r') + " +"  + str(time_tuple[0]).strip('
> \t\n\r') )
> for nnn in hhh:
>     if  (re.search(pattern_test,nnn)):
>    print nnn.split()[0]
>
> 111 Fri 4  8 2011
> 2323232 Fri 4 15 2011
> 4343434 Fri 4 22 2011
> 8522298 Fri 4 29 2011
> .
> 5456678 Fri 10 28 2011
> 563 Fri 11  4 2011
> 4141411 Fri 11 11 2011
> 332 Fri 11 18 2011
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex needed

2011-04-11 Thread Kushal Kumaran

On Tue, Apr 12, 2011 at 10:50 AM, Yuri Slobodyanyuk
 wrote:
> Good day everyone,
> I am trying to make this pretty simple regex to work but got stuck,
> I'd appreciate your help .
> Task: Get current date , then read file of format below, find the line that
> matches
> the current date of month,month and year and extract the number from such
> line.
> Here is what I did , but if i run it at 11 April 2011 ...
>    - regex pattern_match  matches nothing
>    - regex pattern_test  matches the line "4141411 Fri 11 11 2011" , logical
> as it is the last matching line in year 2011 with the date of 11th.
> My question - why regex pattern_match  does not match anything and how to
> make it match the exact line i want.
> Thanks
> Yuri

Consider using datetime.strptime to parse dates and times.  You will
have to strip off the first column since it doesn't look like part of
the date itself.

>
> from datetime import datetime, date, time
> import re
> today_day = datetime.now()
> time_tuple= today_day.timetuple()
> pattern_match = re.compile("([0-9])+ +" +  "Fri +" + str(time_tuple[1]) + "
> +"  +   str(time_tuple[2]) + " +"  +  str(time_tuple[0]) + " +")
> hhh = open("file_with_data.data",'r')
> pattern_test =  re.compile("([0-9]+)" + ".*"  + " +" +
> str(time_tuple[2]).strip(' \t\n\r') + " +"  + str(time_tuple[0]).strip('
> \t\n\r') )
> for nnn in hhh:
>     if  (re.search(pattern_test,nnn)):
>    print nnn.split()[0]
>
> 111 Fri 4  8 2011
> 2323232 Fri 4 15 2011
> 4343434 Fri 4 22 2011
> 8522298 Fri 4 29 2011
> .
> 5456678 Fri 10 28 2011
> 563 Fri 11  4 2011
> 4141411 Fri 11 11 2011
> 332 Fri 11 18 2011
>
> --
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>



-- 
regards,
kushal
-- 
http://mail.python.org/mailman/listinfo/python-list

Help with regex needed

2011-04-11 Thread Yuri Slobodyanyuk

Good day everyone,
I am trying to make this pretty simple regex to work but got stuck,
I'd appreciate your help .
Task: Get current date , then read file of format below, find the line that
matches
the current date of month,month and year and extract the number from such
line.
Here is what I did , but if i run it at 11 April 2011 ...
   - regex pattern_match  matches nothing
   - regex pattern_test  matches the line "4141411 Fri 11 11 2011" , logical
as it is the last matching line in year 2011 with the date of 11th.
My question - why regex pattern_match  does not match anything and how to
make it match the exact line i want.
Thanks
Yuri

from datetime import datetime, date, time
import re
today_day = datetime.now()
time_tuple= today_day.timetuple()
pattern_match = re.compile("([0-9])+ +" +  "Fri +" + str(time_tuple[1]) + "
+"  +   str(time_tuple[2]) + " +"  +  str(time_tuple[0]) + " +")
hhh = open("file_with_data.data",'r')
pattern_test =  re.compile("([0-9]+)" + ".*"  + " +" +
str(time_tuple[2]).strip(' \t\n\r') + " +"  + str(time_tuple[0]).strip('
\t\n\r') )
for nnn in hhh:
if  (re.search(pattern_test,nnn)):
   print nnn.split()[0]

111 Fri 4  8 2011
2323232 Fri 4 15 2011
4343434 Fri 4 22 2011
8522298 Fri 4 29 2011
.
5456678 Fri 10 28 2011
563 Fri 11  4 2011
4141411 Fri 11 11 2011
332 Fri 11 18 2011

--
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: help with regex matching multiple %e

2011-03-03 Thread Matt Funk

Thanks,
works great.

matt

On 3/3/2011 10:53 AM, MRAB wrote:
> On 03/03/2011 17:33, maf...@nmsu.edu wrote:
>> Hi,
>>
>> i have a line that looks something like:
>> 2.234e+04 3.456e+02 7.234e+07 1.543e+04: some description
>>
>> I would like to extract all the numbers. From the python website i
>> got the
>> following expression for matching what in c is %e (i.e. scientific
>> format):
>> (see http://docs.python.org/library/re.html)
>> [-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?
>> And when i apply the pattern (using extra parenthesis around the whole
>> expression) it does match the first number in the line.
>>
>> Is there any way to repeat this pattern to get me all the numbers in the
>> line?
>> I though the following might work, but i doesn't:
>> ([-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?){numToRepeat)
>>
> You're forgetting that the numbers are separated by a space.
>
>> Or will i have to split the line first, then iterate and the apply
>> the match?
>>
>> Any help is greatly appreciated.
>>
> Use re.findall to find all the matches.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: help with regex matching multiple %e

2011-03-03 Thread MRAB


On 03/03/2011 17:33, maf...@nmsu.edu wrote:

Hi,

i have a line that looks something like:
2.234e+04 3.456e+02 7.234e+07 1.543e+04: some description

I would like to extract all the numbers. From the python website i got the
following expression for matching what in c is %e (i.e. scientific
format):
(see http://docs.python.org/library/re.html)
[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?
And when i apply the pattern (using extra parenthesis around the whole
expression) it does match the first number in the line.

Is there any way to repeat this pattern to get me all the numbers in the
line?
I though the following might work, but i doesn't:
([-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?){numToRepeat)


You're forgetting that the numbers are separated by a space.


Or will i have to split the line first, then iterate and the apply the match?

Any help is greatly appreciated.


Use re.findall to find all the matches.
--
http://mail.python.org/mailman/listinfo/python-list

help with regex matching multiple %e

2011-03-03 Thread mafunk

Hi,

i have a line that looks something like:
2.234e+04 3.456e+02 7.234e+07 1.543e+04: some description

I would like to extract all the numbers. From the python website i got the
following expression for matching what in c is %e (i.e. scientific
format):
(see http://docs.python.org/library/re.html)
[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?
And when i apply the pattern (using extra parenthesis around the whole
expression) it does match the first number in the line.

Is there any way to repeat this pattern to get me all the numbers in the
line?
I though the following might work, but i doesn't:
([-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?){numToRepeat)

Or will i have to split the line first, then iterate and the apply the match?

Any help is greatly appreciated.

thanks
matt

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex search-and-replace (Perl to Python)

2010-02-08 Thread Anthra Norell


Schif Schaf wrote:

On Feb 7, 8:57 am, Tim Chase  wrote:
  

Steve Holden wrote:



Really? Under what circumstances does a simple one-for-one character
replacement operation fail?
  

Failure is only defined in the clarified context of what the OP
wants :)  Replacement operations only fail if the OP's desired
output from the above mess doesn't change *all* of the ]/[
characters, but only those with some form of parity (nested or
otherwise).  But if the OP *does* want all of the ]/[ characters
replaced regardless of contextual nature, then yes, replace is a
much better solution than regexps.




I need to do the usual "pipe text through and do various search/
replace" thing fairly often. The above case of having to replace
brackets with braces is only one example. Simple string methods run
out of steam pretty quickly and much of my work relies on using
regular expressions. Yes, I try to keep focused on simplicity, and
often regexes are the simplest solution for my day-to-day needs.
  
Could you post a complex case? It's a kindness to your helpers to 
simplify your case, but if the simplification doesn't cover the full 
scope of your problem you can't expect the suggestions to cover it.


Frederic

--
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex search-and-replace (Perl to Python)

2010-02-07 Thread Schif Schaf

On Feb 7, 8:57 am, Tim Chase  wrote:
> Steve Holden wrote:
>
> > Really? Under what circumstances does a simple one-for-one character
> > replacement operation fail?
>
> Failure is only defined in the clarified context of what the OP
> wants :)  Replacement operations only fail if the OP's desired
> output from the above mess doesn't change *all* of the ]/[
> characters, but only those with some form of parity (nested or
> otherwise).  But if the OP *does* want all of the ]/[ characters
> replaced regardless of contextual nature, then yes, replace is a
> much better solution than regexps.
>

I need to do the usual "pipe text through and do various search/
replace" thing fairly often. The above case of having to replace
brackets with braces is only one example. Simple string methods run
out of steam pretty quickly and much of my work relies on using
regular expressions. Yes, I try to keep focused on simplicity, and
often regexes are the simplest solution for my day-to-day needs.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex search-and-replace (Perl to Python)

2010-02-07 Thread Tim Chase


Steve Holden wrote:

Tim Chase wrote:

And to answer those who are reaching for other non-regex (whether string
translations or .replace(), or pyparsing) solutions, it depends on what
you want to happen in pathological cases like

  s = """Dangling closing]
 with properly [[nested]] and
 complex [properly [nested] text]
 and [improperly [nested] text
 and with some text [straddling
 lines] and with
 dangling opening [brackets
 """
where you'll begin to see the differences.


Really? Under what circumstances does a simple one-for-one character
replacement operation fail?


Failure is only defined in the clarified context of what the OP 
wants :)  Replacement operations only fail if the OP's desired 
output from the above mess doesn't change *all* of the ]/[ 
characters, but only those with some form of parity (nested or 
otherwise).  But if the OP *does* want all of the ]/[ characters 
replaced regardless of contextual nature, then yes, replace is a 
much better solution than regexps.


-tkc


--
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex search-and-replace (Perl to Python)

2010-02-07 Thread Steve Holden

Tim Chase wrote:
> Schif Schaf wrote:
>> On Feb 7, 12:19 am, "Alf P. Steinbach"  wrote:
>>> I haven't used regexps in Python before, but what I did was (1) look
>>> in the
>>> documentation,
> [snip]
>>> 
>>> import re
>>>
>>> text = (
>>>  "Lorem [ipsum] dolor sit amet, consectetur",
>>>  "adipisicing elit, sed do eiusmod tempor",
>>>  "incididunt ut [labore] et [dolore] magna aliqua."
>>>  )
>>>
>>> withbracks = re.compile( r'\[(.+?)\]' )
>>> for line in text:
>>>  print( re.sub( withbracks, r'{\1}', line) )
>>> 
>>
>> Seems like there's magic happening here. There's the `withbracks`
>> regex that applies itself to `line`. But then when `re.sub()` does the
>> replacement operation, it appears to consult the `withbracks` regex on
>> the most recent match it just had.
> 
> I suspect Alf's rustiness with regexps caused him to miss the simpler
> rendition of
> 
>   print withbacks.sub(r'{\1}', line)
> 
> And to answer those who are reaching for other non-regex (whether string
> translations or .replace(), or pyparsing) solutions, it depends on what
> you want to happen in pathological cases like
> 
>   s = """Dangling closing]
>  with properly [[nested]] and
>  complex [properly [nested] text]
>  and [improperly [nested] text
>  and with some text [straddling
>  lines] and with
>  dangling opening [brackets
>  """
> where you'll begin to see the differences.
> 
Really? Under what circumstances does a simple one-for-one character
replacement operation fail?

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010  http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex search-and-replace (Perl to Python)

2010-02-07 Thread Anssi Saari

Schif Schaf  writes:

> (brackets replaced by braces). I can do that with Perl pretty easily:
>
> 
> for (<>) {
> s/\[(.+?)\]/\{$1\}/g;
> print;
> }
> 

Just curious, but since this is just transpose, then why not simply
tr/[]/{}/? I.e. why use a regular expression at all for this?

In python you would do this with 

for line in text: 
 print line.replace('[', '{').replace(']', '}')

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex search-and-replace (Perl to Python)

2010-02-07 Thread Steve Holden

@ Rocteur CC wrote:
> 
> On 07 Feb 2010, at 10:03, Shashwat Anand wrote:
> 
>> Here is one simple solution :
>> >>> intext = """Lorem [ipsum] dolor sit amet, consectetur adipisicing
>> elit, sed do eiusmod tempor  incididunt ut [labore] et [dolore] magna
>> aliqua."""
>>
>> >>> intext.replace('[', '{').replace(']',
>> '}')  
>> 'Lorem {ipsum} dolor sit amet, consectetur adipisicing elit, sed do
>> eiusmod tempor  incididunt ut {labore} et {dolore} magna aliqua.'
>>
>> /Some people, when confronted with a problem, think "I know, I’ll use
>> regular expressions." Now they have two problems./ — Jamie Zawinski
>>  in comp.lang.emacs.
> 
> That is because regular expressions are what we learned in programming
> the shell from sed to awk and ksh and zsh and of course Perl and we've
> read the two books by Jeffrey and much much more!!!
> 
> How do we rethink and relearn how we do things and should we ?
> 
> What is the solution ?
> 
A rigorous focus on programming simplicity.

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010  http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex search-and-replace (Perl to Python)

2010-02-07 Thread Tim Chase


Schif Schaf wrote:

On Feb 7, 12:19 am, "Alf P. Steinbach"  wrote:

I haven't used regexps in Python before, but what I did was (1) look in the
documentation,

[snip]


import re

text = (
 "Lorem [ipsum] dolor sit amet, consectetur",
 "adipisicing elit, sed do eiusmod tempor",
 "incididunt ut [labore] et [dolore] magna aliqua."
 )

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
 print( re.sub( withbracks, r'{\1}', line) )



Seems like there's magic happening here. There's the `withbracks`
regex that applies itself to `line`. But then when `re.sub()` does the
replacement operation, it appears to consult the `withbracks` regex on
the most recent match it just had.


I suspect Alf's rustiness with regexps caused him to miss the 
simpler rendition of


  print withbacks.sub(r'{\1}', line)

And to answer those who are reaching for other non-regex (whether 
string translations or .replace(), or pyparsing) solutions, it 
depends on what you want to happen in pathological cases like


  s = """Dangling closing]
 with properly [[nested]] and
 complex [properly [nested] text]
 and [improperly [nested] text
 and with some text [straddling
 lines] and with
 dangling opening [brackets
 """
where you'll begin to see the differences.

-tkc




--
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex search-and-replace (Perl to Python)

2010-02-07 Thread @ Rocteur CC

On 07 Feb 2010, at 10:03, Shashwat Anand wrote:

Here is one simple solution :
>>> intext = """Lorem [ipsum] dolor sit amet, consectetur  
adipisicing elit, sed do eiusmod tempor  incididunt ut [labore] et  
[dolore] magna aliqua."""

>>> intext.replace('[', '{').replace(']', '}')
'Lorem {ipsum} dolor sit amet, consectetur adipisicing elit, sed do  
eiusmod tempor  incididunt ut {labore} et {dolore} magna aliqua.'

Some people, when confronted with a problem, think "I know, I’ll use  
regular expressions." Now they have two problems. — Jamie Zawinski  
in comp.lang.emacs.

That is because regular expressions are what we learned in programming  
the shell from sed to awk and ksh and zsh and of course Perl and we've  
read the two books by Jeffrey and much much more!!!

How do we rethink and relearn how we do things and should we ?

What is the solution ?

Jerry-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex search-and-replace (Perl to Python)

2010-02-07 Thread Shashwat Anand

Here is one simple solution :
>>> intext = """Lorem [ipsum] dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor  incididunt ut [labore] et [dolore] magna aliqua."""

>>> intext.replace('[', '{').replace(']',
'}')
'Lorem {ipsum} dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor  incididunt ut {labore} et {dolore} magna aliqua.'

*Some people, when confronted with a problem, think "I know, I’ll use
regular expressions." Now they have two problems.* — Jamie
Zawinskiin comp.lang.emacs.


On Sun, Feb 7, 2010 at 11:15 AM, Schif Schaf  wrote:

> On Feb 7, 12:19 am, "Alf P. Steinbach"  wrote:
>
> >
> > I haven't used regexps in Python before, but what I did was (1) look in
> the
> > documentation,
>
> Hm. I checked in the repl, running `import re; help(re)` and the docs
> on the `sub()` method didn't say anything about using back-refs in the
> replacement string. Neat feature though.
>
> > (2) check that it worked.
> >
> > 
> > import re
> >
> > text = (
> >  "Lorem [ipsum] dolor sit amet, consectetur",
> >  "adipisicing elit, sed do eiusmod tempor",
> >  "incididunt ut [labore] et [dolore] magna aliqua."
> >  )
> >
> > withbracks = re.compile( r'\[(.+?)\]' )
> > for line in text:
> >  print( re.sub( withbracks, r'{\1}', line) )
> > 
> >
>
> Seems like there's magic happening here. There's the `withbracks`
> regex that applies itself to `line`. But then when `re.sub()` does the
> replacement operation, it appears to consult the `withbracks` regex on
> the most recent match it just had.
>
> Thanks.
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex search-and-replace (Perl to Python)

2010-02-06 Thread Schif Schaf

On Feb 7, 12:19 am, "Alf P. Steinbach"  wrote:

>
> I haven't used regexps in Python before, but what I did was (1) look in the
> documentation,

Hm. I checked in the repl, running `import re; help(re)` and the docs
on the `sub()` method didn't say anything about using back-refs in the
replacement string. Neat feature though.

> (2) check that it worked.
>
> 
> import re
>
> text = (
>      "Lorem [ipsum] dolor sit amet, consectetur",
>      "adipisicing elit, sed do eiusmod tempor",
>      "incididunt ut [labore] et [dolore] magna aliqua."
>      )
>
> withbracks = re.compile( r'\[(.+?)\]' )
> for line in text:
>      print( re.sub( withbracks, r'{\1}', line) )
> 
>

Seems like there's magic happening here. There's the `withbracks`
regex that applies itself to `line`. But then when `re.sub()` does the
replacement operation, it appears to consult the `withbracks` regex on
the most recent match it just had.

Thanks.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex search-and-replace (Perl to Python)

2010-02-06 Thread Alf P. Steinbach


* Schif Schaf:

Hi,

I've got some text that looks like this:


Lorem [ipsum] dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut [labore] et [dolore] magna aliqua.

and I want to make it look like this:


Lorem {ipsum} dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut {labore} et {dolore} magna aliqua.

(brackets replaced by braces). I can do that with Perl pretty easily:


for (<>) {
s/\[(.+?)\]/\{$1\}/g;
print;
}


but am not able to figure out how to do it with Python. I start out
trying something like:


import re, sys
withbracks = re.compile(r'\[(.+?)\]')
for line in sys.stdin:
mat = withbracks.search(line)
if mat:
# Well, this line has at least one.
# Should be able to use withbracks.sub()
# and mat.group() maybe ... ?
line = withbracks.sub('{' + mat.group(0) + '}', line)
# No, that's not working right.

sys.stdout.write(line)


but then am not sure where to go with that.

How would you do it?


I haven't used regexps in Python before, but what I did was (1) look in the 
documentation, (2) check that it worked.




import re

text = (
"Lorem [ipsum] dolor sit amet, consectetur",
"adipisicing elit, sed do eiusmod tempor",
"incididunt ut [labore] et [dolore] magna aliqua."
)

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
print( re.sub( withbracks, r'{\1}', line) )



Python's equivalent of the Perl snippet seems to be the same number of lines, 
and more clear. :-)



Cheers & hth.,

- Alf
--
http://mail.python.org/mailman/listinfo/python-list

Help with regex search-and-replace (Perl to Python)

2010-02-06 Thread Schif Schaf

Hi,

I've got some text that looks like this:


Lorem [ipsum] dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut [labore] et [dolore] magna aliqua.

and I want to make it look like this:


Lorem {ipsum} dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut {labore} et {dolore} magna aliqua.

(brackets replaced by braces). I can do that with Perl pretty easily:


for (<>) {
s/\[(.+?)\]/\{$1\}/g;
print;
}


but am not able to figure out how to do it with Python. I start out
trying something like:


import re, sys
withbracks = re.compile(r'\[(.+?)\]')
for line in sys.stdin:
mat = withbracks.search(line)
if mat:
# Well, this line has at least one.
# Should be able to use withbracks.sub()
# and mat.group() maybe ... ?
line = withbracks.sub('{' + mat.group(0) + '}', line)
# No, that's not working right.

sys.stdout.write(line)


but then am not sure where to go with that.

How would you do it?

Thanks.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex and optional substring in search string

2009-10-14 Thread Timur Tabi

On Wed, Oct 14, 2009 at 10:30 AM, Zero Piraeus  wrote:

> '(?:etc)' instead of '(etc)' are non-grouping parentheses (since you
> apparently don't care about that bit).

Ah yes, thanks.

> '[^\]]' instead of '[\w\s]' matches "everything except a closing bracket".

I originally had just '[^\]', and I couldn't figure out why it
wouldn't work.  Maybe I need new glasses.

> The '\s*' before the second set of parentheses takes out the leading
> whitespace that would otherwise be returned as part of the match.

And I want that.  The next line of my code is:

description = m.group(2).strip() + "\n\n"

-- 
Timur Tabi
Linux kernel developer at Freescale
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex and optional substring in search string

2009-10-14 Thread Zero Piraeus

:

2009/10/14 Timur Tabi :
> Never mind ... I figured it out.  The middle block should have been [\w
> \s/]*

This is fragile - you'll have to keep adding extra characters to match
if the input turns out to contain them.

 -[]z.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex and optional substring in search string

2009-10-14 Thread Zero Piraeus

:

2009/10/14 Timur Tabi :
> I'm having trouble creating a regex pattern that matches a string that
> has an optional substring in it.  What I'm looking for is a pattern
> that matches both of these strings:
>
> Subject: [PATCH 08/18] This is the patch name
> Subject: This is the patch name
>
> What I want is to extract the "This is the patch name".  I tried this:
>
> m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x)
>
> Unfortunately, the second group appears to be too greedy, and returns
> this:
>
 print m.group(1)
> None
 print m.group(2)
> [PATCH 08/18] Subject line

It's not that the second group is too greedy. The first group isn't
matching what you want it to, because neither \w nor \s match the "/"
inside your brackets. This works for your example input:

>>> import re
>>> pattern = re.compile("Subject:\s*(?:\[[^\]]*\])?\s*(.*)")
>>> for s in (
... "Subject: [PATCH 08/18] This is the patch name",
... "Subject: This is the patch name",
... ):
... re.search(pattern, s).group(1)
...
'This is the patch name'
'This is the patch name'

Going through the changes from your original regex in order:

'(?:etc)' instead of '(etc)' are non-grouping parentheses (since you
apparently don't care about that bit).

'[^\]]' instead of '[\w\s]' matches "everything except a closing bracket".

The '\s*' before the second set of parentheses takes out the leading
whitespace that would otherwise be returned as part of the match.

 -[]z.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex and optional substring in search string

2009-10-14 Thread Timur Tabi

On Oct 14, 9:51 am, Timur Tabi  wrote:
> I'm having trouble creating a regex pattern that matches a string that
> has an optional substring in it.  What I'm looking for is a pattern
> that matches both of these strings:
>
> Subject: [PATCH 08/18] This is the patch name
> Subject: This is the patch name
>
> What I want is to extract the "This is the patch name".  I tried this:
>
> m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x)

Never mind ... I figured it out.  The middle block should have been [\w
\s/]*

-- 
http://mail.python.org/mailman/listinfo/python-list

Help with regex and optional substring in search string

2009-10-14 Thread Timur Tabi

I'm having trouble creating a regex pattern that matches a string that
has an optional substring in it.  What I'm looking for is a pattern
that matches both of these strings:

Subject: [PATCH 08/18] This is the patch name
Subject: This is the patch name

What I want is to extract the "This is the patch name".  I tried this:

m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x)

Unfortunately, the second group appears to be too greedy, and returns
this:

>>> print m.group(1)
None
>>> print m.group(2)
[PATCH 08/18] Subject line
>>>

Can anyone help me?  I'd hate to have to use two regex patterns, one
with the [...] and one without.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex

2009-08-08 Thread Nobody

On Thu, 06 Aug 2009 17:02:44 +0100, MRAB wrote:

> The character class \d is equivalent to [0-9]

Not for Unicode, which is the default in 3.x.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex

2009-08-06 Thread Nobody

On Thu, 06 Aug 2009 14:23:47 -0700, Ethan Furman wrote:

>> [0-9]+ allows any number of leading zeros, which is sometimes undesirable.
>> Using:
>> 
>>  (0|[1-9][0-9]*)
>> 
>> is more robust.
> 
> You make a good point about possibly being undesirable, but I question 
> the assertion that your solution is /more robust/.  If the OP 
> wants/needs to match numbers even with leading zeroes your /more robust/ 
> version fails.

Well, the OP did say:

> The regex should only match the exact above.

I suppose that it depends upon the definition of "exact" ;)

More seriously: failing to produce an error when one is called for is also
a bug.

Personally, unless I knew for certain that the rest of the program would
handle leading zeros correctly (e.g. *not* interpreting the number as
octal), I would try to reject it in the parser. It's usually much easier
to determine the cause of an error raised by the parser than if you allow
bogus data to propagate deep into the program.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex

2009-08-06 Thread John Machin

On Aug 7, 7:23 am, Ethan Furman  wrote:
> Nobody wrote:
> > On Thu, 06 Aug 2009 08:35:57 -0700, Robert Dailey wrote:
>
> >>I'm creating a python script that is going to try to search a text
> >>file for any text that matches my regular expression. The thing it is
> >>looking for is:
>
> >>FILEVERSION #,#,#,#
>
> >>The # symbol represents any number that can be any length 1 or
> >>greater. Example:
>
> >>FILEVERSION 1,45,10082,3
>
> >>The regex should only match the exact above. So far here's what I have
> >>come up with:
>
> >>re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )
>
> > [0-9]+ allows any number of leading zeros, which is sometimes undesirable.
> > Using:
>
> >    (0|[1-9][0-9]*)
>
> > is more robust.
>
> You make a good point about possibly being undesirable, but I question
> the assertion that your solution is /more robust/.  If the OP
> wants/needs to match numbers even with leading zeroes your /more robust/
> version fails.

I'd go further: the OP would probably be better off matching anything
that looked vaguely like an attempt to produce what he wanted e.g.
r"FILEVERSION\s*[0-9,]{3,}" and then taking appropriate action based
on whether that matched a "strictly correct" regex.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex

2009-08-06 Thread Ethan Furman


Nobody wrote:

On Thu, 06 Aug 2009 08:35:57 -0700, Robert Dailey wrote:



I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:

FILEVERSION #,#,#,#

The # symbol represents any number that can be any length 1 or
greater. Example:

FILEVERSION 1,45,10082,3

The regex should only match the exact above. So far here's what I have
come up with:

re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )



[0-9]+ allows any number of leading zeros, which is sometimes undesirable.
Using:

(0|[1-9][0-9]*)

is more robust.


You make a good point about possibly being undesirable, but I question 
the assertion that your solution is /more robust/.  If the OP 
wants/needs to match numbers even with leading zeroes your /more robust/ 
version fails.


~Ethan~

--
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex

2009-08-06 Thread Nobody

On Thu, 06 Aug 2009 08:35:57 -0700, Robert Dailey wrote:

> I'm creating a python script that is going to try to search a text
> file for any text that matches my regular expression. The thing it is
> looking for is:
> 
> FILEVERSION #,#,#,#
> 
> The # symbol represents any number that can be any length 1 or
> greater. Example:
> 
> FILEVERSION 1,45,10082,3
> 
> The regex should only match the exact above. So far here's what I have
> come up with:
> 
> re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )

[0-9]+ allows any number of leading zeros, which is sometimes undesirable.
Using:

(0|[1-9][0-9]*)

is more robust.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex

2009-08-06 Thread Robert Dailey

On Aug 6, 11:12 am, Roman  wrote:
> On 06/08/09 08:35, Robert Dailey wrote:
>
>
>
>
>
> > Hey guys,
>
> > I'm creating a python script that is going to try to search a text
> > file for any text that matches my regular expression. The thing it is
> > looking for is:
>
> > FILEVERSION #,#,#,#
>
> > The # symbol represents any number that can be any length 1 or
> > greater. Example:
>
> > FILEVERSION 1,45,10082,3
>
> > The regex should only match the exact above. So far here's what I have
> > come up with:
>
> > re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )
>
> > This works, but I was hoping for something a bit cleaner. I'm having
> > to create a special case portion of the regex for the last of the 4
> > numbers simply because it doesn't end with a comma like the first 3.
> > Is there a better, more compact, way to write this regex?
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
> Since there cannot be more than one "end of string" you can try this
> expression:
> re.compile( r'FILEVERSION (?:[0-9]+(,|$)){4}' )

I had thought of this but I can't use that either. I have to assume
that someone was silly and put text at the end somewhere, perhaps a
comment. Like so:

FILEVERSION 1,2,3,4 // This is the file version

It would be nice if there was a type of counter for regex. So you
could say 'match only 1 [^,]' or something like that...
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex

2009-08-06 Thread Roman

On 06/08/09 08:35, Robert Dailey wrote:
> Hey guys,
> 
> I'm creating a python script that is going to try to search a text
> file for any text that matches my regular expression. The thing it is
> looking for is:
> 
> FILEVERSION #,#,#,#
> 
> The # symbol represents any number that can be any length 1 or
> greater. Example:
> 
> FILEVERSION 1,45,10082,3
> 
> The regex should only match the exact above. So far here's what I have
> come up with:
> 
> re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )
> 
> This works, but I was hoping for something a bit cleaner. I'm having
> to create a special case portion of the regex for the last of the 4
> numbers simply because it doesn't end with a comma like the first 3.
> Is there a better, more compact, way to write this regex?
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 

Since there cannot be more than one "end of string" you can try this
expression:
re.compile( r'FILEVERSION (?:[0-9]+(,|$)){4}' )
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex

2009-08-06 Thread MRAB


Robert Dailey wrote:

On Aug 6, 11:02 am, MRAB  wrote:

Robert Dailey wrote:

Hey guys,
I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:
FILEVERSION #,#,#,#
The # symbol represents any number that can be any length 1 or
greater. Example:
FILEVERSION 1,45,10082,3
The regex should only match the exact above. So far here's what I have
come up with:
re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )
This works, but I was hoping for something a bit cleaner. I'm having
to create a special case portion of the regex for the last of the 4
numbers simply because it doesn't end with a comma like the first 3.
Is there a better, more compact, way to write this regex?

The character class \d is equivalent to [0-9], and ',' isn't a special
character so it doesn't need to be escaped:

 re.compile(r'FILEVERSION (?:\d+,){3}\d+')


But ',' is a special symbol It's used in this way:
{0,3}

This will match the previous regex 0-3 times. Are you sure commas need
not be escaped?

In any case, your suggestions help to clean it up a bit!


By 'special' I mean ones like '?', '*', '(', etc. ',' isn't special in
that sense.

In fact, the {...} quantifier is special only if it's syntactically
correct, otherwise it's just a literal, eg "a{," and a{} are just
literals.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex

2009-08-06 Thread Robert Dailey

On Aug 6, 11:02 am, MRAB  wrote:
> Robert Dailey wrote:
> > Hey guys,
>
> > I'm creating a python script that is going to try to search a text
> > file for any text that matches my regular expression. The thing it is
> > looking for is:
>
> > FILEVERSION #,#,#,#
>
> > The # symbol represents any number that can be any length 1 or
> > greater. Example:
>
> > FILEVERSION 1,45,10082,3
>
> > The regex should only match the exact above. So far here's what I have
> > come up with:
>
> > re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )
>
> > This works, but I was hoping for something a bit cleaner. I'm having
> > to create a special case portion of the regex for the last of the 4
> > numbers simply because it doesn't end with a comma like the first 3.
> > Is there a better, more compact, way to write this regex?
>
> The character class \d is equivalent to [0-9], and ',' isn't a special
> character so it doesn't need to be escaped:
>
>      re.compile(r'FILEVERSION (?:\d+,){3}\d+')

But ',' is a special symbol It's used in this way:
{0,3}

This will match the previous regex 0-3 times. Are you sure commas need
not be escaped?

In any case, your suggestions help to clean it up a bit!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex

2009-08-06 Thread alex23

On Aug 7, 1:35 am, Robert Dailey  wrote:
> I'm creating a python script that is going to try to search a text
> file for any text that matches my regular expression. The thing it is
> looking for is:
>
> FILEVERSION 1,45,10082,3

Would it be easier to do it without regex? The following is untested
but I would probably do it more like this:

TOKEN = 'FILEVERSION '
for line in file:
  if line.startswith(TOKEN):
version = line[len(TOKEN):]
maj, min, rev, other = version.split(',')
break # if there's only one occurance, otherwise do stuff here
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with regex

2009-08-06 Thread MRAB


Robert Dailey wrote:

Hey guys,

I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:

FILEVERSION #,#,#,#

The # symbol represents any number that can be any length 1 or
greater. Example:

FILEVERSION 1,45,10082,3

The regex should only match the exact above. So far here's what I have
come up with:

re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )

This works, but I was hoping for something a bit cleaner. I'm having
to create a special case portion of the regex for the last of the 4
numbers simply because it doesn't end with a comma like the first 3.
Is there a better, more compact, way to write this regex?


The character class \d is equivalent to [0-9], and ',' isn't a special
character so it doesn't need to be escaped:

re.compile(r'FILEVERSION (?:\d+,){3}\d+')
--
http://mail.python.org/mailman/listinfo/python-list

Help with regex

2009-08-06 Thread Robert Dailey

Hey guys,

I'm creating a python script that is going to try to search a text
file for any text that matches my regular expression. The thing it is
looking for is:

FILEVERSION #,#,#,#

The # symbol represents any number that can be any length 1 or
greater. Example:

FILEVERSION 1,45,10082,3

The regex should only match the exact above. So far here's what I have
come up with:

re.compile( r'FILEVERSION (?:[0-9]+\,){3}[0-9]+' )

This works, but I was hoping for something a bit cleaner. I'm having
to create a special case portion of the regex for the last of the 4
numbers simply because it doesn't end with a comma like the first 3.
Is there a better, more compact, way to write this regex?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with Regex for domain names

2009-08-02 Thread Aahz

In article ,
MRAB   wrote:
>Nobody wrote:
>> On Thu, 30 Jul 2009 10:29:09 -0700, rurpy wrote:
>> 
 regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)')
>>> You might also want to consider that some country
>>> codes such as "co" for Columbia might match more than
>>> you want, for example:
>>>
>>>   re.match(r'[\w\-\.]+\.(?:us|au|de|co)', 'foo.boo.com')
>>>
>>> will match.
>> 
>> ... so put \b at the end, i.e.:
>> 
>> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b')
>> 
>It would still match "www.bbc.co.uk", so you might need:
>
>regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b(?!\.\b)')

If it's a string containing just the candidate domain, you can do

regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)$')
-- 
Aahz (a...@pythoncraft.com)   <*> http://www.pythoncraft.com/

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with Regex for domain names

2009-07-30 Thread MRAB


Nobody wrote:

On Thu, 30 Jul 2009 10:29:09 -0700, rurpy wrote:


regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)')

You might also want to consider that some country
codes such as "co" for Columbia might match more than
you want, for example:

  re.match(r'[\w\-\.]+\.(?:us|au|de|co)', 'foo.boo.com')

will match.


... so put \b at the end, i.e.:

regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b')


It would still match "www.bbc.co.uk", so you might need:

regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b(?!\.\b)')
--
http://mail.python.org/mailman/listinfo/python-list

Re: Help with Regex for domain names

2009-07-30 Thread Nobody

On Thu, 30 Jul 2009 10:29:09 -0700, rurpy wrote:

>> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)')
> 
> You might also want to consider that some country
> codes such as "co" for Columbia might match more than
> you want, for example:
> 
>   re.match(r'[\w\-\.]+\.(?:us|au|de|co)', 'foo.boo.com')
> 
> will match.

... so put \b at the end, i.e.:

regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b')

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with Regex for domain names

2009-07-30 Thread rurpy

On Jul 30, 9:56 am, MRAB  wrote:
> Feyo wrote:
> > I'm trying to figure out how to write efficiently write a regex for
> > domain names with a particular top level domain. Let's say, I want to
> > grab all domain names with country codes .us, .au, and .de.
>
> > I could create three different regexs that would work:
> > regex = re.compile(r'[\w\-\.]+\.us)
> > regex = re.compile(r'[\w\-\.]+\.au)
> > regex = re.compile(r'[\w\-\.]+\.de)
>
> > How would I write one to accommodate all three, or, better yet, to
> > accommodate a list of them that I can pass into a method call? Thanks!
>
>  >
> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)')

You might also want to consider that some country
codes such as "co" for Columbia might match more than
you want, for example:

  re.match(r'[\w\-\.]+\.(?:us|au|de|co)', 'foo.boo.com')

will match.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with Regex for domain names

2009-07-30 Thread Feyo

On Jul 30, 11:56 am, MRAB  wrote:
> Feyo wrote:
> > I'm trying to figure out how to write efficiently write a regex for
> > domain names with a particular top level domain. Let's say, I want to
> > grab all domain names with country codes .us, .au, and .de.
>
> > I could create three different regexs that would work:
> > regex = re.compile(r'[\w\-\.]+\.us)
> > regex = re.compile(r'[\w\-\.]+\.au)
> > regex = re.compile(r'[\w\-\.]+\.de)
>
> > How would I write one to accommodate all three, or, better yet, to
> > accommodate a list of them that I can pass into a method call? Thanks!
>
>  >
> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)')
>
> If you have a list of country codes ["us", "au", "de"] then you can
> build the regular expression from it:
>
> regex = re.compile(r'[\w\-\.]+\.(?:%s)' % '|'.join(domains))

Perfect! Thanks.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with Regex for domain names

2009-07-30 Thread MRAB


Feyo wrote:

I'm trying to figure out how to write efficiently write a regex for
domain names with a particular top level domain. Let's say, I want to
grab all domain names with country codes .us, .au, and .de.

I could create three different regexs that would work:
regex = re.compile(r'[\w\-\.]+\.us)
regex = re.compile(r'[\w\-\.]+\.au)
regex = re.compile(r'[\w\-\.]+\.de)

How would I write one to accommodate all three, or, better yet, to
accommodate a list of them that I can pass into a method call? Thanks!

>
regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)')

If you have a list of country codes ["us", "au", "de"] then you can
build the regular expression from it:

regex = re.compile(r'[\w\-\.]+\.(?:%s)' % '|'.join(domains))
--
http://mail.python.org/mailman/listinfo/python-list

Re: Help with Regex for domain names

2009-07-30 Thread Tim Daneliuk

Feyo wrote:
> I'm trying to figure out how to write efficiently write a regex for
> domain names with a particular top level domain. Let's say, I want to
> grab all domain names with country codes .us, .au, and .de.
> 
> I could create three different regexs that would work:
> regex = re.compile(r'[\w\-\.]+\.us)
> regex = re.compile(r'[\w\-\.]+\.au)
> regex = re.compile(r'[\w\-\.]+\.de)
> 
> How would I write one to accommodate all three, or, better yet, to
> accommodate a list of them that I can pass into a method call? Thanks!

Just a point of interest:  A correctly formed domain name may have a
trailing period at the end of the TLD [1].  Example:

   foo.bar.com.

Though you do not often see this, it's worth accommodating "just in
case"...


[1] 
http://homepages.tesco.net/J.deBoynePollard/FGA/web-fully-qualified-domain-name.html



-- 

Tim Daneliuk tun...@tundraware.com
PGP Key: http://www.tundraware.com/PGP/
-- 
http://mail.python.org/mailman/listinfo/python-list

Help with Regex for domain names

2009-07-30 Thread Feyo

I'm trying to figure out how to write efficiently write a regex for
domain names with a particular top level domain. Let's say, I want to
grab all domain names with country codes .us, .au, and .de.

I could create three different regexs that would work:
regex = re.compile(r'[\w\-\.]+\.us)
regex = re.compile(r'[\w\-\.]+\.au)
regex = re.compile(r'[\w\-\.]+\.de)

How would I write one to accommodate all three, or, better yet, to
accommodate a list of them that I can pass into a method call? Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie needs help with regex strings

2005-12-14 Thread Michael Spencer

Catalina Scott A Contr AFCA/EVEO wrote:
> I have a file with lines in the following format.
> 
> pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
> Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
> Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'
> 
> I would like to pull out some of the values and write them to a csv
> file.
> 
> For line in filea
>   pie = regex
>   quantity = regex
>   cooked = regex
>   ingredients = regex
>   fileb.write (quantity,pie,cooked,ingredients)
> 
> How can I retreive the values and assign them to a name?
> 
> Thank you
> Scott

Here's a trick to parse this source, exploiting the fact that its syntax mimics 
python's keyword arguments.  All that's needed is a way to quote the bare names:

  >>> class lazynames(dict):
  ... def __getitem__(self, key):
  ... if key in self:
  ... return dict.__getitem__(self, key)
  ... return "%s" % key # if name not found, return it as a str constant
  ...
  >>> d = lazynames(dict=dict, __builtins__ = None)


  >>> source = """\
  ... pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
  ... Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
  ... Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'
  ... """
  >>>
  >>> [eval("dict(%s)" % line, d) for line in source.splitlines()]
  [{'cooked': 'yes', 'ingredients': 'sugar and cinnamon', 'pie': 'apple', 
'quantity': 1}, {'ingredients': 'peaches,powdered sugar', 'Pie': 'peach', 
'quantity': 2}, {'cooked': 'no', 'price': 5, 'ingredients': 'cherries and 
sugar', 'Pie': 'cherry', 'quantity': 3}]
  >>>

Michael

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie needs help with regex strings

2005-12-14 Thread Michael Spencer

Dennis Benzinger wrote:
> Christopher Subich schrieb:
>> Paul McGuire wrote:
>>
>> [...]
>> For the example listed, pyparsing is even overkill; the OP should 
>> probably use the csv module.
> 
> But the OP wants to parse lines with key=value pairs, not simply lines
> with comma separated values. Using the csv module will just separate the 
> key=value pairs and you would still have to take them apart.
> 
> Bye,
> Dennis
that, and csv.reader has another problem with this task:

  >>> csv.reader(["Pie=peach,quantity=2,ingredients='peaches,powdered sugar'"], 
quotechar = "'").next()
  ['Pie=peach', 'quantity=2', "ingredients='peaches", "powdered sugar'"]

i.e., it doesn't allow separators within fields unless either the *whole* field 
is quoted:

  >>> csv.reader(["Pie=peach,quantity=2,'ingredients=peaches,powdered sugar'"], 
quotechar = "'").next()
  ['Pie=peach', 'quantity=2', 'ingredients=peaches,powdered sugar']
  >>>

or the separator is escaped:

  >>> csv.reader(["Pie=peach,quantity=2,ingredients='peaches\,powdered 
sugar'"], 
quotechar = "'", escapechar = "\\").next()
  ['Pie=peach', 'quantity=2', "ingredients='peaches,powdered sugar'"]
  >>>


Michael

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie needs help with regex strings

2005-12-14 Thread Gerard Flanagan

Fredrik Lundh wrote:

> Scott wrote:
>
> > I have a file with lines in the following format.
> >
> > pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
> > Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
> > Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'
> >
> > I would like to pull out some of the values and write them to a csv
> > file.
>
> here's a relatively straightforward re solution that gives you a dictionary
> with the values for each line.
>
> import re
>
> for line in open("infile.txt"):
> d = {}
> for k, v1, v2 in re.findall("(\w+)=(?:(\w+)|'([^']*)')", line):
> d[k.lower()] = v1 or v2
> print d
>

How about replacing

d={}

with

d = {'pie': ',', 'quantity': ',', 'cooked': ',', 'price':
',','ingredients': '', 'eol': '\n'}

to get the appropriate commas for missing fields?

Gerard

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie needs help with regex strings

2005-12-14 Thread Dennis Benzinger

Christopher Subich schrieb:
> Paul McGuire wrote:
> 
> [...]
> For the example listed, pyparsing is even overkill; the OP should 
> probably use the csv module.

But the OP wants to parse lines with key=value pairs, not simply lines
with comma separated values. Using the csv module will just separate the 
key=value pairs and you would still have to take them apart.

Bye,
Dennis
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie needs help with regex strings

2005-12-14 Thread Dennis Benzinger

Catalina Scott A Contr AFCA/EVEO schrieb:
> I have a file with lines in the following format.
> 
> pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
> Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
> Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'
> 
> I would like to pull out some of the values and write them to a csv
> file.
> 
> For line in filea
>   pie = regex
>   quantity = regex
>   cooked = regex
>   ingredients = regex
>   fileb.write (quantity,pie,cooked,ingredients)
> 
> How can I retreive the values and assign them to a name?
> 
> Thank you
> Scott

Try this:

import re
import StringIO

filea_string = """pie=apple,quantity=1,cooked=yes,ingredients='sugar and 
cinnamon'
pie=peach,quantity=2,ingredients='peaches,powdered sugar'
pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'
"""

FIELDS = ("pie", "quantity", "cooked", "ingredients", "price")

field_regexes = {}

for field in FIELDS:
 field_regexes[field] = re.compile("%s=([^,\n]*)" % field)

for line in StringIO.StringIO(filea_string):

 field_values = {}

 for field in FIELDS:
 match_object = field_regexes[field].search(line)

 if match_object is not None:
 field_values[field] = match_object.group(1)

 print field_values
 #fileb.write (quantity,pie,cooked,ingredients)



Bye,
Dennis
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie needs help with regex strings

2005-12-14 Thread Christopher Subich

Paul McGuire wrote:
> This isn't a regex solution, but uses pyparsing instead.  Pyparsing
> helps you construct recursive-descent parsers, and maintains a code
> structure that is easy to compose, read, understand, maintain, and
> remember what you did 6-months after you wrote it in the first place.
> 
> Download pyparsing at http://pyparsing.sourceforge.net.


For the example listed, pyparsing is even overkill; the OP should 
probably use the csv module.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie needs help with regex strings

2005-12-14 Thread Fredrik Lundh

Scott wrote:

> I have a file with lines in the following format.
>
> pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
> Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
> Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'
>
> I would like to pull out some of the values and write them to a csv
> file.
>
> For line in filea
> pie = regex
> quantity = regex
> cooked = regex
> ingredients = regex
> fileb.write (quantity,pie,cooked,ingredients)
>
> How can I retreive the values and assign them to a name?

here's a relatively straightforward re solution that gives you a dictionary
with the values for each line.

import re

for line in open("infile.txt"):
d = {}
for k, v1, v2 in re.findall("(\w+)=(?:(\w+)|'([^']*)')", line):
d[k.lower()] = v1 or v2
print d

(the pattern looks for alphanumeric characters (k) followed by an equal
sign followed by either a number of alphanumeric characters (v1), or text
inside single quotes (v2).  either v1 or v2 will be set)

getting from dictionary to file is left as an exercise to the reader.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie needs help with regex strings

2005-12-14 Thread Paul McGuire

This isn't a regex solution, but uses pyparsing instead.  Pyparsing
helps you construct recursive-descent parsers, and maintains a code
structure that is easy to compose, read, understand, maintain, and
remember what you did 6-months after you wrote it in the first place.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul


data = """pie=apple,quantity=1,cooked=yes,ingredients='sugar and
cinnamon'
Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and
sugar'"""

from pyparsing import CaselessLiteral, Literal, Word, alphas, nums,
oneOf, quotedString, \
Group, Dict, delimitedList, removeQuotes

# define basic elements for parsing
pieName = Word(alphas)
qty = Word(nums)
yesNo = oneOf("yes no",caseless=True)
EQUALS = Literal("=").suppress()

# define separate pie attributes
pieEntry = CaselessLiteral("pie") + EQUALS + pieName
qtyEntry = CaselessLiteral("quantity") + EQUALS + qty
cookedEntry  = CaselessLiteral("cooked") + EQUALS + yesNo
ingredientsEntry = CaselessLiteral("ingredients") + EQUALS +
quotedString.setParseAction(removeQuotes)
priceEntry   = CaselessLiteral("price") + EQUALS + qty

# define overall list of alternative attributes
pieAttribute = pieEntry | qtyEntry | cookedEntry | ingredientsEntry |
priceEntry

# define each line as a list of attributes (comma delimiter is the
default), grouping results by attribute
pieDataFormat = delimitedList( Group(pieAttribute) )

# parse each line in the input string, and create a dict of the results
for line in data.split("\n"):
pieData = pieDataFormat.parseString(line)
pieDict = dict( pieData.asList() )
print pieDict

''' prints out:
{'cooked': 'yes', 'ingredients': 'sugar and cinnamon', 'pie': 'apple',
'quantity': '1'}
{'ingredients': 'peaches,powdered sugar', 'pie': 'peach', 'quantity':
'2'}
{'cooked': 'no', 'price': '5', 'ingredients': 'cherries and sugar',
'pie': 'cherry', 'quantity': '3'}
'''

-- 
http://mail.python.org/mailman/listinfo/python-list

Newbie needs help with regex strings

2005-12-14 Thread Catalina Scott A Contr AFCA/EVEO

I have a file with lines in the following format.

pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'

I would like to pull out some of the values and write them to a csv
file.

For line in filea
pie = regex
quantity = regex
cooked = regex
ingredients = regex
fileb.write (quantity,pie,cooked,ingredients)

How can I retreive the values and assign them to a name?

Thank you
Scott
-- 
http://mail.python.org/mailman/listinfo/python-list

60 matches

Mail list logo