subject:"Regular expressions"

regular expressions

2010-04-15 Thread chaaana

hi..
im parsing the text file containing the details of the testcases
failed.From the file i wanted to obtain only the testcase names and
enter them in the excel sheet.

the pattern of the text file is:

FILE : NW_PTH_TFG6_SCEN_4_2_FIFO.c, LINE : 240 TEST FAIL to get entire:
32768 bytes
NW_PTH_TFG6_SCEN_4_2_FIFO.c 340 DATA NOT MATCHING, TEST FAIL
TEST FAIL SYSCALL_TEST_SCEN_4_1_ALL.c at line 355
FILE:US_TFG7_SCEN_4_1.c,LINE:189, Server side TEST FAIL
FAIL BREW_SCEN_4_1.c 219: can't mount
FAIL BREW_SCEN_4_1.c 121: can't umount
<< BREW_SCEN_4_1.c TEST FAIL errno:No such file or directory >>
<< BREW_SCEN_4_1.c TEST FAIL >> error:Invalid argument

i used the ' re module' and its function partition,but im facing
problem in obtaining only the testcase names..
if anyone know the answer please help me..
Thanks for ur help.

-- 
http://mail.python.org/mailman/listinfo/python-list

Regular Expressions

2007-02-10 Thread Geoff Hill

What's the way to go about learning Python's regular expressions? I feel 
like such an idiot - being so strong in a programming language but knowing 
nothing about RE. 


-- 
http://mail.python.org/mailman/listinfo/python-list

Regular Expressions

2005-03-22 Thread Ron

This is probably a repeated question, but try as I might I was unable
to find something similar in the archives.

I'm trying to develop a regular expression for recognizing a simplified
C-Style string syntax.  I need it to be able to handle escape sequences
of the form \x where x is any character including ".

Here's what I'm trying:

  \"([^"\\]|(\\.))*\"

When I try to get it to recognize something like:

   "I said, \"Hello!\""

It stops at the first quote after the \.

I've used this very same regular expression in a parser generator I
have for C++ and it works just fine.

Any thoughts on what I'm doing wrong in the Python Reg Ex world?

Thanks for the comments & help.

Ron

-- 
http://mail.python.org/mailman/listinfo/python-list

Regular Expressions

2006-04-12 Thread david brochu jr

Hi, 
 
I am trying to grab the following string out of a text file using regular _expression_ (re module):
 
"DcaVer"=dword:0640
 
What I need to do with that string is trim down " "DcaVer"=dword:" and convert the remaining number from hex to dec.
 
I have been trying to figure this out for a while..I am fairly new so please any help would be greatly appreciated.
-- 
http://mail.python.org/mailman/listinfo/python-list

Regular Expressions...

2009-01-07 Thread Ken D'Ambrosio

Hi, all.  As a recovering Perl guy, I have to admit I don't quite "get"
the re module.  For example, I'd like to do a few things (I'm going to use
phone numbers, 'cause that's what I'm currently dealing with):
12345678900 -- How would I:
- Get just the area code?
- Get just the seven-digit number?

In Perl, I'd so something like
m/^1(...)(...)/;
and then I'd have that stuff in $1 and $2, respectively.  But the Python
stuff
simply isn't clicking for me.  If anyone could supply concrete examples of
how to do the problem, above, that would be terrific.

Thanks!

-Ken

--
http://mail.python.org/mailman/listinfo/python-list

Regular Expressions...

2009-01-07 Thread Ken D'Ambrosio

Hi, all.  As a recovering Perl guy, I have to admit I don't quite "get"
the re module.  For example, I'd like to do a few things (I'm going to use
phone numbers, 'cause that's what I'm currently dealing with):
12345678900 -- How would I:
- Get just the area code?
- Get just the seven-digit number?

In Perl, I'd so something like
m/^1(...)(...)/;
and then I'd have the numbers in $1 and $2, respectively.  But the Python
stuff simply isn't clicking for me.  If anyone could supply concrete
examples of how to do the problem, above, that would be terrific.

Thanks!

-Ken



--
http://mail.python.org/mailman/listinfo/python-list

Regular Expressions

2007-09-18 Thread Lamonte Harris

I'm trying to get the Javascript output on when I match the given value in a
var, I want to output that value that I tried to match:

Example JS:
x = '';//some INNERHTML in a document
if(x.innerHTML.match(/(.*)/i))
{
valuefound = RegEx.$1;
}

I've been reading my python book and tutorials and google search and google
code search trying to understand and I'm still not quite understanding how
to do it.

-Lamonte.
-- 
http://mail.python.org/mailman/listinfo/python-list

regular expressions.

2008-08-08 Thread Atul.

Hey All,

I have been playing around with REs and could not get the following
code to run.

import re
vowel = r'[aeiou]'
re.findall(vowel, r"vowel")

anything wrong I have done?

Regards,
Atul.

--
http://mail.python.org/mailman/listinfo/python-list

Regular expressions

2015-11-02 Thread Seymore4Head

How do I make a regular expression that returns true if the end of the
line is an asterisk

-- 
https://mail.python.org/mailman/listinfo/python-list

Regular expressions

2011-12-26 Thread mauricel...@acm.org

Hi

I am trying to change "@HWI-ST115:568:B08LLABXX:1:1105:6465:151103 1:N:
0:" to "@HWI-ST115:568:B08LLABXX:1:1105:6465:151103/1".

Can anyone help me with the regular expressions needed?

Thanks in advance.

Maurice
-- 
http://mail.python.org/mailman/listinfo/python-list

Large regular expressions

2010-03-15 Thread Nathan Harmston

Hi,

So I m trying to use a very large regular expression, basically I have
a list of items I want to find in text, its kind of a conjunction of
two regular expressions and a big list..not pretty. However
everytime I try to run my code I get this exception:

OverflowError: regular expression code size limit exceeded

I understand that there is a Python imposed limit on the size of the
regular expression. And although its not nice I have a machine with
12Gb of RAM just waiting to be used, is there anyway I can alter
Python to allow big regular expressions?

Could anyone suggest other methods of these kind of string matching in
Python? I m trying to see if my swigged alphabet trie is faster than
whats possible in Python!

Many thanks,


Nathan
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions

2010-04-15 Thread Terry Reedy


On 4/15/2010 2:57 AM, chaaana wrote:

hi..
im parsing the text file containing the details of the testcases
failed.From the file i wanted to obtain only the testcase names and
enter them in the excel sheet.

the pattern of the text file is:

FILE : NW_PTH_TFG6_SCEN_4_2_FIFO.c, LINE : 240 TEST FAIL to get entire:
32768 bytes
NW_PTH_TFG6_SCEN_4_2_FIFO.c 340 DATA NOT MATCHING, TEST FAIL
TEST FAIL SYSCALL_TEST_SCEN_4_1_ALL.c at line 355
FILE:US_TFG7_SCEN_4_1.c,LINE:189, Server side TEST FAIL
FAIL BREW_SCEN_4_1.c 219: can't mount
FAIL BREW_SCEN_4_1.c 121: can't umount
<<  BREW_SCEN_4_1.c TEST FAIL errno:No such file or directory>>
<<  BREW_SCEN_4_1.c TEST FAIL>>  error:Invalid argument


I do not see any consistent pattern in the above lines.
What output do you see from filtering them?


i used the ' re module' and its function partition,but im facing
problem in obtaining only the testcase names..


If I could not write a satisfactory re, I would be tempted to write 
explicit code to process the lines, possibly defining appropriate search 
states.


Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions

2010-04-15 Thread Tim Chase


On 04/15/2010 11:05 AM, Terry Reedy wrote:

On 4/15/2010 2:57 AM, chaaana wrote:

hi..
im parsing the text file containing the details of the testcases
failed.From the file i wanted to obtain only the testcase names and
enter them in the excel sheet.

the pattern of the text file is:

FILE : NW_PTH_TFG6_SCEN_4_2_FIFO.c, LINE : 240 TEST FAIL to get entire:
32768 bytes
NW_PTH_TFG6_SCEN_4_2_FIFO.c 340 DATA NOT MATCHING, TEST FAIL
TEST FAIL SYSCALL_TEST_SCEN_4_1_ALL.c at line 355
FILE:US_TFG7_SCEN_4_1.c,LINE:189, Server side TEST FAIL
FAIL BREW_SCEN_4_1.c 219: can't mount
FAIL BREW_SCEN_4_1.c 121: can't umount
<<   BREW_SCEN_4_1.c TEST FAIL errno:No such file or directory>>
<<   BREW_SCEN_4_1.c TEST FAIL>>   error:Invalid argument


I do not see any consistent pattern in the above lines.
What output do you see from filtering them?


My guess is that it's the "FILE:...LINE:..." line(s) that the OP 
is interested in, in which case one could do something like


  r = re.compile(r'^FILE\s*:\s*(.*?),\s*LINE\s*:\s*(\d+)')
  for line in file('input.txt'):
m = r.match(line)
if m:
  print m.group(1), m.group(2)

Alternatively, if you're in the regexp-avoiding camp, you might 
be able to get away with


  LINE_BIT = ', LINE : '
  for line in file('input.txt'):
if line.startswith('FILE :') and LINE_BIT in line:
  leader, _ = line.split(LINE_BIT, 1)
  _, fname = leader.split(":", 1)
  fname = fname.strip()
  print repr(fname)

but that relies on the LINE_BIT remaining constant (the 
"LINE:189" doesn't have spaces where the first "LINE : 240" does 
have extra spaces; and the same with the extra spaces around the 
"FILE" portion).


-tkc




--
http://mail.python.org/mailman/listinfo/python-list

Compare regular expressions

2007-04-16 Thread Thomas Dybdahl Ahle

Hi, I'm writing a program with a large data stream to which modules can 
connect using regular expressions.

Now I'd like to not have to test all expressions every time I get a line, 
as most of the time, one of them having a match means none of the others 
can have so.

But ofcource there are also cases where a regular expression can 
"contain" another expression, like in:
"^strange line (\w+) and (\w+)$" and "^strange line (\w+) (?:.*?)$" in 
which case I'd like to first test the seccond and only if it mathces test 
the seccond.

Do anybody know if such a test is possible?
if exp0.contains(exp1): ...
-- 
http://mail.python.org/mailman/listinfo/python-list

builtin regular expressions?

2006-09-30 Thread Antoine De Groote

Hello,

Can anybody tell me the reason(s) why regular expressions are not built 
into Python like it is the case with Ruby and I believe Perl? Like for 
example in the following Ruby code

line = 'some string'

case line
   when /title=(.*)/
 puts "Title is #$1"
   when /track=(.*)/
 puts "Track is #$1"
   when /artist=(.*)/
 puts "Artist is #$1"
end

I'm sure there are good reasons, but I just don't see them.

Python Culture says: 'Explicit is better than implicit'. May it be 
related to this?

Regards,
antoine
-- 
http://mail.python.org/mailman/listinfo/python-list

Enumerating Regular Expressions

2006-05-08 Thread blair . bethwaite

Hi all,

Does anybody know of a module that allows you to enumerate all the
strings a particular regular expression describes?

Cheers,
-Blair

-- 
http://mail.python.org/mailman/listinfo/python-list

Regular expressions question

2007-01-16 Thread Victor Polukcht

I have 2 strings:

"Global   etsi3   *200 ok30   100% 100%
Outgoing"
and
"Global   etsi3   *   4 ok 30   100% 100%
Outgoing"

The difference is "*200" instead of "*  4". Is there ability to write a
regular expression that will match both of that strings?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-10 Thread Paul Rubin

"Geoff Hill" <[EMAIL PROTECTED]> writes:
> What's the way to go about learning Python's regular expressions? I feel 
> like such an idiot - being so strong in a programming language but knowing 
> nothing about RE. 

Read the documentation?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-10 Thread dimitri pater


Hi,
a good start:
http://diveintopython.org/regular_expressions/index.html

On 10 Feb 2007 15:30:04 -0800, Paul Rubin <"http://phr.cx"@nospam.invalid>
wrote:


"Geoff Hill" <[EMAIL PROTECTED]> writes:
> What's the way to go about learning Python's regular expressions? I feel
> like such an idiot - being so strong in a programming language but
knowing
> nothing about RE.

Read the documentation?
--
http://mail.python.org/mailman/listinfo/python-list





--
---
You can't have everything. Where would you put it? -- Steven Wright
---
please visit www.serpia.org
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-10 Thread John Machin

On Feb 11, 10:26 am, "Geoff Hill" <[EMAIL PROTECTED]> wrote:
> What's the way to go about learning Python's regular expressions? I feel
> like such an idiot - being so strong in a programming language but knowing
> nothing about RE.

I suggest that you work through the re HOWTO
http://www.amk.ca/python/howto/regex/
and by work through, I don't mean "read". I mean as each new concept
is introduced:
1. try the given example(s) yourself at the interactive prompt
2. try variations on the examples
3. read the relevant part of the Library Reference Manual

Also I'd suggest reading threads in this newsgroup where people are
asking for help with re.

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-10 Thread Paul Rubin

"John Machin" <[EMAIL PROTECTED]> writes:
> > What's the way to go about learning Python's regular expressions? I feel
> > like such an idiot - being so strong in a programming language but knowing
> > nothing about RE.
> 
> I suggest that you work through the re HOWTO
> http://www.amk.ca/python/howto/regex/

Also remember Zawinski's law:
http://fishbowl.pastiche.org/2003/08/18/beware_regular_expressions
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-10 Thread gregarican

On Feb 10, 6:26 pm, "Geoff Hill" <[EMAIL PROTECTED]> wrote:
> What's the way to go about learning Python's regular expressions? I feel
> like such an idiot - being so strong in a programming language but knowing
> nothing about RE.

I highly recommend reading the book "Mastering Regular Expressions,"
which I believe is published by O'Reilly. It's a great reference and
helps peel the onion in terms of working through RE. They are a
language unto themselves. A fun brain exercise.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-10 Thread Shawn Milo

On 10 Feb 2007 18:58:51 -0800, gregarican <[EMAIL PROTECTED]> wrote:
> On Feb 10, 6:26 pm, "Geoff Hill" <[EMAIL PROTECTED]> wrote:
> > What's the way to go about learning Python's regular expressions? I feel
> > like such an idiot - being so strong in a programming language but knowing
> > nothing about RE.
>
> I highly recommend reading the book "Mastering Regular Expressions,"
> which I believe is published by O'Reilly. It's a great reference and
> helps peel the onion in terms of working through RE. They are a
> language unto themselves. A fun brain exercise.
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Absolutely: Get "Mastering Regular Expressions" by Jeffrey Friedl. Not
only is it easy to read, but you'll get a lot of mileage out of
regexes in general. Grep, Perl one-liners, Python, and other tools use
regexes, and you'll find that they are really clever little creatures
once you befriend a few of them.

Shawn
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-10 Thread Geoff Hill

Thanks. O'Reilly is the way I learned Python, and I'm suprised that I didn't 
think of a book by them earlier. 


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-10 Thread Steve Holden

Geoff Hill wrote:
> What's the way to go about learning Python's regular expressions? I feel 
> like such an idiot - being so strong in a programming language but knowing 
> nothing about RE. 
> 
> 
In fact that's a pretty smart stance. A quote attributed variously to 
Tim Peters and Jamie Zawinski says "Some people, when confronted with a 
problem, think 'I know, I'll use regular expressions.' Now they have two 
problems."

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note:  http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-11 Thread Steven D'Aprano

On Sun, 11 Feb 2007 07:05:30 +, Steve Holden wrote:

> Geoff Hill wrote:
>> What's the way to go about learning Python's regular expressions? I feel 
>> like such an idiot - being so strong in a programming language but knowing 
>> nothing about RE. 
>> 
>> 
> In fact that's a pretty smart stance.

That's a little harsh -- regexes have their place, together with pointer
arithmetic, bit manipulations, reverse polish notation and goto. The
problem is when people use them inappropriately e.g. using a regex when a
simple string.find will do.

> A quote attributed variously to 
> Tim Peters and Jamie Zawinski says "Some people, when confronted with a 
> problem, think 'I know, I'll use regular expressions.' Now they have two 
> problems."

I believe that is correctly attributed to Jamie Zawinski.

-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-11 Thread John Machin

On Feb 11, 9:25 pm, Steven D'Aprano
<[EMAIL PROTECTED]> wrote:
> On Sun, 11 Feb 2007 07:05:30 +, Steve Holden wrote:
> > Geoff Hill wrote:
> >> What's the way to go about learning Python's regular expressions? I feel
> >> like such an idiot - being so strong in a programming language but knowing
> >> nothing about RE.
>
> > In fact that's a pretty smart stance.
>
> That's a little harsh -- regexes have their place, together with pointer
> arithmetic, bit manipulations, reverse polish notation and goto. The
> problem is when people use them inappropriately e.g. using a regex when a
> simple string.find will do.

Thanks for the tip-off, Steve and Steven. Looks like I'll have to
start hiding my 12C (datecode 2214) with its "GTO" button under the
loose floor-board whenever I hear a knock at the door ;-) Looks like
Agner Fog's gone a million, and there'll be a special place in hell
for people who combine regexes with bit manipulation, like Navarro &
Raffinot. And we won't even mention Heikki Hy,*7g^54d3j+__=

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-11 Thread James Stroud

gregarican wrote:
> On Feb 10, 6:26 pm, "Geoff Hill" <[EMAIL PROTECTED]> wrote:
>> What's the way to go about learning Python's regular expressions? I feel
>> like such an idiot - being so strong in a programming language but knowing
>> nothing about RE.
> 
> I highly recommend reading the book "Mastering Regular Expressions,"
> which I believe is published by O'Reilly. It's a great reference and
> helps peel the onion in terms of working through RE. They are a
> language unto themselves. A fun brain exercise.
> 

There is no real mention of python in this book, but the first edition 
is probably the best programming book I've ever read (excepting, perhaps 
Text Processing in Python by Mertz.) Well, come to think of it, check 
the latter book out. It has a great chapter on Python Regex. And its 
free to download.

James
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-11 Thread [EMAIL PROTECTED]


> That's a little harsh -- regexes have their place, together with pointer
> arithmetic, bit manipulations, reverse polish notation and goto. The
> problem is when people use them inappropriately e.g. using a regex when a
> simple string.find will do.
>
> > A quote attributed variously to
> > Tim Peters and Jamie Zawinski says "Some people, when confronted with a
> > problem, think 'I know, I'll use regular expressions.' Now they have two
> > problems."
>
> I believe that is correctly attributed to Jamie Zawinski.
>
> --
> Steven

So as a newbie, I have to ask. I've played with the re module now for
a while, I think regular expressions are super fun and useful. As far
as them being a problem I found they can be tricky and sometimes the
regex's I've devised do unexpected things...(which I can think of two
instances where that unexpected thing was something that I had hoped
to get into further down the line, yay for me!). So I guess I don't
really understand why they are a "bad idea" to use. I don't know of
any other way yet to parse specific data out of a text, html, or xml
file without resorting to regular expressions.
What other ways are there?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-11 Thread skip


jwz> Some people, when confronted with a problem, think 'I know, I'll
jwz> use regular expressions.' Now they have two problems.

dbl> So as a newbie, I have to ask  So I guess I don't really
dbl> understand why they are a "bad idea" to use. 

Regular expressions are fine in their place, however, you can get carried
away.  For example:

http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

Skip
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-11 Thread Gabriel Genellina

En Sun, 11 Feb 2007 13:35:26 -0300, [EMAIL PROTECTED]  
<[EMAIL PROTECTED]> escribió:

>> (Steven?)
>> That's a little harsh -- regexes have their place, together with pointer
>> arithmetic, bit manipulations, reverse polish notation and goto. The
>> problem is when people use them inappropriately e.g. using a regex when  
>> a
>> simple string.find will do.
>
> So as a newbie, I have to ask. I've played with the re module now for
> a while, I think regular expressions are super fun and useful. As far
> as them being a problem I found they can be tricky and sometimes the
> regex's I've devised do unexpected things...(which I can think of two
> instances where that unexpected thing was something that I had hoped
> to get into further down the line, yay for me!). So I guess I don't
> really understand why they are a "bad idea" to use. I don't know of
> any other way yet to parse specific data out of a text, html, or xml
> file without resorting to regular expressions.
> What other ways are there?

For very simple things, it's easier/faster to use string methods like find  
or split. By example, splitting "2007-02-11" into y,m,d parts:
y,m,d = date.split("-")
is a lot faster than matching "(\d+)-(\d+)-(\d+)"
On the other hand, complex tasks like parsing an HTML/XML document,  
*can't* be done with a regexp alone; but people insist anyway, and then  
complain when it doesn't work as expected, and ask how to "fix" the  
regexp...
Good usage of regexps maybe goes in the middle.

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-11 Thread John Machin

On Feb 12, 3:35 am, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:
> > That's a little harsh -- regexes have their place, together with pointer
> > arithmetic, bit manipulations, reverse polish notation and goto. The
> > problem is when people use them inappropriately e.g. using a regex when a
> > simple string.find will do.
>
> > > A quote attributed variously to
> > > Tim Peters and Jamie Zawinski says "Some people, when confronted with a
> > > problem, think 'I know, I'll use regular expressions.' Now they have two
> > > problems."
>
> > I believe that is correctly attributed to Jamie Zawinski.
>
> > --
> > Steven
>
> So as a newbie, I have to ask. I've played with the re module now for
> a while, I think regular expressions are super fun and useful. As far
> as them being a problem I found they can be tricky and sometimes the
> regex's I've devised do unexpected things...(which I can think of two
> instances where that unexpected thing was something that I had hoped
> to get into further down the line, yay for me!). So I guess I don't
> really understand why they are a "bad idea" to use.

Regexes are not "bad". However people tend to overuse them, whether
they are overkill (like Gabriel's date-splitting example) or underkill
-- see your next sentence :-)

> I don't know of
> any other way yet to parse specific data out of a text, html, or xml
> file without resorting to regular expressions.
> What other ways are there?

Text: Paul Maguire's pyparsing module (Google is your friend); read
David Mertz's book on text processing with Python (free download, I
believe); modules for specific data formats e.g. csv

HTML: htmllib and HTMLParser (both in the Python library),
BeautifulSoup (again GIYF)

XML: xml.* in the Python library. ElementTree (recommended) is
included in Python 2.5; use xml.etree.cElementTree.

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-11 Thread Steve Holden

[EMAIL PROTECTED] wrote:
>> That's a little harsh -- regexes have their place, together with pointer
>> arithmetic, bit manipulations, reverse polish notation and goto. The
>> problem is when people use them inappropriately e.g. using a regex when a
>> simple string.find will do.
>>
>>> A quote attributed variously to
>>> Tim Peters and Jamie Zawinski says "Some people, when confronted with a
>>> problem, think 'I know, I'll use regular expressions.' Now they have two
>>> problems."
>> I believe that is correctly attributed to Jamie Zawinski.
>>
>> --
>> Steven
> 
> So as a newbie, I have to ask. I've played with the re module now for
> a while, I think regular expressions are super fun and useful. As far
> as them being a problem I found they can be tricky and sometimes the
> regex's I've devised do unexpected things...(which I can think of two
> instances where that unexpected thing was something that I had hoped
> to get into further down the line, yay for me!). So I guess I don't
> really understand why they are a "bad idea" to use. I don't know of
> any other way yet to parse specific data out of a text, html, or xml
> file without resorting to regular expressions.
> What other ways are there?
> 
Re's aren't inherently bad. Just avoid using them as a hammer to the 
extent that all your problems look like nails.

They wouldn't exist if there weren't problems it was appropriate to use 
them on. Just try to use simpler techniques first.

For example, don't use re's to find out if a string starts with a 
specific substring when you could instead use the .startswith() string 
method.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note:  http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-12 Thread [EMAIL PROTECTED]

HTML: htmllib and HTMLParser (both in the Python library),
BeautifulSoup (again GIYF)

XML: xml.* in the Python library. ElementTree (recommended) is
included in Python 2.5; use xml.etree.cElementTree.


The source of HTMLParser and xmllib use regular expressions for
parsing out the data. htmllib calls sgmllib at the begining of it's
code--sgmllib starts off with a bunch of regular expressions used to
parse data. So the only real difference there I see is that someone
saved me the work of writing them ;0). I haven't looked at the source
for Beautiful Soup, though I have the sneaking suspicion that most
processing of html/xml is all based on regex's.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-12 Thread John Machin

On Feb 12, 9:20 pm, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:
> HTML: htmllib and HTMLParser (both in the Python library),
> BeautifulSoup (again GIYF)
>
> XML: xml.* in the Python library. ElementTree (recommended) is
> included in Python 2.5; use xml.etree.cElementTree.
>
> The source of HTMLParser and xmllib use regular expressions for
> parsing out the data. htmllib calls sgmllib at the begining of it's
> code--sgmllib starts off with a bunch of regular expressions used to
> parse data. So the only real difference there I see is that someone
> saved me the work of writing them ;0). I haven't looked at the source
> for Beautiful Soup, though I have the sneaking suspicion that most
> processing of html/xml is all based on regex's.

That's right. Those modules use regexes. You don't. You call functions
& classes in the modules.

Someone has written those modules and tested them and documented them
and they've had a fair old thrashing by quite a few people over the
years -- it may be the only difference in your way of thinking but
it's quite a large difference from you opening up the re docs and
getting stuck in single-handedly :-)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-12 Thread Neil Cerutti

On 2007-02-10, Geoff Hill <[EMAIL PROTECTED]> wrote:
> What's the way to go about learning Python's regular
> expressions? I feel like such an idiot - being so strong in a
> programming language but knowing nothing about RE. 

A great way to learn regular expressions is to implement them.

-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-12 Thread skip


dbl> The source of HTMLParser and xmllib use regular expressions for
dbl> parsing out the data. htmllib calls sgmllib at the begining of it's
dbl> code--sgmllib starts off with a bunch of regular expressions used
dbl> to parse data.

I am almost certain those modules use regular expressions for lexical
analysis (splitting the input byte stream into "words"), not for parsing
(extracting the structure of the "sentences").

If I have a simple expression:

(7 + 3.14) * CONST

that's just a stream of bytes, "(", "&", " ", "+", ...  Lexical analysis
chunks that stream of bytes into the "words" of the language:

LPAREN (NUMBER, 7) PLUS (NUMBER, 3.14) RPAREN TIMES (IDENT, "CONST")

Parsing then constructs a higher level representation of that stream of
"words" (more commonly called tokens or lexemes).  That representation is
application-dependent.

Regular expressions are ideal for lexical analysis.  They are not-so-hot for
parsing unless the grammar of the language being parsed is *extremely*
simple.

Here are a couple much better expositions on the topics:

http://en.wikipedia.org/wiki/Lexical_analysis
http://en.wikipedia.org/wiki/Parsing

Skip

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2007-02-12 Thread Gabriel Genellina

En Mon, 12 Feb 2007 07:20:11 -0300, [EMAIL PROTECTED]  
<[EMAIL PROTECTED]> escribió:

> The source of HTMLParser and xmllib use regular expressions for
> parsing out the data. htmllib calls sgmllib at the begining of it's
> code--sgmllib starts off with a bunch of regular expressions used to
> parse data. So the only real difference there I see is that someone
> saved me the work of writing them ;0). I haven't looked at the source
> for Beautiful Soup, though I have the sneaking suspicion that most
> processing of html/xml is all based on regex's.

You can build a parser for SGML/HTML/XML documents using regexps AND  
python code. You can't do that with regexps only.
By example, suppose you work hard to build a correct regexp for matching  
an opening  tag. You extract this from the document: "".  
Is it actually an  tag? Maybe. But the text could be inside a comment.  
Or in a CDATA section. Or inside javascript code. Or...
A regexp is good for recognizing tokens, and this can be used to build a  
parser. But regular expressions alone can't parse these kind of documents,  
just because their grammar is not regular.
(Python re engine is stronger that "mathematical" regular expressions, in  
the sense that it can handle things like backreferences (?P=...) and  
lookahead (?=...) but anyway it can't handle HTML)

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list

regular expressions use

2005-08-22 Thread max(01)*

hi everyone.

i would like to do some uri-decoding, which means to translate patterns 
like "%2b/dhg-%3b %7E" into "+/dhg-; ~": in practice, if a sequence like 
"%2b" is found, it should be translated into one character whose hex 
ascii code is 2b.

i did this:

...
import re
import sys

modello = re.compile("%([0-9a-f][0-9a-f])", re.IGNORECASE)

def funzione(corrispondenza):
   return chr(eval('0x' + corrispondenza.group(1)))

for riga in sys.stdin:
   riga = modello.sub(funzione, riga)
   sys.stdout.write(riga)
...

please comment it. can it be made easily or more compactly? i am a 
python regexp novice.

bye

max

ps: i was trying to pythonate this kind of perl code:

$riga =~ s/%([A-Fa-f0-9][A-Fa-f0-9])/chr(hex($1))/ge;
-- 
http://mail.python.org/mailman/listinfo/python-list

Concerning Regular Expressions

2006-01-29 Thread Tempo

I've been reading a bunch of articles and tutorials on the net, but I
cannot quite get ahold of the whole regular expression process. I have
a list that contains about thirty strings, each in its own spot in the
list. What I want to do is search the list, say it's called 'lines',
for 'R0 -'.

Thanks in advanced for any and all info that I recieve.


-Tempo-

-- 
http://mail.python.org/mailman/listinfo/python-list

tricky regular expressions

2006-02-07 Thread Ernesto

I'm trying to get the right syntax for my regular expression.  The
string I'm trying to parse is:


# myString
[USELESS DATA]
Request: Play
[USELESS DATA]
Name:  David Dude
[USELESS DATA]
Request: Next
[USELESS DATA]
Name: Ernesto Python User


# Right now, I'm using the following code:

pattern_Name= '''(?x)
Title:\s+(.+)
'''
names = re.findall(pattern_Name, myString)
print names

This captures all of the names, but I want an added requirement:
Only capture names which are followed (not necessarily immediately) by
"Request: Play" or "Request: Next".  I guess the regular expression
would look something like:

'''(?x)
["Request: Play" OR "Request: Next"][intermediate
data]Title:\s+(.+)
'''
I didn't see any RE constructs like this in the docs, but I have a
feeling it's possible.

-- 
http://mail.python.org/mailman/listinfo/python-list

tricky regular expressions

2006-02-07 Thread Ernesto

So regular expressions have been good to me so far, but now my problem
is a bit trickier.  The string I'm getting data from looks like this:

myString =
[USELESS DATA]
Request : Play
[USELESS DATA]
Title: Beethoven's 5th
[USELESS DATA]
Request : next
[USELESS DATA]
Title:  song #2
.

I'm using this code to search myString:

pattern = '''(?x)
Title:\s+(.+)
'''
Titles = re.findall(pattern, myString)


The problem is that I only want the "Titles" which are either:

a) Followed by "Request : Play"
b) Followed by "Request : next"

I'm not sure if I should use RE's or some other mechanism.  Thanks

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2005-03-23 Thread MyHaz

escape chars are always a pain when making regex's. There is a section
on it the Regex HOWTO

http://www.amk.ca/python/howto/regex/regex.html#SECTION00042

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2005-03-23 Thread Roel Schroeven

Ron wrote:
> This is probably a repeated question, but try as I might I was unable
> to find something similar in the archives.
> 
> I'm trying to develop a regular expression for recognizing a simplified
> C-Style string syntax.  I need it to be able to handle escape sequences
> of the form \x where x is any character including ".
> 
> Here's what I'm trying:
> 
>   \"([^"\\]|(\\.))*\"
> 
> When I try to get it to recognize something like:
> 
>"I said, \"Hello!\""
> 
> It stops at the first quote after the \.

Works for me:

 >>> print re.search(r'\"([^"\\]|(\\.))*\"',
... r'"I said \"Hello!\""').group(0)
"I said \"Hello!\""


You can leave out the backslashes in fron of the first and last quotes
in the regex, by the way, at least if you use ' instead of " to delimite it:

>>> print re.search(r'"([^"\\]|(\\.))*"',
... r'"I said \"Hello!\""').group(0)
"I said \"Hello!\""


-- 
If I have been able to see further, it was only because I stood
on the shoulders of giants.  -- Isaac Newton

Roel Schroeven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2006-04-12 Thread Peter Hansen

david brochu jr wrote:
> I am trying to grab the following string out of a text file using 
> regular expression (re module):

Why do you have to use a regular expression?

> "DcaVer"=dword:0640

Is all your other input pretty much identical in form?  Specifically, 
the number of interest is the last thing on the line, and always 
preceded by a colon?

> What I need to do with that string is trim down " "DcaVer"=dword:" and 
> convert the remaining number from hex to dec.

What does "trim down" mean?  Do you need something out of the string, or 
are you just discarding/ignoring it?

> I have been trying to figure this out for a while..I am fairly new so 
> please any help would be greatly appreciated.

s = '"DcaVer"=dword:0640'
value = int(s.split(':')[-1], 16)

(In other words, split on colons, take the last field and, treating it 
as a hex value, convert to an integer.)

-Peter

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2006-04-12 Thread Steve Juranich

david brochu jr wrote:

> Hi,
> 
> I am trying to grab the following string out of a text file using regular
> expression (re module):
> 
> "DcaVer"=dword:0640
> 
> What I need to do with that string is trim down " "DcaVer"=dword:" and
> convert the remaining number from hex to dec.
> 
> I have been trying to figure this out for a while..I am fairly new so
> please any help would be greatly appreciated.

line = '"DcaVer"=dword:0640'
value = int(line.split(':')[1], 16)

Note that no regexes are needed.

Cheers.
-- 
Steve Juranich
Tucson, AZ
USA

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2006-04-12 Thread david brochu jr

Pete,
Why do you have to use a regular _expression_?
I don't, I just though this was the easiest way.> "DcaVer"=dword:0640Is all your other input pretty much identical in form?  Specifically,the number of interest is the last thing on the line, and always
preceded by a colon? 
The other information is pretty much identical in form, yes. Exactally, all I am interested in is the number following the colon.
> What I need to do with that string is trim down " "DcaVer"=dword:" and> convert the remaining number from hex to dec.What does "trim down" mean?  Do you need something out of the string, or
are you just discarding/ignoring it?
 
What I meant was I want to discard the "DcaVer"=dword: . All I am interested in is searching for "DcaVer, finding it, and then taking the numerical value found after the colon.
s = '"DcaVer"=dword:0640'value = int(s.split(':')[-1], 16)(In other words, split on colons, take the last field and, treating itas a hex value, convert to an integer.)
 
The only problem with this is DcaVer's value is not always going to be the same, so I need to search specially for DcaVer and then after finding it get the numerical value associated with it. 
 
Thanks
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions

2006-04-12 Thread Tim Chase

> I am trying to grab the following string out of a text file using regular
> expression (re module):

While I'm all for using regexps when they're needed, are you 
sure you need the overhead of regexps?

  f = open("foo.txt", "r")
  for line in f.readlines():
  #if '"DcaVer"=dword:' in line:
  if line.startswith('"DcaVer"=dword:'):
  value = int(line.split(":", 1)[-1], 16)
  print value
  f.close()

If you absolutely must use a regexp,

  import re
  f = open("foo", "r")
  r = re.compile(r'"DcaVer"=dword:([0-9a-fA-F]{7})')
  for line in f.readlines():
  m = r.match(line)
  if m:
  value = int(m.group(1), 16)
  print value
  f.close()

should do the trick for you.  This assumes that each string 
of hex digits has seven characters.  Adjust accordingly.

-tim





-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions...

2009-01-07 Thread James Mills

On Thu, Jan 8, 2009 at 8:54 AM, Ken D'Ambrosio  wrote:
> Hi, all.  As a recovering Perl guy, I have to admit I don't quite "get"
> the re module.  For example, I'd like to do a few things (I'm going to use
> phone numbers, 'cause that's what I'm currently dealing with):
> 12345678900 -- How would I:
> - Get just the area code?
> - Get just the seven-digit number?
>
> In Perl, I'd so something like
> m/^1(...)(...)/;
> and then I'd have that stuff in $1 and $2, respectively.  But the Python
> stuff
> simply isn't clicking for me.  If anyone could supply concrete examples of
> how to do the problem, above, that would be terrific.

There is nothing so special or different about
Python's re module than say over any other language's
regular expression library or capabilities. You should
be able to use pretty much the same things, however:

1. Why can't you just use ordinary string manipulation her e?

One of Python's strengths is in string manipulation.

Consider:

>>> s = "1234567890"
>>> area, number = s[:2], s[2:]
>>> area
'12'
>>> number
'34567890'

cheers
James
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions...

2009-01-07 Thread MRAB

Ken D'Ambrosio wrote:

Hi, all.  As a recovering Perl guy, I have to admit I don't quite
"get" the re module.  For example, I'd like to do a few things (I'm
going to use phone numbers, 'cause that's what I'm currently dealing
with):

> 12345678900 -- How would I:
> - Get just the area code?
> - Get just the seven-digit number?

In Perl, I'd so something like m/^1(...)(...)/; and then I'd have
that stuff in $1 and $2, respectively.  But the Python stuff simply
isn't clicking for me.  If anyone could supply concrete examples of 
how to do the problem, above, that would be terrific.

Perl puts the captured text into variables as a side-effect, which, from
the Python point of view, is undesirable 'magic'. The Python way is for
the result to be returned like any normal function or method call:

match = re.search(r"^1(...)(...)", phone_number)
# match is now a match object if successful or None if unsuccessful
if match:
area_code = match.group(1)
local_code = match.group(2)
# or:
# area_code, local_code = match.groups()

No magic involved!

(In this case simple string slicing would be simpler and faster.)
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions...

2009-01-07 Thread James Stroud


Ken D'Ambrosio wrote:

Hi, all.  As a recovering Perl guy,

[snip]

In Perl, I'd so something like
m/^1(...)(...)/;


Indeed it seems you are recovering from an especially bad case. I 
recommend two doses of the python cookbook per day for one to two 
months. Report back here after your first cycle and we'll tell you how 
you are doing. I'm very optimistic about the prognosis.


James
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions...

2009-01-07 Thread James Mills

On Thu, Jan 8, 2009 at 10:03 AM, James Stroud  wrote:
(...)
> Indeed it seems you are recovering from an especially bad case. I recommend
> two doses of the python cookbook per day for one to two months. Report back
> here after your first cycle and we'll tell you how you are doing. I'm very
> optimistic about the prognosis.

2nd opinion :)

I highly recommend a strong dose of the Python Tutorial (1)
followed by a recovery program of the Python Docs (2)

cheers
James

1. http://docs.python.org/tutorial/
2. http://docs.python.org/
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions...

2009-01-07 Thread bearophileHUGS

Ken D'Ambrosio:
> But the Python stuff simply isn't clicking for me.

For people coming from Perl that want to perform some string
processing with Python I suggest to learn first array/string slices
and string methods. And to try to use the regular expressions as
little as possible.

Bye,
bearophile
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions...

2009-01-07 Thread Ben Finney

"Ken D'Ambrosio"  writes:

> Hi, all.  As a recovering Perl guy, I have to admit I don't quite "get"
> the re module.  For example, I'd like to do a few things (I'm going to use
> phone numbers, 'cause that's what I'm currently dealing with):
> 12345678900 -- How would I:
> - Get just the area code?
> - Get just the seven-digit number?
> 
> In Perl, I'd so something like
> m/^1(...)(...)/;

Wouldn't that be better as:

m/^1(\d{3})(\d{7})$/;

I'll assume that more-precise expression in what follows.

> and then I'd have the numbers in $1 and $2, respectively.  But the Python
> stuff simply isn't clicking for me.

In general, where a set of data is likely to be iterated, the Pythonic
way to present it is via a single iterable (instead of, in your Perl
example, separate variables).

Then, for those (generally less frequent) cases where you do want the
separate items, you can bind them in a single statement:

(foo, bar, baz) = some_sequence

or

(foo, bar, baz) = (item for item in some_sequence)

e.g.:

>>> (foo, bar, baz) = [1, 2, 3]
>>> foo
1
>>> bar
2
>>> baz
3

So, the match returned by the various ‘re’ module match functions is
an object which allows access to the grouped matches as a sequence.

> If anyone could supply concrete examples of how to do the problem,
> above, that would be terrific.

Assuming the following:

>>> import re
>>> phone_number_regex = '^1(\d{3})(\d{7})$'

Trivial one-shot example:

>>> phone_number = '12345678900'
>>> (area_code, local_number) = re.match(phone_number_regex, 
phone_number).groups()
>>> area_code
'234'
>>> local_number
'5678900'

More explicit example, showing the various steps and assuming you want
to re-use the various values in multiple statements:

>>> phone_number_pattern = re.compile(phone_number_regex)
>>> phone_number_pattern
<_sre.SRE_Pattern object at 0xf7f8c598>

>>> phone_number = '12345678900'
    >>> phone_number_match = phone_number_pattern.match(phone_number)
>>> phone_number_match
<_sre.SRE_Match object at 0xf7f52338>

>>> (area_code, local_number) = phone_number_match.groups()
>>> area_code
'234'
>>> local_number
'5678900'

Python regular expressions also allow naming each group, for later
access to the matches via a dict:

>>> phone_number_regex = '^1(?P\d{3})(?P\d{7})'
>>> phone_number_pattern = re.compile(phone_number_regex)
>>> phone_number_match = phone_number_pattern.match(phone_number)
>>> phone_number_groups = phone_number_match.groupdict()
>>> phone_number_groups['area_code']
'234'
>>> phone_number_groups['local_number']
'5678900'

-- 
 \   “… one of the main causes of the fall of the Roman Empire was |
  `\that, lacking zero, they had no way to indicate successful |
_o__)  termination of their C programs.” —Robert Firth |
Ben Finney
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions...

2009-01-07 Thread Cameron Simpson

On 07Jan2009 19:51, Ken D'Ambrosio  wrote:
| Hi, all.  As a recovering Perl guy, I have to admit I don't quite "get"
| the re module.  For example, I'd like to do a few things (I'm going to use
| phone numbers, 'cause that's what I'm currently dealing with):
| 12345678900 -- How would I:
| - Get just the area code?
| - Get just the seven-digit number?
| 
| In Perl, I'd so something like
| m/^1(...)(...)/;
| and then I'd have the numbers in $1 and $2, respectively.  But the Python
| stuff simply isn't clicking for me.  If anyone could supply concrete
| examples of how to do the problem, above, that would be terrific.

I presume you're consulting this:
  http://docs.python.org/library/re.html#module-re

Something like this (untested):

  import re
  phone = '12345678900'
  num_re = re.compile('^1(...)(...)')

num_re is now a regular expression object:
  http://docs.python.org/library/re.html#regular-expression-objects
much as you get from a "precompiled" perl regular expression.

  m = num_re.match(phone)

m is now the result of a match against the phone number:
  http://docs.python.org/library/re.html#id1

m.group(0) is what was matched by the whole expression. m.group(1) is perl's
$1, m.group(2) is $2 etc.
For example:

  area_code = m.group(1)

There is also an expand() method that accepts \1, \2 etc in its
template. For direct substitutions (as in perl's s/this/that/) there is
the regular expression object sub() method.

It's a bit more broken out than you normally get in perl, but the pieces
are all there.

Cheers,
-- 
Cameron Simpson  DoD#743
http://www.cskk.ezoshosting.com/cs/

Teamwork is essential. It lets you blame someone else.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions...

2009-01-20 Thread Aahz

In article ,
Ken D'Ambrosio  wrote:
>
>Hi, all.  As a recovering Perl guy, I have to admit I don't quite "get"
>the re module.  

Refer to the following every time you want to use regexes in Python:

'Some people, when confronted with a problem, think "I know, I'll use
regular expressions."  Now they have two problems.'
--Jamie Zawinski, comp.emacs.xemacs, 8/1997

Although there are times when regexes are your best option, Python has
many other good options for processing strings, and your code readability
will usually increase if you try one of them first.
-- 
Aahz (a...@pythoncraft.com)   <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote 
programs, then the first woodpecker that came along would destroy civilization.
--
http://mail.python.org/mailman/listinfo/python-list

regular expressions ... slow

2008-11-17 Thread Uwe Schmitt

Hi,

Is anobody aware of this post:  http://swtch.com/~rsc/regexp/regexp1.html
?

Are there any plans  to speed up Pythons regular expression module ?
Or
is the example in this artricle too far from reality ???

Greetings, Uwe
--
http://mail.python.org/mailman/listinfo/python-list

Regular expressions question

2008-10-02 Thread aditya shukla

Hello folks ,

I trying to match a pattern in a string , i am new in using re .This is what
is happening

When i do this

p = re.compile('(\[&&NHX:)')
>>> m = p.match("[&&NHX:C=0.195.0]")
>>> print m
<_sre.SRE_Match object at 0x013FE1E0>
 --- thus i am able to find the match
but when i use the string

m = p.match("-bin-ulockmgr_server:0.99[&&NHX:")
>>> print m
None
-i am not able to find the match .

Can someone help me here.

Thanks

Aditya
--
http://mail.python.org/mailman/listinfo/python-list

Unicode Regular Expressions

2007-12-23 Thread bryan rasmussen

Hi,

I'm writing a program that requires specifically Unicode regular
expressions http://unicode.org/reports/tr18/ to be loaded in from an
external file and then interpreted against the data.  if I use Python
Regular expressions is there a flag I can set to specify that the
regular expressions that are loaded from the file conform to Unicode
regular expressions. What problems can be expected using Unicode Regex
with Python, is there a library I should be using?

Cheers,
Bryan Rasmussen
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions

2007-11-06 Thread Paul Hankin

On Nov 6, 4:49 pm, [EMAIL PROTECTED] wrote:
> hi i am looking for pattern in regular expreesion that replaces
> anything starting with and betweeen http:// until /
> likehttp://www.start.com/startservice/yellow/fdhttp://helo/abcdwill
> be replaced as
> p/startservice/yellow/ fdp/abcd

What have you come up with so far? Have you looked at the 're' module?

--
Paul Hankin

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions

2007-11-06 Thread J. Clifford Dyer

On Tue, Nov 06, 2007 at 08:49:33AM -0800, [EMAIL PROTECTED] wrote regarding 
regular expressions:
> 
> hi i am looking for pattern in regular expreesion that replaces
> anything starting with and betweeen http:// until /
> like http://www.start.com/startservice/yellow/ fdhttp://helo/abcd will
> be replaced as
> p/startservice/yellow/ fdp/abcd
> 

You don't need regular expressions to do that.  Look into the methods that 
strings have.  Look at slicing. Look at len.  Keep your code readable for 
future generations.

Py>>> help(str)
Py>>> dir(str)
Py>>> help(str.startswith)

Cheers,
Cliff
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions

2007-11-06 Thread Paul McGuire

On Nov 6, 11:07 am, "J. Clifford Dyer" <[EMAIL PROTECTED]> wrote:
> On Tue, Nov 06, 2007 at 08:49:33AM -0800, [EMAIL PROTECTED] wrote regarding 
> regular expressions:
>
>
>
> > hi i am looking for pattern in regular expreesion that replaces
> > anything starting with and betweeen http:// until /
> > likehttp://www.start.com/startservice/yellow/fdhttp://helo/abcdwill
> > be replaced as
> > p/startservice/yellow/ fdp/abcd
>
> You don't need regular expressions to do that.  Look into the methods that 
> strings have.  Look at slicing. Look at len.  Keep your code readable for 
> future generations.
>
> Py>>> help(str)
> Py>>> dir(str)
> Py>>> help(str.startswith)
>
> Cheers,
> Cliff

Look again at the sample input.  Some of the OP's replacement targets
are not at the beginning of a word, so str.startswith wont be much
help.

Here are 2 solutions, one using re, one using pyparsing.

-- Paul


instr = """
anything starting with and betweeen "http://"; until "/"
like http://www.start.com/startservice/yellow/ fdhttp://helo/abcd
will
be replaced as
"""

REPLACE_STRING = "p"

# an re solution
import re
print re.sub("http://[^/]*";, REPLACE_STRING, instr)


# a pyparsing solution - with handling of target strings inside quotes
from pyparsing import SkipTo, replaceWith, quotedString

replPattern = "http://"; + SkipTo("/")
replPattern.setParseAction( replaceWith(REPLACE_STRING) )
replPattern.ignore(quotedString)

print replPattern.transformString(instr)


Prints:

anything starting with and betweeen "p/"
like p/startservice/yellow/ fdp/abcd will
be replaced as


anything starting with and betweeen "http://"; until "/"
like p/startservice/yellow/ fdp/abcd will
be replaced as

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions

2007-11-07 Thread J. Cliff Dyer

Paul McGuire wrote:
> On Nov 6, 11:07 am, "J. Clifford Dyer" <[EMAIL PROTECTED]> wrote:
>   
>> On Tue, Nov 06, 2007 at 08:49:33AM -0800, [EMAIL PROTECTED] wrote regarding 
>> regular expressions:
>>
>>
>>
>> 
>>> hi i am looking for pattern in regular expreesion that replaces
>>> anything starting with and betweeen http:// until /
>>> likehttp://www.start.com/startservice/yellow/fdhttp://helo/abcdwill
>>> be replaced as
>>> p/startservice/yellow/ fdp/abcd
>>>   
>> You don't need regular expressions to do that.  Look into the methods that 
>> strings have.  Look at slicing. Look at len.  Keep your code readable for 
>> future generations.
>>
>> Py>>> help(str)
>> Py>>> dir(str)
>> Py>>> help(str.startswith)
>>
>> Cheers,
>> Cliff
>> 
>
> Look again at the sample input.  Some of the OP's replacement targets
> are not at the beginning of a word, so str.startswith wont be much
> help.
>
> Here are 2 solutions, one using re, one using pyparsing.
>
> -- Paul
>
>
> instr = """
> anything starting with and betweeen "http://"; until "/"
> like http://www.start.com/startservice/yellow/ fdhttp://helo/abcd
> will
> be replaced as
> """
>
> REPLACE_STRING = "p"
>
> # an re solution
> import re
> print re.sub("http://[^/]*";, REPLACE_STRING, instr)
>
>
> # a pyparsing solution - with handling of target strings inside quotes
> from pyparsing import SkipTo, replaceWith, quotedString
>
> replPattern = "http://"; + SkipTo("/")
> replPattern.setParseAction( replaceWith(REPLACE_STRING) )
> replPattern.ignore(quotedString)
>
> print replPattern.transformString(instr)
>
>
> Prints:
>
> anything starting with and betweeen "p/"
> like p/startservice/yellow/ fdp/abcd will
> be replaced as
>
>
> anything starting with and betweeen "http://"; until "/"
> like p/startservice/yellow/ fdp/abcd will
> be replaced as
>
>   
Interesting.  In my email clients, they do show up at the beginning of
words (thunderbird and mutt), but in your reply they aren't.  I wonder
if there's some funky unicode space that your computer isn't
rendering  Or something on my end.  There were definitely spaces in
his email as it appears on my computer.

But if there aren't, s.startswith() is clearly not the way to go.


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions.

2008-08-08 Thread Peter Otten

Atul. wrote:

> I have been playing around with REs and could not get the following
> code to run.
> 
> import re
> vowel = r'[aeiou]'
> re.findall(vowel, r"vowel")
> 
> anything wrong I have done?

Yes. You didn't paste the traceback into your message.
 
>>> import re
>>> vowel = r'[aeiou]'
>>> re.findall(vowel, r"vowel")
['o', 'e']

It works as expected here.

Peter
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions.

2008-08-08 Thread Atul.


> Yes. You didn't paste the traceback into your message.
>
> >>> import re
> >>> vowel = r'[aeiou]'
> >>> re.findall(vowel, r"vowel")
>
> ['o', 'e']
>
> It works as expected here.
>
> Peter

When I key this input in IDLE it works but when I try to run the
module it wont work.
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions.

2008-08-08 Thread Alexei Zankevich

Use the print statement:

import re
vowel = r'[aeiou]'
print re.findall(vowel, r"vowel")

Alexey

On Fri, Aug 8, 2008 at 2:17 PM, Atul. <[EMAIL PROTECTED]> wrote:

>
> > Yes. You didn't paste the traceback into your message.
> >
> > >>> import re
> > >>> vowel = r'[aeiou]'
> > >>> re.findall(vowel, r"vowel")
> >
> > ['o', 'e']
> >
> > It works as expected here.
> >
> > Peter
>
> When I key this input in IDLE it works but when I try to run the
> module it wont work.
> --
> http://mail.python.org/mailman/listinfo/python-list
>
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions.

2008-08-08 Thread Peter Otten

Atul. wrote:

> 
>> Yes. You didn't paste the traceback into your message.
>>
>> >>> import re
>> >>> vowel = r'[aeiou]'
>> >>> re.findall(vowel, r"vowel")
>>
>> ['o', 'e']
>>
>> It works as expected here.
>>
>> Peter
> 
> When I key this input in IDLE it works but when I try to run the
> module it wont work.

What's the name of your script? What happens when you run it? Does it print
a traceback? If so, what does it say? Please cut and paste, don't
paraphrase.

Peter
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions.

2008-08-08 Thread Atul.

On Aug 8, 4:22 pm, Peter Otten <[EMAIL PROTECTED]> wrote:
> Atul. wrote:
>
> >> Yes. You didn't paste the traceback into your message.
>
> >> >>> import re
> >> >>> vowel = r'[aeiou]'
> >> >>> re.findall(vowel, r"vowel")
>
> >> ['o', 'e']
>
> >> It works as expected here.
>
> >> Peter
>
> > When I key this input in IDLE it works but when I try to run the
> > module it wont work.
>
> What's the name of your script? What happens when you run it? Does it print
> a traceback? If so, what does it say? Please cut and paste, don't
> paraphrase.
>
> Peter

This is something get when I run it like below. it does not print any
output.

[EMAIL PROTECTED]:~/Work/work/programs$ python fourth.py
[EMAIL PROTECTED]:~/Work/work/programs$
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions.

2008-08-08 Thread Atul.

On Aug 8, 4:33 pm, "Atul." <[EMAIL PROTECTED]> wrote:
> On Aug 8, 4:22 pm, Peter Otten <[EMAIL PROTECTED]> wrote:
>
>
>
> > Atul. wrote:
>
> > >> Yes. You didn't paste the traceback into your message.
>
> > >> >>> import re
> > >> >>> vowel = r'[aeiou]'
> > >> >>> re.findall(vowel, r"vowel")
>
> > >> ['o', 'e']
>
> > >> It works as expected here.
>
> > >> Peter
>
> > > When I key this input in IDLE it works but when I try to run the
> > > module it wont work.
>
> > What's the name of your script? What happens when you run it? Does it print
> > a traceback? If so, what does it say? Please cut and paste, don't
> > paraphrase.
>
> > Peter
>
> This is something get when I run it like below. it does not print any
> output.
>
> [EMAIL PROTECTED]:~/Work/work/programs$ python fourth.py
> [EMAIL PROTECTED]:~/Work/work/programs$

ok I get it thats coz, I dont print it. right? when I print it does
she it.
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions.

2008-08-08 Thread Diez B. Roggisch


Atul. schrieb:

On Aug 8, 4:33 pm, "Atul." <[EMAIL PROTECTED]> wrote:

On Aug 8, 4:22 pm, Peter Otten <[EMAIL PROTECTED]> wrote:




Atul. wrote:

Yes. You didn't paste the traceback into your message.

import re
vowel = r'[aeiou]'
re.findall(vowel, r"vowel")

['o', 'e']
It works as expected here.
Peter

When I key this input in IDLE it works but when I try to run the
module it wont work.

What's the name of your script? What happens when you run it? Does it print
a traceback? If so, what does it say? Please cut and paste, don't
paraphrase.
Peter

This is something get when I run it like below. it does not print any
output.

[EMAIL PROTECTED]:~/Work/work/programs$ python fourth.py
[EMAIL PROTECTED]:~/Work/work/programs$


ok I get it thats coz, I dont print it. right? when I print it does
she it.


Yes.

Diez
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions.

2008-08-08 Thread Peter Otten

Atul. wrote:

> On Aug 8, 4:33 pm, "Atul." <[EMAIL PROTECTED]> wrote:
>> On Aug 8, 4:22 pm, Peter Otten <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>> > Atul. wrote:
>>
>> > >> Yes. You didn't paste the traceback into your message.
>>
>> > >> >>> import re
>> > >> >>> vowel = r'[aeiou]'
>> > >> >>> re.findall(vowel, r"vowel")
>>
>> > >> ['o', 'e']
>>
>> > >> It works as expected here.
>>
>> > >> Peter
>>
>> > > When I key this input in IDLE it works but when I try to run the
>> > > module it wont work.
>>
>> > What's the name of your script? What happens when you run it? Does it
>> > print a traceback? If so, what does it say? Please cut and paste, don't
>> > paraphrase.
>>
>> > Peter
>>
>> This is something get when I run it like below. it does not print any
>> output.
>>
>> [EMAIL PROTECTED]:~/Work/work/programs$ python fourth.py
>> [EMAIL PROTECTED]:~/Work/work/programs$
> 
> ok I get it thats coz, I dont print it. right? when I print it does
> she it.

Heureka!
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions.

2008-08-08 Thread Atul.

The same file when I use with the following does not work.

import re
vowel =
r'[u"\u093e"u"\u093f"u"\u0940"u"\u0941"u"\u0942"u"\u0943"u"\u0944"u"\u0945"u"\u0946"u"\u0947"u"\u0948"u"\u0949"u"\u094a"u"\u094b"u"\u094c"]'
print re.findall(vowel, u"\u092f\u093e\u0902\u091a\u094d\u092f\u093e",
re.UNICODE)



[EMAIL PROTECTED]:~/Work/work/programs$ python fourth.py
[]
[EMAIL PROTECTED]:~/Work/work/programs$


is this the way to use Unicode in REs?

Regards,
Atul.
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions.

2008-08-08 Thread Peter Otten

Atul. wrote:

> The same file when I use with the following does not work.
> 
> import re
> vowel =
> r'[u"\u093e"u"\u093f"u"\u0940"u"\u0941"u"\u0942"u"\u0943"u"\u0944"u"\u0945"u"\u0946"u"\u0947"u"\u0948"u"\u0949"u"\u094a"u"\u094b"u"\u094c"]'
> print re.findall(vowel, u"\u092f\u093e\u0902\u091a\u094d\u092f\u093e",
> re.UNICODE)
> 
> 
> 
> [EMAIL PROTECTED]:~/Work/work/programs$ python fourth.py
> []
> [EMAIL PROTECTED]:~/Work/work/programs$
> 
> 
> is this the way to use Unicode in REs?

No, u"..." is part of the string, not the character. The regex becomes

# untested
vowel = 
u'[\u093e\u093f\u0940\u0941\u0942\u0943\u0944\u0945\u0946\u0947\u0948\u0949\u094a\u094b\u094c]'

Peter
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expressions.

2008-08-08 Thread Hrvoje Niksic

"Atul." <[EMAIL PROTECTED]> writes:

> the following does not work.
>
> import re
> vowel =
> r'[u"\u093e"u"\u093f"u"\u0940"u"\u0941"u"\u0942"u"\u0943"u"\u0944"u"\u0945"u"\u0946"u"\u0947"u"\u0948"u"\u0949"u"\u094a"u"\u094b"u"\u094c"]'

Unfortunately you cannot embed arbitrary Python string constants
(u"...") in regular expressions.  What does work is something like:

>>> vowel = 
>>> u'[\u093e\u093f\u0940\u0941\u0942\u0943\u0944\u0945\u0946\u0947\u0948\u0949\u094a\u094b\u094c]'
>>> re.findall(vowel, u"\u092f\u093e\u0902\u091a\u094d\u092f\u093e")
[u'\u093e', u'\u093e']
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-02 Thread MRAB


On 2015-11-03 01:09, Seymore4Head wrote:

How do I make a regular expression that returns true if the end of the
line is an asterisk


To match an asterisk: \*

To match the end of a line: $

To match an asterisk at the end of a line: \*$

--
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-02 Thread Tim Chase

On 2015-11-02 20:09, Seymore4Head wrote:
> How do I make a regular expression that returns true if the end of
> the line is an asterisk

Why use a regular expression?

  if line[-1] == '*':
yep(line)
  else:
nope(line)

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-02 Thread Seymore4Head

On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
 wrote:

>On 2015-11-02 20:09, Seymore4Head wrote:
>> How do I make a regular expression that returns true if the end of
>> the line is an asterisk
>
>Why use a regular expression?
>
>  if line[-1] == '*':
>yep(line)
>  else:
>nope(line)
>
>-tkc
>
>
Because that is the part of Python I am trying to learn at the moment.
Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-02 Thread Seymore4Head

On Tue, 3 Nov 2015 01:19:34 +, MRAB 
wrote:

>On 2015-11-03 01:09, Seymore4Head wrote:
>> How do I make a regular expression that returns true if the end of the
>> line is an asterisk
>>
>To match an asterisk: \*
>
>To match the end of a line: $
>
>To match an asterisk at the end of a line: \*$

Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-02 Thread Michael Torrie

On 11/02/2015 07:42 PM, Tim Chase wrote:
> On 2015-11-02 20:09, Seymore4Head wrote:
>> How do I make a regular expression that returns true if the end of
>> the line is an asterisk
> 
> Why use a regular expression?
> 
>   if line[-1] == '*':
> yep(line)
>   else:
> nope(line)

Indeed, sometimes Jamie Zawinski's is often quite appropriate:

Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-02 Thread Joel Goldstick

On Mon, Nov 2, 2015 at 10:17 PM, Seymore4Head 
wrote:

> On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
>  wrote:
>
> >On 2015-11-02 20:09, Seymore4Head wrote:
> >> How do I make a regular expression that returns true if the end of
> >> the line is an asterisk
> >
> >Why use a regular expression?
> >
> >  if line[-1] == '*':
> >yep(line)
> >  else:
> >nope(line)
> >
> >-tkc
> >
> >
> Because that is the part of Python I am trying to learn at the moment.
> Thanks
> --
> https://mail.python.org/mailman/listinfo/python-list
>

My completely unsolicited advice is that regular expressions shouldn't be
very high on the list of things to learn.  They are very useful, and very
tricky and prone many problems that can and should be learned to be
resolved with much simpler methods.  If you really want to learn regular
expressions, that's great but the problem you posed is not one for which
they are the best solution.  Remember simpler is better than complex.

-- 
Joel Goldstick
http://joelgoldstick.com/stats/birthdays
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-02 Thread rurpy--- via Python-list

On 11/02/2015 08:51 PM, Michael Torrie wrote:
>[...]
> Indeed, sometimes Jamie Zawinski's is often quite appropriate:
> 
> Some people, when confronted with a problem, think "I know, I'll use
> regular expressions." Now they have two problems.

Or its sometimes heard paraphrase:
  Some people, when confronted with a problem, think "I know, I'll use
  Python." Now they have two problems
The point being it's a cute and memorable aphorism but not very meaningful
because it can be applied to anything one wishes to denigrate.

Of course there are people who misuse regexes. But I am quite sure,
especially in the Python community, there are just as many who fail to
use them when they are appropriate which is just as bad.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-02 Thread rurpy--- via Python-list

On Monday, November 2, 2015 at 8:58:45 PM UTC-7, Joel Goldstick wrote:
> On Mon, Nov 2, 2015 at 10:17 PM, Seymore4Head 
> wrote:
> 
> > On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
> >  wrote:
> >
> > >On 2015-11-02 20:09, Seymore4Head wrote:
> > >> How do I make a regular expression that returns true if the end of
> > >> the line is an asterisk
> > >
> > >Why use a regular expression?
> > >
> > >  if line[-1] == '*':
> > >yep(line)
> > >  else:
> > >nope(line)
> > >
> > >-tkc
> > >
> > >
> > Because that is the part of Python I am trying to learn at the moment.
> > Thanks
> > --
> > https://mail.python.org/mailman/listinfo/python-list
> >
> 
> My completely unsolicited advice is that regular expressions shouldn't be
> very high on the list of things to learn.  They are very useful, and very
> tricky and prone many problems that can and should be learned to be
> resolved with much simpler methods.  If you really want to learn regular
> expressions, that's great but the problem you posed is not one for which
> they are the best solution.  Remember simpler is better than complex.

Regular expressions should be learned by every programmer or by anyone
who wants to use computers as a tool.  They are a fundamental part of
computer science and are used in all sorts of matching and searching 
from compilers down to your work-a-day text editor.

Not knowing how to use them is like an auto mechanic not knowing how to 
use a socket wrench.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-02 Thread Michael Torrie

On 11/02/2015 09:23 PM, rurpy--- via Python-list wrote:
> On 11/02/2015 08:51 PM, Michael Torrie wrote:
>> [...]
>> Indeed, sometimes Jamie Zawinski's is often quite appropriate:
>>
>> Some people, when confronted with a problem, think "I know, I'll use
>> regular expressions." Now they have two problems.
> 
> Or its sometimes heard paraphrase:
>   Some people, when confronted with a problem, think "I know, I'll use
>   Python." Now they have two problems
> The point being it's a cute and memorable aphorism but not very meaningful
> because it can be applied to anything one wishes to denigrate.
> 
> Of course there are people who misuse regexes. But I am quite sure,
> especially in the Python community, there are just as many who fail to
> use them when they are appropriate which is just as bad.

Judging by a few posts on the list lately, I'd say it is highly relevant
to Python itself.  Too many people have only a vague notion of a problem
they'd like to solve and although they don't really understand the
problem, they've heard Python is a good language to learn, so they ask
how they can solve that problem with Python.

Now, this certainly can work for a person who's already experienced in
several languages and who already understands the problem.  For others,
it's very much now two intractable problems.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-02 Thread Michael Torrie

On 11/02/2015 09:23 PM, rurpy--- via Python-list wrote:
>> My completely unsolicited advice is that regular expressions shouldn't be
>> very high on the list of things to learn.  They are very useful, and very
>> tricky and prone many problems that can and should be learned to be
>> resolved with much simpler methods.  If you really want to learn regular
>> expressions, that's great but the problem you posed is not one for which
>> they are the best solution.  Remember simpler is better than complex.
> 
> Regular expressions should be learned by every programmer or by anyone
> who wants to use computers as a tool.  They are a fundamental part of
> computer science and are used in all sorts of matching and searching 
> from compilers down to your work-a-day text editor.
> 
> Not knowing how to use them is like an auto mechanic not knowing how to 
> use a socket wrench.

Not quite.  Core language concepts like ifs, loops, functions,
variables, slicing, etc are the socket wrenches of the programmer's
toolbox.  Regexs are like an electric impact socket wrench.  You can do
the same work without it, but in many cases it's slower. But you have to
learn the other hand tools first in order to really use the electric
driver properly (understanding torques, direction of threads, etc), lest
you wonder why you're breaking off so many bolts with the torque of the
impact drive.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-02 Thread Steven D'Aprano

On Tue, 3 Nov 2015 03:23 pm, ru...@yahoo.com wrote:

> Regular expressions should be learned by every programmer or by anyone
> who wants to use computers as a tool.  They are a fundamental part of
> computer science and are used in all sorts of matching and searching
> from compilers down to your work-a-day text editor.

You are absolutely right.

If only regular expressions weren't such an overly-terse, cryptic
mini-language, with all but no debugging capabilities, they would be great.

If only there wasn't an extensive culture of regular expression abuse within
programming communities, they would be fine.

All technologies are open to abuse. But we don't say:

  Some people, when confronted with a problem, think "I know, I'll use
  arithmetic." Now they have two problems.

because abuse of arithmetic is rare. It's hard to misuse it, and while
arithmetic can be complicated, it's rare for programmers to abuse it. But
the same cannot be said for regexes -- they are regularly misused, abused,
and down-right hard to use right even when you have a good reason for using
them:

http://www.thedailywtf.com/articles/Irregular_Expression

http://blog.codinghorror.com/regex-use-vs-regex-abuse/

http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html

If there is one person who has done more to create a regex culture, it is
Larry Wall, inventor of Perl. Even Larry Wall says that regexes are
overused and their syntax is harmful, and he has recreated them for Perl 6:

http://www.perl.com/pub/2002/06/04/apo5.html

Oh, and the icing on the cake, regexes can be a security vulnerability too:

https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS

-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Nick Sarbicki

On Tue, Nov 3, 2015 at 7:15 AM, Steven D'Aprano  wrote:

> On Tue, 3 Nov 2015 03:23 pm, ru...@yahoo.com wrote:
>
> > Regular expressions should be learned by every programmer or by anyone
> > who wants to use computers as a tool.  They are a fundamental part of
> > computer science and are used in all sorts of matching and searching
> > from compilers down to your work-a-day text editor.
>
> You are absolutely right.
>
> If only regular expressions weren't such an overly-terse, cryptic
> mini-language, with all but no debugging capabilities, they would be great.
>
> If only there wasn't an extensive culture of regular expression abuse
> within
> programming communities, they would be fine.
>
> All technologies are open to abuse. But we don't say:
>
>   Some people, when confronted with a problem, think "I know, I'll use
>   arithmetic." Now they have two problems.
>
> because abuse of arithmetic is rare. It's hard to misuse it, and while
> arithmetic can be complicated, it's rare for programmers to abuse it. But
> the same cannot be said for regexes -- they are regularly misused, abused,
> and down-right hard to use right even when you have a good reason for using
> them:
>
> http://www.thedailywtf.com/articles/Irregular_Expression
>
> http://blog.codinghorror.com/regex-use-vs-regex-abuse/
>
>
> http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html
>
>
> If there is one person who has done more to create a regex culture, it is
> Larry Wall, inventor of Perl. Even Larry Wall says that regexes are
> overused and their syntax is harmful, and he has recreated them for Perl 6:
>
> http://www.perl.com/pub/2002/06/04/apo5.html
>
> Oh, and the icing on the cake, regexes can be a security vulnerability too:
>
>
> https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS
>
>
>
> --
> Steven
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>


+1

I agree that regex is an entirely necessary part of a programmers toolkit,
but dear god some people need to be taught restraint. The majority of
people I talk about regex to have no idea when and where it shouldn't be
used.

As an example part of my job is bringing our legacy Python code into the
modern day, and one of the largest roadblocks is the amount of regex used.

Some is necessary.

Some can be replaced by an `if word in str` or something similarly basic.

Some spans hundreds of lines and causes acute alopecia.

Just yesterday I found a colleague trying to parse HTML with regex.

So yes, teach regex, but teach it after the basics, and please emphasise
when it is appropriate to use it.

Yes I am bitter.

- Nick.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Peter Otten

Michael Torrie wrote:

> On 11/02/2015 07:42 PM, Tim Chase wrote:
>> On 2015-11-02 20:09, Seymore4Head wrote:
>>> How do I make a regular expression that returns true if the end of
>>> the line is an asterisk
>> 
>> Why use a regular expression?
>> 
>>   if line[-1] == '*':
>> yep(line)
>>   else:
>> nope(line)
> 
> Indeed, sometimes Jamie Zawinski's is often quite appropriate:
> 
> Some people, when confronted with a problem, think "I know, I'll use
> regular expressions." Now they have two problems.

Incidentally the code example has two "problems", too.

- What about the empty string?
- What about lines with a trailing "\n", i. e. as they are usually delivered
  when iterating over a file?

Below is a comparison of some of your options. The "one obvious way" 
line.rstrip("\n").endswith("*") is not included ;)

$ cat starry_table.py 
import re


def show_table(data, header):
rows = [header]
rows.extend([str(c) for c in row] for row in data)
widths = [max(len(row[i]) for row in rows) for i in range(len(header))]
template = "  ".join("{:%d}" % w for w in widths)
for row in rows:
print(template.format(*row))


def compare(sample_lines):
for line in sample_lines:
got_re = bool(re.compile("\*$").search(line))
got_re_M = bool(re.compile("\*$", re.M).search(line))
got_endswith = line.endswith("*")
got_endswith2 = line.endswith(("*", "*\n"))
got_substring = line[-1:] == "*"
try:
got_char = line[-1] == "*"
except IndexError:
got_char = "#exception"
results = (
got_re, got_re_M,
got_endswith, got_endswith2,
got_substring, got_char)
yield (
["", "X"][len(set(results)) > 1],
repr(line)) + results


SAMPLE = ["", "\n", "foo\n", "*\n", "*", "foo*", "foo*\n", "foo*\nbar"]
HEADER = [
"", "line", "regex", "re.M",
"endswith", 'endswith(("*", "*\\n"))',
"substring", "char"]

if __name__ == "__main__":
show_table(compare(SAMPLE), HEADER)


$ python3 starry_table.py 
   line regex  re.M   endswith  endswith(("*", "*\n"))  substring  char 
 
X  ''   False  False  False False   False  
#exception
   '\n' False  False  False False   False  
False 
   'foo\n'  False  False  False False   False  
False 
X  '*\n'True   True   False TrueFalse  
False 
   '*'  True   True   True  TrueTrue   True 
 
   'foo*'   True   True   True  TrueTrue   True 
 
X  'foo*\n' True   True   False TrueFalse  
False 
X  'foo*\nbar'  False  True   False False   False  
False 


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Tim Chase

On 2015-11-03 10:25, Peter Otten wrote:
> >>> How do I make a regular expression that returns true if the end
> >>> of the line is an asterisk
> >> 
> >> Why use a regular expression?
> >> 
> >>   if line[-1] == '*':
> >> yep(line)
> >>   else:
> >> nope(line)
> 
> Incidentally the code example has two "problems", too.
> 
> - What about the empty string?

Good catch: .endswith() works better.

> - What about lines with a trailing "\n", i. e. as they are usually
> delivered when iterating over a file?

Then your string *doesn't* end with a "*", but rather with a
newline. ;-)

Though according to the OP's specs, the following function would work
too:

  def ends_in_asterisk(s):
return True

It *does* return True if the line ends in an asterisk (no requirement
to make the function return False under any other conditions).

-tkc

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Denis McMahon

On Mon, 02 Nov 2015 22:17:49 -0500, Seymore4Head wrote:

> On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
>  wrote:
> 
>>On 2015-11-02 20:09, Seymore4Head wrote:

>>> How do I make a regular expression that returns true if the end of the
>>> line is an asterisk

>>Why use a regular expression?

> Because that is the part of Python I am trying to learn at the moment.

The most important thing to learn about regular expressions is when to 
use them and when not to use them.

Returning true if the last character in a string is an asterisk is almost 
certainly a brilliant example of when not to use a regular expression. 
Here are some timings I tested:

#!/usr/bin/python

import re

import timeit

patt = re.compile("\*$")

start_time = timeit.default_timer()
for i in range(100):
x = re.match("\*$", "test 1")
elapsed = timeit.default_timer() - start_time
print "re, false", elapsed

start_time = timeit.default_timer()
for i in range(100):
x = re.match("\*$", "test *")
elapsed = timeit.default_timer() - start_time
print "re, true", elapsed

start_time = timeit.default_timer()
for i in range(100):
x = patt.match("test 1")
elapsed = timeit.default_timer() - start_time
print "compiled re, false", elapsed

start_time = timeit.default_timer()
for i in range(100):
x = patt.match("test *")
elapsed = timeit.default_timer() - start_time
print "compiled re, true", elapsed

start_time = timeit.default_timer()
for i in range(100):
x = "test 1"[-1] == "*"
elapsed = timeit.default_timer() - start_time
print "char compare, false", elapsed

start_time = timeit.default_timer()
for i in range(100):
x = "test *"[-1] == "*"
elapsed = timeit.default_timer() - start_time
print "char compare, true", elapsed

RESULTS:

re, false 2.4701731205
re, true 2.42048001289
compiled re, false 0.875837087631
compiled re, true 0.876382112503
char compare, false 0.26283121109
char compare, true 0.263465881348

The compiled re is about 3 times as fast as the uncompiled re. The 
character comparison is about 3 times as fast as the compiled re.

-- 
Denis McMahon, denismfmcma...@gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Peter Otten

Tim Chase wrote:

> On 2015-11-03 10:25, Peter Otten wrote:
>> >>> How do I make a regular expression that returns true if the end
>> >>> of the line is an asterisk
>> >> 
>> >> Why use a regular expression?
>> >> 
>> >>   if line[-1] == '*':
>> >> yep(line)
>> >>   else:
>> >> nope(line)
>> 
>> Incidentally the code example has two "problems", too.
>> 
>> - What about the empty string?
> 
> Good catch: .endswith() works better.
> 
>> - What about lines with a trailing "\n", i. e. as they are usually
>> delivered when iterating over a file?
> 
> Then your string *doesn't* end with a "*", but rather with a
> newline. ;-)
> 
> Though according to the OP's specs, the following function would work
> too:
> 
>   def ends_in_asterisk(s):
> return True
> 
> It *does* return True if the line ends in an asterisk (no requirement
> to make the function return False under any other conditions).

If a "line" is defined as a string that ends with a newline

def ends_in_asterisk(line):
return False

would also satisfy the requirement. Lies, damned lies, and specs ;)

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Tim Chase

On 2015-11-02 22:17, Seymore4Head wrote:
> On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
>  wrote:
> 
> >On 2015-11-02 20:09, Seymore4Head wrote:
> >> How do I make a regular expression that returns true if the end
> >> of the line is an asterisk
> >
> >Why use a regular expression?
> >
> Because that is the part of Python I am trying to learn at the
> moment. Thanks

Ah, well that's an entirely different problem-space, so then you
would want to use MRAB's answer

  r = re.compile(r"\*$")

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Grant Edwards

On 2015-11-03, Tim Chase  wrote:
> On 2015-11-02 20:09, Seymore4Head wrote:
>> How do I make a regular expression that returns true if the end of
>> the line is an asterisk
>
> Why use a regular expression?
>
>   if line[-1] == '*':

Why use a negative index and then a compare?

if line.endswith('*'):

If you want to know if a string ends with something, just ask it!

;)

-- 
Grant Edwards   grant.b.edwardsYow! RELATIVES!!
  at   
  gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Jussi Piitulainen

Peter Otten writes:

> If a "line" is defined as a string that ends with a newline
>
> def ends_in_asterisk(line):
> return False
>
> would also satisfy the requirement. Lies, damned lies, and specs ;)

Even if a "line" is defined as a string that comes from reading
something like a file with default options, a line may end in
an asterisk.

>>> [ line.endswith('*') for line in StringIO('rivi*\nrivi*\nrivi*') ]
[False, False, True]
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Joel Goldstick

On Mon, Nov 2, 2015 at 10:17 PM, Seymore4Head 
wrote:

> On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
>  wrote:
>
> >On 2015-11-02 20:09, Seymore4Head wrote:
> >> How do I make a regular expression that returns true if the end of
> >> the line is an asterisk
> >
> >Why use a regular expression?
> >
> >  if line[-1] == '*':
> >yep(line)
> >  else:
> >nope(line)
> >
> >-tkc
> >
> >
> Because that is the part of Python I am trying to learn at the moment.
>

Are we to infer that you were aware of doing the   if line[-1] == '*': ...
, but just wanted to learn how to do the same thing with regex? Or that you
heard about regexes and thought that would be the way to solve your puzzle?

> Thanks
> --
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
Joel Goldstick
http://joelgoldstick.com/stats/birthdays
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Seymore4Head

On Tue, 3 Nov 2015 10:34:12 -0500, Joel Goldstick
 wrote:

>On Mon, Nov 2, 2015 at 10:17 PM, Seymore4Head 
>wrote:
>
>> On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
>>  wrote:
>>
>> >On 2015-11-02 20:09, Seymore4Head wrote:
>> >> How do I make a regular expression that returns true if the end of
>> >> the line is an asterisk
>> >
>> >Why use a regular expression?
>> >
>> >  if line[-1] == '*':
>> >yep(line)
>> >  else:
>> >nope(line)
>> >
>> >-tkc
>> >
>> >
>> Because that is the part of Python I am trying to learn at the moment.
>>
>
>Are we to infer that you were aware of doing the   if line[-1] == '*': ...
>, but just wanted to learn how to do the same thing with regex? Or that you
>heard about regexes and thought that would be the way to solve your puzzle?
>
>> Thanks
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
Yes I knew that -1 represents the end character.  It is not a question
of trying to accomplish anything.  I was just practicing with regex
and wasn't sure how to express a * since it was one of the
instructions.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Chris Angelico

On Wed, Nov 4, 2015 at 3:10 AM, Seymore4Head
 wrote:
> Yes I knew that -1 represents the end character.  It is not a question
> of trying to accomplish anything.  I was just practicing with regex
> and wasn't sure how to express a * since it was one of the
> instructions.

In that case, it's nothing to do with ending a string. What you really
want to know is: How do you match a '*' using a regular expression?
Which is what MRAB answered, courtesy of a working crystal ball: You
use '\*'. Everything about the end of the string is irrelevant. (So,
too, are all the comments about using [-1] or string methods. But we
weren't to know that.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Robin Koch


Am 03.11.2015 um 05:23 schrieb ru...@yahoo.com:


Of course there are people who misuse regexes.


/^1?$|^(11+?)\1+$/

There are? 0:-)

--
Robin Koch
--
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread rurpy--- via Python-list

On 11/03/2015 12:15 AM, Steven D'Aprano wrote:
> On Tue, 3 Nov 2015 03:23 pm, rurpy wrote:
> 
>> Regular expressions should be learned by every programmer or by anyone
>> who wants to use computers as a tool.  They are a fundamental part of
>> computer science and are used in all sorts of matching and searching
>> from compilers down to your work-a-day text editor.
> 
> You are absolutely right.
> 
> If only regular expressions weren't such an overly-terse, cryptic
> mini-language, with all but no debugging capabilities, they would be great.
> 
> If only there wasn't an extensive culture of regular expression abuse within
> programming communities, they would be fine.
> 
> All technologies are open to abuse. But we don't say:
> 
>   Some people, when confronted with a problem, think "I know, I'll use
>   arithmetic." Now they have two problems.
> 
> because abuse of arithmetic is rare. It's hard to misuse it, and while
> arithmetic can be complicated, it's rare for programmers to abuse it. But
> the same cannot be said for regexes -- they are regularly misused, abused,
> and down-right hard to use right even when you have a good reason for using
> them:
> 
> http://www.thedailywtf.com/articles/Irregular_Expression
> 
> http://blog.codinghorror.com/regex-use-vs-regex-abuse/
> 
> http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html

Thanks for pointing out three cases of misuse of regexes out of the
approximately 37500 [*] uses of regexes in the wild. I hope you're
not dumb enough to think that constitutes significant evidence.

Even worse, of the three only one was a real example. One of the others
was machine-generated code, the other was a "look what you can do with
regexes" example, not serious code.

Here is an example of "abusing" python

  https://benkurtovic.com/2014/06/01/obfuscating-hello-world.html

I wouldn't use this as evidence that Python is to be avoided.

> If there is one person who has done more to create a regex culture, it is
> Larry Wall, inventor of Perl. Even Larry Wall says that regexes are
> overused and their syntax is harmful, and he has recreated them for Perl 6:
> 
> http://www.perl.com/pub/2002/06/04/apo5.html

You really should have read beyond the first paragraph. He proposes
fixing regexes by adding even more special character combinations and
making regexes even *more* powerful. (He turned them into full-blown
parsers.)

Nowhere does he advocate not using, or avoiding if possible, regexes
as is the mantra in this list.

Here is Larry's "recreation" that you are touting:

  http://design.perl6.org/S05.html

Please explain to us how you think this "fix" addresses the complaints
you and other Python anti-regexers have about regexes.

I hope you also noted Larry's tongue-in-cheek writing style. Right after
pointing out that some claim Perl is hard to read due largely to regex
syntax, he writes:

  "Funny that other languages have been borrowing Perl's regular
  expressions as fast as they can..."

So I don't think you can claim Larry Wall as a supporter of this list's
anti-regex attitude beyond some superficial verbiage taken out of context.

> Oh, and the icing on the cake, regexes can be a security vulnerability too:
> https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS

And here is a list of CVEs involving Python. There are (at time of
writing) 190 of them.

  http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=python

So if a security vulnerability is reason not to use regexes, we should
all be *running* from Python. I sure you'll point out that most have
been fixed.

But you failed to point out that same is true of regex engines. From
your source:

  "Notice, that not all algorithms are naïve, and actually Regex
  algorithms can be written in an efficient way."

And in fact, again, had you looked beyond a headline that suited your
purpose, you could have tried the "Evil Regexes" noted in that source
and discovered none of them are a DoS in Python.

Even were that not true, normal practice applies: if the input is
untrusted then sanitize it, or mitigate the threat by imposing a timeout,
etc. Not exactly a problem or solution unique to regexes. And common
sense should tell you that since there are a lot of "try a regex" web
sites, this is not a problem without a solution.

And *certainly* not a reason not to use them in the *far* more common
case when they *are* trusted because you are in control of them,

Finally, preemptively, I'll repeat I acknowledge regexs are not the
the optimum solution in every case where they could be used. But they
are very useful when one passes the border of the trivial; and they are
nowhere near as bad as routinely portrayed here.

[*] Yes, I made that number up.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread rurpy--- via Python-list

On Monday, November 2, 2015 at 9:38:24 PM UTC-7, Michael Torrie wrote:
> On 11/02/2015 09:23 PM, rurpy--- via Python-list wrote:
> >> My completely unsolicited advice is that regular expressions shouldn't be
> >> very high on the list of things to learn.  They are very useful, and very
> >> tricky and prone many problems that can and should be learned to be
> >> resolved with much simpler methods.  If you really want to learn regular
> >> expressions, that's great but the problem you posed is not one for which
> >> they are the best solution.  Remember simpler is better than complex.
> > 
> > Regular expressions should be learned by every programmer or by anyone
> > who wants to use computers as a tool.  They are a fundamental part of
> > computer science and are used in all sorts of matching and searching 
> > from compilers down to your work-a-day text editor.
> > 
> > Not knowing how to use them is like an auto mechanic not knowing how to 
> > use a socket wrench.
> 
> Not quite.  Core language concepts like ifs, loops, functions,
> variables, slicing, etc are the socket wrenches of the programmer's
> toolbox.  Regexs are like an electric impact socket wrench.  You can do
> the same work without it, but in many cases it's slower. But you have to
> learn the other hand tools first in order to really use the electric
> driver properly (understanding torques, direction of threads, etc), lest
> you wonder why you're breaking off so many bolts with the torque of the
> impact drive.

I consider regexs more fundemental.  One need not even be a programmer
to use them: consider grep, sed, a zillion editors, database query 
languages, etc.

When there is a mini-language explicitly developed for describing
string patterns, why, except is very simple cases, would one not
take advantage of it?  Beyond trivial operations a regex, although
terse (overly perhaps), is still likely to be more understandable 
more maintainable than bunch of ad-hoc code.  And the relative ease 
of expressing complex patterns means one is more likely to create
more specific patterns, resulting in detecting unexpected input 
earlier than with ad-hoc code. 
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

2015-11-03 Thread Michael Torrie

On 11/03/2015 05:33 PM, rurpy--- via Python-list wrote:
> I consider regexs more fundemental.  One need not even be a programmer
> to use them: consider grep, sed, a zillion editors, database query 
> languages, etc.

Grep can use regular expressions (and I do so with it regularly), but
it's default mode is certainly not regular expressions, and it is still
very powerful.  I've never used regular expressions in a database query
language; until this moment I didn't know any supported such things in
their queries.  Good to know.  How you would index for regular
expressions in queries I don't know.

> When there is a mini-language explicitly developed for describing
> string patterns, why, except is very simple cases, would one not
> take advantage of it?  

Mainly because the programming language itself often can do it just as
cleanly and just as fast (slicing, string methods, etc).  I certainly
programmed for many years without needing regular expressions in my
small projects.  In fact, REs are a bit of a pain to use in, say, C or
C++, requiring a library.  With Python they are much more readily
accessible so I use them much more.

But honestly it wasn't until college when I learned about finite state
automata that I really grasped what regular expressions were and how to
use them.

> Beyond trivial operations a regex, although
> terse (overly perhaps), is still likely to be more understandable 
> more maintainable than bunch of ad-hoc code.  And the relative ease 
> of expressing complex patterns means one is more likely to create
> more specific patterns, resulting in detecting unexpected input 
> earlier than with ad-hoc code. 

Maybe, maybe not.  Using Python string class methods is probably more
clear when such methods are sufficient.

-- 
https://mail.python.org/mailman/listinfo/python-list

1 2 3 4 5 6 7 8 >

1 - 100 of 785 matches

Mail list logo