subject:"Python regex"

Re: I need help with this python regex

2018-04-28 Thread Peter J. Holzer

On 2018-04-27 21:04:49 -0700, Ed Manning wrote:
> Here is the source code.
> 
> 
> import re
> 
> 
> log = open("csg.txt", "r") # Opens a file call session.txt
> regex = re.compile(r'policy id \d+') # search for the policy ID
> regex1 = re.compile(r'log count \d+') # search for the policy ID
> 
> for match in log:
> x = regex.findall(match)
> y = regex1.findall(match)
> 
> q = x + y
> print(q)
>  
> 
> The problem I am having i when it print out ti looks like this
> 
> 
> L'Policy ID 243"|
> []
> []
> []
> []
> []
> []
> []
> []
> {'log count 777,"]
> 
> 
> 
> How so I fix the code sone that it does not print empty []

Print the result only if findall actually found something.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

I need help with this python regex

2018-04-27 Thread Ed Manning

Here is the source code.


import re


log = open("csg.txt", "r") # Opens a file call session.txt
regex = re.compile(r'policy id \d+') # search for the policy ID
regex1 = re.compile(r'log count \d+') # search for the policy ID

for match in log:
x = regex.findall(match)
y = regex1.findall(match)

q = x + y
print(q)
 

The problem I am having i when it print out ti looks like this


L'Policy ID 243"|
[]
[]
[]
[]
[]
[]
[]
[]
{'log count 777,"]



How so I fix the code sone that it does not print empty []






Here so to the test file 


 get policy id 243
name:"csg to wes" (id 243), zone csg -> fwc,action Permit, status "enabled"
10 sources: "206.221.59.229", "206.221.59.246/32", "csg_205.144.151.107/32", 
"csg_205.144.151.177/32", "csg_205.144.151.24/32", "csg_205.144.152.50/32", 
"csg_205.144.153.55/32", "csg_206.221.59.244/32", "csg_206.221.59.250/32", 
"csg_206.221.61.29/32"
19 destinations: "MIP(204.235.119.135)", "MIP(204.235.119.136)", 
"MIP(204.235.119.243)", "MIP(204.235.119.34)", "MIP(204.235.119.39)", 
"MIP(204.235.119.40)", "MIP(204.235.119.41)", "MIP(204.235.119.42)", 
"MIP(204.235.119.43)", "MIP(204.235.119.44)", "MIP(204.235.119.45)", 
"MIP(204.235.119.46)", "MIP(204.235.119.47)", "MIP(204.235.119.50)", 
"MIP(204.235.119.51)", "MIP(204.235.119.52)", "MIP(204.235.119.79)", 
"MIP(204.235.119.82)", "MIP(204.235.119.83)"
1 service: "ANY"
Rules on this VPN policy: 0
nat off, Web filtering : disabled
vpn unknown vpn, policy flag 0001, session backup: on
traffic shaping off, scheduler n/a, serv flag 00
log close, log count 777, alert no, counter yes(79) byte rate(sec/min) 0/0
total octets 0, counter(session/packet/octet) 0/0/79
priority 7, diffserv marking Off
tadapter: state off, gbw/mbw 0/0 policing (no)
No Authentication
No User, User Group or Group expression se
 get policy id 602
name:"ID 36129" (id 602), zone csg -> fwc,action Permit, status "enabled"
src "csg_205.144.151.107/32", dst "MIP(204.235.119.191)", serv "ANY"
Rules on this VPN policy: 0
nat off, Web filtering : disabled
vpn unknown vpn, policy flag 0001, session backup: on
traffic shaping off, scheduler n/a, serv flag 00
log close, log count 0, alert no, counter yes(80) byte rate(sec/min) 0/0
total octets 0, counter(session/packet/octet) 0/0/80
priority 7, diffserv marking Off
tadapter: state off, gbw/mbw 0/0 policing (no)
No Authentication
No User, User Group or Group expression set
csg-vx-fw-n-12:csg-vx-fw-n-01(M)-> get policy id 420
name:"ID 12637" (id 420), zone csg -> fwc,action Permit, status "enabled"
1 source: "csg_204.235.119.78/32"
1 destination: "eg_csg"
6 services: "PING", "tcp_30001-30100", "tcp_6051-6055", "tcp_7041-7091", 
"TELNET", "TRACEROUTE"
Rules on this VPN policy: 0
nat off, Web filtering : disabled
vpn unknown vpn, policy flag 0001, session backup: on
traffic shaping off, scheduler n/a, serv flag 00
log close, log count 0, alert no, counter yes(81) byte rate(sec/min) 0/0
total octets 0, counter(session/packet/octet) 0/0/81
priority 7, diffserv marking Off
tadapter: state off, gbw/mbw 0/0 policing (no)
No Authentication
No User, User Group or Group expression set
-- 
https://mail.python.org/mailman/listinfo/python-list

RE: Python regex pattern from array of hex chars

2018-04-13 Thread Joseph L. Casale

-Original Message-
From: Python-list  On Behalf Of MRAB
Sent: Friday, April 13, 2018 12:05 PM
To: python-list@python.org
Subject: Re: Python regex pattern from array of hex chars

> Use re.escape:
> 
> regex = re.compile('[^{}]+'.format(re.escape(''.join(c for c in
> character_class

Brilliant, thanks!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python regex pattern from array of hex chars

2018-04-13 Thread MRAB


On 2018-04-13 18:28, Joseph L. Casale wrote:

I have an array of hex chars which designate required characters.
and one happens to be \x5C or "\". What foo is required to build the
pattern to exclude all but:

regex = re.compile('[^{}]+'.format(''.join(c for c in character_class)))

I would use that in a re.sub to collapse and replace all but those
in the character_class. Obviously the escape issues break the \x5C
character.


Use re.escape:

regex = re.compile('[^{}]+'.format(re.escape(''.join(c for c in 
character_class

--
https://mail.python.org/mailman/listinfo/python-list

Python regex pattern from array of hex chars

2018-04-13 Thread Joseph L. Casale

I have an array of hex chars which designate required characters.
and one happens to be \x5C or "\". What foo is required to build the
pattern to exclude all but:

regex = re.compile('[^{}]+'.format(''.join(c for c in character_class)))

I would use that in a re.sub to collapse and replace all but those
in the character_class. Obviously the escape issues break the \x5C
character.

Thanks,
jlc
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-16 Thread Marko Rauhamaa

Jussi Piitulainen :

> Michael Torrie writes:
>
>> On 06/15/2016 08:57 AM, Jussi Piitulainen wrote:
>>> Marko Rauhamaa writes:
 And nothing in alister's answer suggests that.
>>> 
>>> Now *I'm* surprised.
>>
>> He simply said, here's a regex that can parse the example string the OP
>> gave us (which maybe looked a bit like HTML, but like you say, may not
>> be), but don't try to use this method to parse actual HTML because it
>> won't work reliably.
>
> Interesting how differently we can read alister's answer. It was only
> two sentences, one of which Marko replaced with "[...]" before adding
> his own one-liner that is still quoted above.
> [...]
>
> That followed the fully quoted original message, and then there was an
> attributed citation from a Bengamin Disraeli, separated as a .sig.
>
> [...]
>
> A surprise calls for an explanation. Or should I say that I felt that
> this particular expression of surprise seemed to me to call for an
> explanation, or in the very least that an explanation would not do much
> harm and might even be considered mildly interesting. And I saw a fully
> adequate explanation: that the question was not about parsing HTML. So I
> said so.

This is so meta.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-15 Thread Jussi Piitulainen

Michael Torrie writes:

> On 06/15/2016 08:57 AM, Jussi Piitulainen wrote:
>> Marko Rauhamaa writes:
>>> And nothing in alister's answer suggests that.
>> 
>> Now *I'm* surprised.
>
> He simply said, here's a regex that can parse the example string the OP
> gave us (which maybe looked a bit like HTML, but like you say, may not
> be), but don't try to use this method to parse actual HTML because it
> won't work reliably.

Interesting how differently we can read alister's answer. It was only
two sentences, one of which Marko replaced with "[...]" before adding
his own one-liner that is still quoted above. Let me quote alister's
response in full here, the way I see it in Gnus:

# don't try to use regex to parse html it wont work reliably
# i am surprised no one has mentioned beautifulsoup yet, which is probably 
# what you require.

That followed the fully quoted original message, and then there was an
attributed citation from a Bengamin Disraeli, separated as a .sig.

Where in alister's original response do you see a regex that can parse
OP's example? I don't see any regex there. (The text where you seem to
me to say that there is one is still quoted above in the normal way.)

Instead of giving any direct answer to the question, alister expresses
surprise at nobody having suggested an HTML parser. (Marko snipped that,
but I've quoted alister's response in full above, so you can check it
without looking up the original messages.)

A surprise calls for an explanation. Or should I say that I felt that
this particular expression of surprise seemed to me to call for an
explanation, or in the very least that an explanation would not do much
harm and might even be considered mildly interesting. And I saw a fully
adequate explanation: that the question was not about parsing HTML. So I
said so.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-15 Thread Michael Torrie

On 06/15/2016 08:57 AM, Jussi Piitulainen wrote:
> Marko Rauhamaa writes:
>> And nothing in alister's answer suggests that.
> 
> Now *I'm* surprised.

He simply said, here's a regex that can parse the example string the OP
gave us (which maybe looked a bit like HTML, but like you say, may not
be), but don't try to use this method to parse actual HTML because it
won't work reliably.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-15 Thread Jussi Piitulainen

alister writes:

> On Wed, 15 Jun 2016 15:55:42 +0300, Jussi Piitulainen wrote:
>
>> alister writes:
>> 
>>> On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote:
>>>
>>>> Hi everyone,
>>>> I am struggling writing a right regex that match what I want:
>>>> 
>>>> Problem Description:
>>>> 
>>>> Given a string like this:
>>>> 
>>>> >>>string = "false_head aaa bbb false_tail \
>>>>  true_head some_text_here ccc ddd eee
>>>>  true_tail"
>>>> 
>>>> I want to match the all the text surrounded by those " ",
>>>> but only if those " " locate **in some distance** behind
>>>> "true_head". That is, I expect to result to be like this:
>>>> 
>>>> >>>import re result = re.findall("the_regex",string) print result
>>>> ["ccc","ddd","eee"]
>>>> 
>>>> How can I write a regex to match that?
>>>> I have try to use the **positive lookbehind assertion** in python
>>>> regex,
>>>> but it does not allowed variable length of lookbehind.
>>>> 
>>>> Thanks in advance,
>>>> Ruan
>>>
>>> don't try to use regex to parse html it wont work reliably i am
>>> surprised no one has mentioned beautifulsoup yet, which is probably
>>> what you require.
>> 
>> Nothing in the question indicates that the data is HTML.
>
> the  tags are a prety good indicator though

I can see how they point that way, but to me that alone seemed pretty
weak.

> even if it is not HTML the same advise stands for XML (the quote
> example would be invalid if it was XML)

It's not valid HTML either, for similar reasons. Or is it? I don't even
want to know.

> if it is neither for these formats but still using a similar tag
> structure then I would say that Reg ex is still unsuitable & the OP
> would need to write a full parser for the format if one does not
> already exist

That depends on details that weren't provided.

I work with a data format that mixes element tags with line-oriented
data records, and having a dedicated parser would be more of a hassle. A
couple of very simple regexen are useful in making sure that start tags
have a valid form and extracting attribute-value pairs from them. I'm
not at all experiencing "two problems" here. Some uses of regex are
good. (And now I may be about to experience the third problem. That
makes me sad.)

Anyway, I think you and another person guessed correctly that the OP is
indeed really considering HTML, and then your suggestion is certainly
helpful.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-15 Thread alister

On Wed, 15 Jun 2016 15:55:42 +0300, Jussi Piitulainen wrote:

> alister writes:
> 
>> On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote:
>>
>>> Hi everyone,
>>> I am struggling writing a right regex that match what I want:
>>> 
>>> Problem Description:
>>> 
>>> Given a string like this:
>>> 
>>> >>>string = "false_head aaa bbb false_tail \
>>>  true_head some_text_here ccc ddd eee
>>>  true_tail"
>>> 
>>> I want to match the all the text surrounded by those " ",
>>> but only if those " " locate **in some distance** behind
>>> "true_head". That is, I expect to result to be like this:
>>> 
>>> >>>import re result = re.findall("the_regex",string) print result
>>> ["ccc","ddd","eee"]
>>> 
>>> How can I write a regex to match that?
>>> I have try to use the **positive lookbehind assertion** in python
>>> regex,
>>> but it does not allowed variable length of lookbehind.
>>> 
>>> Thanks in advance,
>>> Ruan
>>
>> don't try to use regex to parse html it wont work reliably i am
>> surprised no one has mentioned beautifulsoup yet, which is probably
>> what you require.
> 
> Nothing in the question indicates that the data is HTML.

the  tags are a prety good indicator though
even if it is not HTML the same advise stands for XML (the quote example 
would be invalid if it was XML)

if it is neither for these formats but still using a similar tag 
structure then I would say that Reg ex is still unsuitable & the OP would 
need to write a full parser for the format if one does not already exist



-- 
Farewell we call to hearth and hall!
Though wind may blow and rain may fall,
We must away ere break of day
Far over wood and mountain tall.

To Rivendell, where Elves yet dwell
In glades beneath the misty fell,
Through moor and waste we ride in haste,
And whither then we cannot tell.

With foes ahead, behind us dread,
Beneath the sky shall be our bed,
Until at last our toil be passed,
Our journey done, our errand sped.

We must away!  We must away!
We ride before the break of day!
-- J. R. R. Tolkien
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-15 Thread Jussi Piitulainen

Marko Rauhamaa writes:

> Jussi Piitulainen writes:
>
>> alister writes:
>>
>>> On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote:
 Given a string like this:
 
 >>>string = "false_head aaa bbb false_tail \
  true_head some_text_here ccc ddd eee
  true_tail"

 I want to match the all the text surrounded by those " ",
 [...]
>>>
>>> don't try to use regex to parse html it wont work reliably
>>> [...]
>>
>> Nothing in the question indicates that the data is HTML.
>
> And nothing in alister's answer suggests that.

Now *I'm* surprised.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-15 Thread Marko Rauhamaa

Jussi Piitulainen :

> alister writes:
>
>> On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote:
>>> Given a string like this:
>>> 
>>> >>>string = "false_head aaa bbb false_tail \
>>>  true_head some_text_here ccc ddd eee
>>>  true_tail"
>>>
>>> I want to match the all the text surrounded by those " ",
>>> [...]
>>
>> don't try to use regex to parse html it wont work reliably
>> [...]
>
> Nothing in the question indicates that the data is HTML.

And nothing in alister's answer suggests that.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-15 Thread Jussi Piitulainen

alister writes:

> On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote:
>
>> Hi everyone,
>> I am struggling writing a right regex that match what I want:
>> 
>> Problem Description:
>> 
>> Given a string like this:
>> 
>> >>>string = "false_head aaa bbb false_tail \
>>  true_head some_text_here ccc ddd eee
>>  true_tail"
>> 
>> I want to match the all the text surrounded by those " ",
>> but only if those " " locate **in some distance** behind
>> "true_head". That is, I expect to result to be like this:
>> 
>> >>>import re result = re.findall("the_regex",string)
>> >>>print result
>> ["ccc","ddd","eee"]
>> 
>> How can I write a regex to match that?
>> I have try to use the **positive lookbehind assertion** in python regex,
>> but it does not allowed variable length of lookbehind.
>> 
>> Thanks in advance,
>> Ruan
>
> don't try to use regex to parse html it wont work reliably
> i am surprised no one has mentioned beautifulsoup yet, which is probably 
> what you require.

Nothing in the question indicates that the data is HTML.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-15 Thread alister

On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote:

> Hi everyone,
> I am struggling writing a right regex that match what I want:
> 
> Problem Description:
> 
> Given a string like this:
> 
> >>>string = "false_head aaa bbb false_tail \
>  true_head some_text_here ccc ddd eee
>  true_tail"
> 
> I want to match the all the text surrounded by those " ",
> but only if those " " locate **in some distance** behind
> "true_head". That is, I expect to result to be like this:
> 
> >>>import re result = re.findall("the_regex",string)
> >>>print result
> ["ccc","ddd","eee"]
> 
> How can I write a regex to match that?
> I have try to use the **positive lookbehind assertion** in python regex,
> but it does not allowed variable length of lookbehind.
> 
> Thanks in advance,
> Ruan

don't try to use regex to parse html it wont work reliably
i am surprised no one has mentioned beautifulsoup yet, which is probably 
what you require.





-- 
What we anticipate seldom occurs; what we least expect generally happens.
-- Bengamin Disraeli
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-15 Thread Lawrence D’Oliveiro

On Wednesday, June 15, 2016 at 3:28:37 PM UTC+12, Yubin Ruan wrote:

> I want to match the all the text surrounded by those " ",

You are trying to use regex (type 3 grammar) to parse HTML (type 2 grammar) 
?

No can do 
.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-15 Thread Vlastimil Brom

2016-06-15 5:28 GMT+02:00 Yubin Ruan <ablacktsh...@gmail.com>:
> Hi everyone,
> I am struggling writing a right regex that match what I want:
>
> Problem Description:
>
> Given a string like this:
>
> >>>string = "false_head aaa bbb false_tail \
>  true_head some_text_here ccc ddd eee 
> true_tail"
>
> I want to match the all the text surrounded by those " ",
> but only if those " " locate **in some distance** behind "true_head". 
> That is, I expect to result to be like this:
>
> >>>import re
> >>>result = re.findall("the_regex",string)
> >>>print result
> ["ccc","ddd","eee"]
>
> How can I write a regex to match that?
> I have try to use the **positive lookbehind assertion** in python regex,
> but it does not allowed variable length of lookbehind.
>
> Thanks in advance,
> Ruan
> --
> https://mail.python.org/mailman/listinfo/python-list

Hi,
html-like data is generally not very suitable for parsing with regex,
as was explained in the previous answers (especially if comments and
nesting are massively involved).
However, if this suits your data and the usecase, you can use regex
with variable-length lookarounds in a much enhanced "regex" library
for python
https://pypi.python.org/pypi/regex

your pattern might then simply have the form you most likely have
intended, e.g.:
>>> regex.findall(r"(?<=true_head.*)([^<]+)(?=.*true_tail)", "false_head 
>>> aaa bbb false_tail true_head some_text_here ccc 
>>> ddd eee true_tail fff another_false_tail")
['ccc', 'ddd', 'eee']
>>>

If you are accustomed to use regular expressions, I'd certainly
recommend this excellent library (besides unlimited lookarounds, there
are repeated and recursive patterns, many unicode-related
enhancements, powerful character set operations, even fuzzy matching
and much more).

hth,
   vbr
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-14 Thread Jussi Piitulainen

Yubin Ruan writes:

> Hi everyone, 
> I am struggling writing a right regex that match what I want:
>
> Problem Description:
>
> Given a string like this:
>
> >>>string = "false_head aaa bbb false_tail \
>  true_head some_text_here ccc ddd eee 
> true_tail"
>
> I want to match the all the text surrounded by those " ", but
> only if those " " locate **in some distance** behind
> "true_head". That is, I expect to result to be like this:
>
> >>>import re
> >>>result = re.findall("the_regex",string)
> >>>print result
> ["ccc","ddd","eee"]
>
> How can I write a regex to match that?
> I have try to use the **positive lookbehind assertion** in python regex,
> but it does not allowed variable length of lookbehind.

Don't.

Don't even try to do it all in one regex. Keep your regexen simple and
match in two steps.

For example, capture all such elements together with your marker:

re.findall(r'true_head|[^<]+', string)
==>
['aaa', 'bbb',
 'true_head', 'ccc', 'ddd', 'eee']

Then filter the result in the obvious way (not involving any regex any
more, unless needed to recognize the true 'true_head' again). I've kept
the tags at this stage, so a possible 'true_head' won't look like
'true_head' yet.

Another way is to find 'true_head' first (if you can recognize it safely
before also recognizing the elements), and then capture the elements in
the latter half only.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex: variable length of positive lookbehind assertion

2016-06-14 Thread Yubin Ruan

On Wednesday, June 15, 2016 at 12:18:31 PM UTC+8, Lawrence D’Oliveiro wrote:
> On Wednesday, June 15, 2016 at 3:28:37 PM UTC+12, Yubin Ruan wrote:
> 
> > I want to match the all the text surrounded by those " ",
> 
> You are trying to use regex (type 3 grammar) to parse HTML (type 2 grammar) 
> ?
> 
> No can do 
> .


Yes. I think you are correct. Thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list

python regex: variable length of positive lookbehind assertion

2016-06-14 Thread Yubin Ruan

Hi everyone, 
I am struggling writing a right regex that match what I want:

Problem Description:

Given a string like this:

>>>string = "false_head aaa bbb false_tail \
 true_head some_text_here ccc ddd eee 
true_tail"

I want to match the all the text surrounded by those " ",
but only if those " " locate **in some distance** behind "true_head". 
That is, I expect to result to be like this:

>>>import re
>>>result = re.findall("the_regex",string)
>>>print result
["ccc","ddd","eee"]

How can I write a regex to match that?
I have try to use the **positive lookbehind assertion** in python regex,
but it does not allowed variable length of lookbehind.

Thanks in advance,
Ruan
-- 
https://mail.python.org/mailman/listinfo/python-list

python regex dna processing

2016-04-21 Thread Joel Goldstick

>From time to time there are DNA related question posted here.  I came
upon this in the hopes it may be useful to those who do that kind of
software

http://benchling.engineering/dna-regex-search/

-- 
Joel Goldstick
http://joelgoldstick.com/blog
http://cc-baseballstats.info/stats/birthdays
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python regex exercise

2015-04-04 Thread Vincent Davis

On Sat, Apr 4, 2015 at 5:51 PM, Thomas 'PointedEars' Lahn 
pointede...@web.de wrote:

  Do anyone have good links to python regex or other python problems for
  beginners but with solution.
 
  Please mail me.


I recently found this
https://regex101.com/#python


Vincent Davis
720-301-3003
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python regex exercise

2015-04-04 Thread Thomas 'PointedEars' Lahn

Robert Clove wrote:

 Do anyone have good links to python regex or other python problems for
 beginners but with solution.
 
 Please mail me.

http://www.catb.org/~esr/faqs/smart-questions.html#writewell
http://www.catb.org/~esr/faqs/smart-questions.html#prune
http://www.catb.org/~esr/faqs/smart-questions.html#noprivate

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
-- 
https://mail.python.org/mailman/listinfo/python-list

Python regex exercise

2015-03-31 Thread Robert Clove

Hi All,

Do anyone have good links to python regex or other python problems for
beginners but with solution.

Please mail me.

Regards
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python regex negative lookahead assertions problems

2009-11-23 Thread Helmut Jarausch


On 11/22/09 16:05, Helmut Jarausch wrote:

On 11/22/09 14:58, Jelle Smet wrote:

Hi List,

I'm trying to match lines in python using the re module.
The end goal is to have a regex which enables me to skip lines which
have ok and warning in it.
But for some reason I can't get negative lookaheads working, the way
it's explained in http://docs.python.org/library/re.html;.

Consider this example:

Python 2.6.4 (r264:75706, Nov 2 2009, 14:38:03)
[GCC 4.4.1] on linux2
Type help, copyright, credits or license for more information.

import re
line='2009-11-22 12:15:441 lmqkjsfmlqshvquhsudfhqf qlsfh
qsduidfhqlsiufh qlsiuf qldsfhqlsifhqlius dfh warning qlsfj lqshf
lqsuhf lqksjfhqisudfh qiusdfhq iusfh'
re.match('.*(?!warning)',line)

_sre.SRE_Match object at 0xb75b1598

I would expect that this would NOT match as it's a negative lookahead
and warning is in the string.



'.*' eats all of line. Now, when at end of line, there is no 'warning'
anymore, so it matches.
What are you trying to achieve?

If you just want to single out lines with 'ok' or warning in it, why not
just
if re.search('(ok|warning)') : call_skip



Probably you don't want words like 'joke' to match 'ok'.
So, a better regex is

if re.search('\b(ok|warning)\b',line) : SKIP_ME

Helmut.



--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
--
http://mail.python.org/mailman/listinfo/python-list

python regex negative lookahead assertions problems

2009-11-22 Thread Jelle Smet

Hi List,

I'm trying to match lines in python using the re module.
The end goal is to have a regex which enables me to skip lines which have ok 
and warning in it.
But for some reason I can't get negative lookaheads working, the way it's 
explained in http://docs.python.org/library/re.html;.

Consider this example:

Python 2.6.4 (r264:75706, Nov  2 2009, 14:38:03) 
[GCC 4.4.1] on linux2
Type help, copyright, credits or license for more information.
 import re
 line='2009-11-22 12:15:441  lmqkjsfmlqshvquhsudfhqf qlsfh qsduidfhqlsiufh 
 qlsiuf qldsfhqlsifhqlius dfh warning qlsfj lqshf lqsuhf lqksjfhqisudfh 
 qiusdfhq iusfh'
 re.match('.*(?!warning)',line)
_sre.SRE_Match object at 0xb75b1598

I would expect that this would NOT match as it's a negative lookahead and 
warning is in the string.


Thanks,


-- 
Jelle Smet
http://www.smetj.net
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regex negative lookahead assertions problems

2009-11-22 Thread Tim Chase


import re
line='2009-11-22 12:15:441  lmqkjsfmlqshvquhsudfhqf qlsfh qsduidfhqlsiufh 
qlsiuf qldsfhqlsifhqlius dfh warning qlsfj lqshf lqsuhf lqksjfhqisudfh qiusdfhq 
iusfh'
re.match('.*(?!warning)',line)

_sre.SRE_Match object at 0xb75b1598

I would expect that this would NOT match as it's a negative lookahead and 
warning is in the string.


This first finds everything (.*) and then asserts that 
warning doesn't follow it, which is correct in your example. 
You may have to assert that warning doesn't exist at every 
point along the way:


  re.match(r'(?:(?!warning).)*',line)

which will match up-to-but-not-including the warning text.  If 
you don't want it at all, you'd have to also anchor the far end


  re.match(r'^(?:(?!warning).)*$',line)

but in the 2nd case I'd just as soon invert the test:

  if 'warning' not in line:
do_stuff()

-tkc




--
http://mail.python.org/mailman/listinfo/python-list

Re: python regex negative lookahead assertions problems

2009-11-22 Thread Helmut Jarausch


On 11/22/09 14:58, Jelle Smet wrote:

Hi List,

I'm trying to match lines in python using the re module.
The end goal is to have a regex which enables me to skip lines which have ok 
and warning in it.
But for some reason I can't get negative lookaheads working, the way it's explained in 
http://docs.python.org/library/re.html;.

Consider this example:

Python 2.6.4 (r264:75706, Nov  2 2009, 14:38:03)
[GCC 4.4.1] on linux2
Type help, copyright, credits or license for more information.

import re
line='2009-11-22 12:15:441  lmqkjsfmlqshvquhsudfhqf qlsfh qsduidfhqlsiufh 
qlsiuf qldsfhqlsifhqlius dfh warning qlsfj lqshf lqsuhf lqksjfhqisudfh qiusdfhq 
iusfh'
re.match('.*(?!warning)',line)

_sre.SRE_Match object at 0xb75b1598

I would expect that this would NOT match as it's a negative lookahead and 
warning is in the string.



'.*' eats all of line. Now, when at end of line, there is no 'warning' anymore, 
so it matches.
What are you trying to achieve?

If you just want to single out lines with 'ok' or warning in it, why not just
if re.search('(ok|warning)') : call_skip

Helmut.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
--
http://mail.python.org/mailman/listinfo/python-list

Re: python regex negative lookahead assertions problems

2009-11-22 Thread MRAB


Tim Chase wrote:

import re
line='2009-11-22 12:15:441  lmqkjsfmlqshvquhsudfhqf qlsfh 
qsduidfhqlsiufh qlsiuf qldsfhqlsifhqlius dfh warning qlsfj lqshf 
lqsuhf lqksjfhqisudfh qiusdfhq iusfh'

re.match('.*(?!warning)',line)

_sre.SRE_Match object at 0xb75b1598

I would expect that this would NOT match as it's a negative lookahead 
and warning is in the string.


This first finds everything (.*) and then asserts that warning 
doesn't follow it, which is correct in your example. You may have to 
assert that warning doesn't exist at every point along the way:


  re.match(r'(?:(?!warning).)*',line)

which will match up-to-but-not-including the warning text.  If you 
don't want it at all, you'd have to also anchor the far end


  re.match(r'^(?:(?!warning).)*$',line)

but in the 2nd case I'd just as soon invert the test:

  if 'warning' not in line:
do_stuff()


The trick is to think what positive lookahead you'd need if you wanted
check whether 'warning' is present:

'(?=.*warning)'

and then negate it:

'(?!.*warning)'

giving you:

re.match(r'(?!.*warning)', line)
--
http://mail.python.org/mailman/listinfo/python-list

Re: ignore special characters in python regex

2009-07-21 Thread John Machin

On Jul 21, 3:02 pm, Astan Chee astan.c...@al.com.au wrote:
 Hi,
 I'm reading text from a file (per line) and I want to do a regex using
 these lines but I want the regex to ignore any special characters and
 treat them like normal strings.
 Is there a regex function that can do this?

It would help if you were to say

(1) what ignore ... characters means -- pretend they don't exist?
(2) what are special chararacters -- non-alphanumeric?
(3) what treat them like normal strings means
(4) how you expect these special characters to be (a) ignored and (b)
treated like normal strings /at the same time/.

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ignore special characters in python regex

2009-07-21 Thread Astan Chee


I think the re.escape did the trick.
to answer your questions:
By ignore i meant instead of using non-alphanumeric characters that 
have special significance in regular expression (e.g. [|\]) and treat 
them as normal strings (i.e preceded by \), but since I don't know all 
the characters in regular expression that have special significance, I 
don't know which ones to add a '\' infront of.

Thanks anyway

John Machin wrote:

On Jul 21, 3:02 pm, Astan Chee astan.c...@al.com.au wrote:
  

Hi,
I'm reading text from a file (per line) and I want to do a regex using
these lines but I want the regex to ignore any special characters and
treat them like normal strings.
Is there a regex function that can do this?



It would help if you were to say

(1) what ignore ... characters means -- pretend they don't exist?
(2) what are special chararacters -- non-alphanumeric?
(3) what treat them like normal strings means
(4) how you expect these special characters to be (a) ignored and (b)
treated like normal strings /at the same time/.

Cheers,
John
  
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ignore special characters in python regex

2009-07-21 Thread Gabriel Genellina

En Tue, 21 Jul 2009 02:02:57 -0300, Astan Chee astan.c...@al.com.au  
escribió:


I'm reading text from a file (per line) and I want to do a regex using  
these lines but I want the regex to ignore any special characters and  
treat them like normal strings.

Is there a regex function that can do this?
Here is what I have so far:
fp = open('file.txt','r')
notes = fp.readlines()
fp.close()
strin = this is what I want
for note in notes:
 if re.search(r+ str(note) + ,strin):
   print Found  + str(note) +  in  + strin


You don't even need a regex for that.

py fragil in supercalifragilisticexpialidocious
True

Note that: r+ str(note) + 
is the same as: str(note)
which in turn is the same as: note

Remember that each line keeps its '\n' final!

for note in notes:
  if note.rstrip('\n') in strin:
print Found %s in %s % (note, strin)

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

ignore special characters in python regex

2009-07-20 Thread Astan Chee


Hi,
I'm reading text from a file (per line) and I want to do a regex using 
these lines but I want the regex to ignore any special characters and 
treat them like normal strings.

Is there a regex function that can do this?
Here is what I have so far:
fp = open('file.txt','r')
notes = fp.readlines()
fp.close()
strin = this is what I want
for note in notes:
if re.search(r+ str(note) + ,strin):
  print Found  + str(note) +  in  + strin

Thanks for any help
--
http://mail.python.org/mailman/listinfo/python-list

Re: ignore special characters in python regex

2009-07-20 Thread Frank Buss

Astan Chee wrote:

 I'm reading text from a file (per line) and I want to do a regex using 
 these lines but I want the regex to ignore any special characters and 
 treat them like normal strings.
 Is there a regex function that can do this?

Maybe re.escape helps?

-- 
Frank Buss, f...@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
-- 
http://mail.python.org/mailman/listinfo/python-list

Perl / python regex / performance comparison

2009-03-03 Thread Ivan

Hello everyone,

I know this is not a direct python question, forgive me for that, but
maybe some of you will still be able to help me. I've been told that
for my application it would be best to learn a scripting language, so
I looked around and found perl and python to be the nice. Their syntax
and way is not similar, though.
So, I was wondering, could any of you please elaborate on the
following, as to ease my dilemma:

1. Although it is all relatively similar, there are differences
between regexes of these two. Which do you believe is the more
powerful variant (maybe an example) ?

2. They are both interpreted languages, and I can't really be sure how
they measure in speed. In your opinion, for handling large files,
which is better ?
(I'm processing files of numerical data of several hundred mb - let's
say 200mb - how would python handle file of such size ? As compared to
perl ?)

3. This last one is somewhat subjective, but what do you think, in the
future, which will be more useful. Which, in your (humble) opinion
has a future ?

Thank you for all the info you can spare, and expecially grateful for
the time in doing so.
-- Ivan
--
http://mail.python.org/mailman/listinfo/python-list

Perl-python regex-performance comparison

2009-03-03 Thread Ivan

Hello everyone,

I know this is not a direct python question, forgive me for that, but
maybe some of you will still be able to help me. I've been told that
for my application it would be best to learn a scripting language, so
I looked around and found perl and python to be the nice. Their syntax
and way is not similar, though.
So, I was wondering, could any of you please elaborate on the
following, as to ease my dilemma:

1. Although it is all relatively similar, there are differences
between regexes of these two. Which do you believe is the more
powerful variant (maybe an example) ?

2. They are both interpreted languages, and I can't really be sure how
they measure in speed. In your opinion, for handling large files,
which is better ?
(I'm processing files of numerical data of several hundred mb - let's
say 200mb - how would python handle file of such size ? As compared to
perl ?)

3. This last one is somewhat subjective, but what do you think, in the
future, which will be more useful. Which, in your (humble) opinion
has a future ?

Thank you for all the info you can spare, and expecially grateful for
the time in doing so.
-- Ivan
--
http://mail.python.org/mailman/listinfo/python-list

Perl-python regex-performance comparison

2009-03-03 Thread Ivan

Hello everyone,

I know this is not a direct python question, forgive me for that, but
maybe some of you will still be able to help me. I've been told that
for my application it would be best to learn a scripting language, so
I looked around and found perl and python to be the nice. Their syntax
and way is not similar, though.
So, I was wondering, could any of you please elaborate on the
following, as to ease my dilemma:

1. Although it is all relatively similar, there are differences
between regexes of these two. Which do you believe is the more
powerful variant (maybe an example) ?

2. They are both interpreted languages, and I can't really be sure how
they measure in speed. In your opinion, for handling large files,
which is better ?
(I'm processing files of numerical data of several hundred mb - let's
say 200mb - how would python handle file of such size ? As compared to
perl ?)

3. This last one is somewhat subjective, but what do you think, in the
future, which will be more useful. Which, in your (humble) opinion
has a future ?

Thank you for all the info you can spare, and expecially grateful for
the time in doing so.
-- Ivan


p.s. Having some trouble posting this. If you see it come out several
times, please ignore the copies.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Perl-python regex-performance comparison

2009-03-03 Thread Tino Wildenhain


Ivan wrote:

Hello everyone,


...


1. Although it is all relatively similar, there are differences
between regexes of these two. Which do you believe is the more
powerful variant (maybe an example) ?

2. They are both interpreted languages, and I can't really be sure how
they measure in speed. In your opinion, for handling large files,
which is better ?
(I'm processing files of numerical data of several hundred mb - let's
say 200mb - how would python handle file of such size ? As compared to
perl ?)

3. This last one is somewhat subjective, but what do you think, in the
future, which will be more useful. Which, in your (humble) opinion
has a future ?


I guess both languages have their use and future. You should come to
your own conclusion when you work with both languages for a while.
I can only say for myself, I know both and prefer python for its
nice straight forward way. And you don't need the hammer (aka regex)
for everything. Several hundred megabytes is not much, you would
work thru them sequentially, that is with python you would almost
exclusively work with generators.

HTH
Tino


smime.p7s
Description: S/MIME Cryptographic Signature
--
http://mail.python.org/mailman/listinfo/python-list

Perl python - regex performance comparison

2009-03-03 Thread Ivan

Hello everyone,

I know this is not a direct python question, forgive me for that, but
maybe some of you will still be able to help me. I've been told that
for my application it would be best to learn a scripting language, so
I looked around and found perl and python to be the nice. Their syntax
and way is not similar, though.
So, I was wondering, could any of you please elaborate on the
following, as to ease my dilemma:

1. Although it is all relatively similar, there are differences
between regexes of these two. Which do you believe is the more
powerful variant (maybe an example) ?

2. They are both interpreted languages, and I can't really be sure how
they measure in speed. In your opinion, for handling large files,
which is better ?
(I'm processing files of numerical data of several hundred mb - let's
say 200mb - how would python handle file of such size ? As compared to
perl ?)

3. This last one is somewhat subjective, but what do you think, in the
future, which will be more useful. Which, in your (humble) opinion
has a future ?

Thank you for all the info you can spare, and expecially grateful for
the time in doing so.
-- Ivan


p.s. Having some trouble posting this. If you see it come out several
times, please ignore the copies.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Perl-python regex-performance comparison

2009-03-03 Thread Chris Rebert

On Tue, Mar 3, 2009 at 9:05 AM, Ivan i...@invalid.net wrote:
 Hello everyone,

 I know this is not a direct python question, forgive me for that, but
 maybe some of you will still be able to help me. I've been told that
 for my application it would be best to learn a scripting language, so
 I looked around and found perl and python to be the nice. Their syntax
 and way is not similar, though.
 So, I was wondering, could any of you please elaborate on the
 following, as to ease my dilemma:

 1. Although it is all relatively similar, there are differences
 between regexes of these two. Which do you believe is the more
 powerful variant (maybe an example) ?

I would think that either they're currently equal in power or Perl's
are just slightly more powerful, as pretty much all languages have
basically copied Perl's regular expressions almost exactly. If you're
talking about Perl6, then combined with the new rules feature it's
much more powerful; however, like Perl's regular expressions, someone
will probably write a library like pcre that implements it for all
other languages, so it'll probably just be a matter of syntax.

 2. They are both interpreted languages, and I can't really be sure how
 they measure in speed. In your opinion, for handling large files,
 which is better ?
 (I'm processing files of numerical data of several hundred mb - let's
 say 200mb - how would python handle file of such size ? As compared to
 perl ?)

They're probably both about the same. Your problem is probably
IO-bound, so the slowness of either language will likely be made
irrelevant by the comparative slowness of your hard disk. If you
really care about maximum speed here, you should probably write a C
extension for the most intensive part of your algorithm (fairly easy
to do with Python, probably about the same with Perl), though Python
also has Psyco and Cython available as other ways to speed up your
program without resorting to C; don't know if Perl has any similar
tools.

 3. This last one is somewhat subjective, but what do you think, in the
 future, which will be more useful. Which, in your (humble) opinion
 has a future ?

Python is already well on its way to a smooth transition to Python 3.0
(Python's Perl6 equivalent) with few drastic changes to the language;
most of the changes are pretty incremental but just happen to break
backward compatibility; and the resulting language is still quite
clearly identifiable as Python.
Perl, on the other hand, is currently quagmired in a long,
indeterminate wait for the epic and drastically-revised Perl 6 to be
released.
They both have a future, but Python's definitely appears more secure
at this point. It's more wait-and-see with Perl. But take this with a
grain of salt due to the obvious pro-Python bias you'll get on this
list.

Cheers,
Chris

-- 
I have a blog:
http://blog.rebertia.com
--
http://mail.python.org/mailman/listinfo/python-list

Re: Perl python - regex performance comparison

2009-03-03 Thread pruebauno

On Mar 3, 12:38 pm, Ivan ivan@@gmail.com wrote:
 Hello everyone,

 I know this is not a direct python question, forgive me for that, but
 maybe some of you will still be able to help me. I've been told that
 for my application it would be best to learn a scripting language, so
 I looked around and found perl and python to be the nice. Their syntax
 and way is not similar, though.
 So, I was wondering, could any of you please elaborate on the
 following, as to ease my dilemma:

 1. Although it is all relatively similar, there are differences
 between regexes of these two. Which do you believe is the more
 powerful variant (maybe an example) ?

 2. They are both interpreted languages, and I can't really be sure how
 they measure in speed. In your opinion, for handling large files,
 which is better ?
 (I'm processing files of numerical data of several hundred mb - let's
 say 200mb - how would python handle file of such size ? As compared to
 perl ?)

 3. This last one is somewhat subjective, but what do you think, in the
 future, which will be more useful. Which, in your (humble) opinion
 has a future ?

 Thank you for all the info you can spare, and expecially grateful for
 the time in doing so.
 -- Ivan

 p.s. Having some trouble posting this. If you see it come out several
 times, please ignore the copies.

1. They are so similar that, unless they added a lot in Perl lately,
both are about equally powerful. I think the difference had mainly to
do with how certain multiline constructs were handled. But that
doesn't make one more powerful than the other, it just means that you
have to make minor changes to port regexes from one to the other.

2. Speed is pretty similar too, Perl probably has a more optimized
regex engine because regexes are usually used a more in Perl programs.
Python probably has more optimized function calls, string methods,
etc. So it probably depends on the mix of functionality you are using.

3. Both languages have been around for a while and too much
infrastructure depends on them that they are not going to go away.
Perl is extensively used for sysadmin tasks, nagios, etc. and Python
is used in Gentoo, Redhat distributions. Of course COBOL is not dying
quickly either for the same reason which isn't the same than wanting
to program in it.

On this list you will find quite a few that have switched from Perl to
Python because we like the later better, but there are many
programmers that are quite happy with Perl. I myself want to play
around with Perl 6 at some point just for fun. Since a lot of the
reasons to choose one or another seem to be mostly syntax, I would
recommend you to write a couple of short programs in both of them and
see what you like more and use that.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Perl / python regex / performance comparison

2009-03-03 Thread Ciprian Dorin, Craciun

On Tue, Mar 3, 2009 at 7:03 PM, Ivan i...@invalid.com wrote:
 Hello everyone,

 I know this is not a direct python question, forgive me for that, but
 maybe some of you will still be able to help me. I've been told that
 for my application it would be best to learn a scripting language, so
 I looked around and found perl and python to be the nice. Their syntax
 and way is not similar, though.
 So, I was wondering, could any of you please elaborate on the
 following, as to ease my dilemma:

 1. Although it is all relatively similar, there are differences
 between regexes of these two. Which do you believe is the more
 powerful variant (maybe an example) ?

 2. They are both interpreted languages, and I can't really be sure how
 they measure in speed. In your opinion, for handling large files,
 which is better ?
 (I'm processing files of numerical data of several hundred mb - let's
 say 200mb - how would python handle file of such size ? As compared to
 perl ?)

 3. This last one is somewhat subjective, but what do you think, in the
 future, which will be more useful. Which, in your (humble) opinion
 has a future ?

 Thank you for all the info you can spare, and expecially grateful for
 the time in doing so.
 -- Ivan
 --
 http://mail.python.org/mailman/listinfo/python-list

I could answer to your second question (will Python handle large
files). In my case I use Python to create statistics from some trace
files from a genetic algorithm, and my current size is up to 20MB for
about 40 files. I do the following:
* use regular expressions to identify each line type, extract the
information (as numbers);
* either create statistics on the fly, either load the dumped data
into an Sqlite3 database (which got up to a couple of hundred MB);
* everything works fine until now;

I've also used Python (better said an application built in Python
with cElementTree?), that took the Wikipedia XML dumps (7GB? I'm not
sure, but a couple of GB), then created a custom format file, from
which I've tried to create SQL inserts... And everything worked good.
(Of course it took some time to do all the processing).

So my conclusion is that if you try to keep your in-memory data
small, and use the smart (right) solution for the problem you could
use Python without (big) overhead.

Another side-note, I've also used Python (with NumPy) to implement
neural networks (in fact clustering with ART), where I had about 20
thousand training elements (arrays of thousands of elements), and it
worked remarkably good (I would better than in Java, and comparable
with C/C++).

I hope I've helped you,
Ciprian Craciun.

P.S. If you just need one regular expression transformation to
another, or you need regular expression searching, then just use sed
or grep as you would not get anything better than them.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Perl-python regex-performance comparison

2009-03-03 Thread MRAB


Chris Rebert wrote:

On Tue, Mar 3, 2009 at 9:05 AM, Ivan i...@invalid.net wrote:

Hello everyone,

I know this is not a direct python question, forgive me for that, but
maybe some of you will still be able to help me. I've been told that
for my application it would be best to learn a scripting language, so
I looked around and found perl and python to be the nice. Their syntax
and way is not similar, though.
So, I was wondering, could any of you please elaborate on the
following, as to ease my dilemma:

1. Although it is all relatively similar, there are differences
between regexes of these two. Which do you believe is the more
powerful variant (maybe an example) ?


I would think that either they're currently equal in power or Perl's
are just slightly more powerful, as pretty much all languages have
basically copied Perl's regular expressions almost exactly. If you're
talking about Perl6, then combined with the new rules feature it's
much more powerful; however, like Perl's regular expressions, someone
will probably write a library like pcre that implements it for all
other languages, so it'll probably just be a matter of syntax.


Python 2.7's regex will include possessive quantifiers, atomic groups,
variable-length lookbehinds, and Unicode properties (at least the common
ones), amongst other things.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Perl-python regex-performance comparison

2009-03-03 Thread Steve Holden

Ivan wrote:
 Hello everyone,
 
 I know this is not a direct python question, forgive me for that, but
 maybe some of you will still be able to help me. I've been told that
 for my application it would be best to learn a scripting language, so
 I looked around and found perl and python to be the nice. Their syntax
 and way is not similar, though.
 So, I was wondering, could any of you please elaborate on the
 following, as to ease my dilemma:
 
http://isperldeadyet.com/

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

--
http://mail.python.org/mailman/listinfo/python-list

Re: Perl-python regex-performance comparison

2009-03-03 Thread Vlastimil Brom

2009/3/3 MRAB wrote:


 Python 2.7's regex will include possessive quantifiers, atomic groups,
 variable-length lookbehinds, and Unicode properties (at least the common
 ones), amongst other things.
 --
 http://mail.python.org/mailman/listinfo/python-list


Wow, that's excellent news!
Many thanks for all your efforts to enhance the re capabilities in Python!

vbr
--
http://mail.python.org/mailman/listinfo/python-list

Re: Perl / python regex / performance comparison

2009-03-03 Thread Terry Reedy


Ivan wrote:

Hello everyone,

I know this is not a direct python question, forgive me for that, but
maybe some of you will still be able to help me. I've been told that
for my application it would be best to learn a scripting language, so
I looked around and found perl and python to be the nice. Their syntax
and way is not similar, though.
So, I was wondering, could any of you please elaborate on the
following, as to ease my dilemma:


Which way are *you* more comfortable with?  There are people who 
regularly use both, and many who do not.




1. Although it is all relatively similar, there are differences
between regexes of these two. Which do you believe is the more
powerful variant (maybe an example) ?


This is not relevant to your application below.  In any case, the 
differences are in rather esoteric details.


2. They are both interpreted languages, and I can't really be sure how
they measure in speed. In your opinion, for handling large files,
which is better ?
(I'm processing files of numerical data of several hundred mb - let's
say 200mb - how would python handle file of such size ? As compared to
perl ?)


For one file and simple processing, the time difference should be less 
than the time you spent asking the question.  For complex processing or 
multiple files, a Python user might use numpy, scipy, or other 
pre-written analysis extensions.



3. This last one is somewhat subjective, but what do you think, in the
future, which will be more useful. Which, in your (humble) opinion
has a future ?


Python ;-) at least for me.

Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: Perl-python regex-performance comparison

2009-03-03 Thread python


 Python 2.7's regex will include possessive quantifiers, atomic groups,
 variable-length lookbehinds, and Unicode properties (at least the common
 ones), amongst other things.

 Wow, that's excellent news!
 Many thanks for all your efforts to enhance the re capabilities in
 Python!

+1 !!

Regards,
Malcolm
--
http://mail.python.org/mailman/listinfo/python-list

Re: Identifying unicode punctuation characters with Python regex

2008-11-19 Thread jhermann

  P=P.replace('\\','').replace(']','\\]')   # escape both of them.

re.escape() does this w/o any assumptions by your code about the regex
implementation.
--
http://mail.python.org/mailman/listinfo/python-list

Identifying unicode punctuation characters with Python regex

2008-11-14 Thread Shiao

Hello,
I'm trying to build a regex in python to identify punctuation
characters in all the languages. Some regex implementations support an
extended syntax \p{P} that does just that. As far as I know, python re
doesn't. Any idea of a possible alternative?

Apart from manually including the punctuation character range for each
and every language, I don't see how this can be done.

Thank in advance for any suggestions.

John
--
http://mail.python.org/mailman/listinfo/python-list

Re: Identifying unicode punctuation characters with Python regex

2008-11-14 Thread Martin v. Löwis

 I'm trying to build a regex in python to identify punctuation
 characters in all the languages. Some regex implementations support an
 extended syntax \p{P} that does just that. As far as I know, python re
 doesn't. Any idea of a possible alternative?

You should use character classes. You can generate them automatically
from the unicodedata module: check whether unicodedata.category(c)
starts with P.

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Re: Identifying unicode punctuation characters with Python regex

2008-11-14 Thread Shiao

On Nov 14, 11:27 am, Martin v. Löwis [EMAIL PROTECTED] wrote:
  I'm trying to build a regex in python to identify punctuation
  characters in all the languages. Some regex implementations support an
  extended syntax \p{P} that does just that. As far as I know, python re
  doesn't. Any idea of a possible alternative?

 You should use character classes. You can generate them automatically
 from the unicodedata module: check whether unicodedata.category(c)
 starts with P.

 Regards,
 Martin

Thanks Martin. I'll do this.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Identifying unicode punctuation characters with Python regex

2008-11-14 Thread Mark Tolonen



Shiao [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]

Hello,
I'm trying to build a regex in python to identify punctuation
characters in all the languages. Some regex implementations support an
extended syntax \p{P} that does just that. As far as I know, python re
doesn't. Any idea of a possible alternative?

Apart from manually including the punctuation character range for each
and every language, I don't see how this can be done.

Thank in advance for any suggestions.

John


You can always build your own pattern.  Something like (Python 3.0rc2):


import unicodedata
Po=''.join(chr(x) for x in range(65536) if unicodedata.category(chr(x)) == 
'Po')

import re
r=re.compile('['+Po+']')
x='我是美國人。'
x

'我是美國人。'

r.findall(x)

['。']

-Mark

--
http://mail.python.org/mailman/listinfo/python-list

Re: Identifying unicode punctuation characters with Python regex

2008-11-14 Thread Mark Tolonen



Mark Tolonen [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]


Shiao [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]

Hello,
I'm trying to build a regex in python to identify punctuation
characters in all the languages. Some regex implementations support an
extended syntax \p{P} that does just that. As far as I know, python re
doesn't. Any idea of a possible alternative?

Apart from manually including the punctuation character range for each
and every language, I don't see how this can be done.

Thank in advance for any suggestions.

John


You can always build your own pattern.  Something like (Python 3.0rc2):


import unicodedata
Po=''.join(chr(x) for x in range(65536) if unicodedata.category(chr(x)) == 
'Po')

import re
r=re.compile('['+Po+']')
x='我是美國人。'
x

'我是美國人。'

r.findall(x)

['。']

-Mark



This was an interesting problem.  Need to escape \ and ] to find all the 
punctuation correctly, and it turns out those characters are sequential in 
the Unicode character set, so ] was coincidentally escaped in my first 
attempt.


IDLE 3.0rc2

import unicodedata as u
A=''.join(chr(i) for i in range(65536))
P=''.join(chr(i) for i in range(65536) if u.category(chr(i))[0]=='P')
len(A)

65536

len(P)

491
len(re.findall('['+P+']',A)) # ] was naturally 
escaped

490

set(P)-set(re.findall('['+P+']',A)) # so only missing \

{'\\'}

P=P.replace('\\','').replace(']','\\]')   # escape both of them.
len(re.findall('['+P+']',A))

491

-Mark

--
http://mail.python.org/mailman/listinfo/python-list

Re: Identifying unicode punctuation characters with Python regex

2008-11-14 Thread Shiao

On Nov 14, 12:30 pm, Mark Tolonen [EMAIL PROTECTED] wrote:
 Mark Tolonen [EMAIL PROTECTED] wrote in message

 news:[EMAIL PROTECTED]





  Shiao [EMAIL PROTECTED] wrote in message
 news:[EMAIL PROTECTED]
  Hello,
  I'm trying to build a regex in python to identify punctuation
  characters in all the languages. Some regex implementations support an
  extended syntax \p{P} that does just that. As far as I know, python re
  doesn't. Any idea of a possible alternative?

  Apart from manually including the punctuation character range for each
  and every language, I don't see how this can be done.

  Thank in advance for any suggestions.

  John

  You can always build your own pattern.  Something like (Python 3.0rc2):

  import unicodedata
  Po=''.join(chr(x) for x in range(65536) if unicodedata.category(chr(x)) ==
  'Po')
  import re
  r=re.compile('['+Po+']')
  x='我是美國人。'
  x
  '我是美國人。'
  r.findall(x)
  ['。']

  -Mark

 This was an interesting problem.  Need to escape \ and ] to find all the
 punctuation correctly, and it turns out those characters are sequential in
 the Unicode character set, so ] was coincidentally escaped in my first
 attempt.

 IDLE 3.0rc2 import unicodedata as u
  A=''.join(chr(i) for i in range(65536))
  P=''.join(chr(i) for i in range(65536) if u.category(chr(i))[0]=='P')
  len(A)
 65536
  len(P)
 491
  len(re.findall('['+P+']',A)) # ] was naturally
  escaped
 490
  set(P)-set(re.findall('['+P+']',A)) # so only missing \
 {'\\'}
  P=P.replace('\\','').replace(']','\\]')   # escape both of them.
  len(re.findall('['+P+']',A))

 491

 -Mark

Mark,
Many thanks. I feel almost ashamed I got away with it so easily :-)
--
http://mail.python.org/mailman/listinfo/python-list

Python Regex Question

2008-10-29 Thread MalteseUnderdog


Hi there I just started python (but this question isn't that trivial
since I couldn't find it in google :) )

I have the following text file entries (simplified)

start  #frag 1 start
x=Dog # frag 1 end
stop
start# frag 2 start
x=Cat # frag 2 end
stop
start #frag 3 start
x=Dog #frag 3 end
stop


I need a regex expression which returns the start to the x=ANIMAL for
only the x=Dog fragments so all my entries should be start ...
(something here) ... x=Dog .  So I am really interested in fragments 1
and 3 only.

My idea (primitive) ^start.*?x=Dog doesn't work because clearly it
would return results

start
x=Dog  # (good)

and

start
x=Cat
stop
start
x=Dog # bad since I only want start ... x=Dog portion

Can you help me ?

Thanks
JP, Malta.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python Regex Question

2008-10-29 Thread Tim Chase


I need a regex expression which returns the start to the x=ANIMAL for
only the x=Dog fragments so all my entries should be start ...
(something here) ... x=Dog .  So I am really interested in fragments 1
and 3 only.

My idea (primitive) ^start.*?x=Dog doesn't work because clearly it
would return results

start
x=Dog  # (good)

and

start
x=Cat
stop
start
x=Dog # bad since I only want start ... x=Dog portion


Looks like the following does the trick:

 s = start  #frag 1 start
... x=Dog # frag 1 end
... stop
... start# frag 2 start
... x=Cat # frag 2 end
... stop
... start #frag 3 start
... x=Dog #frag 3 end
... stop
 import re
 r = re.compile(r'^start.*\nx=Dog.*\nstop.*', re.MULTILINE)
 for i, result in enumerate(r.findall(s)):
... print i, repr(result)
...
0 'start  #frag 1 start\nx=Dog # frag 1 end\nstop'
1 'start #frag 3 start\nx=Dog #frag 3 end\nstop'

-tkc







--
http://mail.python.org/mailman/listinfo/python-list

Re: Python Regex Question

2008-10-29 Thread Arnaud Delobelle

On Oct 29, 7:01 pm, Tim Chase [EMAIL PROTECTED] wrote:
  I need a regex expression which returns the start to the x=ANIMAL for
  only the x=Dog fragments so all my entries should be start ...
  (something here) ... x=Dog .  So I am really interested in fragments 1
  and 3 only.

  My idea (primitive) ^start.*?x=Dog doesn't work because clearly it
  would return results

  start
  x=Dog  # (good)

  and

  start
  x=Cat
  stop
  start
  x=Dog # bad since I only want start ... x=Dog portion

 Looks like the following does the trick:

   s = start      #frag 1 start
 ... x=Dog # frag 1 end
 ... stop
 ... start    # frag 2 start
 ... x=Cat # frag 2 end
 ... stop
 ... start     #frag 3 start
 ... x=Dog #frag 3 end
 ... stop
   import re
   r = re.compile(r'^start.*\nx=Dog.*\nstop.*', re.MULTILINE)
   for i, result in enumerate(r.findall(s)):
 ...     print i, repr(result)
 ...
 0 'start      #frag 1 start\nx=Dog # frag 1 end\nstop'
 1 'start     #frag 3 start\nx=Dog #frag 3 end\nstop'

 -tkc

This will only work if 'x=Dog' directly follows 'start' (which happens
in the given example).  If that's not necessarily the case, I would do
it in two steps (in fact I wouldn't use regexps probably but...):

 for chunk in re.split(r'\nstop', data):
... m = re.search('^start.*^x=Dog', chunk, re.DOTALL |
re.MULTILINE)
... if m: print repr(m.group())
...
'start  #frag 1 start \nx=Dog'
'start #frag 3 start \nx=Dog'

--
Arnaud

--
http://mail.python.org/mailman/listinfo/python-list

Re: Python Regex Question

2008-10-29 Thread Terry Reedy


MalteseUnderdog wrote:

Hi there I just started python (but this question isn't that trivial
since I couldn't find it in google :) )

I have the following text file entries (simplified)

start  #frag 1 start
x=Dog # frag 1 end
stop
start# frag 2 start
x=Cat # frag 2 end
stop
start #frag 3 start
x=Dog #frag 3 end
stop


I need a regex expression which returns the start to the x=ANIMAL for
only the x=Dog fragments so all my entries should be start ...
(something here) ... x=Dog .  So I am really interested in fragments 1
and 3 only.


As I understand the above
I would first write a generator that separates the file into fragments 
and yields them one at a time.  Perhaps something like


def fragments(ifile):
  frag = []
  for line in ifile:
frag += line
if line ends fragment:
  yield frag
  frag = []

Then I would iterate through fragments, testing for the ones I want:

for frag in fragments(somefile):
  if 'x=Dog' in frag:
do whatever

Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

python regex character group matches

2008-09-17 Thread christopher taylor

hello python-list!

the other day, i was trying to match unicode character sequences that
looked like this:

\\uAD0X...

my issue, is that the pattern i used was returning:

[ '\\uAD0X', '\\u1BF3', ... ]

when i expected:

[ '\\uAD0X\\u1BF3', ]

the code looks something like this:

pat = re.compile((\\\u[0-9A-F]{4})+, re.UNICODE|re.LOCALE)
#print pat.findall(txt_line)
results = pat.finditer(txt_line)

i ran the pattern through a couple of my colleagues and they were all
in agreement that my pattern should have matched correctly.

is this a simple case of a messed up regex or am i not using the regex
api correctly?

cheers,

ct
--
http://mail.python.org/mailman/listinfo/python-list

Re: python regex character group matches

2008-09-17 Thread Marc 'BlackJack' Rintsch

On Wed, 17 Sep 2008 09:27:47 -0400, christopher taylor wrote:

 the other day, i was trying to match unicode character sequences that
 looked like this:
 
 \\uAD0X...

 my issue, is that the pattern i used was returning:
 
 [ '\\uAD0X', '\\u1BF3', ... ]
 
 when i expected:
 
 [ '\\uAD0X\\u1BF3', ]
 
 the code looks something like this:
 
 pat = re.compile((\\\u[0-9A-F]{4})+, re.UNICODE|re.LOCALE) #print
 pat.findall(txt_line)
 results = pat.finditer(txt_line)
  
 i ran the pattern through a couple of my colleagues and they were all in
 agreement that my pattern should have matched correctly.

Correctly for what input?  And the examples above are not matching (no 
pun intended) the regular expression.  `pat` doesn't match '\\uAD0X' 
because there's no 'X' in the character class.  BTW: Are you sure you 
need or want the `re.UNICODE` flag?

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list

Re: python regex character group matches

2008-09-17 Thread Fredrik Lundh


christopher taylor wrote:


my issue, is that the pattern i used was returning:

[ '\\uAD0X', '\\u1BF3', ... ]

when i expected:

[ '\\uAD0X\\u1BF3', ]

the code looks something like this:

pat = re.compile((\\\u[0-9A-F]{4})+, re.UNICODE|re.LOCALE)
#print pat.findall(txt_line)
results = pat.finditer(txt_line)

i ran the pattern through a couple of my colleagues and they were all
in agreement that my pattern should have matched correctly.


First, [0-9A-F] cannot match an X.  Assuming that's a typo, your next 
problem is a precedence issue: (X)+ means one or more (X), not one or 
more X inside parens.  In other words, that pattern matches one or more 
X's and captures the last one.


Assuming that you want to find runs of \u escapes, simply use 
non-capturing parentheses:


   pat = re.compile(u(?:\\\u[0-9A-F]{4}))

and use group(0) instead of group(1) to get the match.

/F

--
http://mail.python.org/mailman/listinfo/python-list

Re: python regex character group matches

2008-09-17 Thread Steven D'Aprano

On Wed, 17 Sep 2008 15:56:31 +0200, Fredrik Lundh wrote:

 Assuming that you want to find runs of \u escapes, simply use
 non-capturing parentheses:
 
 pat = re.compile(u(?:\\\u[0-9A-F]{4}))

Doesn't work for me:

 pat = re.compile(u(?:\\\u[0-9A-F]{4}))
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 
5-7: truncated \u escape


Assuming that the OP is searching byte strings, I came up with this:

 pat = re.compile('(\\\u[0-9A-F]{4})+')
 pat.search('abcd\\u1234\\uAA99\\u0BC4efg').group(0)
'\\u1234\\uAA99\\u0BC4'



-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list

Re: python regex character group matches

2008-09-17 Thread Steven D'Aprano

On Wed, 17 Sep 2008 09:27:47 -0400, christopher taylor wrote:

 hello python-list!
 
 the other day, i was trying to match unicode character sequences that
 looked like this:
 
 \\uAD0X...

It is not clear what this is supposed to be. Is that matching a literal 
pair of backslashes, or a single escaped backslash, or a single unicode 
character with codepoint AD0X, or what?

If I read it *literally*, then you're trying to match:

backslash backslash lowercase-u uppercase-A uppercase-D zero uppercase-X

Is that what you intended to match?


 my issue, is that the pattern i used was returning:
 
 [ '\\uAD0X', '\\u1BF3', ... ]

Unless you are using Python 3, I see that you aren't actually dealing 
with Unicode strings, you're using byte strings. Is that deliberate?


 when i expected:
 
 [ '\\uAD0X\\u1BF3', ]

I make that to be a string of length 12. Is that what you are expecting?

 len('\\uAD0X\\u1BF3')
12


 the code looks something like this:
 
 pat = re.compile((\\\u[0-9A-F]{4})+, re.UNICODE|re.LOCALE) 
 #print pat.findall(txt_line)
 results = pat.finditer(txt_line)

First point: I don't think the UNICODE flag does what you think it does. 
It redefines the meaning of special escape sequences \b etc. Since you 
aren't using any special escape sequences, I'm going to guess that you 
think it turns your search string into Unicode. It doesn't. (Apologies in 
advance if I guessed wrong.) I don't think you need either the UNICODE or 
LOCALE flag for this search: they don't seem to have any effect.

Secondly: you will generally save yourself a lot of trouble when writing 
regexes to use raw strings, because backslashes in the regex engine clash 
with backslashes in Python strings. But there's a gotcha: backslash 
escapes behave differently in ordinary strings and the re engine.

In an ordinary string, the sequence backslash-char is treated as a 
literal backslash-char if it isn't a special escape. So:

 len('\t')  # special escape
1
 len('\u')  # not a special escape
2

But that's not the case in the re engine! As the Fine Manual says:

The special sequences consist of \ and a character from 
the list below. If the ordinary character is not on the 
list, then the resulting RE will match the second character. 
For example, \$ matches the character $.

http://docs.python.org/lib/re-syntax.html

So all of these match the same thing:
re.compile('\\u')
re.compile(r'\u')
re.compile('u')

To match a literal backslash-u, you need to escape the backslash before 
the engine joins it to the u: r'\\u'.

Putting it all together again:

pat = re.compile(r(\\u[0-9A-F]{4})+) 

will probably do what you want, assuming I have guessed what you want 
correctly!



-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list

Re: python regex character group matches

2008-09-17 Thread Fredrik Lundh


Steven D'Aprano wrote:


Assuming that you want to find runs of \u escapes, simply use
non-capturing parentheses:

pat = re.compile(u(?:\\\u[0-9A-F]{4}))


Doesn't work for me:


pat = re.compile(u(?:\\\u[0-9A-F]{4}))


it helps if you cut and paste the right line...  here's a better version:

pat = re.compile(r(?:\\u[0-9A-F]{4})+)

/F

--
http://mail.python.org/mailman/listinfo/python-list

python regex character group matches...group...gotcha

2008-09-17 Thread christopher taylor

My apologies to the respondents - I failed to screen my test cases
before kicking them out to the global python-list. but yes, the 'X'
character in my test case was a mistake on my part. I'll give group()
a shot.

ct
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python regex question

2008-08-15 Thread Tim N. van der Leeuw


Hey Gerhard,


Gerhard Häring wrote:
 
 Tim van der Leeuw wrote:
 Hi,
 
 I'm trying to create a regular expression for matching some particular 
 XML strings. I want to extract the contents of a particular XML tag, 
 only if it follows one tag, but not follows another tag. Complicating 
 this, is that there can be any number of other tags in between. [...]
 
 Sounds like this would be easier to implement using Python's SAX API.
 
 Here's a short example that does something similar to what you want to 
 achieve:
 
 [...]
 

I so far forgot to say a thank you for the suggestion :-)

The sample code as you sent it doesn't do what I need to do, but I did look
at it for creating SAX handler code that does what I want.

It took me a while to implement, as it didn't fit in the parser-engine I had
and I was close to making a release.

But still: thanks!

--Tim

-- 
View this message in context: 
http://www.nabble.com/Python-regex-question-tp17773487p18997385.html
Sent from the Python - python-list mailing list archive at Nabble.com.

--
http://mail.python.org/mailman/listinfo/python-list

Python regex question

2008-06-11 Thread Tim van der Leeuw

Hi,

I'm trying to create a regular expression for matching some particular XML
strings. I want to extract the contents of a particular XML tag, only if it
follows one tag, but not follows another tag. Complicating this, is that
there can be any number of other tags in between.

So basically, my regular expression should have 3 parts:
- first match
- any random text, that should not contain string 'Xds'
- second match

I have a problem figuring out how to do the second part: a random bit of
text, that should _not_ contain the substring 'Xds' ('Xds' being the start
of any tags which should not be in between my first and second match).
Because of the variable length of the overal match, I cannot do this with a
negative look-behind assertion, and a negative look-ahead assertion doesn't
seem to work either.

The regular expression that I have now is:

r'(?s)Xds\w*Policy.*?ref(?Ppol_ref\d+)/ref'

(hopefully without typos)

Here 'Xds\w*Policy' is my first match, and 'ref(?Ppol_ref\d+)/ref'
is my second match.

In this expression, I want to change the generic '.*?', which matches
everything, with something that matches every string that does not include
the substring 'Xds'.

I know that I could capture the text matched by '.*?' and manually check if
it contains that string 'Xds', but that would be very hard to fit into the
rest of the code, for a number of reasons.

Does anyone have an idea how to do this within one regular expression?

Regards,

--Tim
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python regex question

2008-06-11 Thread Gerhard Häring


Tim van der Leeuw wrote:

Hi,

I'm trying to create a regular expression for matching some particular 
XML strings. I want to extract the contents of a particular XML tag, 
only if it follows one tag, but not follows another tag. Complicating 
this, is that there can be any number of other tags in between. [...]


Sounds like this would be easier to implement using Python's SAX API.

Here's a short example that does something similar to what you want to 
achieve:


import xml.sax

test_str = 
xml
ignore/
foo x=1 y=2/
noignore/
foo x=3 y=4/
/xml


class MyHandler(xml.sax.handler.ContentHandler):
def __init__(self):
xml.sax.handler.ContentHandler.__init__(self)
self.ignore_next = False

def startElement(self, name, attrs):
if name == ignore:
self.ignore_next = True
return
elif name == foo:
if not self.ignore_next:
# handle the element you're interested in here
print MY ELEMENT, name, with, dict(attrs)

self.ignore_next = False

xml.sax.parseString(test_str, MyHandler())

In this case, this looks much clearer and easier to understand to me 
than regular expressions.


-- Gerhard

--
http://mail.python.org/mailman/listinfo/python-list

Python regex

2008-03-13 Thread Andrew Rekdal

I hope posting is ok here for this question...

I am attempting to extract the text from a CSS comment using 're' such as...

string = /* CSS comment /*
exp = [^(/*)].*[^(*/)] 

p = re.compile(exp)
q = p.search(string)
r = q.group()

print r

CSS comment

although this works to a degree... I know the within the brackets everything 
is taken literally so the pattern
I am to negating is (/*). ie. includes the parenthesis.

So my question is...

Is there a way to negate a pattern that is more than on character long? eg. 
where rather than saying if forward slash OR astrisk appear..negate.

I would be saying if parenthesis AND asterisk appear in this order... negate


-- Andrew


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python regex

2008-03-13 Thread Andrew Rekdal

made error on last line... read as...

 I would be saying if forward-slash AND asterisk appear in this order... 
 negate


-- 
-- Andrew

Andrew Rekdal @comcast.net nospam wrote in message 
news:[EMAIL PROTECTED]
I hope posting is ok here for this question...

 I am attempting to extract the text from a CSS comment using 're' such 
 as...

 string = /* CSS comment /*
 exp = [^(/*)].*[^(*/)] 

 p = re.compile(exp)
 q = p.search(string)
 r = q.group()

 print r

CSS comment

 although this works to a degree... I know the within the brackets 
 everything is taken literally so the pattern
 I am to negating is (/*). ie. includes the parenthesis.

 So my question is...

 Is there a way to negate a pattern that is more than on character long? 
 eg. where rather than saying if forward slash OR astrisk appear..negate.

 I would be saying if parenthesis AND asterisk appear in this order... 
 negate


 -- Andrew

 


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python regex

2008-03-13 Thread Arnaud Delobelle

On Mar 13, 8:03 pm, Andrew Rekdal nospam@comcast.net wrote:
 I hope posting is ok here for this question...

 I am attempting to extract the text from a CSS comment using 're' such as...

 string = /* CSS comment /*
 exp = [^(/*)].*[^(*/)] 

 p = re.compile(exp)
 q = p.search(string)
 r = q.group()

 print r

 CSS comment

 although this works to a degree... I know the within the brackets everything
 is taken literally so the pattern
 I am to negating is (/*). ie. includes the parenthesis.

 So my question is...

 Is there a way to negate a pattern that is more than on character long? eg.
 where rather than saying if forward slash OR astrisk appear..negate.

 I would be saying if parenthesis AND asterisk appear in this order... negate

 -- Andrew

There would be many ways to do this. One:

 import re
 r = re.compile(r'/\*(.*?)\*/')
 tst = '.a { color: 0xAACC66; /* Fav color */ }'
 m = r.search(tst)
 m.group(1)
' Fav color '


HTH

--
Arnaud

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python regex

2008-03-13 Thread Andrew Rekdal



-- 
-- Andrew

Arnaud Delobelle [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
On Mar 13, 8:03 pm, Andrew Rekdal nospam@comcast.net wrote:
 I hope posting is ok here for this question...

 I am attempting to extract the text from a CSS comment using 're' such 
 as...

 string = /* CSS comment /*
 exp = [^(/*)].*[^(*/)] 

 p = re.compile(exp)
 q = p.search(string)
 r = q.group()

 print r

 CSS comment

 although this works to a degree... I know the within the brackets 
 everything
 is taken literally so the pattern
 I am to negating is (/*). ie. includes the parenthesis.

 So my question is...

 Is there a way to negate a pattern that is more than on character long? 
 eg.
 where rather than saying if forward slash OR astrisk appear..negate.

 I would be saying if parenthesis AND asterisk appear in this order... 
 negate

 -- Andrew

There would be many ways to do this. One:

 import re
 r = re.compile(r'/\*(.*?)\*/')
 tst = '.a { color: 0xAACC66; /* Fav color */ }'
 m = r.search(tst)
 m.group(1)
' Fav color '


HTH

--
Arnaud

Arnaud,

in your expression above..

 r = re.compile(r'/\*(.*?)\*/')

what does the 'r' do?

-- andrew


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python regex

2008-03-13 Thread Arnaud Delobelle

On Mar 13, 8:22 pm, Andrew Rekdal nospam@comcast.net wrote:
[...]
 in your expression above..

  r = re.compile(r'/\*(.*?)\*/')

 what does the 'r' do?

It means the literal is a 'raw string' :

 print 'Hi\nthere!'
Hi
there!
 print r'Hi\nthere!'
Hi\nthere!


If you haven't done so already, I suggest reading the tutorial.  Here
is a link to the relevant section on strings:

http://docs.python.org/tut/node5.html#SECTION00512

--
Arnaud

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python regex

2008-03-13 Thread Adonis Vargas

Andrew Rekdal  wrote:
 I hope posting is ok here for this question...
 
 I am attempting to extract the text from a CSS comment using 're' such as...
 
 string = /* CSS comment /*
 exp = [^(/*)].*[^(*/)] 
 
 p = re.compile(exp)
 q = p.search(string)
 r = q.group()
 
 print r
 
 CSS comment
 
 although this works to a degree... I know the within the brackets everything 
 is taken literally so the pattern
 I am to negating is (/*). ie. includes the parenthesis.
 
 So my question is...
 
 Is there a way to negate a pattern that is more than on character long? eg. 
 where rather than saying if forward slash OR astrisk appear..negate.
 
 I would be saying if parenthesis AND asterisk appear in this order... negate
 
 
 -- Andrew
 
 

Have you looked into this library:

http://cthedot.de/cssutils/

May help you, if you are trying to achieve something. If your doing it 
as an exercise then I can not help you, I avoid regex like the plague 
(but thats just me).

Hope this helps.

Adonis Vargas
-- 
http://mail.python.org/mailman/listinfo/python-list

python regex: misbehaviour with \r (0x0D) as Newline character in Unicode Mode

2008-01-27 Thread Arian Sanusi

Hi,

concerning to unicode, \n, \r and \r\n (0x000A, 0x000D and 
0x000D+0x000A) should be threatened as newline character
at least this is how i understand it: 
(http://en.wikipedia.org/wiki/Newline#Unicode)

obviously, the re module does not care, and on unix, only threatens \n 
as newline char:

  a=re.compile(u^a,re.U|re.M)
  a.search(ubc\ra)
  a.search(ubc\na)
_sre.SRE_Match object at 0xb5908fa8

same thing for $:
  b = re.compile(uc$,re.U|re.M)
  b.search(ubc\r\n)
  b.search(uabc)
_sre.SRE_Match object at 0xb5908f70
  b.search(ubc\nde)
_sre.SRE_Match object at 0xb5908fa8

is this a known bug in the re module? i couldn't find any issues in the 
bug tracker.
Or is this just a user fault and you guys can help me?

arian

p.s.: appears in both python2.4 and 2.5
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regex: misbehaviour with \r (0x0D) as Newline character in Unicode Mode

2008-01-27 Thread Fredrik Lundh

Arian Sanusi wrote:

  concerning to unicode, \n, \r and \r\n (0x000A, 0x000D and
0x000D+0x000A) should be threatened as newline character

the link says that your application should treat them line terminators, 
not that they should all be equal to a new line character.

to split on Unicode line endings, use the splitlines method.  for the 
specific characters you mention, you can also read the file in universal 
mode.

/F

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python/regex question... hope someone can help

2007-12-09 Thread John Machin

On Dec 9, 6:13 pm, charonzen [EMAIL PROTECTED] wrote:
 I have a list of strings.  These strings are previously selected
 bigrams with underscores between them ('and_the', 'nothing_given', and
 so on).  I need to write a regex that will read another text string
 that this list was derived from and replace selections in this text
 string with those from my list.  So in my text string, '... and the...
 ' becomes ' ... and_the...'.   I can't figure out how to manipulate

 re.sub(r'([a-z]*) ([a-z]*)', r'()', textstring)

 Any suggestions?

The usual suggestion is: Don't bother with regexes when simple string
methods will do the job.

 def ch_replace(alist, text):
... for bigram in alist:
... original = bigram.replace('_', ' ')
... text = text.replace(original, bigram)
... return text
...
 print ch_replace(
... ['quick_brown', 'lazy_dogs', 'brown_fox'],
... 'The quick brown fox jumped over the lazy dogs.'
... )
The quick_brown_fox jumped over the lazy_dogs.
 print ch_replace(['red_herring'], 'He prepared herring fillets.')
He prepared_herring fillets.


Another suggestion is to ensure that the job specification is not
overly simplified. How did you parse the text into words in the
prior exercise that produced the list of bigrams? Won't you need to
use the same parsing method in the current exercise of tagging the
bigrams with an underscore?

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python/regex question... hope someone can help

2007-12-09 Thread John Machin

On Dec 9, 6:13 pm, charonzen [EMAIL PROTECTED] wrote:

The following *may* come close to doing what your revised spec
requires:

import re
def ch_replace2(alist, text):
for bigram in alist:
pattern = r'\b' + bigram.replace('_', ' ') + r'\b'
text = re.sub(pattern, bigram, text)
return text

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python/regex question... hope someone can help

2007-12-09 Thread charonzen


 Another suggestion is to ensure that the job specification is not
 overly simplified. How did you parse the text into words in the
 prior exercise that produced the list of bigrams? Won't you need to
 use the same parsing method in the current exercise of tagging the
 bigrams with an underscore?

 Cheers,
 John

Thank you John, that definitely puts things in perspective!  I'm very
new to both Python and text parsing, and I often feel that I can't see
the forest for the trees.  If you're asking, I'm working on a project
that utilizes Church's mutual information score.  I tokenize my text,
split it into a list, derive some unigram and bigram dictionaries, and
then calculate a pmi dictionary based on x,y from the bigrams and
unigrams.  The bigrams that pass my threshold then get put into my
list of x_y strings, and you know the rest.  By modifying the original
text file, I can view 'x_y', z pairs as x,y and iterate it until I
have some collocations that are worth playing with.  So I think that
covers the question the same parsing method.  I'm sure there are more
pythonic ways to do it, but I'm on deadline :)

Thanks again!

Brandon
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python/regex question... hope someone can help

2007-12-09 Thread Gabriel Genellina

En Sun, 09 Dec 2007 16:45:53 -0300, charonzen [EMAIL PROTECTED]  
escribió:

 [John Machin] Another suggestion is to ensure that the job  
 specification is not
 overly simplified. How did you parse the text into words in the
 prior exercise that produced the list of bigrams? Won't you need to
 use the same parsing method in the current exercise of tagging the
 bigrams with an underscore?

 Thank you John, that definitely puts things in perspective!  I'm very
 new to both Python and text parsing, and I often feel that I can't see
 the forest for the trees.  If you're asking, I'm working on a project
 that utilizes Church's mutual information score.  I tokenize my text,
 split it into a list, derive some unigram and bigram dictionaries, and
 then calculate a pmi dictionary based on x,y from the bigrams and
 unigrams.  The bigrams that pass my threshold then get put into my
 list of x_y strings, and you know the rest.  By modifying the original
 text file, I can view 'x_y', z pairs as x,y and iterate it until I
 have some collocations that are worth playing with.  So I think that
 covers the question the same parsing method.  I'm sure there are more
 pythonic ways to do it, but I'm on deadline :)

Looks like you should work with the list of tokens, collapsing consecutive  
elements, not with the original text. Should be easier, and faster because  
you don't regenerate the text and tokenize it again and again.

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list

python/regex question... hope someone can help

2007-12-08 Thread charonzen

I have a list of strings.  These strings are previously selected
bigrams with underscores between them ('and_the', 'nothing_given', and
so on).  I need to write a regex that will read another text string
that this list was derived from and replace selections in this text
string with those from my list.  So in my text string, '... and the...
' becomes ' ... and_the...'.   I can't figure out how to manipulate

re.sub(r'([a-z]*) ([a-z]*)', r'()', textstring)

Any suggestions?

Thank you if you can help!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python Regex Question

2007-09-21 Thread Ivo

crybaby wrote:
 On Sep 20, 4:12 pm, Tobiah [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
 I need to extract the number on each td tags from a html file.
 i.e 49.950 from the following:
 td align=right width=80font size=2 face=New Times
 Roman,Times,Serifnbsp;49.950nbsp;/font/td
 The actual number between: nbsp;49.950nbsp; can be any number of
 digits before decimal and after decimal.
 td align=right width=80font size=2 face=New Times
 Roman,Times,Serifnbsp;##.nbsp;/font/td
 How can I just extract the real/integer number using regex?
 '[0-9]*\.[0-9]*'

 --
 Posted via a free Usenet account fromhttp://www.teranews.com
 
 I am trying to use BeautifulSoup:
 
 soup = BeautifulSoup(page)
 
 td_tags = soup.findAll('td')
 i=0
 for td in td_tags:
 i = i+1
 print td: , td
 # re.search('[0-9]*\.[0-9]*', td)
 price = re.compile('[0-9]*\.[0-9]*').search(td)
 
 I am getting an error:
 
price= re.compile('[0-9]*\.[0-9]*').search(td)
 TypeError: expected string or buffer
 
 Does beautiful soup returns array of objects? If so, how do I pass
 td instance as string to re.search?  What is the different between
 re.search vs re.compile().search?
 

I don't know anything about BeautifulSoup, but to the other questions:

var=re.compile(regexpr) compiles the expression and after that you can 
use var as the reference to that compiled expression (costs less)

re.search(expr, string) compiles and searches every time. This can 
potentially be more expensive in calculating power. especially if you 
have to use the expression a lot of times.

The way you use it it doesn't matter.

do:
pattern = re.compile('[0-9]*\.[0-9]*')
result = pattern.findall(your tekst here)

Now you can reuse pattern.

Cheers,
Ivo.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python Regex Question

2007-09-21 Thread David

 re.search(expr, string) compiles and searches every time. This can
 potentially be more expensive in calculating power. especially if you
 have to use the expression a lot of times.

The re module-level helper functions cache expressions and their
compiled form in a dict. They are only compiled once. The main
overhead would be for repeated dict lookups.

See sre.py (included from re.py) for more details. /usr/lib/python2.4/sre.py
-- 
http://mail.python.org/mailman/listinfo/python-list

Python Regex Question

2007-09-20 Thread joemystery123

I need to extract the number on each td tags from a html file.

i.e 49.950 from the following:

td align=right width=80font size=2 face=New Times
Roman,Times,Serifnbsp;49.950nbsp;/font/td

The actual number between: nbsp;49.950nbsp; can be any number of
digits before decimal and after decimal.

td align=right width=80font size=2 face=New Times
Roman,Times,Serifnbsp;##.nbsp;/font/td

How can I just extract the real/integer number using regex?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python Regex Question

2007-09-20 Thread Tobiah

[EMAIL PROTECTED] wrote:
 I need to extract the number on each td tags from a html file.
 
 i.e 49.950 from the following:
 
 td align=right width=80font size=2 face=New Times
 Roman,Times,Serifnbsp;49.950nbsp;/font/td
 
 The actual number between: nbsp;49.950nbsp; can be any number of
 digits before decimal and after decimal.
 
 td align=right width=80font size=2 face=New Times
 Roman,Times,Serifnbsp;##.nbsp;/font/td
 
 How can I just extract the real/integer number using regex?
 


'[0-9]*\.[0-9]*'

-- 
Posted via a free Usenet account from http://www.teranews.com

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python Regex Question

2007-09-20 Thread Gerardo Herzig

[EMAIL PROTECTED] wrote:

I need to extract the number on each td tags from a html file.

i.e 49.950 from the following:

td align=right width=80font size=2 face=New Times
Roman,Times,Serifnbsp;49.950nbsp;/font/td

The actual number between: nbsp;49.950nbsp; can be any number of
digits before decimal and after decimal.

td align=right width=80font size=2 face=New Times
Roman,Times,Serifnbsp;##.nbsp;/font/td

How can I just extract the real/integer number using regex?

  

If all the td's content has the nbsp;[value_to_extract]nbsp; pattern, 
things goes simplest

[untested]

/td.*nbsp;([^]*)nbsp;/

the parentesis will be used to group() the result (and extract what you 
really want)

Cheers
Gerardo
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python Regex Question

2007-09-20 Thread crybaby

On Sep 20, 4:12 pm, Tobiah [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
  I need to extract the number on each td tags from a html file.

  i.e 49.950 from the following:

  td align=right width=80font size=2 face=New Times
  Roman,Times,Serifnbsp;49.950nbsp;/font/td

  The actual number between: nbsp;49.950nbsp; can be any number of
  digits before decimal and after decimal.

  td align=right width=80font size=2 face=New Times
  Roman,Times,Serifnbsp;##.nbsp;/font/td

  How can I just extract the real/integer number using regex?

 '[0-9]*\.[0-9]*'

 --
 Posted via a free Usenet account fromhttp://www.teranews.com

I am trying to use BeautifulSoup:

soup = BeautifulSoup(page)

td_tags = soup.findAll('td')
i=0
for td in td_tags:
i = i+1
print td: , td
# re.search('[0-9]*\.[0-9]*', td)
price = re.compile('[0-9]*\.[0-9]*').search(td)

I am getting an error:

   price= re.compile('[0-9]*\.[0-9]*').search(td)
TypeError: expected string or buffer

Does beautiful soup returns array of objects? If so, how do I pass
td instance as string to re.search?  What is the different between
re.search vs re.compile().search?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Simple Python REGEX Question

2007-05-12 Thread James T. Dennis

johnny [EMAIL PROTECTED] wrote:
 I need to get the content inside the bracket.

 eg. some characters before bracket (3.12345).
 I need to get whatever inside the (), in this case 3.12345.
 How do you do this with python regular expression?

 I'm going to presume that you mean something like:

I want to extract floating point numerics from parentheses
embedded in other, arbitrary, text.

 Something like:

 given='adfasdfafd(3.14159265)asdfasdfadsfasf'
 import re
 mymatch = re.search(r'\(([0-9.]+)\)', given).groups()[0]
 mymatch
'3.14159265'
 

 Of course, as with any time you're contemplating the use of regular
 expressions, there are lots of questions to consider about the exact
 requirements here.  What if there are more than such pattern?  Do you
 only want the first match per line (or other string)?  (That's all my
 example will give you).  What if there are no matches?  My example
 will raise an AttributeError (since the re.search will return the
 None object rather than a match object; and naturally the None
 object has no .groups()' method.

 The following might work better:

 mymatches = re.findall(r'\(([0-9.]+)\)', given).groups()[0]
 if len(mymatches):
 ...

 ... and, of couse, you might be better with a compiled regexp if
 you're going to repeast the search on many strings:

num_extractor = re.compile(r'\(([0-9.]+)\)')
for line in myfile:
for num in num_extractor(line):
pass
# do whatever with all these numbers


-- 
Jim Dennis,
Starshine: Signed, Sealed, Delivered

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Simple Python REGEX Question

2007-05-11 Thread Gary Herron

johnny wrote:
 I need to get the content inside the bracket.

 eg. some characters before bracket (3.12345).

 I need to get whatever inside the (), in this case 3.12345.

 How do you do this with python regular expression?
   

 import re
 x = re.search([0-9.]+, (3.12345))
 print x.group(0)
3.12345

There's a lot more to the re module, of course.  I'd suggest reading the
manual, but this should get you started.


Gary Herron

-- 
http://mail.python.org/mailman/listinfo/python-list

Simple Python REGEX Question

2007-05-11 Thread johnny

I need to get the content inside the bracket.

eg. some characters before bracket (3.12345).

I need to get whatever inside the (), in this case 3.12345.

How do you do this with python regular expression?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Simple Python REGEX Question

2007-05-11 Thread John Machin

On May 12, 2:21 am, Gary Herron [EMAIL PROTECTED] wrote:
 johnny wrote:
  I need to get the content inside the bracket.

  eg. some characters before bracket (3.12345).

  I need to get whatever inside the (), in this case 3.12345.

  How do you do this with python regular expression?

  import re
  x = re.search([0-9.]+, (3.12345))
  print x.group(0)

 3.12345

 There's a lot more to the re module, of course.  I'd suggest reading the
 manual, but this should get you started.


 s = some chars like 987 before the bracket (3.12345) etc
 x = re.search([0-9.]+, s)
 x.group(0)
'987'

OP sez: I need to get the content inside the bracket
OP sez: I need to get whatever inside the ()

My interpretation:

 for s in ['foo(123)bar', 'foo(123))bar', 'foo()bar', 'foobar']:
... x = re.search(r\([^)]*\), s)
... print repr(x and x.group(0)[1:-1])
...
'123'
'123'
''
None




-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Simple Python REGEX Question

2007-05-11 Thread Steven D'Aprano

On Fri, 11 May 2007 08:54:31 -0700, johnny wrote:

 I need to get the content inside the bracket.
 
 eg. some characters before bracket (3.12345).
 
 I need to get whatever inside the (), in this case 3.12345.
 
 How do you do this with python regular expression?

Why would you bother? If you know your string is a bracketed expression,
all you need is:

s = (3.12345)
contents = s[1:-1] # ignore the first and last characters

If your string is more complex:

s = lots of things here (3.12345) and some more things here

then the task is harder. In general, you can't use regular expressions for
that, you need a proper parser, because brackets can be nested.

But if you don't care about nested brackets, then something like this is
easy:

def get_bracket(s):
p, q = s.find('('), s.find(')')
if p == -1 or q == -1: raise ValueError(Missing bracket)
if p  q: raise ValueError(Close bracket before open bracket)
return s[p+1:q-1]

Or as a one liner with no error checking:

s[s.find('(')+1:s.find(')'-1]


-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: String Pattern Matching: regex and Python regex documentation

2006-09-26 Thread Ilias Lazaridis

Steve Holden wrote:
 Xah Lee wrote:
...
  This project was undertaken as a response to a challenge put forth to
  me with a $100 reward, on 2005-04-12 on comp.lang.python newsgroup. I
  never received the due reward.
 
 Your reading skills must be terrible. You never received the reward
 because it never became due. I offered you $100 if (I believe) five
 regular readers of c.l.py wrote me to say your version was an
 improvement on the original documentation.

 So far (it's now been over a year since your publication, IIRC) not one
 single person has written to me. So while your version of the docs may
 have some merit, it certainly doesn't fulfil the advertised requirements
 for the reward. Which therefore isn't due.

This justification sounds rational.

Possibly this should be published in a seperate topic, asking people to
review the 2 doc's, whilst giving publically(!) their vote.

Finally, all this can contribute to better python documentation (and,
sorry, python doc's  need _really_ a rework).

--
http://lazaridis.com

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: String Pattern Matching: regex and Python regex documentation

2006-09-24 Thread Xah Lee

Xah Lee wrote:
« the Python regex documentation is available at:
 http://xahlee.org/perl-python/python_re-write/lib/module-re.html ...»

Jürgen Exner wrote:
«Yeah, sure, and the Perl regex documentation is available at 'perldoc
perlre'.  So what? Is that anything new or surprising?»

It is of interest and new, because it is a rewrite of Python's
documentation. And it is of interest to Perlers, because Perl and
Python uses the same regex syntax.

The purpose of this rewrite, is to fix Python's lousy documentation,
and to demonstrate a style of technical writing, where precision and
clarity is the prime directive.

It demonstrates a style of documentation, where the philosophy is
task-oriented and uses examples sans misgivings.  (in this aspect, it
is similar to the style of Perl's official documentation.)

Further, the exposition style focuses on the manifestation of the
language elements, as a piece of mathematics, a style often found in
functional language's documentations. It is opposed to, treating the
language as a state machine or compiler engine, which are often
necessarily the approach of imperative languages's documentations.

This project was undertaken as a response to a challenge put forth to
me with a $100 reward, on 2005-04-12 on comp.lang.python newsgroup. I
never received the due reward.

Thanks.

  Xah
  [EMAIL PROTECTED]
∑ http://xahlee.org/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: String Pattern Matching: regex and Python regex documentation

2006-09-24 Thread Steve Holden

Xah Lee wrote:
 Xah Lee wrote:
 « the Python regex documentation is available at:
  http://xahlee.org/perl-python/python_re-write/lib/module-re.html ...»
 
 Jürgen Exner wrote:
 «Yeah, sure, and the Perl regex documentation is available at 'perldoc
 perlre'.  So what? Is that anything new or surprising?»
 
 It is of interest and new, because it is a rewrite of Python's
 documentation. And it is of interest to Perlers, because Perl and
 Python uses the same regex syntax.
 
 The purpose of this rewrite, is to fix Python's lousy documentation,
 and to demonstrate a style of technical writing, where precision and
 clarity is the prime directive.
 
 It demonstrates a style of documentation, where the philosophy is
 task-oriented and uses examples sans misgivings.  (in this aspect, it
 is similar to the style of Perl's official documentation.)
 
 Further, the exposition style focuses on the manifestation of the
 language elements, as a piece of mathematics, a style often found in
 functional language's documentations. It is opposed to, treating the
 language as a state machine or compiler engine, which are often
 necessarily the approach of imperative languages's documentations.
 
 This project was undertaken as a response to a challenge put forth to
 me with a $100 reward, on 2005-04-12 on comp.lang.python newsgroup. I
 never received the due reward.
 
Your reading skills must be terrible. You never received the reward 
because it never became due. I offered you $100 if (I believe) five 
regular readers of c.l.py wrote me to say your version was an 
improvement on the original documentation.

So far (it's now been over a year since your publication, IIRC) not one 
single person has written to me. So while your version of the docs may 
have some merit, it certainly doesn't fulfil the advertised requirements 
for the reward. Which therefore isn't due.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: String Pattern Matching: regex and Python regex documentation

2006-09-22 Thread Ilias Lazaridis

[followup to c.l.py]

Xah Lee wrote:
 the Python regex documentation is available at:
 http://xahlee.org/perl-python/python_re-write/lib/module-re.html
 
 Note that, i've just made the terms of use clear.
 
 Also, can anyone answer what is the precise terms of license of the
 official python documentation? The official python.org doc site is not
 clear.

I would be interested in this information, too.

 Note also, that the regex syntax used by Perl is the same as Python.
 So, this section
  http://xahlee.org/perl-python/python_re-write/lib/re-syntax.html
 which contains clear explanation of regex syntax, will be of interest
 to Perl programers as well.
...

Your tutorial has helped me to write my first regular expression:

http://dev.lazaridis.com/base/changeset/60

your notes about documentation are interesting, too:

http://xahlee.org/perl-python/re-write_notes.html

I have some notes, too:

http://case.lazaridis.com/wiki/Docu

-

I would like to read more on your website, but the usability is 
terrible, mainly due to the missing navigation.

What about an exchange?

I assist you with the navigation. you will just need apache 
server-side-include and one file navigation.html, which will contail 
all of the navigation, very simple.

And you make an real life example for a python regular expression use-case:

i want to scan a text for this line:

[[CustomAttributes(this=4,that=34,name='peter')]]

picking this=4 ...

and add the attributes to an object.

object = addCustomAttributes(text)

(ok, the regex part would be enouth).

.

-- 
http://lazaridis.com
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: String Pattern Matching: regex and Python regex documentation

2006-09-22 Thread Steve Holden

Ilias Lazaridis wrote:
 [followup to c.l.py]
 
 Xah Lee wrote:
 
the Python regex documentation is available at:
http://xahlee.org/perl-python/python_re-write/lib/module-re.html

Note that, i've just made the terms of use clear.

Also, can anyone answer what is the precise terms of license of the
official python documentation? The official python.org doc site is not
clear.
 
 
 I would be interested in this information, too.
 
 
Note also, that the regex syntax used by Perl is the same as Python.
So, this section
 http://xahlee.org/perl-python/python_re-write/lib/re-syntax.html
which contains clear explanation of regex syntax, will be of interest
to Perl programers as well.
 
 
 
 Your tutorial has helped me to write my first regular expression:
 
 http://dev.lazaridis.com/base/changeset/60
 
 your notes about documentation are interesting, too:
 
 http://xahlee.org/perl-python/re-write_notes.html
 
 I have some notes, too:
 
 http://case.lazaridis.com/wiki/Docu
 
 -
 
 I would like to read more on your website, but the usability is 
 terrible, mainly due to the missing navigation.
 
 What about an exchange?
 
 I assist you with the navigation. you will just need apache 
 server-side-include and one file navigation.html, which will contail 
 all of the navigation, very simple.
 
 And you make an real life example for a python regular expression use-case:
 
 i want to scan a text for this line:
 
 [[CustomAttributes(this=4,that=34,name='peter')]]
 
 picking this=4 ...
 
 and add the attributes to an object.
 
 object = addCustomAttributes(text)
 
 (ok, the regex part would be enouth).
 
 ..
 
Ilias Lazardis meets Xah Lee. I just *know* we're in for trouble now ...

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: String Pattern Matching: regex and Python regex documentation

2006-09-22 Thread Paul McGuire

Steve Holden [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
snip

 Ilias Lazardis meets Xah Lee. I just *know* we're in for trouble now ...

 regards
  Steve

A sign of the End Times, perhaps?

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: String Pattern Matching: regex and Python regex documentation

2006-09-22 Thread John Machin


Paul McGuire wrote:
 Steve Holden [EMAIL PROTECTED] wrote in message
 news:[EMAIL PROTECTED]
 snip
 
  Ilias Lazardis meets Xah Lee. I just *know* we're in for trouble now ...
 
  regards
   Steve

 A sign of the End Times, perhaps?
 

Indeed.  Armageddon outa here ;-)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: String Pattern Matching: regex and Python regex documentation

2006-09-22 Thread J�rgen Exner

Ilias Lazaridis wrote:
 Xah Lee wrote:
 the Python regex documentation is available at:
 http://xahlee.org/perl-python/python_re-write/lib/module-re.html

Yeah, sure, and the Perl regex documentation is available at 'perldoc 
perlre'.
So what? Is that anything new or surprising?

jue 


-- 
http://mail.python.org/mailman/listinfo/python-list

String Pattern Matching: regex and Python regex documentation

2006-09-17 Thread Xah Lee

the Python regex documentation is available at:
http://xahlee.org/perl-python/python_re-write/lib/module-re.html

Note that, i've just made the terms of use clear.

Also, can anyone answer what is the precise terms of license of the
official python documentation? The official python.org doc site is not
clear.

Note also, that the regex syntax used by Perl is the same as Python.
So, this section
 http://xahlee.org/perl-python/python_re-write/lib/re-syntax.html
which contains clear explanation of regex syntax, will be of interest
to Perl programers as well.

If you are studying regex, you might also be interested in this lisp
doc:
http://xahlee.org/elisp/Regular-Expressions.html

Also note, that the regex syntax, is one of unix's $free$ fuckup that
has damaged a entire computer industry for decades. ($free$ as drugs
given to children)

For some examples of corrective steps, see:

• Scsh manual, Chapter 6: Pattern-matching strings with regular
expressions
http://www.scsh.net/docu/html/man-Z-H-7.html

• Mathematica Book, section 2.8.4 String Patterns
http://documents.wolfram.com/mathematica/book/section-2.8.4

  Xah
  [EMAIL PROTECTED]
∑ http://xahlee.org/

-- 
http://mail.python.org/mailman/listinfo/python-list

1 2 >

1 - 100 of 134 matches

Mail list logo