Re: [Tutor] Re Module

2018-12-27 Thread Avi Gross
Asad,

After reading replies to you by Alan and Steven I want to ask you if you can
first tell us in normal words what the exact outline of the program does. If
you only want help on one small part, tell us  about that.

I was first fooled into thinking you wanted to show us how you solve the
majority of the entire problem, whatever it was so I wanted to hear things
like I show next.

An example would be to search two files for error matches of various kinds
and report if they contain any matches. Just report True versus False or
something.

Another goal might be to show the first match in some way then quit.

Another might be to do the same search in two files and report ALL the
matches in some format.

After being clear on the goal, you might specify the overall algorithm you
want to use. For example, do you process one file to completion and save
some results then process the other the same way then compare and produce
output? Or do you process both nearly simultaneously in one pass, or perhaps
multiple passes. Do you search for one error type at a time or all at once?
Can there be multiple errors on the same line of the same kind or different
ones? What does error even mean? Is it something like "Fail: 666" versus
"Warn: 42" or something where multiple errors share a part or ...

Once we have some idea of the goal, we could help you see if the approach
seems reasonable even before reading the code. And, when reading the code,
we might see if your implementation  seems to match the plan so perhaps we
can see where you diverge from it perhaps with a mistake.

If I just look at what you provided, you do some of what I asked. You are
not clear on what the two files contain other than they may have an error
that you can identify with a set of patterns. Can you tell us if you are
looking at one line at a time, assuming it is a text file? Your code shows
no evidence of a file at all. Your focus in what you share with us is mainly
on creating a list of compiled search patterns and applying it to one
uninitialized "st" and trying to figure out which one matched. 

You do not show any examples of the pattern but suggest something is
failing. For all we know one of your patterns just matched the presence of a
single common character or even was not formatted properly and failed to be
compiled.

My impression is you are not actually asking about the overall problem. Your
real question may be how to use a regular expression on a string and find
out what matched. If so, that would be the headline, not about two files.
And it may even be your entire approach could change. An example would be to
store your patterns as a text keyword in a dictionary with the value being
the compiled version so when you evaluate a line using the pattern, you know
which one you matched with. I am NOT saying this is a good solution or a
better one. I am asking you to think what you will need and what techniques
might make life easier in doing it.

So besides trying to alter some code based of the feedback, from others,
could you resubmit the question with a focus on what you are doing and what
exactly is not working that you want looked at. Specifics would be useful
including at least one pattern and a line of sample text that should be
matched by the pattern as an example and perhaps one that should not. And
any error messages are vital.

When you do, I am sure Steven and Alan and others might be able to zoom
right in and help you diagnose, if you don't figure it out by yourself first
by being able to see what your goal is and perhaps doing a little debugging.

-Original Message-
From: Tutor  On Behalf Of
Asad
Sent: Thursday, December 27, 2018 10:10 AM
To: tutor@python.org
Subject: [Tutor] Re Module

Hi All ,

  I trying find a solution for my script , I have two files :

file1 - I need a search a error say x if the error matches

Look for the same error x in other file 2

Here is the code :
I have 10 different patterns therefore I used list comprehension and
compiling the pattern so I loop over and find the exact pattern matching

re_comp1 = [re.compile(pattern) for pattern in str1]

for pat in re_comp1:
if pat.search(st,re.IGNORECASE):
x = pat.pattern
print x===> here it gives the expected output it correct
match
print type(x)



if re.search('x', line, re.IGNORECASE) is not None:  ===> Gives a wrong
match
  print line

Instead if I use :

if re.search(x, line, re.IGNORECASE) is not None: then no match occurs
  print line

Please advice where I going wrong or what can be done to make it better .

Thanks,


--
Asad Hasan
+91 9582111698
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscriptio

Re: [Tutor] Re Module

2018-12-27 Thread Steven D'Aprano
On Thu, Dec 27, 2018 at 08:40:12PM +0530, Asad wrote:
> Hi All ,
> 
>   I trying find a solution for my script , I have two files :
> 
> file1 - I need a search a error say x if the error matches
> 
> Look for the same error x in other file 2
> 
> Here is the code :
> I have 10 different patterns therefore I used list comprehension and
> compiling the pattern so I loop over and find the exact pattern matching
> 
> re_comp1 = [re.compile(pattern) for pattern in str1]


You can move the IGNORECASE flag into the call to compile. Also, perhaps 
you can use better names instead of "str1" (one string?).

patterns = [re.compile(pattern, re.IGNORECASE) for pattern in string_patterns]
 
> for pat in re_comp1:
> if pat.search(st,re.IGNORECASE):
> x = pat.pattern
> print x===> here it gives the expected output it correct
> match
> print type(x)
> 

Be careful here: even though you have ten different patterns, only *one* 
will be stored in x. If three patterns match, x will only get the last 
of the three and the others will be ignored.

 
> if re.search('x', line, re.IGNORECASE) is not None:  ===> Gives a wrong match

That's because you are trying to match the literal string "x", so it 
will match anything with the letter "x":

box, text, ax, equinox, except, hexadecimal, fix, Kleenex, sixteen ...


> Instead if I use :
> 
> if re.search(x, line, re.IGNORECASE) is not None: then no match occurs
>   print line

Here you are trying to match the variable called x. That is a very bad 
name for a variable (what does "x" mean?) but it should work.

If no match occurs, it probably means that the value of x doesn't occur 
in the line you are looking at.

Try printing x and line and see if they are what you expect them to be:

print x
print line


-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Re Module

2018-12-27 Thread Alan Gauld via Tutor
On 27/12/2018 15:10, Asad wrote:

> file1 - I need a search a error say x if the error matches
> 
> Look for the same error x in other file 2
> 
> Here is the code :
> I have 10 different patterns therefore I used list comprehension and
> compiling the pattern so I loop over and find the exact pattern matching
> 
> re_comp1 = [re.compile(pattern) for pattern in str1]

I assume str1 is actually a list of strings? You don't
show the definition but since you say it gives the
expected output I'll hope that its correct.

> for pat in re_comp1:
> if pat.search(st,re.IGNORECASE):
> x = pat.pattern
> print x===> here it gives the expected output it correct

I assume st comes from your file1? You don't show us that
bit of code either...

But you do realize that the print only shows the last result.
If there is more than one matching pattern the previous results
get thrown away. And if you only care about one match you
could just use a single regex.
On the other hand, if you do only want the last matching
pattern then what you have works.

> if re.search('x', line, re.IGNORECASE) is not None:  ===> Gives a wrong
> match
>   print line

Notice that you pass the string 'x' into the search.
I assume it is meant to be x? That means you are searching
for the single character 'x' in line. You also don't show
us where line comes from I assume its the other file?

But why do you switch from using the compiled pattern?
Why not just assign x to the pattern object pat? This can
then be used to search line directly and with greater
efficiency.


> if re.search(x, line, re.IGNORECASE) is not None: then no match occurs
>   print line

And are you sure a match should occur?
It would help debug this if you showed us some sample data.
Such as the value of x and the value of line.

Given you are obviously only showing us a selected segment
of your code its hard to be sure. But as written here you
are searching line even if no pattern matches in file1.
That is, you could loop through all your patterns, never
assign anything to x and then go ahead and try to search
for 'x' in line. You should probably check x first.

Also, since you don't show the file looping code we don't
know whether you break out whenever you find a match or
whether the rest of the code is all inside the first
loop over file1. Trying to debug someone else's code
is hard enough. When we only have half the code we are
reduced to guesswork.

Finally, do you get any error messages? If so, please
post them in their entirety. Based on your code I'm
assuming you are working on Python v2.? but its always
worth posting the python version and OS.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Re Module

2018-12-27 Thread Asad
Hi All ,

  I trying find a solution for my script , I have two files :

file1 - I need a search a error say x if the error matches

Look for the same error x in other file 2

Here is the code :
I have 10 different patterns therefore I used list comprehension and
compiling the pattern so I loop over and find the exact pattern matching

re_comp1 = [re.compile(pattern) for pattern in str1]

for pat in re_comp1:
if pat.search(st,re.IGNORECASE):
x = pat.pattern
print x===> here it gives the expected output it correct
match
print type(x)



if re.search('x', line, re.IGNORECASE) is not None:  ===> Gives a wrong
match
  print line

Instead if I use :

if re.search(x, line, re.IGNORECASE) is not None: then no match occurs
  print line

Please advice where I going wrong or what can be done to make it better .

Thanks,


-- 
Asad Hasan
+91 9582111698
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-19 Thread Sunil Tech
Hey thanks Danny Yoo, Chris “Kwpolska” Warrick, D.V.N Sarma
​.

I will take all your inputs.

Thanks a lot.​


On Fri, Aug 15, 2014 at 3:32 AM, Danny Yoo  wrote:

> On Thu, Aug 14, 2014 at 8:39 AM, D.V.N.Sarma డి.వి.ఎన్.శర్మ
>  wrote:
> > I tested it on IDLE. It works.
>
>
> Hi Sarma,
>
>
> Following up on this one.  I'm pretty sure that:
>
> print re.search("
> is going to print something, but it almost certainly will not do what
> Sunil wants.  See:
>
> https://docs.python.org/2/howto/regex.html#greedy-versus-non-greedy
>
> for why.
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-14 Thread Danny Yoo
Hi Sunil,

Don't use regular expressions for this task.  Use something that knows
about HTML structure.  As others have noted, the Beautiful Soup or
lxml libraries are probably a much better choice here.

There are good reasons to avoid regexp for the task you're trying to
do.  For example, your regular expression:

 ">> import re
>>> m = re.match("'(.*)'", "'quoted' text, but note how it's greedy!")
>>> m.group(1)
"quoted' text, but note how it"
##

and note how the match doesn't limited itself to "quoted", but goes as
far as it can.

This shows at least one of the problems that you're going to run into.
Fixing this so it doesn't grab so much is doable, of course.  But
there are other issues, all of which are little headaches upon
headaches.  (e.g. Attribute vlaues may be single or double quoted, may
use HTML entity references, etc.)

So don't try to parse HTML by hand.  Let a library do it for you.  For
example with Beautiful Soup:

http://www.crummy.com/software/BeautifulSoup/bs4/doc/

the code should be as straightforward as:

###
from bs4 import BeautifulSoup
soup = BeautifulSoup(stmt)
for span in soup.find_all('span'):
print span.get('style')
###

where you deal with the _structure_ of your document, rather than at
the low-level individual characters of that document.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-14 Thread Danny Yoo
On Thu, Aug 14, 2014 at 8:39 AM, D.V.N.Sarma డి.వి.ఎన్.శర్మ
 wrote:
> I tested it on IDLE. It works.


Hi Sarma,


Following up on this one.  I'm pretty sure that:

print re.search("https://docs.python.org/2/howto/regex.html#greedy-versus-non-greedy

for why.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-14 Thread Albert-Jan Roskam

-
On Thu, Aug 14, 2014 4:07 PM CEST Chris “Kwpolska” Warrick wrote:

>On 14 Aug 2014 15:58 "Sunil Tech"  wrote:
>>
>> Hi,
>>
>> I have string like
>> stmt = 'Patient name: Upadhyay Shyamstyle="font-family: times new roman,times;">  Date of
>birth:   08/08/1988 Issue(s) to be
>analyzed:  tesstyle="font-size: 11pt;">Nurse Clinical summary:  test1style="font-family: times new roman,times;"> Date of
>injury:   12/14/2013Diagnoses:   723.4 - 300.02 - 298.3
>- 780.50 - 724.4 Brachial neuritis or radiculitis nos - Generalized
>anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance
>- Thoracic or lumbosacral neuritis or radiculitis, unspecified/>Requester
>name:   Demo Spltycdtesttstyle="font-family: times new roman,times;">Phone #:   (213)
>480-9000Medical records reviewed __ pages of medical and
>administrative records were reviewed including:Criteria
>used in analysis  Reviewer comments />DeterminationBased on the clinical information submitted for this
>review and using the evidence-based, peer-reviewed guidelines referenced
>above, this request is Peer Reviewer
>Name/Credentials  Solis, Test, PhDstyle="font-family: times new roman,times;">Internal Medicine/> />Attestation/>Contact Information roman,times\' size=\'3\'>Peer to Peer contact attempt 1: 08/13/2014 02:46
>PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did
>Not Change Determination'
>>
>>
>> i am trying to find the various font sizes and font face from this string.
>>
>> i tried
>>
>> print re.search(">
>>
>> Thank you.
>>
>>
>>
>>
>> ___
>> Tutor maillist  -  Tutor@python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>Don't use regular expressions for HTML. Use lxml instead.
>
>Also, why would you need that exact thing? It's useless. Also, this code is
>very ugly, with too many s and — worse — s which should not be
>used at all.

Why lxml and not bs? I read that bs deals better with malformed html. You said 
the above html is messy, which is not necessarily the same as malformed, but.. 
Anyway, this reference also seems to favor lxml: 
http://stackoverflow.com/questions/4967103/beautifulsoup-and-lxml-html-what-to-prefer
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-14 Thread D . V . N . Sarma డి . వి . ఎన్ . శర్మ
I tested it on IDLE. It works.

regards,
Sarma.


On Thu, Aug 14, 2014 at 7:37 PM, Chris “Kwpolska” Warrick <
kwpol...@gmail.com> wrote:

>
> On 14 Aug 2014 15:58 "Sunil Tech"  wrote:
> >
> > Hi,
> >
> > I have string like
> > stmt = 'Patient name: Upadhyay Shyam style="font-family: times new roman,times;">  Date of
> birth:   08/08/1988 Issue(s) to be
> analyzed:  tes style="font-size: 11pt;">Nurse Clinical summary:  test1 style="font-family: times new roman,times;"> Date of
> injury:   12/14/2013Diagnoses:   723.4 - 300.02 - 298.3
> - 780.50 - 724.4 Brachial neuritis or radiculitis nos - Generalized
> anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance
> - Thoracic or lumbosacral neuritis or radiculitis, unspecified />Requester
> name:   Demo Spltycdtestt style="font-family: times new roman,times;">Phone #:   (213)
> 480-9000Medical records reviewed __ pages of medical and
> administrative records were reviewed including:Criteria
> used in analysis  Reviewer comments  />DeterminationBased on the clinical information submitted for this
> review and using the evidence-based, peer-reviewed guidelines referenced
> above, this request is Peer Reviewer
> Name/Credentials  Solis, Test, PhD style="font-family: times new roman,times;">Internal Medicine />  />Attestation />Contact Information  roman,times\' size=\'3\'>Peer to Peer contact attempt 1: 08/13/2014 02:46
> PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did
> Not Change Determination'
> >
> >
> > i am trying to find the various font sizes and font face from this
> string.
> >
> > i tried
> >
> > print re.search(" >
> >
> > Thank you.
> >
> >
> >
> >
> > ___
> > Tutor maillist  -  Tutor@python.org
> > To unsubscribe or change subscription options:
> > https://mail.python.org/mailman/listinfo/tutor
> >
> Don't use regular expressions for HTML. Use lxml instead.
>
> Also, why would you need that exact thing? It's useless. Also, this code
> is very ugly, with too many s and — worse — s which should not
> be used at all.
>
> --
> Chris “Kwpolska” Warrick 
> Sent from my SGS3.
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module

2014-08-14 Thread Chris “Kwpolska” Warrick
On 14 Aug 2014 15:58 "Sunil Tech"  wrote:
>
> Hi,
>
> I have string like
> stmt = 'Patient name: Upadhyay Shyam  Date of
birth:   08/08/1988 Issue(s) to be
analyzed:  tesNurse Clinical summary:  test1 Date of
injury:   12/14/2013Diagnoses:   723.4 - 300.02 - 298.3
- 780.50 - 724.4 Brachial neuritis or radiculitis nos - Generalized
anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance
- Thoracic or lumbosacral neuritis or radiculitis, unspecifiedRequester
name:   Demo SpltycdtesttPhone #:   (213)
480-9000Medical records reviewed __ pages of medical and
administrative records were reviewed including:Criteria
used in analysis  Reviewer comments DeterminationBased on the clinical information submitted for this
review and using the evidence-based, peer-reviewed guidelines referenced
above, this request is Peer Reviewer
Name/Credentials  Solis, Test, PhDInternal Medicine AttestationContact Information Peer to Peer contact attempt 1: 08/13/2014 02:46
PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did
Not Change Determination'
>
>
> i am trying to find the various font sizes and font face from this string.
>
> i tried
>
> print re.search("
>
> Thank you.
>
>
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
Don't use regular expressions for HTML. Use lxml instead.

Also, why would you need that exact thing? It's useless. Also, this code is
very ugly, with too many s and — worse — s which should not be
used at all.

-- 
Chris “Kwpolska” Warrick 
Sent from my SGS3.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] re module

2014-08-14 Thread Sunil Tech
Hi,

I have string like
stmt = 'Patient name: Upadhyay Shyam  Date of
birth:   08/08/1988 Issue(s) to be
analyzed:  tesNurse Clinical summary:  test1 Date of
injury:   12/14/2013Diagnoses:   723.4 - 300.02 - 298.3
- 780.50 - 724.4 Brachial neuritis or radiculitis nos - Generalized
anxiety disorder - Acute paranoid reaction - Unspecified sleep disturbance
- Thoracic or lumbosacral neuritis or radiculitis, unspecifiedRequester
name:   Demo SpltycdtesttPhone #:   (213)
480-9000Medical records reviewed __ pages of medical and
administrative records were reviewed including:Criteria
used in analysis  Reviewer comments DeterminationBased on the clinical information submitted for this
review and using the evidence-based, peer-reviewed guidelines referenced
above, this request is Peer Reviewer
Name/Credentials  Solis, Test, PhDInternal Medicine AttestationContact Information Peer to Peer contact attempt 1: 08/13/2014 02:46
PM, Central, Incoming Call, Successful, No Contact Made, Peer Contact Did
Not Change Determination'


i am trying to find the various font sizes and font face from this string.

i tried

print re.search("___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module- puzzling results when matching money

2013-08-04 Thread Alan Gauld

On 04/08/13 08:45, Alex Kleider wrote:


sorry, my bad. I forgot to delete that backslash, I meant
re.findall(r"\be\b", "d e f"). Same with the other example.


..but the interesting thing is that the presence or absence of the
spurious back slashes seems not to change the results.



It wouldn't because the backslash says treat the next character as a 
literal and if its not a metacharacter its already treated as a literal.

So the \ is effectively a non-operation in that context.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module- puzzling results when matching money

2013-08-04 Thread Dominik George
Hi,

not quite. The moral is to learn about greedy and non-greedy matching ;)!

-nik



Alex Kleider  schrieb:
>On 2013-08-03 13:38, Dominik George wrote:
>> Hi,
>> 
>>  b is defined as all non-word characters, so it is the complement oft
>> w. w is [A-Za-z0-9_-], so b includes $ and thus cuts off your 
>> group.
>> 
>>  -nik
>
>I get it now.  I was using it before the '$' to define the beginning of
>
>a word but I think things are failing because it detects an end of
>word.
>Anyway, the moral is not to use it with anything but \w!
>
>Thanks!

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module- puzzling results when matching money

2013-08-03 Thread Dominik George
Hi,

\b is defined as all non-word characters, so it is the complement oft \w. \w is 
[A-Za-z0-9_-], so \b includes \$ and thus cuts off your  group.

-nik



Alex Kleider  schrieb:
>#!/usr/bin/env python
>
>"""
>I've been puzzling over the re module and have a couple of questions
>regarding the behaviour of this script.
>
>I've provided two possible patterns (re_US_money):
>the one surrounded by the 'word boundary' meta sequence seems not to
>work
>while the other one does. I can't understand why the addition of the
>word
>boundary defeats the match.
>
>I also don't understand why the split method includes the matched text.
>Splitting only works as I would have expected if no goupings are used.
>
>If I've set this up as intended, the full body of this e-mail should be
>executable as a script.
>
>Comments appreciated.
>alex kleider
>"""
>
># file :  tutor.py (Python 2.7, NOT Python 3)
>print 'Running "tutor.py" on an Ubuntu Linux machine. *'
>
>import re
>
>target = \
>"""Cost is $4.50. With a $.30 discount:
>Price is $4.15.
>The price could be less, say $4 or $4.
>Let's see how this plays out:  $4.50.60
>"""
>
># Choose one of the following two alternatives:
>re_US_money =\
>r"((?P\$)(?P\d{0,})(?:\.(?P\d{2})){0,1})"
># The above provides matches.
># The following does NOT.
># re_US_money =\
># r"\b((?P\$)(?P\d{0,})(?:\.(?P\d{2})){0,1})\b"
>
>pat_object = re.compile(re_US_money)
>match_object = pat_object.search(target)
>if match_object:
> print "'match_object.group()' and 'match_object.span()' yield:"
> print match_object.group(), match_object.span()
> print
>else:
> print "NO MATCH FOUND!!!"
>print
>print "Now will use 'finditer()':"
>
>print
>iterator = pat_object.finditer(target)
>i = 1
>for iter in iterator:
> print
> print "iter #%d: "%(i, ),
> print iter.group()
> print "'groups()' yields: '%s'."%(iter.groups(), )
> print iter.span()
> i += 1
> sign = iter.group("sign")
> dollars = iter.group("dollars")
> cents = iter.group("cents")
> print sign,
> print "  ",
> if dollars:
> print dollars,
> else:
> print "00",
> print "  ",
> if cents:
> print cents,
> else:
> print "00",
>
>print
>
>t = target
>sub_target = pat_object.sub("", t)
>print
>print "Printing substitution: "
>print sub_target
>split_target = pat_object.split(target)
>print "Result of splitting on the target: "
>print split_target
>
># End of script.
>___
>Tutor maillist  -  Tutor@python.org
>To unsubscribe or change subscription options:
>http://mail.python.org/mailman/listinfo/tutor

--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module help

2012-01-09 Thread bodsda
You could use read directly on the popen call to negate having to write to a 
file

output = os.popen(“sdptool -i hci0 search OPUSH“).read()

Bodsda
Sent from my BlackBerry® wireless device

-Original Message-
From: Ganesh Kumar 
Sender: tutor-bounces+bodsda=googlemail@python.org
Date: Mon, 9 Jan 2012 14:47:46 
To: 
Subject: [Tutor] re module help

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] re module help

2012-01-09 Thread Ganesh Kumar
Hi Gurus,

I have created regular expression with os modules, I have created file
sdptool to match the regular expression pattern, will print the result.
I want without creating file how to get required output, I tried but i
didn't get output correctly, over stream.

#! /usr/bin/python
import os,re

def scan():

cmd = "sdptool -i hci0 search OPUSH > sdptool"
fp = os.popen(cmd)

results = []
l = open("sdptool").read()


pattern = r"^Searching for OPUSH on (\w\w(:\w\w)+).*?Channel: (\d+)"
r = re.compile(pattern, flags=re.MULTILINE|re.DOTALL)
while True:
for match in r.finditer(l):
g  = match.groups()

results.append((g[0],'phone',g[2]))
return results

## output [('00:15:83:3D:0A:57', 'phone', '1')]


http://dpaste.com/684335/
please guide me. with out file creating, to archive required output.


Did I learn something today? If not, I wasted it.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-04 Thread Karim


By the way with your helper function algorithm Steven and Peter comments 
you made me think of this change:


karim@Requiem4Dream:~$ echo 'prima " "' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'

prima \" \"
karim@Requiem4Dream:~$ echo 'prima ""' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'

prima \"\"
karim@Requiem4Dream:~$ echo 'prima "Ich Karim"' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'

prima \"Ich Karim\"
karim@Requiem4Dream:~$ echo 'prima "Ich Karim"' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'

prima \"Ich Karim\"

Regards
Karim


On 02/04/2011 08:07 PM, Karim wrote:

On 02/04/2011 02:36 AM, Steven D'Aprano wrote:

Karim wrote:


*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know 
how to fix it yet.


A man when to a doctor and said, "Doctor, every time I do this, it 
hurts. What should I do?"


The doctor replied, "Then stop doing that!"

:)


Yes this these words made me laugh. I will keep it in my funny box.




Don't add bold or any other formatting to things which should be 
program code. Even if it looks okay in *your* program, you don't know 
how it will look in other people's programs. If you need to draw 
attention to something in a line of code, add a comment, or talk 
about it in the surrounding text.



[...]
That is not the thing I want. I want to escape any " which are not 
already escaped.
The sed regex  '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I 
have made regex on unix since 15 years).


Mainly sed, awk and perl sometimes grep and egrep. I know this is the 
jungle.


Which regex? Perl regexes? sed or awk regexes? Extended regexes? GNU 
posix compliant regexes? grep or egrep regexes? They're all different.


In any case, I am sorry, I don't think your regex does what you say. 
When I try it, it doesn't work for me.


[steve@sylar ~]$ echo 'Some \"text"' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
Some \\"text\"


I give you my word on this. Exact output I redid it:

#MY OS VERSION
karim@Requiem4Dream:~$ uname -a
Linux Requiem4Dream 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 
23:42:43 UTC 2011 x86_64 GNU/Linux

#MY SED VERSION
karim@Requiem4Dream:~$ sed --version
GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR 
PURPOSE,

to the extent permitted by law.

GNU sed home page: .
General help using GNU software: .
E-mail bug reports to: .
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
#MY SED OUTPUT COMMAND:
karim@Requiem4Dream:~$  echo 'Some ""' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
Some \"\"
# THIS IS WHAT I WANT 2 CONSECUTIVES IF THE FIRST ONE IS ALREADY 
ESCAPED I DON'T WANT TO ESCAPED IT TWICE.

karim@Requiem4Dream:~$ echo 'Some \""' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
Some \"\"
# BY THE WAY THIS ONE WORKS:
karim@Requiem4Dream:~$ echo 'Some "text"' | sed -e 
's/\([^\\]\)\?"/\1\\"/g'

Some \"text\"
# BUT SURE NOT THIS ONE NOT COVERED BY MY REGEX (I KNOW IT AND WANT 
ORIGINALY TO COVER IT):
karim@Requiem4Dream:~$ echo 'Some \"text"' | sed -e 
's/\([^\\]\)\?"/\1\\"/g'

Some \\"text\"

By the way in all sed version I work with the '?'  (0 or one match) 
should be escaped that's the reason I have '\?' same thing with save 
'\(' and '\)' to store value. In perl, grep you don't need to escape.


# SAMPLE FROM http://www.gnu.org/software/sed/manual/sed.html

|\+|
same As |*|, but matches one or more. It is a GNU extension.
|\?|
same As |*|, but only matches zero or one. It is a GNU extension


I wouldn't expect it to work. See below.

By the way, you don't need to escape the brackets or the question mark:

[steve@sylar ~]$ echo 'Some \"text"' | sed -re 's/([^\\])?"/\1\\"/g'
Some \\"text\"



For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'


No it is not.



Yes I know, see my latest post in detail I already found the solution. 
I put it again the solution below:


#Found the solution: '?' needs to be inside parenthesis (saved 
pattern) because outside we don't know if the saved match argument

#will exist or not namely '\1'.

>>> re.subn(r'([^\\]?)"', r'\1\\"', expression)

(' \\"\\" ', 2)


The pattern you are matching does not do what you think it does. 
"Zero or one of not-backslash, followed by a quote" will match a 
single quote *regardless* of what is before it. This is true even in 
sed, as you can see above, your sed regex matches both quotes.


\" will match, because the regular expression will match zero 
characters, followed by a quote. So the regex is correct.


>>> match = r'[^\\]?"'  # zero or one not-backslash followed by quote
>>> re.search(match, r'aaa\"aaa').group()
'"'

Now watch what happens when you call re.sub:


>>> match = r'([^\\])?"'  # group 1 equals a single n

Re: [Tutor] RE module is working ?

2011-02-04 Thread Karim

On 02/04/2011 11:26 AM, Peter Otten wrote:

Karim wrote:


That is not the thing I want. I want to escape any " which are not
already escaped.
The sed regex  '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have
made regex on unix since 15 years).

Can the backslash be escaped, too? If so I don't think your regex does what
you think it does.

r'\\\"' # escaped \ followed by escaped "

should not be altered, but:

$ echo '\\\"' | sed 's/\([^\\]\)\?"/\1\\"/g'
" # two escaped \ folloed by a " that is not escaped




By the way you are right:

I changed an I added sed command for the ' "" ':

karim@Requiem4Dream:~$ echo 'prima " "' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'

prima \" \"
karim@Requiem4Dream:~$ echo 'prima ""' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'

prima \"\"
karim@Requiem4Dream:~$ echo 'prima "Ich Karim"' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'

prima \"Ich Karim\"
karim@Requiem4Dream:~$ echo 'prima "Ich Karim"' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'

prima \"Ich Karim\"

Sorry, for the incomplete command. You pointed it out, many thanks Peter!

Regards
Karim



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-04 Thread Karim

On 02/04/2011 02:36 AM, Steven D'Aprano wrote:

Karim wrote:


*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know 
how to fix it yet.


A man when to a doctor and said, "Doctor, every time I do this, it 
hurts. What should I do?"


The doctor replied, "Then stop doing that!"

:)


Yes this these words made me laugh. I will keep it in my funny box.




Don't add bold or any other formatting to things which should be 
program code. Even if it looks okay in *your* program, you don't know 
how it will look in other people's programs. If you need to draw 
attention to something in a line of code, add a comment, or talk about 
it in the surrounding text.



[...]
That is not the thing I want. I want to escape any " which are not 
already escaped.
The sed regex  '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have 
made regex on unix since 15 years).


Mainly sed, awk and perl sometimes grep and egrep. I know this is the 
jungle.


Which regex? Perl regexes? sed or awk regexes? Extended regexes? GNU 
posix compliant regexes? grep or egrep regexes? They're all different.


In any case, I am sorry, I don't think your regex does what you say. 
When I try it, it doesn't work for me.


[steve@sylar ~]$ echo 'Some \"text"' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
Some \\"text\"


I give you my word on this. Exact output I redid it:

#MY OS VERSION
karim@Requiem4Dream:~$ uname -a
Linux Requiem4Dream 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 23:42:43 
UTC 2011 x86_64 GNU/Linux

#MY SED VERSION
karim@Requiem4Dream:~$ sed --version
GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.

GNU sed home page: .
General help using GNU software: .
E-mail bug reports to: .
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
#MY SED OUTPUT COMMAND:
karim@Requiem4Dream:~$  echo 'Some ""' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
Some \"\"
# THIS IS WHAT I WANT 2 CONSECUTIVES IF THE FIRST ONE IS ALREADY ESCAPED 
I DON'T WANT TO ESCAPED IT TWICE.

karim@Requiem4Dream:~$ echo 'Some \""' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
Some \"\"
# BY THE WAY THIS ONE WORKS:
karim@Requiem4Dream:~$ echo 'Some "text"' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
Some \"text\"
# BUT SURE NOT THIS ONE NOT COVERED BY MY REGEX (I KNOW IT AND WANT 
ORIGINALY TO COVER IT):
karim@Requiem4Dream:~$ echo 'Some \"text"' | sed -e 
's/\([^\\]\)\?"/\1\\"/g'

Some \\"text\"

By the way in all sed version I work with the '?'  (0 or one match) 
should be escaped that's the reason I have '\?' same thing with save 
'\(' and '\)' to store value. In perl, grep you don't need to escape.


# SAMPLE FROM http://www.gnu.org/software/sed/manual/sed.html

|\+|
   same As |*|, but matches one or more. It is a GNU extension.
|\?|
   same As |*|, but only matches zero or one. It is a GNU extension


I wouldn't expect it to work. See below.

By the way, you don't need to escape the brackets or the question mark:

[steve@sylar ~]$ echo 'Some \"text"' | sed -re 's/([^\\])?"/\1\\"/g'
Some \\"text\"



For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'


No it is not.



Yes I know, see my latest post in detail I already found the solution. I 
put it again the solution below:


#Found the solution: '?' needs to be inside parenthesis (saved pattern) 
because outside we don't know if the saved match argument

#will exist or not namely '\1'.

>>> re.subn(r'([^\\]?)"', r'\1\\"', expression)

(' \\"\\" ', 2)


The pattern you are matching does not do what you think it does. "Zero 
or one of not-backslash, followed by a quote" will match a single 
quote *regardless* of what is before it. This is true even in sed, as 
you can see above, your sed regex matches both quotes.


\" will match, because the regular expression will match zero 
characters, followed by a quote. So the regex is correct.


>>> match = r'[^\\]?"'  # zero or one not-backslash followed by quote
>>> re.search(match, r'aaa\"aaa').group()
'"'

Now watch what happens when you call re.sub:


>>> match = r'([^\\])?"'  # group 1 equals a single non-backslash
>>> replace = r'\1\\"'  # group 1 followed by \ followed by "
>>> re.sub(match, replace, '')  # no matches
''
>>> re.sub(match, replace, 'aa"aa')  # one match
'aa\\"aa'
>>> re.sub(match, replace, '"')  # one match, but there's no group 1
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.1/re.py", line 166, in sub
return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/local/lib/python3.1/re.py", line 303, in filter
return sre_parse.expand_template(template, match)
  File "/usr/local/lib/python3.1/sre_parse.

Re: [Tutor] RE module is working ?

2011-02-04 Thread Peter Otten
Karim wrote:

> That is not the thing I want. I want to escape any " which are not
> already escaped.
> The sed regex  '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have
> made regex on unix since 15 years).

Can the backslash be escaped, too? If so I don't think your regex does what 
you think it does.

r'\\\"' # escaped \ followed by escaped "

should not be altered, but:

$ echo '\\\"' | sed 's/\([^\\]\)\?"/\1\\"/g'
" # two escaped \ folloed by a " that is not escaped



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-04 Thread Peter Otten
Karim wrote:

> Recall:
> 
>  >>> re.subn(r'([^\\])?"', r'\1\\"', expression)
> 
> Traceback (most recent call last):
>  File "", line 1, in
>  File "/home/karim/build/python/install/lib/python2.7/re.py", line
> 162, in subn
>return _compile(pattern, flags).subn(repl, string, count)
>  File "/home/karim/build/python/install/lib/python2.7/re.py", line
> 278, in filter
>return sre_parse.expand_template(template, match)
>  File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
> line 787, in expand_template
>raise error, "unmatched group"
> sre_constants.error: unmatched group
> 
> 
> Found the solution: '?' needs to be inside parenthesis (saved pattern)
> because outside we don't know if the saved match argument
> will exist or not namely '\1'.
> 
>  >>> re.subn(r'([^\\]?)"', r'\1\\"', expression)
> 
> (' \\"\\" ', 2)
> 
> sed unix command is more permissive: sed 's/\([^\\]\)\?"/\1\\"/g'
> because '?' can be outside parenthesis (saved pattern but escaped for
> sed). \1 seems to not cause issue when matching is found. Perhaps it is
> created only when match occurs.

Thanks for reporting the explanation.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Steven D'Aprano

Karim wrote:


*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know how 
to fix it yet.


A man when to a doctor and said, "Doctor, every time I do this, it 
hurts. What should I do?"


The doctor replied, "Then stop doing that!"

:)

Don't add bold or any other formatting to things which should be program 
code. Even if it looks okay in *your* program, you don't know how it 
will look in other people's programs. If you need to draw attention to 
something in a line of code, add a comment, or talk about it in the 
surrounding text.



[...]
That is not the thing I want. I want to escape any " which are not 
already escaped.
The sed regex  '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have 
made regex on unix since 15 years).


Which regex? Perl regexes? sed or awk regexes? Extended regexes? GNU 
posix compliant regexes? grep or egrep regexes? They're all different.


In any case, I am sorry, I don't think your regex does what you say. 
When I try it, it doesn't work for me.


[steve@sylar ~]$ echo 'Some \"text"' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
Some \\"text\"

I wouldn't expect it to work. See below.

By the way, you don't need to escape the brackets or the question mark:

[steve@sylar ~]$ echo 'Some \"text"' | sed -re 's/([^\\])?"/\1\\"/g'
Some \\"text\"



For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'


No it is not.

The pattern you are matching does not do what you think it does. "Zero 
or one of not-backslash, followed by a quote" will match a single quote 
*regardless* of what is before it. This is true even in sed, as you can 
see above, your sed regex matches both quotes.


\" will match, because the regular expression will match zero 
characters, followed by a quote. So the regex is correct.


>>> match = r'[^\\]?"'  # zero or one not-backslash followed by quote
>>> re.search(match, r'aaa\"aaa').group()
'"'

Now watch what happens when you call re.sub:


>>> match = r'([^\\])?"'  # group 1 equals a single non-backslash
>>> replace = r'\1\\"'  # group 1 followed by \ followed by "
>>> re.sub(match, replace, '')  # no matches
''
>>> re.sub(match, replace, 'aa"aa')  # one match
'aa\\"aa'
>>> re.sub(match, replace, '"')  # one match, but there's no group 1
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.1/re.py", line 166, in sub
return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/local/lib/python3.1/re.py", line 303, in filter
return sre_parse.expand_template(template, match)
  File "/usr/local/lib/python3.1/sre_parse.py", line 807, in 
expand_template

raise error("unmatched group")
sre_constants.error: unmatched group

Because group 1 was never matched, Python's re.sub raised an error. It 
is not a very informative error, but it is valid behaviour.


If I try the same thing in sed, I get something different:

[steve@sylar ~]$ echo '"Some text' | sed -re 's/([^\\])?"/\1\\"/g'
\"Some text

It looks like this version of sed defines backreferences on the 
right-hand side to be the empty string, in the case that they don't 
match at all. But this is not standard behaviour. The sed FAQs say that 
this behaviour will depend on the version of sed you are using:


"Seds differ in how they treat invalid backreferences where no 
corresponding group occurs."


http://sed.sourceforge.net/sedfaq3.html

So you can't rely on this feature. If it works for you, great, but it 
may not work for other people.



When you delete the ? from the Python regex, group 1 is always valid, 
and you don't get an exception. Or if you ensure the input always 
matches group 1, no exception:


>>> match = r'([^\\])?"'
>>> replace = r'\1\\"'
>>> re.sub(match, replace, 'a"a"a"a') # group 1 always matches
'a\\"a\\"a\\"a'

(It still won't do what you want, but that's a *different* problem.)



Jamie Zawinski wrote:

  Some people, when confronted with a problem, think "I know,
  I'll use regular expressions." Now they have two problems.

How many hours have you spent trying to solve this problem using 
regexes? This is a *tiny* problem that requires an easy solution, not 
wrestling with a programming language that looks like line-noise.


This should do what you ask for:

def escape(text):
"""Escape any double-quote characters if and only if they
aren't already escaped."""
output = []
escaped = False
for c in text:
if c == '"' and not escaped:
output.append('\\')
elif c == '\\':
output.append('\\')
escaped = True
continue
output.append(c)
escaped = False
return ''.join(output)


Armed with this helper function, which took me two minutes to write, I 
can do this:


>>> text = 'Some text with backslash-quotes \\" and plain quotes " 
together.'

>>> print escape(text)
Some text with backslash-quotes \" an

Re: [Tutor] RE module is working ?

2011-02-03 Thread Alan Gauld


"Karim"  wrote


Because expression = *' "" '*  is in fact fact expression = ' "" '.
The bold appear as stars I don't know why. 


Because in the days when email was always sent in plain 
ASCII text the way to show "bold" was to put asterisks around 
it. Underlining used _underscores_ like so...


Obviously somebody decided that Thunderbird would stick 
with those conventions when translating HTML to text :-)


Quite smart really :-)

Alan G.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim

On 02/03/2011 07:47 PM, Karim wrote:

On 02/03/2011 02:15 PM, Peter Otten wrote:

Karim wrote:


I am trying to subsitute a '""' pattern in '\"\"' namely escape 2
consecutives double quotes:

 * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>  expression = *' "" '*
>>>  re.subn(*r'([^\\])?"', r'\1\\"', expression*)
Traceback (most recent call last):
File "", line 1, in
File "/home/karim/build/python/install/lib/python2.7/re.py", line
162, in subn
  return _compile(pattern, flags).subn(repl, string, count)
File "/home/karim/build/python/install/lib/python2.7/re.py", line
278, in filter
  return sre_parse.expand_template(template, match)
File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
line 787, in expand_template
  raise error, "unmatched group"
sre_constants.error: unmatched group

But if I remove '?' I get the following:

>>>  re.subn(r'([^\\])"', r'\1\\"', expression)
(' \\"" ', 1)

Only one substitution..._But this is not the same REGEX._ And the
count=2 does nothing. By default all occurrence shoul be substituted.

 * *On linux using my good old sed command, it is working with 
my '?'

   (0-1 match):*

*$* echo *' "" '* | sed *'s/\([^\\]\)\?"/\1\\"/g*'*
   \"\"

*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know 
how to fix it yet.

  afterwards
it's probably a good idea to try and explain your goal clearly, in plain
English.


I already did it. (cf the mails queue). But to resume I pass the 
expression string to TCL command which delimits string with double 
quotes only.

Indeed I get error with nested double quotes => That's the key problem.

Yes. What Steven said ;)

Now to your question as stated: if you want to escape two consecutive 
double

quotes that can be done with

s = s.replace('""', '\"\"')

I have already done it as a workaround but I have to add another 
replacement before to consider all other cases.

I want to make the original command work to suppress the workaround.



but that's probably *not* what you want. Assuming you want to escape two
consecutive double quotes and make sure that the first one isn't already
escaped,


You hit it !:-)


this is my attempt:


def sub(m):

... s = m.group()
... return r'\"\"' if s == '""' else s
...
print re.compile(r'[\\].|""').sub(sub, r'\\\"" \\"" \"" "" \\\" 
\\" \"')


That is not the thing I want. I want to escape any " which are not 
already escaped.
The sed regex  '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have 
made regex on unix since 15 years).


For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'
'?' is not accepted Why? character which should not be an antislash 
with 0 or 1 occurence. This is quite simple.


I am a poor tradesman but I don't deny evidence.


Recall:

>>> re.subn(r'([^\\])?"', r'\1\\"', expression)

Traceback (most recent call last):
File "", line 1, in
File "/home/karim/build/python/install/lib/python2.7/re.py", line
162, in subn
  return _compile(pattern, flags).subn(repl, string, count)
File "/home/karim/build/python/install/lib/python2.7/re.py", line
278, in filter
  return sre_parse.expand_template(template, match)
File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
line 787, in expand_template
  raise error, "unmatched group"
sre_constants.error: unmatched group


Found the solution: '?' needs to be inside parenthesis (saved pattern) 
because outside we don't know if the saved match argument

will exist or not namely '\1'.

>>> re.subn(r'([^\\]?)"', r'\1\\"', expression)

(' \\"\\" ', 2)

sed unix command is more permissive: sed 's/\([^\\]\)\?"/\1\\"/g' 
because '?' can be outside parenthesis (saved pattern but escaped for sed).
\1 seems to not cause issue when matching is found. Perhaps it is 
created only when match occurs.


MORALITY:

1) Behaviour of python is logic and I must understand what I do with it.
2) sed is a fantastic tool because it manages match value when missing.
3) I am a real poor tradesman

Regards
Karim



Regards
Karim


\\\"" \\\"\" \"" \"\" \\\" \\" \"

Compare that with

$ echo '\\\"" \\"" \"" "" \\\" \\" \"' | sed 's/\([^\\]\)\?"/\1\\"/g'
\\\"\" \\"\" \"\" \"\" " \\\" \\"

Concerning the exception and the discrepancy between sed and python's 
re, I

suggest that you ask it again on comp.lang.python aka the python-list
mailing list where at least one regex guru will read it.

Peter

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:

Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim

On 02/03/2011 11:20 PM, Dave Angel wrote:

On 01/-10/-28163 02:59 PM, Karim wrote:

On 02/03/2011 02:15 PM, Peter Otten wrote:

Karim wrote:
  (snip>

*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;

Thunderbird issue with bold type (appears as stars) but I don't know how
to fix it yet.


The simple fix is not to try to add bold or colors on a text message. 
Python-tutor is a text list, not an html one.  Thunderbird tries to 
accomodate you by adding the asterisks, which is fine if it's regular 
English.  But in program code, it's obviously confuses things.


While I've got you, can I urge you not to top-post?  In this message, 
you correctly added your remarks after the part you were quoting.  But 
many times you put your comments at the top, which is backwards.


DaveA



Sorry Dave,

I will try and do my best to avoid bold and top-post in the future.

Regards
Karim
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Dave Angel

On 01/-10/-28163 02:59 PM, Karim wrote:

On 02/03/2011 02:15 PM, Peter Otten wrote:

Karim wrote:
  (snip>

*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;

Thunderbird issue with bold type (appears as stars) but I don't know how
to fix it yet.


The simple fix is not to try to add bold or colors on a text message. 
Python-tutor is a text list, not an html one.  Thunderbird tries to 
accomodate you by adding the asterisks, which is fine if it's regular 
English.  But in program code, it's obviously confuses things.


While I've got you, can I urge you not to top-post?  In this message, 
you correctly added your remarks after the part you were quoting.  But 
many times you put your comments at the top, which is backwards.


DaveA

--
--
da...@ieee.org
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim

On 02/03/2011 02:15 PM, Peter Otten wrote:

Karim wrote:


I am trying to subsitute a '""' pattern in '\"\"' namely escape 2
consecutives double quotes:

 * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
  >>>  expression = *' "" '*
  >>>  re.subn(*r'([^\\])?"', r'\1\\"', expression*)
Traceback (most recent call last):
File "", line 1, in
File "/home/karim/build/python/install/lib/python2.7/re.py", line
162, in subn
  return _compile(pattern, flags).subn(repl, string, count)
File "/home/karim/build/python/install/lib/python2.7/re.py", line
278, in filter
  return sre_parse.expand_template(template, match)
File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
line 787, in expand_template
  raise error, "unmatched group"
sre_constants.error: unmatched group

But if I remove '?' I get the following:

  >>>  re.subn(r'([^\\])"', r'\1\\"', expression)
(' \\"" ', 1)

Only one substitution..._But this is not the same REGEX._ And the
count=2 does nothing. By default all occurrence shoul be substituted.

 * *On linux using my good old sed command, it is working with my '?'
   (0-1 match):*

*$* echo *' "" '* | sed *'s/\([^\\]\)\?"/\1\\"/g*'*
   \"\"

*Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know how 
to fix it yet.

  afterwards
it's probably a good idea to try and explain your goal clearly, in plain
English.


I already did it. (cf the mails queue). But to resume I pass the 
expression string to TCL command which delimits string with double 
quotes only.

Indeed I get error with nested double quotes => That's the key problem.

Yes. What Steven said ;)

Now to your question as stated: if you want to escape two consecutive double
quotes that can be done with

s = s.replace('""', '\"\"')

I have already done it as a workaround but I have to add another 
replacement before to consider all other cases.

I want to make the original command work to suppress the workaround.



but that's probably *not* what you want. Assuming you want to escape two
consecutive double quotes and make sure that the first one isn't already
escaped,


You hit it !:-)


this is my attempt:


def sub(m):

... s = m.group()
... return r'\"\"' if s == '""' else s
...

print re.compile(r'[\\].|""').sub(sub, r'\\\"" \\"" \"" "" \\\" \\" \"')


That is not the thing I want. I want to escape any " which are not 
already escaped.
The sed regex  '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have 
made regex on unix since 15 years).


For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'
'?' is not accepted Why? character which should not be an antislash with 
0 or 1 occurence. This is quite simple.


I am a poor tradesman but I don't deny evidence.

Regards
Karim


\\\"" \\\"\" \"" \"\" \\\" \\" \"

Compare that with

$ echo '\\\"" \\"" \"" "" \\\" \\" \"' | sed 's/\([^\\]\)\?"/\1\\"/g'
\\\"\" \\"\" \"\" \"\" " \\\" \\"

Concerning the exception and the discrepancy between sed and python's re, I
suggest that you ask it again on comp.lang.python aka the python-list
mailing list where at least one regex guru will read it.

Peter

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Peter Otten
Karim wrote:

> I am trying to subsitute a '""' pattern in '\"\"' namely escape 2
> consecutives double quotes:
> 
> * *In Python interpreter:*
> 
> $ python
> Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
> [GCC 4.4.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> expression = *' "" '*
>  >>> re.subn(*r'([^\\])?"', r'\1\\"', expression*)
> Traceback (most recent call last):
>File "", line 1, in 
>File "/home/karim/build/python/install/lib/python2.7/re.py", line
> 162, in subn
>  return _compile(pattern, flags).subn(repl, string, count)
>File "/home/karim/build/python/install/lib/python2.7/re.py", line
> 278, in filter
>  return sre_parse.expand_template(template, match)
>File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
> line 787, in expand_template
>  raise error, "unmatched group"
> sre_constants.error: unmatched group
> 
> But if I remove '?' I get the following:
> 
>  >>> re.subn(r'([^\\])"', r'\1\\"', expression)
> (' \\"" ', 1)
> 
> Only one substitution..._But this is not the same REGEX._ And the
> count=2 does nothing. By default all occurrence shoul be substituted.
> 
> * *On linux using my good old sed command, it is working with my '?'
>   (0-1 match):*
> 
> *$* echo *' "" '* | sed *'s/\([^\\]\)\?"/\1\\"/g*'*
>   \"\"
> 
> *Indeed what's the matter with RE module!?*

You should really fix the problem with your email program first; afterwards 
it's probably a good idea to try and explain your goal clearly, in plain 
English.

Yes. What Steven said ;)

Now to your question as stated: if you want to escape two consecutive double 
quotes that can be done with

s = s.replace('""', '\"\"')

but that's probably *not* what you want. Assuming you want to escape two 
consecutive double quotes and make sure that the first one isn't already 
escaped, this is my attempt:

>>> def sub(m):
... s = m.group()
... return r'\"\"' if s == '""' else s
...
>>> print re.compile(r'[\\].|""').sub(sub, r'\\\"" \\"" \"" "" \\\" \\" \"')
\\\"" \\\"\" \"" \"\" \\\" \\" \"

Compare that with

$ echo '\\\"" \\"" \"" "" \\\" \\" \"' | sed 's/\([^\\]\)\?"/\1\\"/g'
\\\"\" \\"\" \"\" \"\" " \\\" \\"

Concerning the exception and the discrepancy between sed and python's re, I 
suggest that you ask it again on comp.lang.python aka the python-list 
mailing list where at least one regex guru will read it.

Peter

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim


I forget something. There is no issue with python and double quotes.
But I need to give it to TCL script but as TCL is shit string is only 
delimited by double quotes.
Thus I need to escape it to not have syntax error whith nested double 
quotes.


Regards
The poor tradesman


On 02/03/2011 12:45 PM, Karim wrote:


Hello Steven,

I am perhaps a poor tradesman but I have to blame my thunderbird tool 
:-P .

Because expression = *' "" '*  is in fact fact expression = ' "" '.
The bold appear as stars I don't know why. I need to have escapes for 
passing it to another language (TCL interpreter).

So I will rewrite it not _in bold_:

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> expression = ' "" '

>>> re.subn(r'([^\\])?"', r'\1\\"', expression)

But if I remove '?' I get the following:

>>> re.subn(r'([^\\])"', r'\1\\"', expression)
(' \\"" ', 1)

* On linux using my good old sed command, it is working with my
  '?' (0-1 match):

$ echo ' "" ' | sed 's/\([^\\]\)\?"/\1\\"/g'*
* \"\"

For me linux/unix sed utility is trusty and is the reference.

Regards
Karim


On 02/03/2011 11:43 AM, Steven D'Aprano wrote:

Karim wrote:


Hello,

I am trying to subsitute a '""' pattern in '\"\"' namely escape 2 
consecutives double quotes:


You don't have to escape quotes. Just use the other sort of quote:

>>> print '""'
""



   * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> expression = *' "" '*


No, I'm sorry, that's incorrect -- that gives a syntax error in every 
version of Python I know of, including version 2.7:


>>> expression = *' "" '*
  File "", line 1
expression = *' "" '*
 ^
SyntaxError: invalid syntax


So what are you really running?




>>> re.subn(*r'([^\\])?"', r'\1\\"', expression*)


Likewise here. *r'...' is a syntax error, as is expression*)

I don't understand what you are running or why you are getting the 
results you are.



> *Indeed what's the matter with RE module!?*

There are asterisks all over your post! Where are they coming from?

What makes you think the problem is with the RE module?

We have a saying in English:

"The poor tradesman blames his tools."

Don't you think it's more likely that the problem is that you are 
using the module wrongly?


I don't understand what you are trying to do, so I can't tell you how 
to do it. Can you give an example of what you want to start with, and 
what you want to end up with? NOT Python code, just literal text, 
like you would type into a letter.


E.g. ABC means literally A followed by B followed by C.
\" means literally backslash followed by double-quote







___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim


Hello Steven,

I am perhaps a poor tradesman but I have to blame my thunderbird tool :-P .
Because expression = *' "" '*  is in fact fact expression = ' "" '.
The bold appear as stars I don't know why. I need to have escapes for 
passing it to another language (TCL interpreter).

So I will rewrite it not _in bold_:

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> expression = ' "" '

>>> re.subn(r'([^\\])?"', r'\1\\"', expression)

But if I remove '?' I get the following:

>>> re.subn(r'([^\\])"', r'\1\\"', expression)
(' \\"" ', 1)

   * On linux using my good old sed command, it is working with my '?'
 (0-1 match):

$ echo ' "" ' | sed 's/\([^\\]\)\?"/\1\\"/g'*
* \"\"

For me linux/unix sed utility is trusty and is the reference.

Regards
Karim


On 02/03/2011 11:43 AM, Steven D'Aprano wrote:

Karim wrote:


Hello,

I am trying to subsitute a '""' pattern in '\"\"' namely escape 2 
consecutives double quotes:


You don't have to escape quotes. Just use the other sort of quote:

>>> print '""'
""



   * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> expression = *' "" '*


No, I'm sorry, that's incorrect -- that gives a syntax error in every 
version of Python I know of, including version 2.7:


>>> expression = *' "" '*
  File "", line 1
expression = *' "" '*
 ^
SyntaxError: invalid syntax


So what are you really running?




>>> re.subn(*r'([^\\])?"', r'\1\\"', expression*)


Likewise here. *r'...' is a syntax error, as is expression*)

I don't understand what you are running or why you are getting the 
results you are.



> *Indeed what's the matter with RE module!?*

There are asterisks all over your post! Where are they coming from?

What makes you think the problem is with the RE module?

We have a saying in English:

"The poor tradesman blames his tools."

Don't you think it's more likely that the problem is that you are 
using the module wrongly?


I don't understand what you are trying to do, so I can't tell you how 
to do it. Can you give an example of what you want to start with, and 
what you want to end up with? NOT Python code, just literal text, like 
you would type into a letter.


E.g. ABC means literally A followed by B followed by C.
\" means literally backslash followed by double-quote






___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Steven D'Aprano

Karim wrote:


Hello,

I am trying to subsitute a '""' pattern in '\"\"' namely escape 2 
consecutives double quotes:


You don't have to escape quotes. Just use the other sort of quote:

>>> print '""'
""



   * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> expression = *' "" '*


No, I'm sorry, that's incorrect -- that gives a syntax error in every 
version of Python I know of, including version 2.7:


>>> expression = *' "" '*
  File "", line 1
expression = *' "" '*
 ^
SyntaxError: invalid syntax


So what are you really running?




 >>> re.subn(*r'([^\\])?"', r'\1\\"', expression*)


Likewise here. *r'...' is a syntax error, as is expression*)

I don't understand what you are running or why you are getting the 
results you are.



> *Indeed what's the matter with RE module!?*

There are asterisks all over your post! Where are they coming from?

What makes you think the problem is with the RE module?

We have a saying in English:

"The poor tradesman blames his tools."

Don't you think it's more likely that the problem is that you are using 
the module wrongly?


I don't understand what you are trying to do, so I can't tell you how to 
do it. Can you give an example of what you want to start with, and what 
you want to end up with? NOT Python code, just literal text, like you 
would type into a letter.


E.g. ABC means literally A followed by B followed by C.
\" means literally backslash followed by double-quote




--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] RE module is working ?

2011-02-03 Thread Karim


Hello,

Any news on this topic?O:-)

Regards
Karim

On 02/02/2011 08:21 PM, Karim wrote:


Hello,

I am trying to subsitute a '""' pattern in '\"\"' namely escape 2 
consecutives double quotes:


* *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> expression = *' "" '*
>>> re.subn(*r'([^\\])?"', r'\1\\"', expression*)
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/karim/build/python/install/lib/python2.7/re.py", line 
162, in subn

return _compile(pattern, flags).subn(repl, string, count)
  File "/home/karim/build/python/install/lib/python2.7/re.py", line 
278, in filter

return sre_parse.expand_template(template, match)
  File "/home/karim/build/python/install/lib/python2.7/sre_parse.py", 
line 787, in expand_template

raise error, "unmatched group"
sre_constants.error: unmatched group

But if I remove '?' I get the following:

>>> re.subn(r'([^\\])"', r'\1\\"', expression)
(' \\"" ', 1)

Only one substitution..._But this is not the same REGEX._ And the 
count=2 does nothing. By default all occurrence shoul be substituted.


* *On linux using my good old sed command, it is working with my
  '?' (0-1 match):*

*$* echo *' "" '* | sed *'s/\([^\\]\)\?"/\1\\"/g*'*
 \"\"

*Indeed what's the matter with RE module!?*

*Any idea will be welcome!

Regards
Karim*
*


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] RE module is working ?

2011-02-02 Thread Karim


Hello,

I am trying to subsitute a '""' pattern in '\"\"' namely escape 2 
consecutives double quotes:


   * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> expression = *' "" '*
>>> re.subn(*r'([^\\])?"', r'\1\\"', expression*)
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/karim/build/python/install/lib/python2.7/re.py", line 
162, in subn

return _compile(pattern, flags).subn(repl, string, count)
  File "/home/karim/build/python/install/lib/python2.7/re.py", line 
278, in filter

return sre_parse.expand_template(template, match)
  File "/home/karim/build/python/install/lib/python2.7/sre_parse.py", 
line 787, in expand_template

raise error, "unmatched group"
sre_constants.error: unmatched group

But if I remove '?' I get the following:

>>> re.subn(r'([^\\])"', r'\1\\"', expression)
(' \\"" ', 1)

Only one substitution..._But this is not the same REGEX._ And the 
count=2 does nothing. By default all occurrence shoul be substituted.


   * *On linux using my good old sed command, it is working with my '?'
 (0-1 match):*

*$* echo *' "" '* | sed *'s/\([^\\]\)\?"/\1\\"/g*'*
 \"\"

*Indeed what's the matter with RE module!?*

*Any idea will be welcome!

Regards
Karim*
*
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-25 Thread Tiago Saboga
Thanks Kent! Once more you go straight to the point!

Kent Johnson  writes:
> On Wed, Jun 24, 2009 at 2:24 PM, Tiago Saboga wrote:
>> In [33]: re.search("(a[^.]*?b\.\s?){2}", text).group(0)
>> Out[33]: 'a45453b. a325643b. '
>
> group(0) is the entire match so this returns what you expect. But what
> is group(1)?
>
> In [6]: re.search("(a[^.]*?b\.\s?){2}", text).group(1)
> Out[6]: 'a325643b. '
>
> Repeated groups are tricky; the returned value contains only the first
> match for the group, not the repeats.

The problem was exactly that. I had seen that findall got the first
group of the match, but not that this would not span repeats. But it
makes sense, as the repeat count is after the parens. 

> If you change the inner parentheses to be non-grouping then you get
> pretty much what you want:
>
> In [8]: re.findall("((?:a[^.]*?b\.\s?)+)", text)
> Out[8]: ['a2345b. ', 'a45453b. a325643b. a435643b. ']

And the trick of the non-grouping parens is great too. Thanks again!

Tiago.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-24 Thread Serdar Tumgoren
Ok -- realized my "solution" incorrectly strips white space from
multiword strings:

> Out[92]: ['a2345b.', 'a45453b.a325643b.a435643b.']
>

So here are some more gymnastics to get the correct result:

In [105]: newlist
Out[105]: ['a2345b.', '|', 'a45453b.', 'a325643b.', 'a435643b.', '|']

In [109]: lastlist2 = " ".join(newlist).rstrip("|").split("|")

In [110]: lastlist3 = [item.strip() for item in lastlist2]

In [111]: lastlist3
Out[111]: ['a2345b.', 'a45453b. a325643b. a435643b.']
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-24 Thread Serdar Tumgoren
As usual, Kent Johnson has swooped in an untangled the mess with a
clear explanation.

By the time a regex gets this complicated, I typically start thinking
of ways to simplify or avoid them altogether.

Below is the code I came up with. It goes through some gymnastics and
can surely stand improvement, but it seems to get the job done.
Suggestions are welcome.


In [83]: text
Out[83]: 'a2345b. f325. a45453b. a325643b. a435643b. g234324b.'

In [84]: textlist = text.split()

In [85]: textlist
Out[85]: ['a2345b.', 'f325.', 'a45453b.', 'a325643b.', 'a435643b.', 'g234324b.']

In [86]: newlist = []

In [87]: pat = re.compile(r'a\w+b\.')

In [88]: for item in textlist:
   : if pat.match(item):
   : newlist.append(item)
   : else:
   : newlist.append("|")
   :
   :

In [89]: newlist
Out[89]: ['a2345b.', '|', 'a45453b.', 'a325643b.', 'a435643b.', '|']

In [90]: lastlist = ''.join(newlist)

In [91]: lastlist
Out[91]: 'a2345b.|a45453b.a325643b.a435643b.|'

In [92]: lastlist.rstrip("|").split("|")
Out[92]: ['a2345b.', 'a45453b.a325643b.a435643b.']
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-24 Thread Kent Johnson
On Wed, Jun 24, 2009 at 2:24 PM, Tiago Saboga wrote:
> Hi!
>
> I am trying to split some lists out of a single text file, and I am
> having a hard time. I have reduced the problem to the following one:
>
> text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."
>
> Of this line of text, I want to take out strings where all words start
> with a, end with "b.". But I don't want a list of words. I want that:
>
> ["a2345b.", "a45453b. a325643b. a435643b."]
>
> And I feel I still don't fully understand regular expression's logic. I
> do not understand the results below:
>
> In [33]: re.search("(a[^.]*?b\.\s?){2}", text).group(0)
> Out[33]: 'a45453b. a325643b. '

group(0) is the entire match so this returns what you expect. But what
is group(1)?

In [6]: re.search("(a[^.]*?b\.\s?){2}", text).group(1)
Out[6]: 'a325643b. '

Repeated groups are tricky; the returned value contains only the first
match for the group, not the repeats.

> In [34]: re.findall("(a[^.]*?b\.\s?){2}", text)
> Out[34]: ['a325643b. ']

When the re contains groups, re.findall() returns the groups. It
doesn't return the whole match. So this is giving group(1), not
group(0). You can get the whole match by explicitly grouping it:

In [4]: re.findall("((a[^.]*?b\.\s?){2})", text)
Out[4]: [('a45453b. a325643b. ', 'a325643b. ')]

> In [35]: re.search("(a[^.]*?b\.\s?)+", text).group(0)
> Out[35]: 'a2345b. '

You only get the first match, so this is correct.

> In [36]: re.findall("(a[^.]*?b\.\s?)+", text)
> Out[36]: ['a2345b. ', 'a435643b. ']

This is finding both matches but the grouping has the same difficulty
as the previous findall(). This is closer:

In [7]: re.findall("((a[^.]*?b\.\s?)+)", text)
Out[7]: [('a2345b. ', 'a2345b. '), ('a45453b. a325643b. a435643b. ',
'a435643b. ')]

If you change the inner parentheses to be non-grouping then you get
pretty much what you want:

In [8]: re.findall("((?:a[^.]*?b\.\s?)+)", text)
Out[8]: ['a2345b. ', 'a45453b. a325643b. a435643b. ']

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-24 Thread Tiago Saboga
Serdar Tumgoren  writes:

> Hey Tiago,
>
>> text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."
>>
>> Of this line of text, I want to take out strings where all words start
>> with a, end with "b.". But I don't want a list of words. I want that:
>>
>> ["a2345b.", "a45453b. a325643b. a435643b."]
>>
>
> Are you saying you want a list of every item that starts with an "a"
> and ends with a "b"? If so, the above list is not what you're after.
> It only contains two items:
>   a2345b.
>   a45453b. a325643b. a435643b.

Yes, I want to find only two items. I want every sequence of words where
every word begins with an "a" and ends with "b.".

> Try reading this:
> http://www.amk.ca/python/howto/regex/

I have read several times, and I thought I understood it quite well ;)

I have not the time right now to do it, but if it turns out to be
useful, I can show why I came to the patterns I sent to the list.

Thanks,

Tiago.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-24 Thread Serdar Tumgoren
apologies -- I just reread your post and appears you also want to
capture the dot after each "b" ( "b." )

In that case, you need to update the pattern to match for the dot. But
because the dot is itself a metacharacter, you have to escape it with
a backslash:

In [23]: re.findall(r'a\w+b\.',text)
Out[23]: ['a2345b.', 'a45453b.', 'a325643b.', 'a435643b.']

Again, all of these features are explained nicely at
http://www.amk.ca/python/howto/regex/
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re module / separator

2009-06-24 Thread Serdar Tumgoren
Hey Tiago,

> text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."
>
> Of this line of text, I want to take out strings where all words start
> with a, end with "b.". But I don't want a list of words. I want that:
>
> ["a2345b.", "a45453b. a325643b. a435643b."]
>

Are you saying you want a list of every item that starts with an "a"
and ends with a "b"? If so, the above list is not what you're after.
It only contains two items:
  a2345b.
  a45453b. a325643b. a435643b.

You can verify this by trying len(["a2345b.", "a45453b. a325643b.
a435643b."]).  You can also see that each item is wrapped in double
quotes and separated by a comma.

> And I feel I still don't fully understand regular expression's logic. I
> do not understand the results below:

Try reading this:
http://www.amk.ca/python/howto/regex/

I've found it to be a very gentle and useful introduction to regexes.

It explains, among other things, what the search and findall methods
do. If I'm understanding your problem correctly, you probably want the
findall method:

You should definitely take the time to read up on regexes. Your
patterns grew too complex for this problem (again, if I'm
understanding you right) which is probably why you're not
understanding your results.

In [9]:   re.findall(r'a[a-z0-9]+b',text)
Out[9]: ['a2345b', 'a45453b', 'a325643b', 'a435643b']

There are other ways to perform the above, for instance using the "\w"
metacharacter to match any alphanumeric.

In [20]: re.findall(r'a\w+b',text)
Out[20]: ['a2345b', 'a45453b', 'a325643b', 'a435643b']

Or, to get even more (needlessly) complicated:

In [21]: re.findall(r'\ba\w+b\b',text)
Out[21]: ['a2345b', 'a45453b', 'a325643b', 'a435643b']

As you learned, regexes can get really complicated, really quickly if
you don't understand the syntax.  Others with more experience might
offer more elegant solutions to your problem, but I'd still encourage
you to read up on the basics and get comfortable with the re module.
It's a great tool once you understand it.

Best of luck,
Serdar
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] re module / separator

2009-06-24 Thread Tiago Saboga
Hi!

I am trying to split some lists out of a single text file, and I am
having a hard time. I have reduced the problem to the following one:

text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."

Of this line of text, I want to take out strings where all words start
with a, end with "b.". But I don't want a list of words. I want that:

["a2345b.", "a45453b. a325643b. a435643b."]

And I feel I still don't fully understand regular expression's logic. I
do not understand the results below:

In [33]: re.search("(a[^.]*?b\.\s?){2}", text).group(0)
Out[33]: 'a45453b. a325643b. '

In [34]: re.findall("(a[^.]*?b\.\s?){2}", text)
Out[34]: ['a325643b. ']

In [35]: re.search("(a[^.]*?b\.\s?)+", text).group(0)
Out[35]: 'a2345b. '

In [36]: re.findall("(a[^.]*?b\.\s?)+", text)
Out[36]: ['a2345b. ', 'a435643b. ']


What's the difference between search and findall in [33-34]? And why I
cannot generalize [33] to [35]? Out[35] would make sense to me if I had
put a non-greedy +, but why do re gets only one word?

Thanks,

Tiago Saboga.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Re: Module Loop doesn't work (Joseph Q.)

2005-04-01 Thread Andrei
Joseph Quigley wrote on Fri, 01 Apr 2005 10:07:08 -0600:

>   I have some code on a geek dictionary that I'm making where the command 
> geeker() opens a module for the "real" geek dictionary (where you can type 
> a word to see what it is geekified). Supposedly, you type lobby() to go 
> back to what I call  the lobby (where you can get info on the web site and 
> email and version). But it just loops back to the Geeker>>> prompt where 
> you type the word that you want geekified. I even tried having it restart 
> the whole program by importing the index module that I wrote. But it still 
> won't restart the program!

Without seeing your code, I doubt anyone will be able to solve your problem
except by pure chance. In addition to that, I'm confused by the use of
function calls in what seems te be essentially a menu system. 

Speaking in general terms, the way you could handle this is as follows:
- have a main menu loop (what you call the lobby) which accepts user input
and based on that input calls other functions which perform certain tasks
(e.g. open a webpage or go to the dictionary part)
- the dictionary part would in turn be another loop accepting words as
input which get 'translated', until the user gives a blank string or
whatever as input in order to terminate the loop (and automatically fall
back into the loop of the lobby)

-- 
Yours,

Andrei

=
Real contact info (decode with rot13):
[EMAIL PROTECTED] Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq
gur yvfg, fb gurer'f ab arrq gb PP.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor