Re: [Tutor] regular expression query

2019-06-09 Thread Cameron Simpson

On 08Jun2019 22:27, Sean Murphy  wrote:

Windows 10 OS, Python 3.6


Thanks for this.

I have a couple of  queries  in relation to extracting content using 
regular expressions. I understand [...the regexp syntax...]

The challenge I am finding is getting a pattern to
extract specific word(s). Trying to identify the best method to use and how
to use the \1 when using forward and backward search pattern (Hoping I am
using the right term). Basically I am trying to extract specific phrases or
digits to place in a dictionary within categories. Thus if "ROYaL_BANK
123123123" is found, it is placed in a category called transfer funds. Other
might be a store name which likewise is placed in the store category.


I'll tackle your specific examples lower down, and make some 
suggestions.


Note, I have found a logic error with "ROYAL_BANK 123123123", but that 
isn't a concern. The extraction of the text is.


Line examples:
Royal_bank M-BANKING PAYMENT TRANSFER 123456 to 9922992299
Royal_bank M-BANKING PAYMENT TRANSFER 123456 FROM 9922992299
PAYMENT TO SARWARS-123123123
ROYAL_BANK INTERNET BANKING BPAY Kangaroo Store {123123123}
EFTPOS Amazon
PAY/SALARY FROM foo bar 123123123
PAYMENT TO Tax Man  666


Thanks.

Assuming the below is a cut/paste accident from some code:

 result = re.sub(r'ROYAL_BANK INTERNET BANKING FUNDS TFER TRANSFER \d+ TO ', 
'ROYAL_BANK ', line)
 r'ROYAL_BANK INTERNET BANKING TRANSFER Mouth in foot


And other similar structures. Below is the function I am currently using.
Not sure if the sub, match or search is going to be the best method. The
reason why I am using a sub is to delete the unwanted text. The
searchmatch/findall  could do the same if I use a group. Also I have not
used any tests in the below and logically I think I should. As the code will
override the results if not found in the later tests. If there is a more
elegant  way to do it then having:

If line.startswith('text string to match'):
   Regular expression
el If line.startswith('text string to match'):
   regular expression
return result


There is. How far you take it depends on how variable your input it.  
Banking statement data I would expect to have relatively few formats 
(unless the banking/financ industry is every bit as fragmented as I 
sometimes believe, in which case the structure might be less driven by 
_your_ bank and instead arbitrarily garbled according the the various 
other entities due to getting ad hoc junk as the description).


I would like to know. The different regular expressions I have used 
are:


# this sometimes matches and sometimes does not. I want all the text up to
the from or to, to be replaced with "ROYAL_BANK". Ending up with ROYAL_BANK
123123123

   result= re.sub(r'ROYAL_BANK M-BANKING PAYMENT TRANSFER \d+ (TO|FROM) ',
'ROYAL_BANK ', line)


Looks superficially ok. Got an example input line where it fails? Not 
that the above is case sentitive, so if "to" etc can be in lower case 
(as in your example text earlier) this will fail. See the re.I modifier.



# the below  returns from STARWARS and it shouldn't. I should just get
STARWARS.

   result = re.match(r'PAYMENT TO (SARWARS)-\d+ ', line)


Well, STARWARS seems misseplt above. And you should get a "match" 
object, with "STARWARS" in .group(1).


So earlier you're getting a str in result, and here you're getting an 
re.match object (or None for a failed match).


# the below should (doesn't work the last time I tested it) should 
return the words between the (.)


   result = re.match(r'ROYAL_BANK INTERNET BANKING BPAY (.*) [{].*$', '\1', 
line)


"should" what? It would help to see the input line you expect this to 
match. And re.match is not an re.sub - it looks like you have these 
confused here, based on the following '\`',line parameters.



# the below patterns should remove the text at the beginning of the string
   result = re.sub(r'ROYAL_BANK INTERNET BANKING FUNDS TFER TRANSFER \d+ TO ', 
'ROYAL_BANK ', line)
   result = re.sub(r'ROYAL_BANK INTERNET BANKING TRANSFER ', '', line)
   result = re.sub(r'EFTPOS ', '', line)


Sure. Got an example line where this does not happen?

# The below does not work and I am trying to use the back or forward 
search feature. Is this syntax wrong or the pattern wrong? I cannot work it out

from the information I have read.

result = re.sub(r'PAY/SALARY FROM (*.) \d+$', '\1', line)
   result = re.sub(r'PAYMENT TO (*.) \d+', '\1', line)


You've got "*." You probably mean ".*"

Main issues:

1: Your input data seems to be mixed case, but all your regexps are case 
sensitive. They will not match if the case is different eg "Royal_Bank" 
vs "ROYAL_BANK", "to" vs "TO", etc. Use the re.I modified to make your 
regexps case insensitive.


2: You're using re.sub a lot. I'd be inclined to always use re.match and 
to pull information from the match object you get back. Untested example 
sketch:


 m = re.match('(ROYAL_BANK|COMMONER_CREDIT_UNION) INTERNET BANKING FUNDS TFER 
TRANSFER (\d+) TO (.*)'

[Tutor] regular expression query

2019-06-08 Thread mhysnm1964
Hello all,

 

Windows 10 OS, Python 3.6

 

I have a couple of  queries  in relation to extracting content using regular
expressions. I understand the pattern chars (.?*+), Meta-chars \d, \D, \W,
\W and so on. The class structure [.]. The group I believe I understand (.).
The repeat feature {m,n}. the difference between the methods match, search,
findall, sub and ETC. The challenge I am finding is getting a pattern to
extract specific word(s). Trying to identify the best method to use and how
to use the \1 when using forward and backward search pattern (Hoping I am
using the right term). Basically I am trying to extract specific phrases or
digits to place in a dictionary within categories. Thus if "ROYaL_BANK
123123123" is found, it is placed in a category called transfer funds. Other
might be a store name which likewise is placed in the store category. 

 

Note, I have found a logic error with "ROYAL_BANK 123123123", but that isn't
a concern. The extraction of the text is.

 

Line examples:

 

Royal_bank M-BANKING PAYMENT TRANSFER 123456 to 9922992299

Royal_bank M-BANKING PAYMENT TRANSFER 123456 FROM 9922992299

PAYMENT TO SARWARS-123123123

ROYAL_BANK INTERNET BANKING BPAY Kangaroo Store {123123123}result =
re.sub(r'ROYAL_BANK INTERNET BANKING FUNDS TFER TRANSFER \d+ TO ',
'ROYAL_BANK ', line)

r'ROYAL_BANK INTERNET BANKING TRANSFER Mouth in foot

EFTPOS Amazon

PAY/SALARY FROM foo bar 123123123

PAYMENT TO Tax Man  666

 

And other similar structures. Below is the function I am currently using.
Not sure if the sub, match or search is going to be the best method. The
reason why I am using a sub is to delete the unwanted text. The
searchmatch/findall  could do the same if I use a group. Also I have not
used any tests in the below and logically I think I should. As the code will
override the results if not found in the later tests. If there is a more
elegant  way to do it then having:

 

If line.startswith('text string to match'):

Regular expression 

el If line.startswith('text string to match'):

regular expression

return result 

 

I would like to know. The different regular expressions I have used are:

 

# this sometimes matches and sometimes does not. I want all the text up to
the from or to, to be replaced with "ROYAL_BANK". Ending up with ROYAL_BANK
123123123

result= re.sub(r'ROYAL_BANK M-BANKING PAYMENT TRANSFER \d+ (TO|FROM) ',
'ROYAL_BANK ', line)

 

# the below  returns from STARWARS and it shouldn't. I should just get
STARWARS.

result = re.match(r'PAYMENT TO (SARWARS)-\d+ ', line)

 

# the below should (doesn't work the last time I tested it) should return
the words between the (.)

result = re.match(r'ROYAL_BANK INTERNET BANKING BPAY (.*) [{].*$', '\1',
line)

 

# the below patterns should remove the text at the beginning of the string

result = re.sub(r'ROYAL_BANK INTERNET BANKING FUNDS TFER TRANSFER \d+ TO
', 'ROYAL_BANK ', line)

result = re.sub(r'ROYAL_BANK INTERNET BANKING TRANSFER ', '', line)

result = re.sub(r'EFTPOS ', '', line)

 

# The below does not work and I am trying to use the back or forward search
feature. Is this syntax wrong or the pattern wrong? I cannot work it out
from the information I have read.

 result = re.sub(r'PAY/SALARY FROM (*.) \d+$', '\1', line)

result = re.sub(r'PAYMENT TO (*.) \d+', '\1', line)

 

Sean 

 

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] (regular expression)

2016-12-10 Thread Martin A. Brown

Hello Isaac,

This second posting you have made has provided more information 
about what you are trying to accomplish and how (and also was 
readable, where the first one looked like it got mangled by your 
mail user agent; it's best to try to post only plain text messages 
to this sort of mailing list).

I suspect that we can help you a bit more, now.

If we knew even more about what you were looking to do, we might be 
able to help you further (with all of the usual remarks about how we 
won't do your homework for you, but all of us volunteers will gladly 
help you understand the tools, the systems, the world of Python and 
anything else we can suggest in the realm of computers, computer 
science and problem solving).

I will credit the person who assigned this task for you, as this is 
not dissimilar from the sort of problem that one often has when 
facing a new practical computing problem.  Often (and in your case) 
there is opaque structure and hidden assumptions in the question 
which need to be understood.  See further below

These were your four lines of code:

>with 
>urllib.request.urlopen("https://www.sdstate.edu/electrical-engineering-and-computer-science";)
> as cs:
>cs_page = cs.read()
>soup = BeautifulSoup(cs_page, "html.parser")
>print(len(soup.body.find_all(string = ["Engineering","engineering"])))

The fourth line is an impressive attempt at compressing all of the 
searching, finding, counting and reporting steps into a single line.  

Your task (I think), is more complicated than that single line can 
express.  So, that will need to be expanded to a few more lines of 
code.

You may have heard these aphorisms before:

  * brevity is the soul of wit
  * fewer lines of code are better
  * prefer a short elegant solution

But, when complexity intrudes into brevity, the human mind 
struggles.  As a practitioner, I will say that I spend more of my 
time reading and understanding code than writing it, so writing 
simple, self-contained and understandable units of code leads to 
intelligibility for humans and composability for systems.

Try this at a Python console [1].

  import this

>i used control + f on the link in the code and i get 11 for ctrl + 
>f and 3 for the code

Applause!  Look at the raw data!  Study the raw data!  That is an 
excellent way to start to try to understand the raw data.  You must 
always go back to the raw input data and then consider whether your 
tooling or the data model in your program matches what you are 
trying to extract/compute/transform.

The answer (for number of occurrences of the word 'engineering', 
case-insensitive) that I get is close to your answer when searching 
with control + f, but is a bit larger than 11.

Anyway, here are my thoughts.  I will start with some tips that are 
relevant to your 4-line pasted program:

  * BeautifulSoup is wonderfully convenient, but also remember it 
is another high-level tool; it is often forgiving where other 
tools are more rigorous, however it is excellent for learning 
and (I hope you see below) that it is a great tool for the 
problem you are trying to solve

  * in your code, soup.body is a handle that points to the 
tag of the HTML document you have fetched; so why can't you 
simply find_all of the strings "Engineering" and "engineering" 
in the text and count them?

  - find_all is a method that returns all of the tags in the
structured document below (in this case) soup.body

  - your intent is not to count tags with the string
'engineering' but rather , you are looking for that string 
in the text (I think)

  * it is almost always a mistake to try to process HTML with 
regular expressions, however, it seems that you are trying to 
find all matches of the (case-insensitive) word 'engineering' in 
the text of this document; that is something tailor-made for 
regular expressions, so there's the Python regular expression 
library, too:  'import re'

  * and on a minor note, since you are using urllib.request.open()
in a with statement (using contexts this way is wonderful), you
could collect the data from the network socket, then drop out of 
the 'with' block to allow the context to close, so if your block 
worked as you wanted, you could adjust it as follows:

  with urllib.request.urlopen(uri as cs:
  cs_page = cs.read()
  soup = BeautifulSoup(cs_page, "html.parser")
  print(len(soup.body.find_all(string = ["Engineering","engineering"])))

  * On a much more minor point, I'll mention that urllib / urllib2 
are available with the main Python releases but there are other 
libraries for handling fetching; I often recommend the 
third-party requests [0] library, as it is both very Pythonic, 
reasonably high-level and frightfully flexible

So, connecting the Zen of Python [1] to your problem, I would 
suggest making shorter, simpler lines and separating the logic

Re: [Tutor] (regular expression)

2016-12-10 Thread isaac tetteh
this is the real code


with 
urllib.request.urlopen("https://www.sdstate.edu/electrical-engineering-and-computer-science";)
 as cs:
cs_page = cs.read()
soup = BeautifulSoup(cs_page, "html.parser")
print(len(soup.body.find_all(string = ["Engineering","engineering"])))

i used control + f on the link in the code and i get 11 for ctrl + f and 3 for 
the code

THanks





From: Tutor  on behalf of Bob 
Gailer 
Sent: Saturday, December 10, 2016 7:54 PM
To: Tetteh, Isaac - SDSU Student
Cc: Python Tutor
Subject: Re: [Tutor] (no subject)

On Dec 10, 2016 12:15 PM, "Tetteh, Isaac - SDSU Student" <
isaac.tet...@jacks.sdstate.edu> wrote:
>
> Hello,
>
> I am trying to find the number of times a word occurs on a webpage so I
used bs4 code below
>
> Let assume html contains the "html code"
> soup = BeautifulSoup(html, "html.parser")
> print(len(soup.find_all(string=["Engineering","engineering"])))
> But the result is different from when i use control + f on my keyboard to
find
>
> Please help me understand why it's different results. Thanks
> I am using Python 3.5
>
What is the URL of the web page?
To what are you applying control-f?
What are the two different counts you're getting?
Is it possible that the page is being dynamically altered after it's loaded?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Tutor Info Page - Python
mail.python.org
This list is for folks who want to ask questions regarding how to learn 
computer programming with the Python language and its standard library.



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-15 Thread Alan Gauld

On 15/04/15 09:24, Peter Otten wrote:


function call. I've never seen (or noticed?) the embedded form,
and don't see it described in the docs anywhere


Quoting :

"""
(?aiLmsux)
(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The
group matches the empty string; the letters set the corresponding flags:


Aha. The trick is knowing the correct search string... I tried 'flag' 
and 'verbose' but missed this entry.



Again, where is that described?


"""
(?#...)
A comment; the contents of the parentheses are simply ignored.
"""


OK, I missed that too.
Maybe I just wasn't awake enough this morning! :-)

Thanks Peter.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-15 Thread Peter Otten
Albert-Jan Roskam wrote:

> On Tue, 4/14/15, Peter Otten <__pete...@web.de> wrote:

>>> >>> pprint.pprint(
>>> ... [(k, int(v)) for k, v in
>>> ...
>re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall(line)])
>>> [('Input Read Pairs', 2127436),
>>>('Both Surviving', 1795091),
>>>('Forward Only Surviving', 17315),
>>>('Reverse Only Surviving', 6413),
>>>('Dropped', 308617)]

> Yes, nice, but why do you use
> re.compile(regex).findall(line)
> and not
> re.findall(regex, line)
> 
> I know what re.compile is for. I often use it outside a loop and then
> actually use the compiled regex inside a loop, I just haven't see the way
> you use it before.

What you describe here is how I use regular expressions most of the time.
Also, re.compile() behaves the same over different Python versions while the 
shortcuts for the pattern methods changed signature over time. 
Finally, some have a gotcha. Compare:

>>> re.compile("a", re.IGNORECASE).sub("b", "aAAaa")
'b'
>>> re.sub("a", "b", "aAAaa", re.IGNORECASE)
'bAAba'

Did you expect that? Congrats for thorough reading of the docs ;)

> personally, I prefer to be verbose about being verbose, ie use the
> re.VERBOSE flag. But perhaps that's just a matter of taste. Are there any
> use cases when the ?iLmsux operators are clearly a better choice than the
> equivalent flag? For me, the mental burden of a regex is big enough
> already without these operators. 

I pass flags separately myself, but

>>> re.sub("(?i)a", "b", "aAAaa")
'b'

might serve as an argument for inlined flags.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-15 Thread Albert-Jan Roskam

On Tue, 4/14/15, Peter Otten <__pete...@web.de> wrote:

 Subject: Re: [Tutor] Regular expression on python
 To: tutor@python.org
 Date: Tuesday, April 14, 2015, 4:37 PM
 
 Steven D'Aprano wrote:
 
 > On Tue, Apr 14, 2015 at 10:00:47AM +0200, Peter Otten
 wrote:
 >> Steven D'Aprano wrote:
 > 
 >> > I swear that Perl has been a blight on an
 entire generation of
 >> > programmers. All they know is regular
 expressions, so they turn every
 >> > data processing problem into a regular
 expression. Or at least they
 >> > *try* to. As you have learned, regular
 expressions are hard to read,
 >> > hard to write, and hard to get correct.
 >> > 
 >> > Let's write some Python code instead.
 > [...]
 > 
 >> The tempter took posession of me and dictated:
 >> 
 >> >>> pprint.pprint(
 >> ... [(k, int(v)) for k, v in
 >> ...
 re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall(line)])
 >> [('Input Read Pairs', 2127436),
 >>  ('Both Surviving', 1795091),
 >>  ('Forward Only Surviving', 17315),
 >>  ('Reverse Only Surviving', 6413),
 >>  ('Dropped', 308617)]
 > 
 > Nicely done :-)
 > 


Yes, nice, but why do you use 
re.compile(regex).findall(line) 
and not
re.findall(regex, line)

I know what re.compile is for. I often use it outside a loop and then actually 
use the compiled regex inside a loop, I just haven't see the way you use it 
before.



 > I didn't say that it *couldn't* be done with a regex. 
 
 I didn't claim that.
 
 > Only that it is
 > harder to read, write, etc. Regexes are good tools, but
 they aren't the
 > only tool and as a beginner, which would you rather
 debug? The extract()
 > function I wrote, or
 r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*" ?
 
 I know a rhetorical question when I see one ;)
 
 > Oh, and for the record, your solution is roughly 4-5
 times faster than
 > the extract() function on my computer. 
 
 I wouldn't be bothered by that. See below if you are.
 
 > If I knew the requirements were
 > not likely to change (that is, the maintenance burden
 was likely to be
 > low), I'd be quite happy to use your regex solution in
 production code,
 > although I would probably want to write it out in
 verbose mode just in
 > case the requirements did change:
 > 
 > 
 > r"""(?x)    (?# verbose mode)

personally, I prefer to be verbose about being verbose, ie use the re.VERBOSE 
flag. But perhaps that's just a matter of taste. Are there any use cases when 
the ?iLmsux operators are clearly a better choice than the equivalent flag? For 
me, the mental burden of a regex is big enough already without these operators. 


 >     (.+?):  (?# capture one or
 more character, followed by a colon)
 >     \s+     (?#
 one or more whitespace)
 >     (\d+)   (?#
 capture one or more digits)
 >     (?:     (?#
 don't capture ... )
 >       \s+   
    (?# one or more whitespace)
 >   
    \(.*?\)   (?# anything
 inside round brackets)
 >       )?     
   (?# ... and optional)
 >     \s*     (?#
 ignore trailing spaces)
 >     """


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-15 Thread Peter Otten
Alan Gauld wrote:

> On 15/04/15 02:02, Steven D'Aprano wrote:
>>> New one on me. Where does one find out about verbose mode?
>>> I don't see it in the re docs?
>>>
> 
>> or embed the flag in the pattern. The flags that I know of are:
>>
>> (?x) re.X re.VERBOSE
>>
>> The flag can appear anywhere in the pattern and applies to the whole
>> pattern, but it is good practice to put them at the front, and in the
>> future it may be an error to put the flags elsewhere.
> 
> I've always applied flags as separate params at the end of the
> function call. I've never seen (or noticed?) the embedded form,
> and don't see it described in the docs anywhere (although it
> probably is). 

Quoting :

"""
(?aiLmsux)
(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The 
group matches the empty string; the letters set the corresponding flags: 
re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), 
re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the 
entire regular expression. (The flags are described in Module Contents.) 
This is useful if you wish to include the flags as part of the regular 
expression, instead of passing a flag argument to the re.compile() function.

Note that the (?x) flag changes how the expression is parsed. It should be 
used first in the expression string, or after one or more whitespace 
characters. If there are non-whitespace characters before the flag, the 
results are undefined.
"""

> But the re module descriptions of the flags only goive the
> re.X/re.VERBOSE options, no mention of the embedded form.
> Maybe you are just supposed to infer the (?x) form from the re.X...
> 
> However, that still doesn't explain the difference in your comment
> syntax.
> 
> The docs say the verbose syntax looks like:
> 
> a = re.compile(r"""\d +  # the integral part
> \.# the decimal point
> \d *  # some fractional digits""", re.X)
> 
> Whereas your syntax is like:
> 
> a = re.compile(r"""(?x)  (?# turn on verbose mode)
> \d +  (?# the integral part)
> \.(?# the decimal point)
> \d *  (?# some fractional digits)""")
> 
> Again, where is that described?

"""
(?#...)
A comment; the contents of the parentheses are simply ignored.
"""

Let's try it out:

>>> re.compile("\d+(?# sequence of digits)").findall("alpha 123 beta 456")
['123', '456']
>>> re.compile("\d+# sequence of digits").findall("alpha 123 beta 456")
[]
>>> re.compile("\d+# sequence of digits", re.VERBOSE).findall("alpha 123 
beta 456")
['123', '456']

So (?#...)-style comments work in non-verbose mode, too, and Steven is 
wearing belt and braces (almost, the verbose flag is still necessary to 
ignore the extra whitespace).

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-15 Thread Alan Gauld

On 15/04/15 02:02, Steven D'Aprano wrote:

New one on me. Where does one find out about verbose mode?
I don't see it in the re docs?




or embed the flag in the pattern. The flags that I know of are:

(?x) re.X re.VERBOSE

The flag can appear anywhere in the pattern and applies to the whole
pattern, but it is good practice to put them at the front, and in the
future it may be an error to put the flags elsewhere.


I've always applied flags as separate params at the end of the
function call. I've never seen (or noticed?) the embedded form,
and don't see it described in the docs anywhere (although it
probably is). But the re module descriptions of the flags only goive the 
re.X/re.VERBOSE options, no mention of the embedded form.

Maybe you are just supposed to infer the (?x) form from the re.X...

However, that still doesn't explain the difference in your comment
syntax.

The docs say the verbose syntax looks like:

a = re.compile(r"""\d +  # the integral part
   \.# the decimal point
   \d *  # some fractional digits""", re.X)

Whereas your syntax is like:

a = re.compile(r"""(?x)  (?# turn on verbose mode)
   \d +  (?# the integral part)
   \.(?# the decimal point)
   \d *  (?# some fractional digits)""")

Again, where is that described?

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-14 Thread Alex Kleider

On 2015-04-14 16:49, Alan Gauld wrote:


New one on me. Where does one find out about verbose mode?
I don't see it in the re docs?


This is where I go whenever I find myself having to (re)learn the 
details of regex:

https://docs.python.org/3/howto/regex.html

I believe a '2' can be substituted for the '3' but I've not found any 
difference between the two.


(I submit this not so much for Alan (tutor) as for those like me who are 
learning.)


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-14 Thread Steven D'Aprano
On Wed, Apr 15, 2015 at 12:49:26AM +0100, Alan Gauld wrote:

> New one on me. Where does one find out about verbose mode?
> I don't see it in the re docs?
> 
> I see an re.X flag but while it seems to be similar in purpose
> yet it is different to your style above (no parens for example)?

I presume it is documented in the main docs, but I actually found this 
in the "Python Pocket Reference" by Mark Lutz :-)

All of the regex flags have three forms:

- a numeric flag with a long name;
- the same numeric flag with a short name;
- a regular expression pattern.

So you can either do:

re.compile(pattern, flags)

or embed the flag in the pattern. The flags that I know of are:

(?i) re.I re.IGNORECASE
(?L) re.L re.LOCALE
(?M) re.M re.MULTILINE
(?s) re.S re.DOTALL
(?x) re.X re.VERBOSE

The flag can appear anywhere in the pattern and applies to the whole 
pattern, but it is good practice to put them at the front, and in the 
future it may be an error to put the flags elsewhere.

When provided as a separate argument, you can combine flags like this:

re.I|re.X


-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-14 Thread Mark Lawrence

On 15/04/2015 00:49, Alan Gauld wrote:

On 14/04/15 13:21, Steven D'Aprano wrote:


although I would probably want to write it out in verbose mode just in
case the requirements did change:


r"""(?x)(?# verbose mode)
 (.+?):  (?# capture one or more character, followed by a colon)
 \s+ (?# one or more whitespace)
 (\d+)   (?# capture one or more digits)
 (?: (?# don't capture ... )
   \s+   (?# one or more whitespace)
   \(.*?\)   (?# anything inside round brackets)
   )?(?# ... and optional)
 \s* (?# ignore trailing spaces)
 """

That's a hint to people learning regular expressions: start in verbose
mode, then "de-verbose" it if you must.


New one on me. Where does one find out about verbose mode?
I don't see it in the re docs?

I see an re.X flag but while it seems to be similar in purpose
yet it is different to your style above (no parens for example)?



https://docs.python.org/3/library/re.html#module-contents re.X and 
re.VERBOSE are together.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-14 Thread Alan Gauld

On 14/04/15 13:21, Steven D'Aprano wrote:


although I would probably want to write it out in verbose mode just in
case the requirements did change:


r"""(?x)(?# verbose mode)
 (.+?):  (?# capture one or more character, followed by a colon)
 \s+ (?# one or more whitespace)
 (\d+)   (?# capture one or more digits)
 (?: (?# don't capture ... )
   \s+   (?# one or more whitespace)
   \(.*?\)   (?# anything inside round brackets)
   )?(?# ... and optional)
 \s* (?# ignore trailing spaces)
 """

That's a hint to people learning regular expressions: start in verbose
mode, then "de-verbose" it if you must.


New one on me. Where does one find out about verbose mode?
I don't see it in the re docs?

I see an re.X flag but while it seems to be similar in purpose
yet it is different to your style above (no parens for example)?

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-14 Thread Peter Otten
Steven D'Aprano wrote:

> On Tue, Apr 14, 2015 at 10:00:47AM +0200, Peter Otten wrote:
>> Steven D'Aprano wrote:
> 
>> > I swear that Perl has been a blight on an entire generation of
>> > programmers. All they know is regular expressions, so they turn every
>> > data processing problem into a regular expression. Or at least they
>> > *try* to. As you have learned, regular expressions are hard to read,
>> > hard to write, and hard to get correct.
>> > 
>> > Let's write some Python code instead.
> [...]
> 
>> The tempter took posession of me and dictated:
>> 
>> >>> pprint.pprint(
>> ... [(k, int(v)) for k, v in
>> ... re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall(line)])
>> [('Input Read Pairs', 2127436),
>>  ('Both Surviving', 1795091),
>>  ('Forward Only Surviving', 17315),
>>  ('Reverse Only Surviving', 6413),
>>  ('Dropped', 308617)]
> 
> Nicely done :-)
> 
> I didn't say that it *couldn't* be done with a regex. 

I didn't claim that.

> Only that it is
> harder to read, write, etc. Regexes are good tools, but they aren't the
> only tool and as a beginner, which would you rather debug? The extract()
> function I wrote, or r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*" ?

I know a rhetorical question when I see one ;)

> Oh, and for the record, your solution is roughly 4-5 times faster than
> the extract() function on my computer. 

I wouldn't be bothered by that. See below if you are.

> If I knew the requirements were
> not likely to change (that is, the maintenance burden was likely to be
> low), I'd be quite happy to use your regex solution in production code,
> although I would probably want to write it out in verbose mode just in
> case the requirements did change:
> 
> 
> r"""(?x)(?# verbose mode)
> (.+?):  (?# capture one or more character, followed by a colon)
> \s+ (?# one or more whitespace)
> (\d+)   (?# capture one or more digits)
> (?: (?# don't capture ... )
>   \s+   (?# one or more whitespace)
>   \(.*?\)   (?# anything inside round brackets)
>   )?(?# ... and optional)
> \s* (?# ignore trailing spaces)
> """
> 
> 
> That's a hint to people learning regular expressions: start in verbose
> mode, then "de-verbose" it if you must.

Regarding the speed of the Python approach: you can easily improve that by 
relatively minor modifications. The most important one is to avoid the 
exception:

$ python parse_jarod.py
$ python3 parse_jarod.py

The regex for reference:

$ python3 -m timeit -s "from parse_jarod import extract_re as extract" 
"extract()"
10 loops, best of 3: 18.6 usec per loop

Steven's original extract():

$ python3 -m timeit -s "from parse_jarod import extract_daprano as extract" 
"extract()"
1 loops, best of 3: 92.6 usec per loop

Avoid raising ValueError (This won't work with negative numbers):

$ python3 -m timeit -s "from parse_jarod import extract_daprano2 as extract" 
"extract()"
1 loops, best of 3: 44.3 usec per loop

Collapse the two loops into one, thus avoiding the accumulator list and the 
isinstance() checks:

$ python3 -m timeit -s "from parse_jarod import extract_daprano3 as extract" 
"extract()"
1 loops, best of 3: 29.6 usec per loop

Ok, this is still slower than the regex, a result that I cannot accept. 
Let's try again:

$ python3 -m timeit -s "from parse_jarod import extract_py as extract" 
"extract()"
10 loops, best of 3: 15.1 usec per loop

Heureka? The "winning" code is brittle and probably as hard to understand as 
the regex. You can judge for yourself if you're interested:

$ cat parse_jarod.py   
import re

line = ("Input Read Pairs: 2127436 "
"Both Surviving: 1795091 (84.38%) "
"Forward Only Surviving: 17315 (0.81%) "
"Reverse Only Surviving: 6413 (0.30%) "
"Dropped: 308617 (14.51%)")
_findall = re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall


def extract_daprano(line=line):
# Extract key:number values from the string.
line = line.strip()  # Remove leading and trailing whitespace.
words = line.split()
accumulator = []  # Collect parts of the string we care about.
for word in words:
if word.startswith('(') and word.endswith('%)'):
# We don't care about percentages in brackets.
continue
try:
n = int(word)
except ValueError:
accumulator.append(word)
else:
accumulator.append(n)
# Now accumulator will be a list of strings and ints:
# e.g. ['Input', 'Read', 'Pairs:', 1234, 'Both', 'Surviving:', 1000]
# Collect consecutive strings as the key, int to be the value.
results = {}
keyparts = []
for item in accumulator:
if isinstance(item, int):
key = ' '.join(keyparts)
keyparts = []
if key.endswith(':'):
key = key[:-1]
results[key] = item
else:
keyparts.append(item)
# When we have finished process

Re: [Tutor] Regular expression on python

2015-04-14 Thread Steven D'Aprano
On Tue, Apr 14, 2015 at 10:00:47AM +0200, Peter Otten wrote:
> Steven D'Aprano wrote:

> > I swear that Perl has been a blight on an entire generation of
> > programmers. All they know is regular expressions, so they turn every
> > data processing problem into a regular expression. Or at least they
> > *try* to. As you have learned, regular expressions are hard to read,
> > hard to write, and hard to get correct.
> > 
> > Let's write some Python code instead.
[...]

> The tempter took posession of me and dictated:
> 
> >>> pprint.pprint(
> ... [(k, int(v)) for k, v in
> ... re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall(line)])
> [('Input Read Pairs', 2127436),
>  ('Both Surviving', 1795091),
>  ('Forward Only Surviving', 17315),
>  ('Reverse Only Surviving', 6413),
>  ('Dropped', 308617)]

Nicely done :-)

I didn't say that it *couldn't* be done with a regex. Only that it is 
harder to read, write, etc. Regexes are good tools, but they aren't the 
only tool and as a beginner, which would you rather debug? The extract() 
function I wrote, or r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*" ?

Oh, and for the record, your solution is roughly 4-5 times faster than 
the extract() function on my computer. If I knew the requirements were 
not likely to change (that is, the maintenance burden was likely to be 
low), I'd be quite happy to use your regex solution in production code, 
although I would probably want to write it out in verbose mode just in 
case the requirements did change:


r"""(?x)(?# verbose mode)
(.+?):  (?# capture one or more character, followed by a colon)
\s+ (?# one or more whitespace)
(\d+)   (?# capture one or more digits)
(?: (?# don't capture ... )
  \s+   (?# one or more whitespace)
  \(.*?\)   (?# anything inside round brackets)
  )?(?# ... and optional)
\s* (?# ignore trailing spaces)
"""


That's a hint to people learning regular expressions: start in verbose 
mode, then "de-verbose" it if you must.


-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-14 Thread Peter Otten
Steven D'Aprano wrote:

> On Mon, Apr 13, 2015 at 02:29:07PM +0200, jarod...@libero.it wrote:
>> Dear all.
>> I would like to extract from some file some data.
>> The line I'm interested is this:
>> 
>> Input Read Pairs: 2127436 Both Surviving: 1795091 (84.38%) Forward
>> Only Surviving: 17315 (0.81%) Reverse Only Surviving: 6413 (0.30%)
>> Dropped: 308617 (14.51%)
> 
> 
> Some people, when confronted with a problem, think "I know, I'll
> use regular expressions." Now they have two problems.
> -- Jamie Zawinski
> ‎
> I swear that Perl has been a blight on an entire generation of
> programmers. All they know is regular expressions, so they turn every
> data processing problem into a regular expression. Or at least they
> *try* to. As you have learned, regular expressions are hard to read,
> hard to write, and hard to get correct.
> 
> Let's write some Python code instead.
> 
> 
> def extract(line):
> # Extract key:number values from the string.
> line = line.strip()  # Remove leading and trailing whitespace.
> words = line.split()
> accumulator = []  # Collect parts of the string we care about.
> for word in words:
> if word.startswith('(') and word.endswith('%)'):
> # We don't care about percentages in brackets.
> continue
> try:
> n = int(word)
> except ValueError:
> accumulator.append(word)
> else:
> accumulator.append(n)
> # Now accumulator will be a list of strings and ints:
> # e.g. ['Input', 'Read', 'Pairs:', 1234, 'Both', 'Surviving:', 1000]
> # Collect consecutive strings as the key, int to be the value.
> results = {}
> keyparts = []
> for item in accumulator:
> if isinstance(item, int):
> key = ' '.join(keyparts)
> keyparts = []
> if key.endswith(':'):
> key = key[:-1]
> results[key] = item
> else:
> keyparts.append(item)
> # When we have finished processing, the keyparts list should be empty.
> if keyparts:
> extra = ' '.join(keyparts)
> print('Warning: found extra text at end of line "%s".' % extra)
> return results
> 
> 
> 
> Now let me test it:
> 
> py> line = ('Input Read Pairs: 2127436 Both Surviving: 1795091'
> ... ' (84.38%) Forward Only Surviving: 17315 (0.81%)'
> ... ' Reverse Only Surviving: 6413 (0.30%) Dropped:'
> ... ' 308617 (14.51%)\n')
> py>
> py> print(line)
> Input Read Pairs: 2127436 Both Surviving: 1795091 (84.38%) Forward
> Only Surviving: 17315 (0.81%) Reverse Only Surviving: 6413 (0.30%)
> Dropped: 308617 (14.51%)
> 
> py> extract(line)
> {'Dropped': 308617, 'Both Surviving': 1795091, 'Reverse Only Surviving':
> 6413, 'Forward Only Surviving': 17315, 'Input Read Pairs': 2127436}
> 
> 
> Remember that dicts are unordered. All the data is there, but in
> arbitrary order. Now that you have a nice function to extract the data,
> you can apply it to the lines of a data file in a simple loop:
> 
> with open("255.trim.log") as p:
> for line in p:
> if line.startswith("Input "):
> d = extract(line)
> print(d)  # or process it somehow

The tempter took posession of me and dictated:

>>> pprint.pprint(
... [(k, int(v)) for k, v in
... re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall(line)])
[('Input Read Pairs', 2127436),
 ('Both Surviving', 1795091),
 ('Forward Only Surviving', 17315),
 ('Reverse Only Surviving', 6413),
 ('Dropped', 308617)]


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-13 Thread Steven D'Aprano
On Mon, Apr 13, 2015 at 02:29:07PM +0200, jarod...@libero.it wrote:
> Dear all.
> I would like to extract from some file some data.
> The line I'm interested is this:
> 
> Input Read Pairs: 2127436 Both Surviving: 1795091 (84.38%) Forward 
> Only Surviving: 17315 (0.81%) Reverse Only Surviving: 6413 (0.30%) 
> Dropped: 308617 (14.51%)


Some people, when confronted with a problem, think "I know, I'll 
use regular expressions." Now they have two problems.
-- Jamie Zawinski
‎
I swear that Perl has been a blight on an entire generation of 
programmers. All they know is regular expressions, so they turn every 
data processing problem into a regular expression. Or at least they 
*try* to. As you have learned, regular expressions are hard to read, 
hard to write, and hard to get correct.

Let's write some Python code instead.


def extract(line):
# Extract key:number values from the string.
line = line.strip()  # Remove leading and trailing whitespace.
words = line.split()
accumulator = []  # Collect parts of the string we care about.
for word in words:
if word.startswith('(') and word.endswith('%)'):
# We don't care about percentages in brackets.
continue
try:
n = int(word)
except ValueError:
accumulator.append(word)
else:
accumulator.append(n)
# Now accumulator will be a list of strings and ints:
# e.g. ['Input', 'Read', 'Pairs:', 1234, 'Both', 'Surviving:', 1000]
# Collect consecutive strings as the key, int to be the value.
results = {}
keyparts = []
for item in accumulator:
if isinstance(item, int):
key = ' '.join(keyparts)
keyparts = []
if key.endswith(':'):
key = key[:-1]
results[key] = item
else:
keyparts.append(item)
# When we have finished processing, the keyparts list should be empty.
if keyparts:
extra = ' '.join(keyparts)
print('Warning: found extra text at end of line "%s".' % extra)
return results



Now let me test it:

py> line = ('Input Read Pairs: 2127436 Both Surviving: 1795091'
... ' (84.38%) Forward Only Surviving: 17315 (0.81%)'
... ' Reverse Only Surviving: 6413 (0.30%) Dropped:'
... ' 308617 (14.51%)\n')
py>
py> print(line)
Input Read Pairs: 2127436 Both Surviving: 1795091 (84.38%) Forward 
Only Surviving: 17315 (0.81%) Reverse Only Surviving: 6413 (0.30%) 
Dropped: 308617 (14.51%)

py> extract(line)
{'Dropped': 308617, 'Both Surviving': 1795091, 'Reverse Only Surviving': 
6413, 'Forward Only Surviving': 17315, 'Input Read Pairs': 2127436}


Remember that dicts are unordered. All the data is there, but in 
arbitrary order. Now that you have a nice function to extract the data, 
you can apply it to the lines of a data file in a simple loop:

with open("255.trim.log") as p:
for line in p:
if line.startswith("Input "):
d = extract(line)
print(d)  # or process it somehow



-- 
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-13 Thread Alan Gauld

On 13/04/15 19:42, Alan Gauld wrote:


 if lines.startswith("Input"):
 tp = lines.split("\t")
 print re.findall("Input\d",str(tp))


Input is not followed by a number. You need a more powerful pattern.
Which is why I recommend trying to solve it as far as possible
without using regex.


I also just realised that you call split there then take the str() of 
the result. That means you are searching the string representation

of a list, which doesn't seem to make much sense?


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression on python

2015-04-13 Thread Alan Gauld

On 13/04/15 13:29, jarod...@libero.it wrote:


Input Read Pairs: 2127436 Both Surviving: 1795091 (84.38%) Forward Only 
Surviving: 17315 (0.81%) Reverse Only Surviving: 6413 (0.30%) Dropped: 308617 
(14.51%)


Its not clear where the tabs are in this line.
But if they are after the numbers, like so:

Input Read Pairs: 2127436 \t
Both Surviving: 1795091 (84.38%) \t
Forward Only Surviving: 17315 (0.81%) \t
Reverse Only Surviving: 6413 (0.30%) \t
Dropped: 308617 (14.51%)

Then you may not need to use regular expressions.
Simply split by tab then split by :
And if the 'number' contains parens split again by space


  with open("255.trim.log","r") as p:
 for i in p:
 lines= i.strip("\t")


lines is a bad name here since its only a single line. In fact I'd lose 
the 'i' variable and just use


for line in p:


 if lines.startswith("Input"):
 tp = lines.split("\t")
 print re.findall("Input\d",str(tp))


Input is not followed by a number. You need a more powerful pattern.
Which is why I recommend trying to solve it as far as possible
without using regex.


So I started to find ":" from the row:
  with open("255.trim.log","r") as p:
 for i in p:
 lines= i.strip("\t")
 if lines.startswith("Input"):
 tp = lines.split("\t")
 print re.findall(":",str(tp[0]))


Does finding the colons really help much?
Or at least, does it help any more than splitting by colon would?


And I'm able to find, but when I try to take the number using \d not work.
Someone can explain why?


Because your pattern doesn't match the string.

HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular expression on python

2015-04-13 Thread jarod...@libero.it
Dear all.
I would like to extract from some file some data.
The line I'm interested is this:

Input Read Pairs: 2127436 Both Surviving: 1795091 (84.38%) Forward Only 
Surviving: 17315 (0.81%) Reverse Only Surviving: 6413 (0.30%) Dropped: 308617 
(14.51%)



 with open("255.trim.log","r") as p:
for i in p:
lines= i.strip("\t")
if lines.startswith("Input"):
tp = lines.split("\t")
print re.findall("Input\d",str(tp))

So I started to find ":" from the row:
 with open("255.trim.log","r") as p:
for i in p:
lines= i.strip("\t")
if lines.startswith("Input"):
tp = lines.split("\t")
print re.findall(":",str(tp[0]))

And I'm able to find, but when I try to take the number using \d not work. 
Someone can explain why?
How can extract the numbers from this row.?
thanks so much=

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression

2014-09-23 Thread Steven D'Aprano
On Tue, Sep 23, 2014 at 11:40:25AM +0200, jarod...@libero.it wrote:
> Hi there!!
> 
> I need to read this file:
> 
> pippo.count :
>  10566 ZXDC
>2900 ZYG11A
>7909 ZYG11B
>3584 ZYX
>9614 ZZEF1
>   17704 ZZZ3

> How can extract only the number and the work in array? Thanks for any help

There is no need for the nuclear-powered bulldozer of regular 
expressions just to crack this peanut.

with open('pippo.count') as f:
for line in f:
num, word = line.split()
num = int(num)
print num, word


Or, if you prefer the old-fashioned way:

f = open('pippo.count')
for line in f:
num, word = line.split()
num = int(num)
print num, word
f.close()


but the first way with the with-statement is better.


-- 
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular expression

2014-09-23 Thread jarod...@libero.it
Hi there!!

I need to read this file:

pippo.count :
 10566 ZXDC
   2900 ZYG11A
   7909 ZYG11B
   3584 ZYX
   9614 ZZEF1
  17704 ZZZ3


This file present a use space on the begin then they have a number and the a 
word.
 p =re.compile("\s+\d+")
with open("pippo.count") as p:
for i in p:
lines =i.rstrip("\n").split("\t")
print t.findall(str(lines))
out:

['994']
['  10428']
['   1810']
['   4880']
['   8905']



How can extract only the number and the work in array? Thanks for any help
jarod
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression - I

2014-02-18 Thread Santosh Kumar
Thank you all. I got it. :)
I need to read more between lines .


On Wed, Feb 19, 2014 at 4:25 AM, spir  wrote:

> On 02/18/2014 08:39 PM, Zachary Ware wrote:
>
>> Hi Santosh,
>>
>> On Tue, Feb 18, 2014 at 9:52 AM, Santosh Kumar 
>> wrote:
>>
>>>
>>> Hi All,
>>>
>>> If you notice the below example, case I is working as expected.
>>>
>>> Case I:
>>> In [41]: string = "test"
>>>
>>> In [42]: re.match('',string).group()
>>> Out[42]: ''
>>>
>>> But why is the raw string 'r' not working as expected ?
>>>
>>> Case II:
>>>
>>> In [43]: re.match(r'',string).group()
>>> 
>>> ---
>>> AttributeErrorTraceback (most recent call
>>> last)
>>>  in ()
>>> > 1 re.match(r'',string).group()
>>>
>>> AttributeError: 'NoneType' object has no attribute 'group'
>>>
>>> In [44]: re.match(r'',string)
>>>
>>
>> It is working as expected, but you're not expecting the right thing
>> ;).  Raw strings don't escape anything, they just prevent backslash
>> escapes from expanding.  Case I works because "\*" is not a special
>> character to Python (like "\n" or "\t"), so it leaves the backslash in
>> place:
>>
>> >>> ''
>> ''
>>
>> The equivalent raw string is exactly the same in this case:
>>
>> >>> r''
>> ''
>>
>> The raw string you provided doesn't have the backslash, and Python
>> will not add backslashes for you:
>>
>> >>> r''
>> ''
>>
>> The purpose of raw strings is to prevent Python from recognizing
>> backslash escapes.  For example:
>>
>> >>> path = 'C:\temp\new\dir' # Windows paths are notorious...
>> >>> path   # it looks mostly ok... [1]
>> 'C:\temp\new\\dir'
>> >>> print(path)  # until you try to use it
>> C:  emp
>> ew\dir
>> >>> path = r'C:\temp\new\dir'  # now try a raw string
>> >>> path   # Now it looks like it's stuffed full of backslashes [2]
>> 'C:\\temp\\new\\dir'
>> >>> print(path)  # but it works properly!
>> C:\temp\new\dir
>>
>> [1] Count the backslashes in the repr of 'path'.  Notice that there is
>> only one before the 't' and the 'n', but two before the 'd'.  "\d" is
>> not a special character, so Python didn't do anything to it.  There
>> are two backslashes in the repr of "\d", because that's the only way
>> to distinguish a real backslash; the "\t" and "\n" are actually the
>> TAB and LINE FEED characters, as seen when printing 'path'.
>>
>> [2] Because they are all real backslashes now, so they have to be
>> shown escaped ("\\") in the repr.
>>
>> In your regex, since you're looking for, literally, "", you'll
>> need to backslash escape the "*" since it is a special character *in
>> regular expressions*.  To avoid having to keep track of what's special
>> to Python as well as regular expressions, you'll need to make sure the
>> backslash itself is escaped, to make sure the regex sees "\*", and the
>> easiest way to do that is a raw string:
>>
>> >>> re.match(r'', string).group()
>> ''
>>
>> I hope this makes some amount of sense; I've had to write it up
>> piecemeal and will never get it posted at all if I don't go ahead and
>> post :).  If you still have questions, I'm happy to try again.  You
>> may also want to have a look at the Regex HowTo in the Python docs:
>> http://docs.python.org/3/howto/regex.html
>>
>
> In addition to all this:
> * You may confuse raw strings with "regex escaping" (a tool func that
> escapes special regex characters for you).
> * For simplicity, always use raw strings for regex formats (as in your
> second example); this does not prevent you to escape special characters,
> but you only have to do it once!
>
>
> d
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>



-- 
D. Santosh Kumar
RHCE | SCSA
+91-9703206361


Every task has a unpleasant side .. But you must focus on the end result
you are producing.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression - I

2014-02-18 Thread spir

On 02/18/2014 08:39 PM, Zachary Ware wrote:

Hi Santosh,

On Tue, Feb 18, 2014 at 9:52 AM, Santosh Kumar  wrote:


Hi All,

If you notice the below example, case I is working as expected.

Case I:
In [41]: string = "test"

In [42]: re.match('',string).group()
Out[42]: ''

But why is the raw string 'r' not working as expected ?

Case II:

In [43]: re.match(r'',string).group()
---
AttributeErrorTraceback (most recent call last)
 in ()
> 1 re.match(r'',string).group()

AttributeError: 'NoneType' object has no attribute 'group'

In [44]: re.match(r'',string)


It is working as expected, but you're not expecting the right thing
;).  Raw strings don't escape anything, they just prevent backslash
escapes from expanding.  Case I works because "\*" is not a special
character to Python (like "\n" or "\t"), so it leaves the backslash in
place:

>>> ''
''

The equivalent raw string is exactly the same in this case:

>>> r''
''

The raw string you provided doesn't have the backslash, and Python
will not add backslashes for you:

>>> r''
''

The purpose of raw strings is to prevent Python from recognizing
backslash escapes.  For example:

>>> path = 'C:\temp\new\dir' # Windows paths are notorious...
>>> path   # it looks mostly ok... [1]
'C:\temp\new\\dir'
>>> print(path)  # until you try to use it
C:  emp
ew\dir
>>> path = r'C:\temp\new\dir'  # now try a raw string
>>> path   # Now it looks like it's stuffed full of backslashes [2]
'C:\\temp\\new\\dir'
>>> print(path)  # but it works properly!
C:\temp\new\dir

[1] Count the backslashes in the repr of 'path'.  Notice that there is
only one before the 't' and the 'n', but two before the 'd'.  "\d" is
not a special character, so Python didn't do anything to it.  There
are two backslashes in the repr of "\d", because that's the only way
to distinguish a real backslash; the "\t" and "\n" are actually the
TAB and LINE FEED characters, as seen when printing 'path'.

[2] Because they are all real backslashes now, so they have to be
shown escaped ("\\") in the repr.

In your regex, since you're looking for, literally, "", you'll
need to backslash escape the "*" since it is a special character *in
regular expressions*.  To avoid having to keep track of what's special
to Python as well as regular expressions, you'll need to make sure the
backslash itself is escaped, to make sure the regex sees "\*", and the
easiest way to do that is a raw string:

>>> re.match(r'', string).group()
''

I hope this makes some amount of sense; I've had to write it up
piecemeal and will never get it posted at all if I don't go ahead and
post :).  If you still have questions, I'm happy to try again.  You
may also want to have a look at the Regex HowTo in the Python docs:
http://docs.python.org/3/howto/regex.html


In addition to all this:
* You may confuse raw strings with "regex escaping" (a tool func that escapes 
special regex characters for you).
* For simplicity, always use raw strings for regex formats (as in your second 
example); this does not prevent you to escape special characters, but you only 
have to do it once!


d
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression - I

2014-02-18 Thread Emile van Sebille

On 2/18/2014 11:42 AM, Mark Lawrence wrote:

On 18/02/2014 18:03, Steve Willoughby wrote:

Because the regular expression  means “match an angle-bracket





Please do not top post on this list.


Appropriate trimming is also appreciated.

Emile




___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression - I

2014-02-18 Thread Albert-Jan Roskam


_
> From: Steve Willoughby 
>To: Santosh Kumar  
>Cc: python mail list  
>Sent: Tuesday, February 18, 2014 7:03 PM
>Subject: Re: [Tutor] Regular expression - I
> 
>
>Because the regular expression  means “match an angle-bracket character, 
>zero or more H characters, followed by a close angle-bracket character” and 
>your string does not match that pattern.
>
>This is why it’s best to check that the match succeeded before going ahead to 
>call group() on the result (since in this case there is no result).
>
>
>On 18-Feb-2014, at 09:52, Santosh Kumar  wrote:


You also might want to consider making it a non-greedy match. The explanation 
http://docs.python.org/2/howto/regex.html covers an example almost identical to 
yours:

Greedy versus Non-Greedy
When repeating a regular expression, as in a*, the resulting action is to
consume as much of the pattern as possible.  This fact often bites you when
you’re trying to match a pair of balanced delimiters, such as the angle brackets
surrounding an HTML tag.  The naive pattern for matching a single HTML tag
doesn’t work because of the greedy nature of .*.
>>>
>>> s = 'Title' >>> len(s) 32 >>> print 
>>> re.match('<.*>', s).span() (0, 32) >>> print re.match('<.*>', s).group() 
>>> Title 
The RE matches the '<' in , and the .* consumes the rest of
the string.  There’s still more left in the RE, though, and the > can’t
match at the end of the string, so the regular expression engine has to
backtrack character by character until it finds a match for the >.   The
final match extends from the '<' in  to the '>' in , which isn’t 
what you want.
In this case, the solution is to use the non-greedy qualifiers *?, +?, ??, or 
{m,n}?, which match as little text as possible.  In the above
example, the '>' is tried immediately after the first '<' matches, and
when it fails, the engine advances a character at a time, retrying the '>' at 
every step.  This produces just the right result:
>>>
>>> print re.match('<.*?>', s).group()  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression - I

2014-02-18 Thread S Tareq
does any one know how to use 2to3 program to convert 2.7 coding 3.X please i 
need help sorry 



On Tuesday, 18 February 2014, 19:50, Zachary Ware 
 wrote:
 
On Tue, Feb 18, 2014 at 11:39 AM, Zachary Ware
 wrote:

>    >>> ''
>    ''
>
> The equivalent raw string is exactly the same in this case:
>
>    >>> r''
>    ''

Oops, I mistyped both of these.  The repr should be '' in both cases.

Sorry for the confusion!

-- 
Zach
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression - I

2014-02-18 Thread Zachary Ware
On Tue, Feb 18, 2014 at 11:39 AM, Zachary Ware
 wrote:

>>>> ''
>''
>
> The equivalent raw string is exactly the same in this case:
>
>>>> r''
>''

Oops, I mistyped both of these.  The repr should be '' in both cases.

Sorry for the confusion!

-- 
Zach
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression - I

2014-02-18 Thread Mark Lawrence

On 18/02/2014 18:03, Steve Willoughby wrote:

Because the regular expression  means “match an angle-bracket character, 
zero or more H characters, followed by a close angle-bracket character” and your 
string does not match that pattern.

This is why it’s best to check that the match succeeded before going ahead to 
call group() on the result (since in this case there is no result).


On 18-Feb-2014, at 09:52, Santosh Kumar  wrote:



Hi All,

If you notice the below example, case I is working as expected.

Case I:
In [41]: string = "test"

In [42]: re.match('',string).group()
Out[42]: ''

But why is the raw string 'r' not working as expected ?

Case II:

In [43]: re.match(r'',string).group()
---
AttributeErrorTraceback (most recent call last)
 in ()
> 1 re.match(r'',string).group()

AttributeError: 'NoneType' object has no attribute 'group'

In [44]: re.match(r'',string)



Thanks,
santosh



Please do not top post on this list.

--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression - I

2014-02-18 Thread Zachary Ware
Hi Santosh,

On Tue, Feb 18, 2014 at 9:52 AM, Santosh Kumar  wrote:
>
> Hi All,
>
> If you notice the below example, case I is working as expected.
>
> Case I:
> In [41]: string = "test"
>
> In [42]: re.match('',string).group()
> Out[42]: ''
>
> But why is the raw string 'r' not working as expected ?
>
> Case II:
>
> In [43]: re.match(r'',string).group()
> ---
> AttributeErrorTraceback (most recent call last)
>  in ()
> > 1 re.match(r'',string).group()
>
> AttributeError: 'NoneType' object has no attribute 'group'
>
> In [44]: re.match(r'',string)

It is working as expected, but you're not expecting the right thing
;).  Raw strings don't escape anything, they just prevent backslash
escapes from expanding.  Case I works because "\*" is not a special
character to Python (like "\n" or "\t"), so it leaves the backslash in
place:

   >>> ''
   ''

The equivalent raw string is exactly the same in this case:

   >>> r''
   ''

The raw string you provided doesn't have the backslash, and Python
will not add backslashes for you:

   >>> r''
   ''

The purpose of raw strings is to prevent Python from recognizing
backslash escapes.  For example:

   >>> path = 'C:\temp\new\dir' # Windows paths are notorious...
   >>> path   # it looks mostly ok... [1]
   'C:\temp\new\\dir'
   >>> print(path)  # until you try to use it
   C:  emp
   ew\dir
   >>> path = r'C:\temp\new\dir'  # now try a raw string
   >>> path   # Now it looks like it's stuffed full of backslashes [2]
   'C:\\temp\\new\\dir'
   >>> print(path)  # but it works properly!
   C:\temp\new\dir

[1] Count the backslashes in the repr of 'path'.  Notice that there is
only one before the 't' and the 'n', but two before the 'd'.  "\d" is
not a special character, so Python didn't do anything to it.  There
are two backslashes in the repr of "\d", because that's the only way
to distinguish a real backslash; the "\t" and "\n" are actually the
TAB and LINE FEED characters, as seen when printing 'path'.

[2] Because they are all real backslashes now, so they have to be
shown escaped ("\\") in the repr.

In your regex, since you're looking for, literally, "", you'll
need to backslash escape the "*" since it is a special character *in
regular expressions*.  To avoid having to keep track of what's special
to Python as well as regular expressions, you'll need to make sure the
backslash itself is escaped, to make sure the regex sees "\*", and the
easiest way to do that is a raw string:

   >>> re.match(r'', string).group()
   ''

I hope this makes some amount of sense; I've had to write it up
piecemeal and will never get it posted at all if I don't go ahead and
post :).  If you still have questions, I'm happy to try again.  You
may also want to have a look at the Regex HowTo in the Python docs:
http://docs.python.org/3/howto/regex.html

Hope this helps,

-- 
Zach
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression - I

2014-02-18 Thread Steve Willoughby
Because the regular expression  means “match an angle-bracket character, 
zero or more H characters, followed by a close angle-bracket character” and 
your string does not match that pattern.

This is why it’s best to check that the match succeeded before going ahead to 
call group() on the result (since in this case there is no result).


On 18-Feb-2014, at 09:52, Santosh Kumar  wrote:

> 
> Hi All,
> 
> If you notice the below example, case I is working as expected.
> 
> Case I:
> In [41]: string = "test"
> 
> In [42]: re.match('',string).group()
> Out[42]: ''
> 
> But why is the raw string 'r' not working as expected ?
> 
> Case II:
> 
> In [43]: re.match(r'',string).group()
> ---
> AttributeErrorTraceback (most recent call last)
>  in ()
> > 1 re.match(r'',string).group()
> 
> AttributeError: 'NoneType' object has no attribute 'group'
> 
> In [44]: re.match(r'',string)
> 
> 
> 
> Thanks,
> santosh
> 
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression - I

2014-02-18 Thread Steve Willoughby
The problem is not the use of the raw string, but rather the regular expression 
inside it.

In regular expressions, the * means that whatever appears before it may be 
repeated zero or more times.  So if you say H* that means zero or more H’s in a 
row.  I think you mean an H followed by any number of other characters which 
would be H.*  (the . matches any single character, so .* means zero or more of 
any characters).

On the other hand, H\* means to match an H followed by a literal asterisk 
character.

Does that help clarify why one matched and the other doesn’t?

steve

On 18-Feb-2014, at 10:09, Santosh Kumar  wrote:

> Steve,
> 
> i am trying to under r - raw string notation. Am i understanding it wrong.
> Rather than using "\", it says we can use the "r" option.
> 
> http://docs.python.org/2/library/re.html
> 
> Check the first paragraph for the above link.
> 
> Thanks,
> santosh
> 
> 
> 
> On Tue, Feb 18, 2014 at 11:33 PM, Steve Willoughby  wrote:
> Because the regular expression  means “match an angle-bracket character, 
> zero or more H characters, followed by a close angle-bracket character” and 
> your string does not match that pattern.
> 
> This is why it’s best to check that the match succeeded before going ahead to 
> call group() on the result (since in this case there is no result).
> 
> 
> On 18-Feb-2014, at 09:52, Santosh Kumar  wrote:
> 
> >
> > Hi All,
> >
> > If you notice the below example, case I is working as expected.
> >
> > Case I:
> > In [41]: string = "test"
> >
> > In [42]: re.match('',string).group()
> > Out[42]: ''
> >
> > But why is the raw string 'r' not working as expected ?
> >
> > Case II:
> >
> > In [43]: re.match(r'',string).group()
> > ---
> > AttributeErrorTraceback (most recent call last)
> >  in ()
> > > 1 re.match(r'',string).group()
> >
> > AttributeError: 'NoneType' object has no attribute 'group'
> >
> > In [44]: re.match(r'',string)
> >
> >
> >
> > Thanks,
> > santosh
> >
> > ___
> > Tutor maillist  -  Tutor@python.org
> > To unsubscribe or change subscription options:
> > https://mail.python.org/mailman/listinfo/tutor
> 
> 
> 
> 
> -- 
> D. Santosh Kumar
> RHCE | SCSA 
> +91-9703206361
> 
> 
> Every task has a unpleasant side .. But you must focus on the end result you 
> are producing.
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression - I

2014-02-18 Thread Santosh Kumar
Steve,

i am trying to under r - raw string notation. Am i understanding it wrong.
Rather than using "\", it says we can use the "r" option.

http://docs.python.org/2/library/re.html

Check the first paragraph for the above link.

Thanks,
santosh



On Tue, Feb 18, 2014 at 11:33 PM, Steve Willoughby wrote:

> Because the regular expression  means “match an angle-bracket
> character, zero or more H characters, followed by a close angle-bracket
> character” and your string does not match that pattern.
>
> This is why it’s best to check that the match succeeded before going ahead
> to call group() on the result (since in this case there is no result).
>
>
> On 18-Feb-2014, at 09:52, Santosh Kumar  wrote:
>
> >
> > Hi All,
> >
> > If you notice the below example, case I is working as expected.
> >
> > Case I:
> > In [41]: string = "test"
> >
> > In [42]: re.match('',string).group()
> > Out[42]: ''
> >
> > But why is the raw string 'r' not working as expected ?
> >
> > Case II:
> >
> > In [43]: re.match(r'',string).group()
> >
> ---
> > AttributeErrorTraceback (most recent call
> last)
> >  in ()
> > > 1 re.match(r'',string).group()
> >
> > AttributeError: 'NoneType' object has no attribute 'group'
> >
> > In [44]: re.match(r'',string)
> >
> >
> >
> > Thanks,
> > santosh
> >
> > ___
> > Tutor maillist  -  Tutor@python.org
> > To unsubscribe or change subscription options:
> > https://mail.python.org/mailman/listinfo/tutor
>
>


-- 
D. Santosh Kumar
RHCE | SCSA
+91-9703206361


Every task has a unpleasant side .. But you must focus on the end result
you are producing.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular expression - I

2014-02-18 Thread Santosh Kumar
Hi All,

If you notice the below example, case I is working as expected.

Case I:
In [41]: string = "test"

In [42]: re.match('',string).group()
Out[42]: ''

But why is the raw string 'r' not working as expected ?

Case II:

In [43]: re.match(r'',string).group()
---
AttributeErrorTraceback (most recent call last)
 in ()
> 1 re.match(r'',string).group()

AttributeError: 'NoneType' object has no attribute 'group'

In [44]: re.match(r'',string)



Thanks,
santosh
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression wildcard search

2012-12-11 Thread Alan Gauld

On 11/12/12 15:54, Hs Hs wrote:


myseq = 'MMSASRLAGTLIPAMAFLSCVRPESWEPC VEVVP
NITYQCMELNFYKIPDNLPFSTKNLDLSFNPLRHLGSYSFFSFPELQVLDLSRCEIQTIED'

if re.search('V*VVP',myseq):
print myseq


I hope this is just a typo but you are printing your original string not 
the things found...



--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression wildcard search

2012-12-11 Thread Joel Goldstick
On Tue, Dec 11, 2012 at 10:54 AM, Hs Hs  wrote:

> Dear group:
>

Please send mail as plain text.  It is easier to read

>
> I have 50 thousand lists. My aim is to search a pattern in the
> alphabetical strings (these are protein sequence strings).
>
>
> MMSASRLAGTLIPAMAFLSCVRPESWEPC VEVVP
> NITYQCMELNFYKIPDNLPFSTKNLDLSFNPLRHLGSYSFFSFPELQVLDLSRCEIQTIED
>
> my aim is to find the list of string that has V*VVP.
>

Asterisk

The "*" matches 0 or more instances of the previous element.

I am not sure what you want, but I don't think it is this.  Do you want V
then any characters followed by VVP?  In that case perhaps

V.+VP


There are many tutorials about how to create regular expressions

**
**

>
> myseq = 'MMSASRLAGTLIPAMAFLSCVRPESWEPC VEVVP
> NITYQCMELNFYKIPDNLPFSTKNLDLSFNPLRHLGSYSFFSFPELQVLDLSRCEIQTIED'
>
> if re.search('V*VVP',myseq):
> print myseq
>
> the problem with this is, I am also finding junk with just VVP or VP etc.
>
> How can I strictly search for V*VVP only.
>
> Thanks for help.
>
> Hs
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>


-- 
Joel Goldstick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression wildcard search

2012-12-11 Thread Emma Birath
Hi there

Do you want your "*" to represent a single letter, or what is your intent?

If you want only a single letter between the "V" and "VVP", use "\w"
instead of "*".

re.search('v\wVVP',myseq)

Emma

On Tue, Dec 11, 2012 at 8:54 AM, Hs Hs  wrote:

> Dear group:
>
> I have 50 thousand lists. My aim is to search a pattern in the
> alphabetical strings (these are protein sequence strings).
>
>
> MMSASRLAGTLIPAMAFLSCVRPESWEPC VEVVP
> NITYQCMELNFYKIPDNLPFSTKNLDLSFNPLRHLGSYSFFSFPELQVLDLSRCEIQTIED
>
> my aim is to find the list of string that has V*VVP.
>
> myseq = 'MMSASRLAGTLIPAMAFLSCVRPESWEPC VEVVP
> NITYQCMELNFYKIPDNLPFSTKNLDLSFNPLRHLGSYSFFSFPELQVLDLSRCEIQTIED'
>
> if re.search('V*VVP',myseq):
> print myseq
>
> the problem with this is, I am also finding junk with just VVP or VP etc.
>
> How can I strictly search for V*VVP only.
>
> Thanks for help.
>
> Hs
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] regular expression wildcard search

2012-12-11 Thread Hs Hs
Dear group:

I have 50 thousand lists. My aim is to search a pattern in the alphabetical 
strings (these are protein sequence strings).


MMSASRLAGTLIPAMAFLSCVRPESWEPC VEVVP 
NITYQCMELNFYKIPDNLPFSTKNLDLSFNPLRHLGSYSFFSFPELQVLDLSRCEIQTIED

my aim is to find the list of string that has V*VVP. 

myseq = 'MMSASRLAGTLIPAMAFLSCVRPESWEPC VEVVP 
NITYQCMELNFYKIPDNLPFSTKNLDLSFNPLRHLGSYSFFSFPELQVLDLSRCEIQTIED'

if re.search('V*VVP',myseq):
print myseq 

the problem with this is, I am also finding junk with just VVP or VP etc. 

How can I strictly search for V*VVP only. 

Thanks for help. 

Hs___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression grouping insert thingy

2010-06-08 Thread Lang Hurst
Oh.  Crap, I knew it would be something simple, but honestly, I don't 
think that I would have gotten there.  Thank you so much.  Seriously 
saved me more grey hair.


Matthew Wood wrote:

re.sub(r'(\d+)x', r'\1*x', input_text)

--

I enjoy haiku
but sometimes they don't make sense;
refrigerator?


On Tue, Jun 8, 2010 at 10:11 PM, Lang Hurst > wrote:


This is so trivial (or should be), but I can't figure it out.

I'm trying to do what in vim is

:s/\([0-9]\)x/\1*x/

That is, "find a number followed by an x and put a "*" in between
the number and the x"

So, if the string is "6443x - 3", I'll get back "6443*x - 3"

I won't write down all the things I've tried, but suffice it to
say, nothing has done it.  I just found myself figuring out how to
call sed and realized that this should be a one-liner in python
too.  Any ideas?  I've read a lot of documentation, but I just
can't figure it out.  Thanks.

-- 
There are no stupid questions, just stupid people.


___
Tutor maillist  -  Tutor@python.org 
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor





--
There are no stupid questions, just stupid people.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression grouping insert thingy

2010-06-08 Thread Matthew Wood
re.sub(r'(\d+)x', r'\1*x', input_text)

--

I enjoy haiku
but sometimes they don't make sense;
refrigerator?


On Tue, Jun 8, 2010 at 10:11 PM, Lang Hurst  wrote:

> This is so trivial (or should be), but I can't figure it out.
>
> I'm trying to do what in vim is
>
> :s/\([0-9]\)x/\1*x/
>
> That is, "find a number followed by an x and put a "*" in between the
> number and the x"
>
> So, if the string is "6443x - 3", I'll get back "6443*x - 3"
>
> I won't write down all the things I've tried, but suffice it to say,
> nothing has done it.  I just found myself figuring out how to call sed and
> realized that this should be a one-liner in python too.  Any ideas?  I've
> read a lot of documentation, but I just can't figure it out.  Thanks.
>
> --
> There are no stupid questions, just stupid people.
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular expression grouping insert thingy

2010-06-08 Thread Lang Hurst

This is so trivial (or should be), but I can't figure it out.

I'm trying to do what in vim is

:s/\([0-9]\)x/\1*x/

That is, "find a number followed by an x and put a "*" in between the 
number and the x"


So, if the string is "6443x - 3", I'll get back "6443*x - 3"

I won't write down all the things I've tried, but suffice it to say, 
nothing has done it.  I just found myself figuring out how to call sed 
and realized that this should be a one-liner in python too.  Any ideas?  
I've read a lot of documentation, but I just can't figure it out.  Thanks.


--
There are no stupid questions, just stupid people.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular expression generator

2010-02-24 Thread Kent Johnson
Another interesting tool - you give it a sample string and it helps
you build a regular expression to match the string. This is not a
regex tester, it actually creates the regex for you as you click on
elements of the string.
http://txt2re.com/index-python.php3

Kent
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread Kent Johnson
On Tue, Apr 28, 2009 at 4:03 AM, Kelie  wrote:
> Hello,
>
> The following code returns 'abc123abc45abc789jk'. How do I revise the pattern 
> so
> that the return value will be 'abc789jk'? In other words, I want to find the
> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' 
> are
> just examples. They are actually quite different in the string that I'm 
> working
> with.
>
> import re
> s = 'abc123abc45abc789jk'
> p = r'abc.+jk'
> lst = re.findall(p, s)
> print lst[0]

re.findall() won't work because it finds non-overlapping matches.

If there is a character in the initial match which cannot occur in the
middle section, change .+ to exclude that character. For example,
r'abc[^a]+jk' works with your example.

Another possibility is to look for the match starting at different
locations, something like this:
p = re.compile(r'abc.+jk')
lastMatch = None
i = 0
while i < len(s):
  m = p.search(s, i)
  if m is None:
break
  lastMatch = m.group()
  i = m.start() + 1

print lastMatch

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread Kent Johnson
2009/4/28 Marek spociń...@go2.pl,Poland :

>> import re
>> s = 'abc123abc45abc789jk'
>> p = r'abc.+jk'
>> lst = re.findall(p, s)
>> print lst[0]
>
> I suggest using r'abc.+?jk' instead.
>
> the additional ? makes the preceeding '.+' non-greedy so instead of matching 
> as long string as it can it matches as short string as possible.

Did you try it? It doesn't do what you expect, it still matches at the
beginning of the string.

The re engine searches for a match at a location and returns the first
one it finds. A non-greedy match doesn't mean "Find the shortest
possible match anywhere in the string", it means, "find the shortest
possible match starting at this location."

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread Kelie
spir  free.fr> writes:

> To avoid that, use non-grouping parens (?:...). This also avoids the need for
parens around the whole format:
> p = Pattern(r'abc(?:(?!abc).)+jk')
> print p.findall(s)
> ['abc789jk']
> 
> Denis


This one works! Thank you Denis. I'll try it out on the actual much longer
(multiline) string and see what happens.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread Kelie
Andre Engels  gmail.com> writes:

> 
> 2009/4/28 Marek Spociński  go2.pl,Poland  10g.pl>:

> > I suggest using r'abc.+?jk' instead.
> >

> 
> That was my first idea too, but it does not work for this case,
> because Python will still try to _start_ the match as soon as
> possible. 

yeah, i tried the '?' as well and realized it would not work.


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread spir
Le Tue, 28 Apr 2009 11:06:16 +0200,
Marek spociń...@go2.pl,  Poland  s'exprima ainsi:

> > Hello,
> > 
> > The following code returns 'abc123abc45abc789jk'. How do I revise the
> > pattern so that the return value will be 'abc789jk'? In other words, I
> > want to find the pattern 'abc' that is closest to 'jk'. Here the string
> > '123', '45' and '789' are just examples. They are actually quite
> > different in the string that I'm working with. 
> > 
> > import re
> > s = 'abc123abc45abc789jk'
> > p = r'abc.+jk'
> > lst = re.findall(p, s)
> > print lst[0]
> 
> I suggest using r'abc.+?jk' instead.
> 
> the additional ? makes the preceeding '.+' non-greedy so instead of
> matching as long string as it can it matches as short string as possible.

Non-greedy repetition will not work in this case, I guess:

from re import compile as Pattern
s = 'abc123abc45abc789jk'
p = Pattern(r'abc.+?jk')
print p.match(s).group()
==>
abc123abc45abc789jk

(Someone explain why?)

My solution would be to explicitely exclude 'abc' from the sequence of chars 
matched by '.+'. To do this, use negative lookahead (?!...) before '.':
p = Pattern(r'(abc((?!abc).)+jk)')
print p.findall(s)
==>
[('abc789jk', '9')]

But it's not exactly what you want. Because the internal () needed to express 
exclusion will be considered by findall as a group to be returned, so that you 
also get the last char matched in there.
To avoid that, use non-grouping parens (?:...). This also avoids the need for 
parens around the whole format:
p = Pattern(r'abc(?:(?!abc).)+jk')
print p.findall(s)
['abc789jk']

Denis
--
la vita e estrany
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread Marek Spociński , Poland
Dnia 28 kwietnia 2009 11:16 Andre Engels  napisał(a):
> 2009/4/28 Marek spociń...@go2.pl,Poland :
> >> Hello,
> >>
> >> The following code returns 'abc123abc45abc789jk'. How do I revise the 
> >> pattern so
> >> that the return value will be 'abc789jk'? In other words, I want to find 
> >> the
> >> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and 
> >> '789' are
> >> just examples. They are actually quite different in the string that I'm 
> >> working
> >> with.
> >>
> >> import re
> >> s = 'abc123abc45abc789jk'
> >> p = r'abc.+jk'
> >> lst = re.findall(p, s)
> >> print lst[0]
> >
> > I suggest using r'abc.+?jk' instead.
> >
> > the additional ? makes the preceeding '.+' non-greedy so instead of 
> > matching as long string as it can it matches as short string as possible.
> 
> That was my first idea too, but it does not work for this case,
> because Python will still try to _start_ the match as soon as
> possible. To use .+? one would have to revert the string, then use the
> reverse regular expression on the result, which looks like a rather
> roundabout way of doing things.

I don't have access to python right now so i cannot test my ideas...
And i don't really want to give you wrong idea too.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread Andre Engels
2009/4/28 Marek spociń...@go2.pl,Poland :
>> Hello,
>>
>> The following code returns 'abc123abc45abc789jk'. How do I revise the 
>> pattern so
>> that the return value will be 'abc789jk'? In other words, I want to find the
>> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' 
>> are
>> just examples. They are actually quite different in the string that I'm 
>> working
>> with.
>>
>> import re
>> s = 'abc123abc45abc789jk'
>> p = r'abc.+jk'
>> lst = re.findall(p, s)
>> print lst[0]
>
> I suggest using r'abc.+?jk' instead.
>
> the additional ? makes the preceeding '.+' non-greedy so instead of matching 
> as long string as it can it matches as short string as possible.

That was my first idea too, but it does not work for this case,
because Python will still try to _start_ the match as soon as
possible. To use .+? one would have to revert the string, then use the
reverse regular expression on the result, which looks like a rather
roundabout way of doing things.



-- 
André Engels, andreeng...@gmail.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread =?UTF-8?Q?Marek_Spoci=C5=84ski
> Hello,
> 
> The following code returns 'abc123abc45abc789jk'. How do I revise the pattern 
> so
> that the return value will be 'abc789jk'? In other words, I want to find the
> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' 
> are
> just examples. They are actually quite different in the string that I'm 
> working
> with. 
> 
> import re
> s = 'abc123abc45abc789jk'
> p = r'abc.+jk'
> lst = re.findall(p, s)
> print lst[0]

I suggest using r'abc.+?jk' instead.

the additional ? makes the preceeding '.+' non-greedy so instead of matching as 
long string as it can it matches as short string as possible.


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] regular expression question

2009-04-28 Thread Kelie
Hello,

The following code returns 'abc123abc45abc789jk'. How do I revise the pattern so
that the return value will be 'abc789jk'? In other words, I want to find the
pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' are
just examples. They are actually quite different in the string that I'm working
with. 

import re
s = 'abc123abc45abc789jk'
p = r'abc.+jk'
lst = re.findall(p, s)
print lst[0]

Thanks for your help!

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression problem

2009-04-15 Thread Spencer Parker
After he said that...I realized where I was being dumb...

On Wed, Apr 15, 2009 at 10:29 AM, bob gailer  wrote:

> Spencer Parker wrote:
>
>> I have a python script that takes a text file as an argument.  It then
>> loops
>> through the text file pulling out specific lines of text that I want.  I
>> have a regular expression that evaluates the text to see if it matches a
>>
>> specific phrase.  Right now I have it writing to another text file that
>> output.  The problem I am having is that it finds the phrase prints it,
>> but
>> then it continuously prints the statement.  There is only 1 entries in the
>>
>> file for the result it finds, but it prints it multiple times...several
>> hundred before it moves onto the next one.  But it appends the first one
>> to
>> the next entry...and does this till it finds everything.
>>
>> http://dpaste.com/33982/
>>
>>
>> Any Help?
>>
>>
>
> dedent the 2nd for loop.
>
> --
> Bob Gailer
> Chapel Hill NC
> 919-636-4239
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression problem

2009-04-15 Thread bob gailer

Spencer Parker wrote:

I have a python script that takes a text file as an argument.  It then loops
through the text file pulling out specific lines of text that I want.  I
have a regular expression that evaluates the text to see if it matches a

specific phrase.  Right now I have it writing to another text file that
output.  The problem I am having is that it finds the phrase prints it, but
then it continuously prints the statement.  There is only 1 entries in the

file for the result it finds, but it prints it multiple times...several
hundred before it moves onto the next one.  But it appends the first one to
the next entry...and does this till it finds everything.

http://dpaste.com/33982/


Any Help?
  


dedent the 2nd for loop.

--
Bob Gailer
Chapel Hill NC
919-636-4239
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] regular expression problem

2009-04-15 Thread Spencer Parker
I have a python script that takes a text file as an argument.  It then loops
through the text file pulling out specific lines of text that I want.  I
have a regular expression that evaluates the text to see if it matches a
specific phrase.  Right now I have it writing to another text file that
output.  The problem I am having is that it finds the phrase prints it, but
then it continuously prints the statement.  There is only 1 entries in the
file for the result it finds, but it prints it multiple times...several
hundred before it moves onto the next one.  But it appends the first one to
the next entry...and does this till it finds everything.

http://dpaste.com/33982/

Any Help?
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression oddity

2008-11-23 Thread spir

bob gailer a écrit :

Emmanuel Ruellan wrote:

Hi tutors!

While trying to write a regular expression that would split a string
the way I want, I noticed a behaviour I didn't expect.

 

re.findall('.?', 'some text')


['s', 'o', 'm', 'e', ' ', 't', 'e', 'x', 't', '']

Where does the last string, the empty one, come from?
I find this behaviour rather annoying: I'm getting one group too many.
  
The ? means 0 or 1 occurrence. I think re is matching the null string at 
the end.


Drop the ? and you'll get what you want.

Of course you can get the same thing using list('some text') at lower cost.


I find this fully consistent, for your regex means matching
* either any char
* or no char at all
Logically, you first get n chars, then one 'nothing'. Only after that will 
parsing be stopped because of end of string. Maybe clearer:

print re.findall('.?', '')
==> ['']
print re.findall('.', '')
==> []
denis

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression oddity

2008-11-22 Thread bob gailer

Emmanuel Ruellan wrote:

Hi tutors!

While trying to write a regular expression that would split a string
the way I want, I noticed a behaviour I didn't expect.

  

re.findall('.?', 'some text')


['s', 'o', 'm', 'e', ' ', 't', 'e', 'x', 't', '']

Where does the last string, the empty one, come from?
I find this behaviour rather annoying: I'm getting one group too many.
  
The ? means 0 or 1 occurrence. I think re is matching the null string at 
the end.


Drop the ? and you'll get what you want.

Of course you can get the same thing using list('some text') at lower cost.

--
Bob Gailer
Chapel Hill NC 
919-636-4239


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular expression oddity

2008-11-22 Thread Emmanuel Ruellan
Hi tutors!

While trying to write a regular expression that would split a string
the way I want, I noticed a behaviour I didn't expect.

>>> re.findall('.?', 'some text')
['s', 'o', 'm', 'e', ' ', 't', 'e', 'x', 't', '']

Where does the last string, the empty one, come from?
I find this behaviour rather annoying: I'm getting one group too many.

Regards,
Emmanuel
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression to match \f in groff input?

2008-08-21 Thread Alan Gauld


"Bill Campbell" <[EMAIL PROTECTED]> wrote 


to get regular expressions to match the font change sequences in
*roff input (e.g. \fB for bold, \fP to revert to previous font).
The re library maps r'\f' to the single form-feed character (as
it does other common single-character sequences like r'\n').


I think all you need is an extra \ to escape the \ character in \f


This does not work in puthon:

s = re.sub(r'\f[1NR]', ', sinput)


Try

s = re.sub(r'\\f[1NR]', ', sinput)

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression to match \f in groff input?

2008-08-21 Thread Danny Yoo
On Thu, Aug 21, 2008 at 1:40 PM, Bill Campbell <[EMAIL PROTECTED]> wrote:
> I've been beating my head against the wall try to figure out how
> to get regular expressions to match the font change sequences in
> *roff input (e.g. \fB for bold, \fP to revert to previous font).
> The re library maps r'\f' to the single form-feed character (as
> it does other common single-character sequences like
r'\n').


Does this example help?

###
>>> sampleText = r"\This\ has \backslashes\."
>>> print sampleText
\This\ has \backslashes\.
>>> import re
>>> re.findall(r"\\\w+\\", sampleText)
['\\This\\', '\\backslashes\\']
###
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular expression to match \f in groff input?

2008-08-21 Thread Bill Campbell
I've been beating my head against the wall try to figure out how
to get regular expressions to match the font change sequences in
*roff input (e.g. \fB for bold, \fP to revert to previous font).
The re library maps r'\f' to the single form-feed character (as
it does other common single-character sequences like r'\n').

In perl, I can write something like this:

s/\f[1NR]//g;

This does not work in puthon:

s = re.sub(r'\f[1NR]', ', sinput)

The string.replace() operator will handle the above replacement,
although it requires a separate replace for each of the possible
characters in the [1NR].

I have tried various options such as r'\\x66' and r'\\146', but
none of these work.

One work-around, assuming that the text does not contain control
characters, is to replace the \f characters with a control
character before doing the replacements, then replace that
control character with \f if any remain after processing:

import re, fileinput
for line in fileinput.input():
line = line.rstrip()
line = line.replace(r'\f', r'\001')
# do something here to make substitutions
line = line.replace(r'\001', r'\f')
print line

Bill
-- 
INTERNET:   [EMAIL PROTECTED]  Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/  PO Box 820; 6641 E. Mercer Way
Voice:  (206) 236-1676  Mercer Island, WA 98040-0820
Fax:(206) 232-9186

It is our duty still to endeavor to avoid war; but if it shall actually
take place, no matter by whom brought on, we must defend ourselves. If our
house be on fire, without inquiring whether it was fired from within or
without, we must try to extinguish it.
-- Thomas Jefferson to James Lewis, Jr., 1798.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression

2007-12-21 Thread Michael Langford
You need to pass a parameter to the string in the following line:

outfile.write("%s\n" % m.string[m.start():m.end()])

And you need to use m.search, not m.match in the line where you're
actually apply the expression to the string

 m = patt.search(line)

   --Michael

On 12/21/07, Que Prime <[EMAIL PROTECTED]> wrote:
>
>
> I need to pull the highligted data from a similar file and can't seem to get
> my script to work:
>
> Script:
> import re
> infile = open("filter.txt","r")
> outfile = open("out.txt","w")
> patt = re.compile(r"~02([\d{10}])")
> for line in infile:
>   m = patt.match(line)
>   if m:
> outfile.write("%s\n")
> infile.close()
> outfile.close()
>
>
> File:
> 200~02001491~05070
> 200~02001777~05070
> 200~02001995~05090
> 200~02002609~05090
> 200~02002789~05070
> 200~012~02004169~0
>  200~02004247~05090
> 200~02008623~05090
> 200~02010957~05090
> 200~02 011479~05090
> 200~0199~02001237~
> 200~02011600~05090
> 200~012~02 022305~0
> 200~02023546~05090
> 200~02025427~05090
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>


-- 
Michael Langford
Phone: 404-386-0495
Consulting: http://www.RowdyLabs.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression

2007-12-21 Thread Tiger12506
>I need to pull the highligted data from a similar file and can't seem to 
>get
> my script to work:
>
> Script:
> import re
> infile = open("filter.txt","r")
> outfile = open("out.txt","w")
> patt = re.compile(r"~02([\d{10}])")

You have to allow for the characters at the beginning and end too.
Try this.
re.compile(r".*~02(\d{10})~.*")

Also outfile.write("%s\n") literally writes "%s\n"
You need this I believe

outfile.write("%s\n" % m.group(1))


> for line in infile:
>  m = patt.match(line)
>  if m:
>outfile.write("%s\n")
> infile.close()
> outfile.close()
>
>
> File:
> 200~02001491~05070
> 200~02001777~05070
> 200~02001995~05090
> 200~02002609~05090
> 200~02002789~05070
> 200~012~02004169~0
> 200~02004247~05090
> 200~02008623~05090
> 200~02010957~05090
> 200~02 011479~05090
> 200~0199~02001237~
> 200~02011600~05090
> 200~012~02 022305~0
> 200~02023546~05090
> 200~02025427~05090
>





> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular Expression

2007-12-21 Thread Michael H. Goldwasser

Que,

I haven't tested the script, but please note that patt.match(line)
will only succeed when the pattern is at the start of the line.  Use
patt.search(line) if you want to find the pattern anywhere within the
line.

Based on your desired highlight, you might want to use the pattern,
patt = re.compile(r"~02(\d*)~")

Also, your outfile.write command below doesn't provide any substitute
for %s.   Presumably you mean outfile.write("%s\n" % m.group(1)) .

Good luck,
Michael

On Friday December 21, 2007, Que Prime wrote: 

>I need to pull the highligted data from a similar file and can't seem to 
> get
>my script to work:
>
>Script:
>import re
>infile = open("filter.txt","r")
>outfile = open("out.txt","w")
>patt = re.compile(r"~02([\d{10}])")
>for line in infile:
>  m = patt.match(line)
>  if m:
>outfile.write("%s\n")
>infile.close()
>outfile.close()
>
>
>File:
>200~02001491~05070
>200~02001777~05070
>200~02001995~05090
>200~02002609~05090
>200~02002789~05070
>200~012~02004169~0
>200~02004247~05090
>200~02008623~05090
>200~02010957~05090
>200~02 011479~05090
>200~0199~02001237~
>200~02011600~05090
>200~012~02 022305~0
>200~02023546~05090
>200~02025427~05090


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular Expression

2007-12-21 Thread Que Prime
I need to pull the highligted data from a similar file and can't seem to get
my script to work:

Script:
import re
infile = open("filter.txt","r")
outfile = open("out.txt","w")
patt = re.compile(r"~02([\d{10}])")
for line in infile:
  m = patt.match(line)
  if m:
outfile.write("%s\n")
infile.close()
outfile.close()


File:
200~02001491~05070
200~02001777~05070
200~02001995~05090
200~02002609~05090
200~02002789~05070
200~012~02004169~0
200~02004247~05090
200~02008623~05090
200~02010957~05090
200~02 011479~05090
200~0199~02001237~
200~02011600~05090
200~012~02 022305~0
200~02023546~05090
200~02025427~05090
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help - parsing AppleScript Lists as Strings

2007-11-01 Thread Andrew Wu
Ah - thanks for the correction!  I missed the extra grouping and the
extra spacing ... doh!  Sorry about the HTML-formatted e-mail ...

Thanks also for the pyparsing variant as well - I didn't know the
module existed before!



Andrew

On 11/1/07, Kent Johnson <[EMAIL PROTECTED]> wrote:
> Andrew Wu wrote:
>
> >pattern3 = '''
> >   ^{
> >   (
> >   %s
> >   | {%s}   # Possible to have 1 level of nested lists
> >   ,?)* # Items are comma-delimited, except for the last item
> >   }$
> >''' % (pattern2, pattern2)
>
> The above doesn't allow comma after the first instance of pattern2 and
> it doesn't allow space after either instance. Here is a version that
> passes your tests:
>
> pattern3 = '''
>^{
>(
>(%s
>| {%s})   # Possible to have 1 level of nested lists
>,?\s*)* # Items are comma-delimited, except for the last item
>}$
> ''' % (pattern2, pattern2)
>
> You might want to look at doing this with pyparsing, I think it will
> make it easier to get the data out vs just recognizing the correct pattern.
>
> Kent
>
> PS Please post in plain text, not HTML.
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help - parsing AppleScript Lists as Strings

2007-11-01 Thread Kent Johnson
Kent Johnson wrote:
> You might want to look at doing this with pyparsing, I think it will 
> make it easier to get the data out vs just recognizing the correct pattern.

Here is a pyparsing version that correctly recognizes all of your 
patterns and returns a (possibly nested) Python list in case of a match.

Note that this version will parse lists that are nested arbitrarily 
deeply. If you don't want that you will have to define two kinds of 
lists, a singly-nested list and a non-nested list.

Kent

from pyparsing import *

List = Forward()
T = Literal('true').setParseAction( lambda s,l,t: [ True ] )
F = Literal('false').setParseAction( lambda s,l,t: [ False ] )
String = QuotedString('"')
Number = Word(nums).setParseAction( lambda s,l,t: [ int(t[0]) ] )
List << Literal('{').suppress() + 
delimitedList(T|F|String|Number|Group(List)) + Literal('}').suppress()

def IsASList(s):
# AppleScript lists are bracked by curly braces with items separate 
by commas
# Each item is an alphanumeric label(?) or a string enclosed by
# double quotes or a list itself
# e.g. {2, True, "hello"}
try:
parsed = List.parseString(s)
return parsed.asList()
except Exception, e:
return None

sample_strs = [
'{}',  # Empty list
'{a}', # Should not match
'{a, b, c}', # Should not match
'{"hello"}',
'{"hello", "kitty"}',
'{true}',
'{false}',
'{true, false}',
'{9}',
'{9,10, 11}',
'{93214, true, false, "hello", "kitty"}',
'{{1, 2, 3}}',  # This matches
'{{1, 2, "cat"}, 1}',  # This matches

 # These don't match:
'{{1,2,3},1,{4,5,6},2}',
'{1, {2, 3, 4}, 3}',
'{{1, 2, 3}, {4, 5, 6}, 1}',
'{1, {1, 2, 3}}',  # Should match but doesn't
'{93214, true, false, "hello", "kitty", {1, 2, 3}}',  # Should match 
but doesn't
'{label: "hello", value: false, num: 2}',  # AppleScript dictionary 
- should not match
]

for sample in sample_strs:
result = IsASList(sample)
print 'Is AppleScript List:  %s;   String:  %s' % (bool(result), sample)
if result:
print result
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help - parsing AppleScript Lists as Strings

2007-11-01 Thread Kent Johnson
Andrew Wu wrote:

>pattern3 = '''
>   ^{
>   (
>   %s
>   | {%s}   # Possible to have 1 level of nested lists
>   ,?)* # Items are comma-delimited, except for the last item
>   }$
>''' % (pattern2, pattern2)

The above doesn't allow comma after the first instance of pattern2 and 
it doesn't allow space after either instance. Here is a version that 
passes your tests:

pattern3 = '''
   ^{
   (
   (%s
   | {%s})   # Possible to have 1 level of nested lists
   ,?\s*)* # Items are comma-delimited, except for the last item
   }$
''' % (pattern2, pattern2)

You might want to look at doing this with pyparsing, I think it will 
make it easier to get the data out vs just recognizing the correct pattern.

Kent

PS Please post in plain text, not HTML.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular Expression help - parsing AppleScript Lists as Strings

2007-10-31 Thread Andrew Wu
Hi,

I'm writing utilities to handle return values from AppleScript in python and
one of the steps is recognizing when a returned value from an AppleScript
execution (via popen and osascript) represents a list (in AppleScript) or
not.  For the most part I think I have the correct matching pattern, but I
am hung up on one of the sample strings I was using to test it - AppleScript
allows for one level of nested lists (I believe) and I get tripped up with
attempting to match lists with nested lists.

My second question is, once I have a pattern matching correctly, I need to
convert the AppleScript list into a Python list - I've read a bit about the
findall() method of the re module and was wondering if that would work in
this instance (there's also split() but I've been having issues with that,
probably b/c my pattern matching isn't correct).


Thank you!

Andrew

(source code below)



#!/usr/bin/env python
# Sample script to test if a string represented an AppleScript List or not

import re
import os

def IsASList(thestr=''):
   # AppleScript lists are bracked by curly braces with items separate by
commas
   # Each item is an alphanumeric label(?) or a string enclosed by
   # double quotes or a list itself
   # e.g. {2, True, "hello"}
   #
   # They differ from AppleScript records in that AS records have a key and
value:
   # {name: "Buffy", field: "Slaying", job: true, age: 21}
   #
   # Now the question is how to make the distinction?

   pattern = '''
  ^{# Must start with a curly brace
  (
  \s*? # Group to repeat; clear the whitespace after commas first
  (   # Start of group of alternating match possibilities
  ".+?"   # Match a string
  | \d+? # Match a number
  | true|false  # Match 'true' or 'false' label
  )   # End of group of alternating match possibilities
  ,?)* # Items are comma-delimited, except for the last item
  }$# Must end with a closing curly brace
   '''

   pattern2 = '''
  (
  \s*? # Group to repeat; clear the whitespace after commas first
  (   # Start of group of alternating match possibilities
  ".+?"   # Match a string
  | \d+? # Match a number
  | true|false  # Match 'true' or 'false' label
  )   # End of group of alternating match possibilities
  ,?)* # Items are comma-delimited, except for the last item
   '''

   pattern3 = '''
  ^{
  (
  %s
  | {%s}   # Possible to have 1 level of nested lists
  ,?)* # Items are comma-delimited, except for the last item
  }$
   ''' % (pattern2, pattern2)

   regex = re.compile(pattern3, re.VERBOSE)
   result = regex.match(thestr)

#   print 'Result: ',
#   try:
#  print result.groups()
#   except AttributeError:
#  pass

   if result:
  return True
   else:
  return False


# main()

sample_strs = [
   '{}',  # Empty list
   '{a}', # Should not match
   '{a, b, c}', # Should not match
   '{"hello"}',
   '{"hello", "kitty"}',
   '{true}',
   '{false}',
   '{true, false}',
   '{9}',
   '{9,10, 11}',
   '{93214, true, false, "hello", "kitty"}',
   '{{1, 2, 3}}',  # This matches
   '{{1, 2, "cat"}, 1}',  # This matches

# These don't match:
   '{{1,2,3},1,{4,5,6},2}',
   '{1, {2, 3, 4}, 3}',
   '{{1, 2, 3}, {4, 5, 6}, 1}',
   '{1, {1, 2, 3}}',  # Should match but doesn't
   '{93214, true, false, "hello", "kitty", {1, 2, 3}}',  # Should match but
doesn't
   '{label: "hello", value: false, num: 2}',  # AppleScript dictionary -
should not match
]

for sample in sample_strs:
   print 'Is AppleScript List:  %s;   String:  %s' % (str(IsASList(sample)),
sample)
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-29 Thread Gardner, Dean
Thanks everyone for the replies all worked well, I adopted the string
splitting approach in favour of the regex one as it seemed to miss less
of the edge cases. I would like to thank everyone for their help once
again 




-Original Message-
From: Kent Johnson [mailto:[EMAIL PROTECTED] 
Sent: 27 June 2007 14:55
To: tutor@python.org; Gardner, Dean
Subject: Re: [Tutor] Regular Expression help

Gardner, Dean wrote:
> Hi
> 
> I have a text file that I would like to split up so that I can use it 
> in Excel to filter a certain field. However as it is a flat text file 
> I need to do some processing on it so that Excel can correctly import
it.
> 
> File Example:
> tag descVR  VM
> (0012,0042) Clinical Trial Subject Reading ID LO 1
> (0012,0050) Clinical Trial Time Point ID LO 1
> (0012,0051) Clinical Trial Time Point Description ST 1
> (0012,0060) Clinical Trial Coordinating Center Name LO 1
> (0018,0010) Contrast/Bolus Agent LO 1
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
> (0018,0015) Body Part Examined CS 1
> 
> What I essentially want is to use python to process this file to give 
> me
> 
> 
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1 (0012,0050); 
> Clinical Trial Time Point ID; LO; 1 (0012,0051); Clinical Trial Time 
> Point Description; ST; 1 (0012,0060); Clinical Trial Coordinating 
> Center Name; LO; 1 (0018,0010); Contrast/Bolus Agent; LO; 1 
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1 (0018,0014); 
> Contrast/Bolus Administration Route Sequence; SQ; 1 (0018,0015); Body 
> Part Examined; CS; 1
> 
> so that I can import to excel using a delimiter.
> 
> This file is extremely long and all I essentially want to do is to 
> break it into it 'fields'
> 
> Now I suspect that regular expressions are the way to go but I have 
> only basic experience of using these and I have no idea what I should
be doing.

This seems to work:

data = '''\
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1'''.splitlines()

import re
fieldsRe = re.compile(r'^(\(\d+,\d+\)) (.*?) (\w+) (\d+)$')

for line in data:
match = fieldsRe.match(line)
if match:
print ';'.join(match.group(1, 2, 3, 4))


I don't think you want the space after the ; that you put in your
example; Excel wants a single-character delimiter.

Kent


DISCLAIMER:
Unless indicated otherwise, the information contained in this message is 
privileged and confidential, and is intended only for the use of the 
addressee(s) named above and others who have been specifically authorized to 
receive it. If you are not the intended recipient, you are hereby notified that 
any dissemination, distribution or copying of this message and/or attachments 
is strictly prohibited. The company accepts no liability for any damage caused 
by any virus transmitted by this email. Furthermore, the company does not 
warrant a proper and complete transmission of this information, nor does it 
accept liability for any delays. If you have received this message in error, 
please contact the sender and delete the message. Thank you.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Kent Johnson
Gardner, Dean wrote:
> Hi
> 
> I have a text file that I would like to split up so that I can use it in 
> Excel to filter a certain field. However as it is a flat text file I 
> need to do some processing on it so that Excel can correctly import it.
> 
> File Example:
> tag descVR  VM
> (0012,0042) Clinical Trial Subject Reading ID LO 1
> (0012,0050) Clinical Trial Time Point ID LO 1
> (0012,0051) Clinical Trial Time Point Description ST 1
> (0012,0060) Clinical Trial Coordinating Center Name LO 1
> (0018,0010) Contrast/Bolus Agent LO 1
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
> (0018,0015) Body Part Examined CS 1
> 
> What I essentially want is to use python to process this file to give me
> 
> 
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1
> (0012,0050); Clinical Trial Time Point ID; LO; 1
> (0012,0051); Clinical Trial Time Point Description; ST; 1
> (0012,0060); Clinical Trial Coordinating Center Name; LO; 1
> (0018,0010); Contrast/Bolus Agent; LO; 1
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
> (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
> (0018,0015); Body Part Examined; CS; 1
> 
> so that I can import to excel using a delimiter.
> 
> This file is extremely long and all I essentially want to do is to break 
> it into it 'fields'
> 
> Now I suspect that regular expressions are the way to go but I have only 
> basic experience of using these and I have no idea what I should be doing.

This seems to work:

data = '''\
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1'''.splitlines()

import re
fieldsRe = re.compile(r'^(\(\d+,\d+\)) (.*?) (\w+) (\d+)$')

for line in data:
 match = fieldsRe.match(line)
 if match:
 print ';'.join(match.group(1, 2, 3, 4))


I don't think you want the space after the ; that you put in your 
example; Excel wants a single-character delimiter.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Kent Johnson
Yikes! Sorry about all the duplicate postings. Thunderbird was telling 
me the send failed so I kept retrying; I guess it was actually sending!

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Kent Johnson
Gardner, Dean wrote:
> Hi
> 
> I have a text file that I would like to split up so that I can use it in 
> Excel to filter a certain field. However as it is a flat text file I 
> need to do some processing on it so that Excel can correctly import it.
> 
> File Example:
> tag descVR  VM
> (0012,0042) Clinical Trial Subject Reading ID LO 1
> (0012,0050) Clinical Trial Time Point ID LO 1
> (0012,0051) Clinical Trial Time Point Description ST 1
> (0012,0060) Clinical Trial Coordinating Center Name LO 1
> (0018,0010) Contrast/Bolus Agent LO 1
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
> (0018,0015) Body Part Examined CS 1
> 
> What I essentially want is to use python to process this file to give me
> 
> 
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1
> (0012,0050); Clinical Trial Time Point ID; LO; 1
> (0012,0051); Clinical Trial Time Point Description; ST; 1
> (0012,0060); Clinical Trial Coordinating Center Name; LO; 1
> (0018,0010); Contrast/Bolus Agent; LO; 1
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
> (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
> (0018,0015); Body Part Examined; CS; 1
> 
> so that I can import to excel using a delimiter.
> 
> This file is extremely long and all I essentially want to do is to break 
> it into it 'fields'
> 
> Now I suspect that regular expressions are the way to go but I have only 
> basic experience of using these and I have no idea what I should be doing.

This seems to work:

data = '''\
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1'''.splitlines()

import re
fieldsRe = re.compile(r'^(\(\d+,\d+\)) (.*?) (\w+) (\d+)$')

for line in data:
 match = fieldsRe.match(line)
 if match:
 print ';'.join(match.group(1, 2, 3, 4))


I don't think you want the space after the ; that you put in your
example; Excel wants a single-character delimiter.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Kent Johnson
Gardner, Dean wrote:
> Hi
> 
> I have a text file that I would like to split up so that I can use it in 
> Excel to filter a certain field. However as it is a flat text file I 
> need to do some processing on it so that Excel can correctly import it.
> 
> File Example:
> tag descVR  VM
> (0012,0042) Clinical Trial Subject Reading ID LO 1
> (0012,0050) Clinical Trial Time Point ID LO 1
> (0012,0051) Clinical Trial Time Point Description ST 1
> (0012,0060) Clinical Trial Coordinating Center Name LO 1
> (0018,0010) Contrast/Bolus Agent LO 1
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
> (0018,0015) Body Part Examined CS 1
> 
> What I essentially want is to use python to process this file to give me
> 
> 
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1
> (0012,0050); Clinical Trial Time Point ID; LO; 1
> (0012,0051); Clinical Trial Time Point Description; ST; 1
> (0012,0060); Clinical Trial Coordinating Center Name; LO; 1
> (0018,0010); Contrast/Bolus Agent; LO; 1
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
> (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
> (0018,0015); Body Part Examined; CS; 1
> 
> so that I can import to excel using a delimiter.
> 
> This file is extremely long and all I essentially want to do is to break 
> it into it 'fields'
> 
> Now I suspect that regular expressions are the way to go but I have only 
> basic experience of using these and I have no idea what I should be doing.

This seems to work:

data = '''\
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1'''.splitlines()

import re
fieldsRe = re.compile(r'^(\(\d+,\d+\)) (.*?) (\w+) (\d+)$')

for line in data:
 match = fieldsRe.match(line)
 if match:
 print ';'.join(match.group(1, 2, 3, 4))


I don't think you want the space after the ; that you put in your 
example; Excel wants a single-character delimiter.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Kent Johnson
Gardner, Dean wrote:
> Hi
> 
> I have a text file that I would like to split up so that I can use it in 
> Excel to filter a certain field. However as it is a flat text file I 
> need to do some processing on it so that Excel can correctly import it.
> 
> File Example:
> tag descVR  VM
> (0012,0042) Clinical Trial Subject Reading ID LO 1
> (0012,0050) Clinical Trial Time Point ID LO 1
> (0012,0051) Clinical Trial Time Point Description ST 1
> (0012,0060) Clinical Trial Coordinating Center Name LO 1
> (0018,0010) Contrast/Bolus Agent LO 1
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
> (0018,0015) Body Part Examined CS 1
> 
> What I essentially want is to use python to process this file to give me
> 
> 
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1
> (0012,0050); Clinical Trial Time Point ID; LO; 1
> (0012,0051); Clinical Trial Time Point Description; ST; 1
> (0012,0060); Clinical Trial Coordinating Center Name; LO; 1
> (0018,0010); Contrast/Bolus Agent; LO; 1
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
> (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
> (0018,0015); Body Part Examined; CS; 1
> 
> so that I can import to excel using a delimiter.
> 
> This file is extremely long and all I essentially want to do is to break 
> it into it 'fields'
> 
> Now I suspect that regular expressions are the way to go but I have only 
> basic experience of using these and I have no idea what I should be doing.

This seems to work:

data = '''\
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1'''.splitlines()

import re
fieldsRe = re.compile(r'^(\(\d+,\d+\)) (.*?) (\w+) (\d+)$')

for line in data:
 match = fieldsRe.match(line)
 if match:
 print ';'.join(match.group(1, 2, 3, 4))


I don't think you want the space after the ; that you put in your 
example; Excel wants a single-character delimiter.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Kent Johnson
Tom Tucker wrote:
> #matchstr regex flow
> # (\(\d+,\d+\)) # (0018,0014)
> # \s   # [space]
> # (..*)# Contrast/Bolus Administration Route Sequence
> # \s   # space
> # ([a-z]{2}) # SQ - two letters and no more
> # \s  # [space]
> # (\d)# 1 - single digit
> # re.I)   # case insensitive
> 
> matchstr = re.compile(r"(\(\d+,\d+\))\s(..*)\s([a-z]{2})\s(\d)",re.I)

You should learn about re.VERBOSE:
http://docs.python.org/lib/node46.html#l2h-414

With this flag, your commented version could be the actual regex, 
instead of repeating it in code with the whitespace and comments removed:

matchstr = re.compile(r'''
(\(\d+,\d+\))# (0018,0014)
\s   # [space]
(..*)# Contrast/Bolus Administration Route Sequence
\s   # space
([a-z]{2})   # SQ - two letters and no more
\s  # [space]
(\d)# 1 - single digit
re.I)   # case insensitive
''', re.I|re.VERBOSE)

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Reed O'Brien

On Jun 27, 2007, at 10:24 AM, Mike Hansen wrote:

>
>
>> -Original Message-
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] On Behalf Of Gardner, Dean
>> Sent: Wednesday, June 27, 2007 3:59 AM
>> To: tutor@python.org
>> Subject: [Tutor] Regular Expression help
>>
>> Hi
>>
>> I have a text file that I would like to split up so that I
>> can use it in Excel to filter a certain field. However as it
>> is a flat text file I need to do some processing on it so
>> that Excel can correctly import it.
>>
>> File Example:
>> tag descVR  VM
>> (0012,0042) Clinical Trial Subject Reading ID LO 1
>> (0012,0050) Clinical Trial Time Point ID LO 1
>> (0012,0051) Clinical Trial Time Point Description ST 1
>> (0012,0060) Clinical Trial Coordinating Center Name LO 1
>> (0018,0010) Contrast/Bolus Agent LO 1
>> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
>> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
>> (0018,0015) Body Part Examined CS 1
>>
>> What I essentially want is to use python to process this file
>> to give me
>>
>>
>> (0012,0042); Clinical Trial Subject Reading ID; LO; 1
>> (0012,0050); Clinical Trial Time Point ID; LO; 1
>> (0012,0051); Clinical Trial Time Point Description; ST; 1
>> (0012,0060); Clinical Trial Coordinating Center Name; LO; 1
>> (0018,0010); Contrast/Bolus Agent; LO; 1
>> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
>> (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
>> (0018,0015); Body Part Examined; CS; 1
>>
>> so that I can import to excel using a delimiter.
>>
>> This file is extremely long and all I essentially want to do
>> is to break it into it 'fields'
>>
>> Now I suspect that regular expressions are the way to go but
>> I have only basic experience of using these and I have no
>> idea what I should be doing.
>>
>> Can anyone help.
>>
>> Thanks
>>
>
> H... You might be able to do this without the need for regular
> expressions. You can split the row on spaces which will give you a  
> list.
> Then you can reconstruct the row inserting your delimiter as needed  
> and
> joining the rest with spaces again.
>
> In [63]: row = "(0012,0042) Clinical Trial Subject Reading ID LO 1"
>
> In [64]: row_items = row.split(' ')
>
> In [65]: row_items
> Out[65]: ['(0012,0042)', 'Clinical', 'Trial', 'Subject', 'Reading',
> 'ID', 'LO',
> '1']
>
> In [66]: tag = row_items.pop(0)
>
> In [67]: tag
> Out[67]: '(0012,0042)'
>
> In [68]: vm = row_items.pop()
>
> In [69]: vm
> Out[69]: '1'
>
> In [70]: vr = row_items.pop()
>
> In [71]: vr
> Out[71]: 'LO'
>
> In [72]: desc = ' '.join(row_items)
>
> In [73]: new_row = "%s; %s; %s; %s" %(tag, desc, vr, vm, )
>
> In [74]: new_row
> Out[74]: '(0012,0042); Clinical Trial Subject Reading ID; LO; 1'
>
> Someone might think of a better way with them thar fancy lambdas and
> list comprehensions thingys, but I think this will work.
>
>
I sent this to Dean this morning:

Dean,

I would do something like this (if your pattern is always the same.)

  foo =['(0012,0042) Clinical Trial Subject Reading ID LO 1 ',
  '(0012,0050) Clinical Trial Time Point ID LO 1 ',
  '(0012,0051) Clinical Trial Time Point Description ST 1 ',
  '(0012,0060) Clinical Trial Coordinating Center Name LO 1 ',
  '(0018,0010) Contrast/Bolus Agent LO 1 ',
  '(0018,0012) Contrast/Bolus Agent Sequence SQ 1 ',
  '(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1 ',
  '(0018,0015) Body Part Examined CS 1',]

import csv
writer = csv.writer(open('/Users/reed/tmp/foo.csv', 'w'), delimiter=';')

for lin in foo:
 lin = lin.split()
 row = (lin[0], ' '.join(lin[1:-2]), lin[-2], lin[-1])
 writer.writerow(row)


more foo.csv
(0012,0042);Clinical Trial Subject Reading ID;LO;1
(0012,0050);Clinical Trial Time Point ID;LO;1
(0012,0051);Clinical Trial Time Point Description;ST;1
(0012,0060);Clinical Trial Coordinating Center Name;LO;1
(0018,0010);Contrast/Bolus Agent;LO;1
(0018,0012);Contrast/Bolus Agent Sequence;SQ;1
(0018,0014);Contrast/Bolus Administration Route Sequence;SQ;1
(0018,0015);Body Part Examined;CS;1


HTH,
~reed


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Mike Hansen
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Gardner, Dean
> Sent: Wednesday, June 27, 2007 3:59 AM
> To: tutor@python.org
> Subject: [Tutor] Regular Expression help
> 
> Hi 
> 
> I have a text file that I would like to split up so that I 
> can use it in Excel to filter a certain field. However as it 
> is a flat text file I need to do some processing on it so 
> that Excel can correctly import it.
> 
> File Example: 
> tag descVR  VM 
> (0012,0042) Clinical Trial Subject Reading ID LO 1 
> (0012,0050) Clinical Trial Time Point ID LO 1 
> (0012,0051) Clinical Trial Time Point Description ST 1 
> (0012,0060) Clinical Trial Coordinating Center Name LO 1 
> (0018,0010) Contrast/Bolus Agent LO 1 
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1 
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1 
> (0018,0015) Body Part Examined CS 1 
> 
> What I essentially want is to use python to process this file 
> to give me 
> 
> 
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1 
> (0012,0050); Clinical Trial Time Point ID; LO; 1 
> (0012,0051); Clinical Trial Time Point Description; ST; 1 
> (0012,0060); Clinical Trial Coordinating Center Name; LO; 1 
> (0018,0010); Contrast/Bolus Agent; LO; 1 
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1 
> (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1 
> (0018,0015); Body Part Examined; CS; 1 
> 
> so that I can import to excel using a delimiter. 
> 
> This file is extremely long and all I essentially want to do 
> is to break it into it 'fields' 
> 
> Now I suspect that regular expressions are the way to go but 
> I have only basic experience of using these and I have no 
> idea what I should be doing.
> 
> Can anyone help. 
> 
> Thanks 
> 

H... You might be able to do this without the need for regular
expressions. You can split the row on spaces which will give you a list.
Then you can reconstruct the row inserting your delimiter as needed and
joining the rest with spaces again.

In [63]: row = "(0012,0042) Clinical Trial Subject Reading ID LO 1"

In [64]: row_items = row.split(' ')

In [65]: row_items
Out[65]: ['(0012,0042)', 'Clinical', 'Trial', 'Subject', 'Reading',
'ID', 'LO',
'1']

In [66]: tag = row_items.pop(0)

In [67]: tag
Out[67]: '(0012,0042)'

In [68]: vm = row_items.pop()

In [69]: vm
Out[69]: '1'

In [70]: vr = row_items.pop()

In [71]: vr
Out[71]: 'LO'

In [72]: desc = ' '.join(row_items)

In [73]: new_row = "%s; %s; %s; %s" %(tag, desc, vr, vm, )

In [74]: new_row
Out[74]: '(0012,0042); Clinical Trial Subject Reading ID; LO; 1'

Someone might think of a better way with them thar fancy lambdas and
list comprehensions thingys, but I think this will work. 

Mike
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Mike Hansen
Argh... My e-mail program really messed up the threads. I didn't notice
that there was already multiple replies to this message.

Doh!

Mike
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Kent Johnson
Gardner, Dean wrote:
> Hi
> 
> I have a text file that I would like to split up so that I can use it in 
> Excel to filter a certain field. However as it is a flat text file I 
> need to do some processing on it so that Excel can correctly import it.
> 
> File Example:
> tag descVR  VM
> (0012,0042) Clinical Trial Subject Reading ID LO 1
> (0012,0050) Clinical Trial Time Point ID LO 1
> (0012,0051) Clinical Trial Time Point Description ST 1
> (0012,0060) Clinical Trial Coordinating Center Name LO 1
> (0018,0010) Contrast/Bolus Agent LO 1
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
> (0018,0015) Body Part Examined CS 1
> 
> What I essentially want is to use python to process this file to give me
> 
> 
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1
> (0012,0050); Clinical Trial Time Point ID; LO; 1
> (0012,0051); Clinical Trial Time Point Description; ST; 1
> (0012,0060); Clinical Trial Coordinating Center Name; LO; 1
> (0018,0010); Contrast/Bolus Agent; LO; 1
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
> (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
> (0018,0015); Body Part Examined; CS; 1
> 
> so that I can import to excel using a delimiter.
> 
> This file is extremely long and all I essentially want to do is to break 
> it into it 'fields'
> 
> Now I suspect that regular expressions are the way to go but I have only 
> basic experience of using these and I have no idea what I should be doing.

This seems to work:

data = '''\
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1'''.splitlines()

import re
fieldsRe = re.compile(r'^(\(\d+,\d+\)) (.*?) (\w+) (\d+)$')

for line in data:
 match = fieldsRe.match(line)
 if match:
 print ';'.join(match.group(1, 2, 3, 4))


I don't think you want the space after the ; that you put in your 
example; Excel wants a single-character delimiter.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Kent Johnson
Gardner, Dean wrote:
> Hi
> 
> I have a text file that I would like to split up so that I can use it in 
> Excel to filter a certain field. However as it is a flat text file I 
> need to do some processing on it so that Excel can correctly import it.
> 
> File Example:
> tag descVR  VM
> (0012,0042) Clinical Trial Subject Reading ID LO 1
> (0012,0050) Clinical Trial Time Point ID LO 1
> (0012,0051) Clinical Trial Time Point Description ST 1
> (0012,0060) Clinical Trial Coordinating Center Name LO 1
> (0018,0010) Contrast/Bolus Agent LO 1
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
> (0018,0015) Body Part Examined CS 1
> 
> What I essentially want is to use python to process this file to give me
> 
> 
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1
> (0012,0050); Clinical Trial Time Point ID; LO; 1
> (0012,0051); Clinical Trial Time Point Description; ST; 1
> (0012,0060); Clinical Trial Coordinating Center Name; LO; 1
> (0018,0010); Contrast/Bolus Agent; LO; 1
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
> (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
> (0018,0015); Body Part Examined; CS; 1
> 
> so that I can import to excel using a delimiter.
> 
> This file is extremely long and all I essentially want to do is to break 
> it into it 'fields'
> 
> Now I suspect that regular expressions are the way to go but I have only 
> basic experience of using these and I have no idea what I should be doing.

This seems to work:

data = '''\
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1'''.splitlines()

import re
fieldsRe = re.compile(r'^(\(\d+,\d+\)) (.*?) (\w+) (\d+)$')

for line in data:
match = fieldsRe.match(line)
if match:
print ';'.join(match.group(1, 2, 3, 4))


I don't think you want the space after the ; that you put in your 
example; Excel wants a single-character delimiter.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression help

2007-06-27 Thread Tom Tucker

I think I have a solution.

File

(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1


Script
#
#!/usr/bin/python

import re

#matchstr regex flow
# (\(\d+,\d+\)) # (0018,0014)
# \s   # [space]
# (..*)# Contrast/Bolus Administration Route Sequence
# \s   # space
# ([a-z]{2}) # SQ - two letters and no more
# \s  # [space]
# (\d)# 1 - single digit
# re.I)   # case insensitive

matchstr = re.compile(r"(\(\d+,\d+\))\s(..*)\s([a-z]{2})\s(\d)",re.I)
myfile = open('/tmp/file','r')

for line in myfile.readlines():
   regex_match = matchstr.match(line)
   if regex_match:
   print regex_match.group(1) + ";" + regex_match.group(2) +
";" + regex_match.group(3) + ";" + regex_match.group(4)


Output
#
(0012,0042);Clinical Trial Subject Reading ID;LO;1
(0012,0050);Clinical Trial Time Point ID;LO;1
(0012,0051);Clinical Trial Time Point Description;ST;1
(0012,0060);Clinical Trial Coordinating Center Name;LO;1
(0018,0010);Contrast/Bolus Agent;LO;1
(0018,0012);Contrast/Bolus Agent Sequence;SQ;1
(0018,0014);Contrast/Bolus Administration Route Sequence;SQ;1
(0018,0015);Body Part Examined;CS;1


On 6/27/07, Gardner, Dean <[EMAIL PROTECTED]> wrote:


 Hi

I have a text file that I would like to split up so that I can use it in
Excel to filter a certain field. However as it is a flat text file I need to
do some processing on it so that Excel can correctly import it.

File Example:
tag descVR  VM
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1

What I essentially want is to use python to process this file to give me

(0012,0042); Clinical Trial Subject Reading ID; LO; 1
(0012,0050); Clinical Trial Time Point ID; LO; 1
(0012,0051); Clinical Trial Time Point Description; ST; 1
(0012,0060); Clinical Trial Coordinating Center Name; LO; 1
(0018,0010); Contrast/Bolus Agent; LO; 1
(0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
(0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
(0018,0015); Body Part Examined; CS; 1

so that I can import to excel using a delimiter.

This file is extremely long and all I essentially want to do is to break
it into it 'fields'

Now I suspect that regular expressions are the way to go but I have only
basic experience of using these and I have no idea what I should be doing.

Can anyone help.

Thanks

DISCLAIMER:
Unless indicated otherwise, the information contained in this message is
privileged and confidential, and is intended only for the use of the
addressee(s) named above and others who have been specifically authorized to
receive it. If you are not the intended recipient, you are hereby notified
that any dissemination, distribution or copying of this message and/or
attachments is strictly prohibited. The company accepts no liability for any
damage caused by any virus transmitted by this email. Furthermore, the
company does not warrant a proper and complete transmission of this
information, nor does it accept liability for any delays. If you have
received this message in error, please contact the sender and delete the
message. Thank you.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular Expression help

2007-06-27 Thread Gardner, Dean
Hi 

I have a text file that I would like to split up so that I can use it in
Excel to filter a certain field. However as it is a flat text file I
need to do some processing on it so that Excel can correctly import it.

File Example:
tag descVR  VM
(0012,0042) Clinical Trial Subject Reading ID LO 1 
(0012,0050) Clinical Trial Time Point ID LO 1 
(0012,0051) Clinical Trial Time Point Description ST 1 
(0012,0060) Clinical Trial Coordinating Center Name LO 1 
(0018,0010) Contrast/Bolus Agent LO 1 
(0018,0012) Contrast/Bolus Agent Sequence SQ 1 
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1 
(0018,0015) Body Part Examined CS 1 

What I essentially want is to use python to process this file to give me



(0012,0042); Clinical Trial Subject Reading ID; LO; 1 
(0012,0050); Clinical Trial Time Point ID; LO; 1 
(0012,0051); Clinical Trial Time Point Description; ST; 1 
(0012,0060); Clinical Trial Coordinating Center Name; LO; 1 
(0018,0010); Contrast/Bolus Agent; LO; 1 
(0018,0012); Contrast/Bolus Agent Sequence; SQ ;1 
(0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1 
(0018,0015); Body Part Examined; CS; 1 

so that I can import to excel using a delimiter. 

This file is extremely long and all I essentially want to do is to break
it into it 'fields'

Now I suspect that regular expressions are the way to go but I have only
basic experience of using these and I have no idea what I should be
doing.
Can anyone help.

Thanks




DISCLAIMER:
Unless indicated otherwise, the information contained in this message is 
privileged and confidential, and is intended only for the use of the 
addressee(s) named above and others who have been specifically authorized to 
receive it. If you are not the intended recipient, you are hereby notified that 
any dissemination, distribution or copying of this message and/or attachments 
is strictly prohibited. The company accepts no liability for any damage caused 
by any virus transmitted by this email. Furthermore, the company does not 
warrant a proper and complete transmission of this information, nor does it 
accept liability for any delays. If you have received this message in error, 
please contact the sender and delete the message. Thank you.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression questions

2007-05-03 Thread Bernard Lebel
Thanks a lot Kent, that indeed solves the issues altogether.


Cheers
Bernard




On 5/3/07, Kent Johnson <[EMAIL PROTECTED]> wrote:
> Bernard Lebel wrote:
> > Hello,
> >
> > Once again struggling with regular expressions.
> >
> > I have a string that look like "something_shp1".
> > I want to replace "_shp1" by "_shp". I'm never sure if it's going to
> > be 1, if there's going to be a number after "_shp".
> >
> > So I'm trying to use regular expression to perform this replacement.
> > But I just can't seem to get a match! I always get a None match.
> >
> > I would think that this would have done the job:
> >
> > r = re.compile( r"(_shp\d)$" )
> >
> > The only way I have found to get a match, is using
> >
> > r = re.compile( r"(\S+_shp\d)$" )
>
> My guess is you are calling r.match() rather than r.search(). r.match()
> only looks for matches at the start of the string; r.search() will find
> a match anywhere.
>
> > My second question is related more to the actual string replacement.
> > Using regular expressions, what would be the way to go? I have tried
> > the following:
> >
> > newstring = r.sub( '_shp', oldstring )
> >
> > But the new string is always "_shp" instead of "something_shp".
>
> Because your re matches something_shp.
>
> I think
> newstring = re.sub('_shp\d' '_shp', oldstring )
> will do what you want.
>
> Kent
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular expression questions

2007-05-03 Thread Kent Johnson
Bernard Lebel wrote:
> Hello,
> 
> Once again struggling with regular expressions.
> 
> I have a string that look like "something_shp1".
> I want to replace "_shp1" by "_shp". I'm never sure if it's going to
> be 1, if there's going to be a number after "_shp".
> 
> So I'm trying to use regular expression to perform this replacement.
> But I just can't seem to get a match! I always get a None match.
> 
> I would think that this would have done the job:
> 
> r = re.compile( r"(_shp\d)$" )
> 
> The only way I have found to get a match, is using
> 
> r = re.compile( r"(\S+_shp\d)$" )

My guess is you are calling r.match() rather than r.search(). r.match() 
only looks for matches at the start of the string; r.search() will find 
a match anywhere.

> My second question is related more to the actual string replacement.
> Using regular expressions, what would be the way to go? I have tried
> the following:
> 
> newstring = r.sub( '_shp', oldstring )
> 
> But the new string is always "_shp" instead of "something_shp".

Because your re matches something_shp.

I think
newstring = re.sub('_shp\d' '_shp', oldstring )
will do what you want.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular expression questions

2007-05-03 Thread Bernard Lebel
Hello,

Once again struggling with regular expressions.

I have a string that look like "something_shp1".
I want to replace "_shp1" by "_shp". I'm never sure if it's going to
be 1, if there's going to be a number after "_shp".

So I'm trying to use regular expression to perform this replacement.
But I just can't seem to get a match! I always get a None match.

I would think that this would have done the job:

r = re.compile( r"(_shp\d)$" )

The only way I have found to get a match, is using

r = re.compile( r"(\S+_shp\d)$" )




My second question is related more to the actual string replacement.
Using regular expressions, what would be the way to go? I have tried
the following:

newstring = r.sub( '_shp', oldstring )

But the new string is always "_shp" instead of "something_shp".



Thanks
Bernard
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression

2006-08-03 Thread arbaro arbaro
Hello,

Im just answering my own email, since I just found out what my error was.

>From a regular expression howto: 
>http://www.amk.ca/python/howto/regex/regex.html

The match() function only checks if the RE matches at the beginning of
the string while search() will scan forward through the string for a
match. It's important to keep this distinction in mind.  Remember,
match() will only report a successful match which will start at 0; if
the match wouldn't start at zero,  match() will not report it.

That was exactly my problem. Replacing r.match(line) for
r.search(line) solved it.

Sorry for having bothered you prematurely.




On 8/3/06, arbaro arbaro <[EMAIL PROTECTED]> wrote:
>
> Hello,
>
> I'm trying to mount an usb device from python under linux.
> To do so, I read the kernel log /proc/kmsg and watch for something like:
>   "<6> /dev/scsi/host3/bus0/target0/lun0/:<7>usb-storage: device scan 
> complete"
>
> When I compile a regular expression like:
>   "r = re.compile('<\d+>\s/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')"
> It is found. But I don't want the <\d+>\s or '<6> ' in front of the path, so 
> I tried:
>"r = re.compile('/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')"
> But this way the usb device path it is not found.
>
> So what i'm trying to do is:
> - find the usb device path from the kernel log with the regular expression.
> - Determine the start and end positions of the match (and add /disc or /part1 
> to the match).
> - And use that to mount the usb stick on /mnt/usb -> mount -t auto match 
> /mnt/usb
>
> If anyone can see what i'm doing wrong, please tell me, because I don't 
> understand it anymore.
> Thanks.
>
> Below is the code:
>
> # \d+ = 1 or more digits
>  # \s  = an empty space
>
> import re
>
> def findusbdevice():
> ''' Returns path of usb device '''
> # I did a 'cat /proc/kmsg /log/kmsg' to be able to read the kernel 
> message.
> # Somehow I can't read /proc/kmsg directly.
> kmsg = open('/log/kmsg', 'r')
> r = re.compile('/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')
> #r = re.compile('<\d+>\s/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')
> for line in kmsg:
> if 'usb-storage' in line and  r.match(line):
> print 'Success', line
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression

2006-08-03 Thread Kent Johnson
arbaro arbaro wrote:
> Hello,
>
> I'm trying to mount an usb device from python under linux.
> To do so, I read the kernel log /proc/kmsg and watch for something like:
>   "<6> /dev/scsi/host3/bus0/target0/lun0/:<7>usb-storage: device scan 
> complete"
>
> When I compile a regular expression like:
>   "r = re.compile('<\d+>\s/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')"
You should use raw strings for regular expressions that contain \ 
characters.
> It is found. But I don't want the <\d+>\s or '<6> ' in front of the 
> path, so I tried:
>"r = re.compile('/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')"
> But this way the usb device path it is not found.
You are using re.match(), which just looks for a match at the start of 
the string. Try using re.search() instead.
http://docs.python.org/lib/matching-searching.html

Kent
>
> So what i'm trying to do is:
> - find the usb device path from the kernel log with the regular 
> expression.
> - Determine the start and end positions of the match (and add /disc or 
> /part1 to the match).
> - And use that to mount the usb stick on /mnt/usb -> mount -t auto 
> match /mnt/usb
>
> If anyone can see what i'm doing wrong, please tell me, because I 
> don't understand it anymore.
> Thanks.
>
> Below is the code:
>
> # \d+ = 1 or more digits
> # \s  = an empty space
>
> import re
>
> def findusbdevice():
> ''' Returns path of usb device '''
> # I did a 'cat /proc/kmsg /log/kmsg' to be able to read the kernel 
> message.
> # Somehow I can't read /proc/kmsg directly.
> kmsg = open('/log/kmsg', 'r')
> r = re.compile('/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')
> #r = re.compile('<\d+>\s/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')
> for line in kmsg:
> if 'usb-storage' in line and r.match(line):
> print 'Success', line
> 
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>   


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] regular expression

2006-08-03 Thread arbaro arbaro
Hello,I'm trying to mount an usb device from python under linux.To do so, I read the kernel log /proc/kmsg and watch for something like:  "<6> /dev/scsi/host3/bus0/target0/lun0/:<7>usb-storage: device scan complete"
When I compile a regular _expression_ like:  "r = re.compile('<\d+>\s/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')"It is found. But I don't want the <\d+>\s or '<6> ' in front of the path, so I tried:
   "r = re.compile('/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')"But this way the usb device path it is not found.So what i'm trying to do is:- find the usb device path from the kernel log with the regular _expression_.
- Determine the start and end positions of the match (and add /disc or /part1 to the match).- And use that to mount the usb stick on /mnt/usb -> mount -t auto match /mnt/usbIf anyone can see what i'm doing wrong, please tell me, because I don't understand it anymore.
Thanks.Below is the code:# \d+ = 1 or more digits
# \s  = an empty spaceimport redef findusbdevice():    ''' Returns path of usb device '''    # I did a 'cat /proc/kmsg /log/kmsg' to be able to read the kernel message.    # Somehow I can't read /proc/kmsg directly.
    kmsg = open('/log/kmsg', 'r')     r = re.compile('/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')    #r = re.compile('<\d+>\s/dev/scsi/host\d+/bus\d+/target\d+/lun\d+')    for line in kmsg:    if 'usb-storage' in line and 
r.match(line):    print 'Success', line
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression Misunderstanding

2006-07-14 Thread Kent Johnson
Steve Nelson wrote:
> Incidentally continuing my reading of the HOWTO I have sat and puzzled
> for about 30 mins on the difference the MULTILINE flag makes.  I can't
> quite see the difference.  I *think* it is as follows:
>
> Under normal circumstances, ^ matches the start of a line, only.  On a
> line by line basis.
>
> With the re.M flag, we get a match after *any* newline?
>
> Similarly with $ - under normal circumstances, $ matches the end of
> the string, or that which precedes a newline.
>
> With the MULTILINE flag, $ matches before *any* newline?
>
> Is this correct?
I'm not sure, I think you are a little confused. MULTILINE only matters 
if the string you are matching contains newlines. Without MULTILINE, ^ 
will match only at the beginning of the string. With it, ^ will match 
after any newline. For example,
In [1]: import re

A string  containing two lines:
In [2]: s='one\ntwo'

The first line matches without MULTILINE:
In [3]: re.search('^one', s)
Out[3]: <_sre.SRE_Match object at 0x00C3E640>

The second one does not (result of the search is None so nothing prints):
In [4]: re.search('^two', s)

With MULTILINE ^two will match:
In [5]: re.search('^two', s, re.MULTILINE)
Out[5]: <_sre.SRE_Match object at 0x00E901E0>

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression Misunderstanding

2006-07-14 Thread Steve Nelson
On 7/14/06, Kent Johnson <[EMAIL PROTECTED]> wrote:

> But for this particular application you might as well use
> line.startswith('b') instead of a regex.

Ah yes, that makes sense.

Incidentally continuing my reading of the HOWTO I have sat and puzzled
for about 30 mins on the difference the MULTILINE flag makes.  I can't
quite see the difference.  I *think* it is as follows:

Under normal circumstances, ^ matches the start of a line, only.  On a
line by line basis.

With the re.M flag, we get a match after *any* newline?

Similarly with $ - under normal circumstances, $ matches the end of
the string, or that which precedes a newline.

With the MULTILINE flag, $ matches before *any* newline?

Is this correct?

> Kent

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression Misunderstanding

2006-07-14 Thread Kent Johnson
Steve Nelson wrote:
> On 7/14/06, John Fouhy <[EMAIL PROTECTED]> wrote:
>
>   
> m = re.match(...)
> dir(m)
>   
>> It will tell you what attributes the match object has.
>> 
>
> Useful - thank you.
>
> I am now confuse on this:
>
> I have a file full of lines beginning with the letter "b".  I want a
> RE that will return the whole line if it begins with b.
>
> I find if I do eg:
>
>   
 m = re.search("^b", "b spam spam spam")
 m.group()
 
> 'b'
>
> How do I get it to return the whole line if it begins with a b?
Use the match object in a test. If the search fails it will return None 
which tests false:

for line in lines:
  m = re.search(...)
  if m:
# do something with line that matches

But for this particular application you might as well use 
line.startswith('b') instead of a regex.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression Misunderstanding

2006-07-14 Thread Luke Paireepinart

> I have a file full of lines beginning with the letter "b".  I want a
> RE that will return the whole line if it begins with b.
>
> I find if I do eg:
>
>   
 m = re.search("^b", "b spam spam spam")
 m.group()
 
> 'b'
>
> How do I get it to return the whole line if it begins with a b?
>
> S.
for line in file:
if line.strip()[0] == 'b':
   print line

or
print [a for a in file if a.strip()[0] == b]
if you want to use list comprehension.
As for the RE way, I've no idea.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression Misunderstanding

2006-07-14 Thread Steve Nelson
On 7/14/06, John Fouhy <[EMAIL PROTECTED]> wrote:

> >>> m = re.match(...)
> >>> dir(m)
>
> It will tell you what attributes the match object has.

Useful - thank you.

I am now confuse on this:

I have a file full of lines beginning with the letter "b".  I want a
RE that will return the whole line if it begins with b.

I find if I do eg:

>>> m = re.search("^b", "b spam spam spam")
>>> m.group()
'b'

How do I get it to return the whole line if it begins with a b?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression Misunderstanding

2006-07-14 Thread John Fouhy
On 14/07/06, Steve Nelson <[EMAIL PROTECTED]> wrote:
> How does one query a match object in this way?  I am learning by
> fiddling interactively.

If you're fiddling interactively, try the dir() command --

ie:

>>> m = re.match(...)
>>> dir(m)

It will tell you what attributes the match object has.

Or you can read the documentation --- a combination of both approaches
usually works quite well :-)

-- 
John.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression Misunderstanding

2006-07-14 Thread Kent Johnson
Steve Nelson wrote:
> On 7/14/06, John Fouhy <[EMAIL PROTECTED]> wrote:
>
>   
>> It doesn't have to match the _whole_ string.
>> 
>
> Ah right - yes, so it doesn't say that it has to end with a b - as per
> your comment about ending with $.
>   
The matched portion must end with b, but it doesn't have to coincide 
with the end of the string. The whole regex must be used for it to 
match; the whole string does not have to be used - the matched portion 
can be a substring.
>   
>> If you look at the match object returned, you should se that the match
>> starts at position 0 and is four characters long.
>> 
>
> How does one query a match object in this way?  I am learning by
> fiddling interactively.
The docs for match objects are here:
http://docs.python.org/lib/match-objects.html

match.start() and match.end() will tell you where it matched.

You might like to try the regex demo that comes with Python; on Windows 
it is installed at C:\Python24\Tools\Scripts\redemo.py. It gives you an 
easy way to experiment with regexes.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression Misunderstanding

2006-07-14 Thread Steve Nelson
On 7/14/06, John Fouhy <[EMAIL PROTECTED]> wrote:

> It doesn't have to match the _whole_ string.

Ah right - yes, so it doesn't say that it has to end with a b - as per
your comment about ending with $.

> If you look at the match object returned, you should se that the match
> starts at position 0 and is four characters long.

How does one query a match object in this way?  I am learning by
fiddling interactively.

> John.

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regular Expression Misunderstanding

2006-07-14 Thread John Fouhy
On 14/07/06, Steve Nelson <[EMAIL PROTECTED]> wrote:
> What I don't understand is how in the end the RE *does* actually match
> - which may indicate a serious misunderstanding on my part.
>
> >>> re.match("a[bcd]*b", "abcbd")
> <_sre.SRE_Match object at 0x186b7b10>
>
> I don't see how abcbd matches! It ends with a d and the RE states it
> should end with a b!
>
> What am I missing?

It doesn't have to match the _whole_ string.

[bcd]* will match, amongst other things, the empty string (ie: 0
repetitions of either a b, a c, or a d).  So "a[bcd]*b" will match
"ab", which is in the string abcbd.

It will also match "abcb", which is the longest match, and thus
probably the one it found.

If you look at the match object returned, you should se that the match
starts at position 0 and is four characters long.

Now, if you asked for "a[bcd]*b$", that would be a different matter!

HTH :-)

-- 
John.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Regular Expression Misunderstanding

2006-07-14 Thread Steve Nelson
Hello,

I am reading the "Regular Expression HOWTO" at
http://www.amk.ca/python/howto/regex/

I am at the bit where "greediness" is discussed with respect to
metacharacters enabling repetition of sections of a RE.  I understand
the concept.

The author gives a step by step example of how the matching engine
goes through the RE step by step, and when the repitition
metacharacter appears it tries the maximum first, and then effectively
reels back until the last step of the RE will pass.

This made sense after a bit of time with pen and paper.

What I don't understand is how in the end the RE *does* actually match
- which may indicate a serious misunderstanding on my part.

>>> re.match("a[bcd]*b", "abcbd")
<_sre.SRE_Match object at 0x186b7b10>

I don't see how abcbd matches! It ends with a d and the RE states it
should end with a b!

What am I missing?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression matching a dot?

2005-10-21 Thread Christian Meesters
Hi Frank & Kent & Hugo,

Didn't have the time to read the list yesterday ...

Thanks for pointing me to the regex-debuggers. Though I don't 
considered myself a regex-beginner I had to learn, that now that I'm 
using regexes only occasionally I might need some help here and there.

Cheers,
Christian


Frank Bloeink wrote:
> Hi [Christian|List]
>
> This post is not regarding your special problem (which anyway has been
> solved by now), but I'd like to share some general tip on working with
> regular expressions.
> There are some nice regex-debuggers out there that can help clearify
> what went wrong when a regex doesn't match when it should or vice 
> versa.
>
> Kodos http://kodos.sourceforge.net/ is one of them, but there are many
> others that can make your life easier ; at least in terms of
> regex-debugging ;)
>
> Probably most of you (especially all regex-gurus) know about this
> already, but i thought it was worth the post as a hint for all 
> beginners
>
> hth Frank

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression matching a dot?

2005-10-20 Thread Hugo González Monteverde
I personally fancy Kiki, that comes with the WxPython installer... It 
has very nice coloring for grouping in Regexes.

Hugo

Kent Johnson wrote:
> Frank Bloeink wrote:
> 
>>There are some nice regex-debuggers out there that can help clearify
>>what went wrong when a regex doesn't match when it should or vice versa.
>>
>>Kodos http://kodos.sourceforge.net/ is one of them, but there are many
>>others that can make your life easier ; at least in terms of
>>regex-debugging ;) 
> 
> 
> Yes, these programs can be very helpful. There is even one that ships with 
> Python - see
> C:\Python24\Tools\Scripts\redemo.py
> 
> Kent
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


  1   2   >