[Tutor] Regex Question

2013-09-30 Thread Leena Gupta
Hello,

I have a TSV file that has the city,state,country information in this
format:
Name   Display name  Code
San Jose  SJC SJC - SJ (POP), CA (US)
San Francisco  SFOSFO - SF, CA (US)

I need to extract the state and country for each city from this file. I'm
trying to do this in python by using the following Regex:

s=re.search(',(.*?)\(',text)
   if s:
   state=s.group(1).strip()
c=re.search('\((.*?)\)',text)
   if c:
   country=c.group(1).strip()


This works well for the state. But for country for San Jose, it brings the
following:
country = POP

I think it maybe better to search from the end of the string,but I am
unable to get the right syntax. Could you please share any pointers?

Thanks!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex Question

2013-09-30 Thread Dave Angel
On 30/9/2013 16:29, Leena Gupta wrote:

 Hello,

 I have a TSV file that has the city,state,country information in this
 format:
 Name   Display name  Code
 San Jose  SJC SJC - SJ (POP), CA (US)
 San Francisco  SFOSFO - SF, CA (US)

That's not a format, it's a infinitesimally tiny sample.  But if we
trust in this sample, you don't need a regex at all.  The state and
country are in the last 7 characters of the string:

countr = text[-3:-1]
state = text[-7:-5]

I could be off by 1 or 2, but you get the idea.

if this isn't good enough, then either supply or give a reference to a
specification for how code is encoded.

(If it does indeed need a regex, someone else will have to help)

-- 
DaveA


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex Question

2013-09-30 Thread Mark Lawrence

On 30/09/2013 21:29, Leena Gupta wrote:

Hello,

I have a TSV file that has the city,state,country information in this
format:
Name   Display name  Code
San Jose  SJC SJC - SJ (POP), CA (US)
San Francisco  SFOSFO - SF, CA (US)

I need to extract the state and country for each city from this file.
I'm trying to do this in python by using the following Regex:

s=re.search(',(.*?)\(',text)
if s:
state=s.group(1).strip()
c=re.search('\((.*?)\)',text)
if c:
country=c.group(1).strip()


This works well for the state. But for country for San Jose, it brings
the following:
country = POP

I think it maybe better to search from the end of the string,but I am
unable to get the right syntax. Could you please share any pointers?

Thanks!



I'd be strongly inclined to use the CSV module from the standard library 
with an excel-tab dialect name, see 
http://docs.python.org/3/library/csv.html#module-csv


Please try it and if you encounter any problems feel free to get back to 
us, we don't bite :)

--
Cheers.

Mark Lawrence

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2012-04-08 Thread Wayne Werner

On Fri, 6 Apr 2012, Khalid Al-Ghamdi wrote:


hi all,
I'm trying to extract the domain in the following string. Why doesn't my 
pattern (patt) work:

 redata
'Tue Jan 14 00:43:21 2020::eax...@gstwyysnbd.gov::1578951801-6-10 Sat Jul 31 
15:17:39 1993::rz...@wgxvhx.com::744121059-5-6 Mon Sep 21 20:22:37
1987::ttw...@rpybrct.edu::559243357-6-7 Fri Aug  2 07:15:23 
1991::t...@mgfyitsks.net::681106523-4-9 Mon Mar 18 19:59:47 
2024::dgz...@fhyykji.org::1710781187-6-7 '
 patt=r'\w+\.\w{3}(?=@)'
 re.findall(patt,redata)
[]

This pattern works but the first should, too. shouldn't it?


The all too familiar quote looks like it applies here: Often programmers, 
when faced with a problem, think 'Aha! I'll use a regex!'. Now you have 
two problems.


It looks like you could easily split this string with redata.split('::') 
and then look at every second element in the list and split *that* element 
on the last '.' in the string.


With data as well-formed as this, regex is probably overkill.

HTH,
Wayne
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2012-04-08 Thread kaifeng jin
I think you can do this:
a=[]
b=redata.split('::')
for e in b:
if e.find('@') != -1:
a.append(e.split('@')[1])

list a includes all the domain

在 2012年4月9日 上午5:26,Wayne Werner wa...@waynewerner.com写道:

 On Fri, 6 Apr 2012, Khalid Al-Ghamdi wrote:

  hi all,
 I'm trying to extract the domain in the following string. Why doesn't my
 pattern (patt) work:

  redata
 'Tue Jan 14 00:43:21 2020::eax...@gstwyysnbd.gov::**1578951801-6-10 Sat
 Jul 31 15:17:39 1993::rz...@wgxvhx.com::**744121059-5-6 Mon Sep 21
 20:22:37
 1987::ttw...@rpybrct.edu::**559243357-6-7 Fri Aug  2 07:15:23
 1991::t...@mgfyitsks.net::**681106523-4-9 Mon Mar 18 19:59:47
 2024::dgz...@fhyykji.org::**1710781187-6-7 '
  patt=r'\w+\.\w{3}(?=@)'
  re.findall(patt,redata)
 []

 This pattern works but the first should, too. shouldn't it?


 The all too familiar quote looks like it applies here: Often programmers,
 when faced with a problem, think 'Aha! I'll use a regex!'. Now you have two
 problems.

 It looks like you could easily split this string with redata.split('::')
 and then look at every second element in the list and split *that* element
 on the last '.' in the string.

 With data as well-formed as this, regex is probably overkill.

 HTH,
 Wayne

 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor




-- 
twitter:@zybest https://twitter.com/#!/zybest
新浪微博:@爱子悦 http://www.weibo.com/zybest
在openshift上搭建wordpress:http://blog-mking.rhcloud.com/
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2012-04-06 Thread Peter Otten
Khalid Al-Ghamdi wrote:

 I'm trying to extract the domain in the following string. Why doesn't my
 pattern (patt) work:
 
 redata
 'Tue Jan 14 00:43:21 2020::eax...@gstwyysnbd.gov::1578951801-6-10 Sat Jul
 31 15:17:39 1993::rz...@wgxvhx.com::744121059-5-6 Mon Sep 21 20:22:37
 1987::ttw...@rpybrct.edu::559243357-6-7 Fri Aug  2 07:15:23
 1991::t...@mgfyitsks.net::681106523-4-9 Mon Mar 18 19:59:47
 2024::dgz...@fhyykji.org::1710781187-6-7 '
 patt=r'\w+\.\w{3}(?=@)'
 re.findall(patt,redata)
 []
 
 This pattern works but the first should, too. shouldn't it?

No. I think you want r'(?=@)\w+\.\w{3}'.

How do you handle a domain like web.de, by the way?


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex question

2011-04-03 Thread Andrés Chandía


I continue working with RegExp, but I have reached a point for wich I can't find
documentation, maybe there is no possible way to do it, any way I throw the 
question:

This is my code:

    contents = re.sub(r'Á',
A, contents)
    contents = re.sub(r'á', a,
contents)
    contents = re.sub(r'É', E, contents)
    contents = re.sub(r'é', e, contents)
    contents = re.sub(r'Í', I, contents)
    contents = re.sub(r'í', i, contents)
    contents = re.sub(r'Ó', O, contents)
    contents = re.sub(r'ó', o, contents)
    contents = re.sub(r'Ú', U, contents)
    contents = re.sub(r'ú', u, contents)

It is
clear that I need to convert any accented vowel into the same not accented 
vowel,
The
qestion is : is there a way to say that whenever you find an accented character 
this
one
has to change into a non accented character, but not every character, it must 
be only
this vowels and accented this way, because at the language I am working with, 
there are
letters
like ü, and ñ that should remain the same.

thanks you
all.

___
andrés
chandía

P No imprima
innecesariamente. ¡Cuide el medio ambiente!


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex question

2011-04-03 Thread Hugo Arts
2011/4/3 Andrés Chandía and...@chandia.net:


 I continue working with RegExp, but I have reached a point for wich I can't 
 find
 documentation, maybe there is no possible way to do it, any way I throw the 
 question:

 This is my code:

     contents = re.sub(r'Á',
 A, contents)
     contents = re.sub(r'á', a,
 contents)
     contents = re.sub(r'É', E, contents)
     contents = re.sub(r'é', e, contents)
     contents = re.sub(r'Í', I, contents)
     contents = re.sub(r'í', i, contents)
     contents = re.sub(r'Ó', O, contents)
     contents = re.sub(r'ó', o, contents)
     contents = re.sub(r'Ú', U, contents)
     contents = re.sub(r'ú', u, contents)

 It is
 clear that I need to convert any accented vowel into the same not accented 
 vowel,
 The
 qestion is : is there a way to say that whenever you find an accented 
 character this
 one
 has to change into a non accented character, but not every character, it must 
 be only
 this vowels and accented this way, because at the language I am working with, 
 there are
 letters
 like ü, and ñ that should remain the same.


Okay, first thing, forget about regexes for this problem.They're too
complicated and not suited to it.

Encoding issues make this a somewhat complicated problem. In Unicode,
There's two ways to encode most accented characters. For example, the
character Ć can be encoded both by U+0106, LATIN CAPITAL LETTER C
WITH ACUTE, and a combination of U+0043 and U+0301, being simply 'C'
and the 'COMBINING ACUTE ACCENT', respectively. You must remove both
forms to be sure every accented character is gone from your string.

using unicode.translate, you can craft a translation table to
translate the accented characters to their non-accented counterparts.
The combining characters can simply be removed by mapping them to
None.

HTH,
Hugo
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex question

2011-04-03 Thread Peter Otten
Hugo Arts wrote:

 2011/4/3 Andrés Chandía and...@chandia.net:


 I continue working with RegExp, but I have reached a point for wich I
 can't find documentation, maybe there is no possible way to do it, any
 way I throw the question:

 This is my code:

 contents = re.sub(r'Á',
 A, contents)
 contents = re.sub(r'á', a,
 contents)
 contents = re.sub(r'É', E, contents)
 contents = re.sub(r'é', e, contents)
 contents = re.sub(r'Í', I, contents)
 contents = re.sub(r'í', i, contents)
 contents = re.sub(r'Ó', O, contents)
 contents = re.sub(r'ó', o, contents)
 contents = re.sub(r'Ú', U, contents)
 contents = re.sub(r'ú', u, contents)

 It is
 clear that I need to convert any accented vowel into the same not
 accented vowel, The
 qestion is : is there a way to say that whenever you find an accented
 character this one
 has to change into a non accented character, but not every character, it
 must be only this vowels and accented this way, because at the language I
 am working with, there are letters
 like ü, and ñ that should remain the same.

 
 Okay, first thing, forget about regexes for this problem.They're too
 complicated and not suited to it.
 
 Encoding issues make this a somewhat complicated problem. In Unicode,
 There's two ways to encode most accented characters. For example, the
 character Ć can be encoded both by U+0106, LATIN CAPITAL LETTER C
 WITH ACUTE, and a combination of U+0043 and U+0301, being simply 'C'
 and the 'COMBINING ACUTE ACCENT', respectively. You must remove both
 forms to be sure every accented character is gone from your string.
 
 using unicode.translate, you can craft a translation table to
 translate the accented characters to their non-accented counterparts.
 The combining characters can simply be removed by mapping them to
 None.

If you go that road you might be interested in Fredrik Lundh's article at

http://effbot.org/zone/unicode-convert.htm

The class presented there is a bit tricky, but for your purpose it might be 
sufficient to subclass it:

 KEEP_CHARS = set(ord(c) for c in uüñ)
 class Map(unaccented_map):
... def __missing__(self, key):
... if key in KEEP_CHARS:
... self[key] = key
... return key
... return unaccented_map.__missing__(self, key)
...
 print uäöü.translate(Map())
aoü


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Regex question

2011-03-30 Thread Alan Gauld


Andrés Chandía and...@chandia.net wrote


I'm new to this list, so hello everybody!.


Hi, welcome to the list.

Please do  not use reply to start a new thread it confuses threaded
readers and may mean you message will not be seen. Also please
supply a meaningful subject (as above) so we can decide if it
looks like something we can answer!

These will help you maximise the replies. Also, although not
relevant here, please include the full text of any error messages
and the Python version and OS you are using (2 or 3 etc).
Basically anything that helps us understand the context.


in perl there is a way to reference previous registers,
$text =~ s/u(l|L|n|N)\/u/$1e/g;



I'm looking for the way to do it in python


It is possible but I'll let some of the more regex literate users
tell you how :-)

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex question

2011-03-30 Thread Steve Willoughby

On 29-Mar-11 23:55, Alan Gauld wrote:

Andrés Chandía and...@chandia.net wrote

in perl there is a way to reference previous registers,
$text =~ s/u(l|L|n|N)\/u/$1e/g;



I'm looking for the way to do it in python



If you're using just a straight call to re.sub(), it works like this:

text = re.sub(r'u(l|L|n|N)/u', '\1e', text)

You use \1, \2, etc. for backreferences just like all the other 
regex-based editors do (Perl's more of an exception than the rule there).


Alternatively, you can pre-compile the regular expression into an object:
pattern = re.compile(r'u(l|L|n|N)/u')

and then substitute by calling its sub() method:

text = pattern.sub('\1e', text)
--
Steve Willoughby / st...@alchemy.com
A ship in harbor is safe, but that is not what ships are built for.
PGP Fingerprint 48A3 2621 E72C 31D9 2928 2E8F 6506 DB29 54F7 0F53
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex question

2011-03-30 Thread Andrés Chandía


Thanks Kushal and Steve.
I think it works,a I say I think because at the
results I got a strange character instead of the letter that should appear

this is
my regexp:

contents = re.sub(r'(u|span style=text-decoration:
underline;)(l|L|n|N|t|T)(/span|/u)', '\2\'' ,contents)

this is my input file content:
ul/uomo  
un/uomo  
ut/uomo  
uL/uomo  
uN/uomo  
uT/uomo  
span style=text-decoration:
underline;n/spanomo  
ut/uomo 

this is
my output file content
'omo  
'omo  
'omo  
'omo 

'omo  
'omo  
'omo  
'omo  

at to head
of the file I got:
#!/usr/bin/env python
# -*- coding: utf-8 -*-

I tried
changing the coding to iso-8859-15, but nothing, for sure you know the reason 
for this, can
you share it with this poor newbee

Thanks a lot!!




On Wed, March 30, 2011 09:46, Kushal Kumaran wrote:
2011/3/30 Andrés
Chandía and...@chandia.net:


 I'm new to
this list, so hello everybody!.


Hello Andrés


The stuff:

 I'm working with
 regexps and this is my line:

 contents = re.sub(ul\/u,
 le
,contents)

 in perl there is a way to reference previous registers,
 i.e.

 $text =~ s/u(l|L|n|N)\/u/$1e/g;

 So I'm looking for
 the way to do it in python, obviously this does not
works:

 contents =

re.sub(u(l|L|n|N)\/u, $1e, contents)


You will use \1 for the backreference.  The documentation of the re
module
(http://docs.python.org/library/re.html#re.sub) has an example.
 Also note the use of raw
strings (r'...') to avoid having to escape
the backslash with another backslash.




___
andrés
chandía

P No imprima
innecesariamente. ¡Cuide el medio ambiente!


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex question

2011-03-30 Thread Steve Willoughby

On 30-Mar-11 08:21, Andrés Chandía wrote:



Thanks Kushal and Steve.
I think it works,a I say I think because at the
results I got a strange character instead of the letter that should appear

this is
my regexp:

contents = re.sub(r'(u|span style=text-decoration:
underline;)(l|L|n|N|t|T)(/span|/u)', '\2\'' ,contents)


Remember that \2 in a string means the ASCII character with the code 
002.  You need to escape this with an extra backslash:

'\\2\''
Although it would be more convenient to switch to double quotes to make 
the inclusion of the literal single quote easier:

\\2'

How does that work?  As the string is being built, the \\ is 
interpreted as a literal backslash, so the actual characters in the 
string's value end up being:

\2'
THAT is what is then passed into the sub() function, where \2 means to 
replace the second match.


This can be yet simpler by using raw strings:
r\2'

Since in raw strings, backslashes do almost nothing special at all, so 
you don't need to double them.


I should have thought of that when sending my original answer to your 
question.  Sorry I overlooked it.


--steve


--
Steve Willoughby / st...@alchemy.com
A ship in harbor is safe, but that is not what ships are built for.
PGP Fingerprint 48A3 2621 E72C 31D9 2928 2E8F 6506 DB29 54F7 0F53
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex question

2011-03-30 Thread Andrés Chandía


Thanks Steve, your are, from now on, my guru

this is the final version, the
good one!

contents = re.sub(r'(u|span style=text-decoration:
underline;)(l|L|n|N|t|T)(/span|/u)', r\2' ,contents)


On Wed, March 30, 2011 17:27, Steve Willoughby wrote:
On 30-Mar-11 08:21,
Andrés Chandía wrote:


 Thanks Kushal
and Steve.
 I think it works,a I say I think because at the

results I got a strange character instead of the letter that should appear


this is
 my regexp:

 contents = re.sub(r'(u|span
style=text-decoration:

underline;)(l|L|n|N|t|T)(/span|/u)', '\2\'' ,contents)

Remember that \2 in a string means the ASCII character with the code
002.  You need to
escape this with an extra backslash:
'\\2\''
Although it would be more convenient
to switch to double quotes to make
the inclusion of the literal single quote easier:
\\2'

How does that work?  As the string is being built,
the \\ is
interpreted as a literal backslash, so the actual characters in the
string's value end up being:
\2'
THAT is what is then passed into the sub()
function, where \2 means to
replace the second match.

This can be yet simpler
by using raw strings:
r\2'

Since in raw strings, backslashes do
almost nothing special at all, so
you don't need to double them.

I should have
thought of that when sending my original answer to your
question.  Sorry I overlooked
it.

--steve





___
andrés
chandía

P No imprima
innecesariamente. ¡Cuide el medio ambiente!


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] regex question

2011-01-04 Thread Richard D. Moores
I use

regex = .* + search + .*
p = re.compile(regex, re.I)

in finding lines in a text file that contain search, a string entered
at a prompt.

What regex do I use to find lines in a text file that contain search,
where search is a word entered at a prompt?

Thanks,

Dick Moores
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2011-01-04 Thread Wayne Werner
On Tue, Jan 4, 2011 at 9:37 AM, Richard D. Moores rdmoo...@gmail.comwrote:

 I use

 regex = .* + search + .*
 p = re.compile(regex, re.I)

 in finding lines in a text file that contain search, a string entered
 at a prompt.

 What regex do I use to find lines in a text file that contain search,
 where search is a word entered at a prompt?

 Thanks,

 Dick Moores


You could use (2.6+ I think):

word = raw_input('Enter word to search for: ')
with open('somefile.txt') as f:
   for line in f:
   if word in line:
print line

You could always try a speed test, but I'm guessing that other than
extremely large files (10k+ lines) you probably won't see much speed
difference. Then again, you might!

HTH,
Wayne

p.s. I tend to only use a regex when I absolutely need to, because usually
when you try to solve one problem with a regex it becomes two problems.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2011-01-04 Thread Richard D. Moores
On Tue, Jan 4, 2011 at 07:55, Wayne Werner waynejwer...@gmail.com wrote:
 On Tue, Jan 4, 2011 at 9:37 AM, Richard D. Moores rdmoo...@gmail.com

 You could use (2.6+ I think):
 word = raw_input('Enter word to search for: ')
 with open('somefile.txt') as f:
    for line in f:
        if word in line:
             print line

I think I do need a regex for cases such as this:

A file has these 2 lines:

alksdhjf ksjhdf kjshf dex akjdhf jkdshf jsdhf
alkdshf jkashd flkjdsf index alkdjshf alkdjshf

And I want the only line that contains the word dex

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2011-01-04 Thread Brett Ritter
On Tue, Jan 4, 2011 at 10:37 AM, Richard D. Moores rdmoo...@gmail.com wrote:
 regex = .* + search + .*
 p = re.compile(regex, re.I)

 in finding lines in a text file that contain search, a string entered
 at a prompt.

That's an inefficient regex (though the compiler may be smart enough
to prune the unneeded .*).

Just having search as your regex is fine (it will search for the
pattern _in_ the string, no need to specify the other parts of the
string), but if you're not using any special regex characters you're
probably better off not using a regex and just using a string
operation.

Regexes are great for trying to do powerful and complicated things -
and as such may be too complicated if you're trying to do a simple
thing.

-- 
Brett Ritter / SwiftOne
swift...@swiftone.org
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2011-01-04 Thread Richard D. Moores
On Tue, Jan 4, 2011 at 09:31, Brett Ritter swift...@swiftone.org wrote:
 On Tue, Jan 4, 2011 at 10:37 AM, Richard D. Moores rdmoo...@gmail.com wrote:
 regex = .* + search + .*
 p = re.compile(regex, re.I)


 Just having search as your regex is fine (it will search for the
 pattern _in_ the string, no need to specify the other parts of the
 string),

I see. Thanks.

 but if you're not using any special regex characters you're
 probably better off not using a regex and just using a string
 operation.

Please see my reply to Wayne Werner.

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2011-01-04 Thread Richard D. Moores
On Tue, Jan 4, 2011 at 10:41, Richard D. Moores rdmoo...@gmail.com wrote:
 Please see http://tutoree7.pastebin.com/z9YeSYRw . I'm actually
 searching RTF files, not TXT files.

 I want to modify this script to handle searching on a word. So what,
 for example, should line 71 be?

OK, I think I've got it.

in place of lines 66-75 I now have

search = input(first search string: )
search = \\b + search + \\b
if not search:
print(Bye)
sys.exit()
elif search[0] != ' ':
p = re.compile(search, re.I)
else:
p = re.compile(search)

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2011-01-04 Thread Richard D. Moores
On Tue, Jan 4, 2011 at 11:57, Richard D. Moores rdmoo...@gmail.com wrote:
 On Tue, Jan 4, 2011 at 10:41, Richard D. Moores rdmoo...@gmail.com wrote:
 Please see http://tutoree7.pastebin.com/z9YeSYRw . I'm actually
 searching RTF files, not TXT files.

 I want to modify this script to handle searching on a word. So what,
 for example, should line 71 be?

 OK, I think I've got it.

 in place of lines 66-75 I now have

 search = input(first search string: )
    search = \\b + search + \\b
    if not search:
        print(Bye)
        sys.exit()
    elif search[0] != ' ':
        p = re.compile(search, re.I)
    else:
        p = re.compile(search)

Oops. That should be

search = input(first search string: )
if not search:
print(Bye)
sys.exit()
elif search[0] != ' ':
search = \\b + search + \\b
p = re.compile(search, re.I)
else:
search = \\b + search + \\b
p = re.compile(search)

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2011-01-04 Thread Dave Angel

On 01/-10/-28163 02:59 PM, Richard D. Moores wrote:

On Tue, Jan 4, 2011 at 11:57, Richard D. Mooresrdmoo...@gmail.com  wrote:

On Tue, Jan 4, 2011 at 10:41, Richard D. Mooresrdmoo...@gmail.com  wrote:

Please see http://tutoree7.pastebin.com/z9YeSYRw . I'm actually
searching RTF files, not TXT files.

I want to modify this script to handle searching on a word. So what,
for example, should line 71 be?


OK, I think I've got it.

in place of lines 66-75 I now have

search =nput(first search string: )
search =\\b + search + \\b
if not search:
print(Bye)
sys.exit()
elif search[0] != ':
p =e.compile(search, re.I)
else:
p =e.compile(search)


Oops. That should be

search =nput(first search string: )
 if not search:
 print(Bye)
 sys.exit()
 elif search[0] != ':
 search =\\b + search + \\b
 p =e.compile(search, re.I)
 else:
 search =\\b + search + \\b
 p =e.compile(search)

Dick

One hazard is if the string the user inputs has any regex special 
characters in it.  If it's anything but letters and digits you probably 
want to escape it before combining it with your \\b strings.


DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2011-01-04 Thread Steven D'Aprano

Dave Angel wrote:

One hazard is if the string the user inputs has any regex special 
characters in it.  If it's anything but letters and digits you probably 
want to escape it before combining it with your \\b strings.


It is best to escape any user-input before passing it to regex 
regardless. The re.escape function will do the right thing whether the 
string is all letters and digits or not.


 re.escape(dev)
'dev'
 re.escape(dev+)
'dev\\+'


--
Steven


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex question

2011-01-04 Thread Richard D. Moores
On Tue, Jan 4, 2011 at 14:58, Steven D'Aprano st...@pearwood.info wrote:
 Dave Angel wrote:

 One hazard is if the string the user inputs has any regex special
 characters in it.  If it's anything but letters and digits you probably want
 to escape it before combining it with your \\b strings.

 It is best to escape any user-input before passing it to regex regardless.
 The re.escape function will do the right thing whether the string is all
 letters and digits or not.

 re.escape(dev)
 'dev'
 re.escape(dev+)
 'dev\\+'

I didn't know about re.escape.

from the 3.1.3 docs:
re.escape(string)
Return string with all non-alphanumerics backslashed; this is
useful if you want to match an arbitrary literal string that may have
regular expression metacharacters in it.

I'm writing the script for my own use, and don't expect to be
searching on non-alphanumerics. Even so, I'd like to incorporate
re.escape. However, I'm using ' ' to set case sensitive searches, and
'=' to set word searches. Would you take a look at my revised script
at http://tutoree7.pastebin.com/wQHVV68U, lines 72-97? I tried using
line 80, but I can't because '=' is a regular expression
metacharacter. I could use some other character instead of '=', but I
would want it to be one that can be typed easily without using the
shift key. '=' is the best, I think. I did try to use 'qq' instead of
'=', but that got messy. Or is there another, completely different
way to do what I do in lines 72-97 with ' ' and '=' that wouldn't
involve increasing the number of prompts? Right now, the user has to
respond to 4 prompts, even though some responses are quickly made:
either by entering nothing, or by entering anything.

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor