Re: one more question on regex

2016-01-23 Thread Vlastimil Brom
2016-01-22 23:47 GMT+01:00 mg :
> Il Fri, 22 Jan 2016 21:10:44 +0100, Vlastimil Brom ha scritto:
>
>> [...]
>
> You explanation of re.findall() results is correct. My point is that the
> documentation states:
>
> re.findall(pattern, string, flags=0)
> Return all non-overlapping matches of pattern in string, as a list of
> strings
>
> and this is not what re.findall does. IMHO it should be more reasonable
> to get back the whole matches, since this seems to me the most useful
> information for the user. In any case I'll go with finditer, that returns
> in match object all the infos that anyone can look for.
> --
> https://mail.python.org/mailman/listinfo/python-list

Hi,
I don't know the reasoning for this special behaviour of findall, but
it seems to be documented explicitly:
https://docs.python.org/3/library/re.html#re.findall
"... If one or more groups are present in the pattern, return a list
of groups; this will be a list of tuples if the pattern has more than
one group.
finditer is clearly much more robust for general usage.
I only use findall for quick one-line tests (and there one has to
account for this specificities - either by using non capturing groups
or enclosing the whole pattern in a "main" group and use the first
items in the resulting tuples.
vbr
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: one more question on regex

2016-01-22 Thread mg
Il Fri, 22 Jan 2016 21:10:44 +0100, Vlastimil Brom ha scritto:

> 2016-01-22 16:50 GMT+01:00 mg :
>> Il Fri, 22 Jan 2016 15:32:57 +, mg ha scritto:
>>
>>> python 3.4.3
>>>
>>> import re re.search('(ab){2}','abzzabab')
>>> <_sre.SRE_Match object; span=(4, 8), match='abab'>
>>>
>> re.findall('(ab){2}','abzzabab')
>>> ['ab']
>>>
>>> Why for search() the match is 'abab' and for findall the match is
>>> 'ab'?
>>
>> finditer seems to be consistent with search:
>> regex = re.compile('(ab){2}')
>>
>> for match in regex.finditer('abzzababab'):
>>   print ("%s: %s" % (match.start(), match.span() ))
>> ...
>> 4: (4, 8)
>>
>> -- https://mail.python.org/mailman/listinfo/python-list
> 
> Hi,
> as was already pointed out, findall "collects" the content of the
> capturing groups (if present), rather than the whole matching text;
> 
> for repeated captures the last content of them is taken discarding the
> previous ones; cf.:
> 
 re.findall('(?i)(a)x(b)+','axbB')
> [('a', 'B')]

> (for multiple capturing groups in the pattern, a tuple of captured parts
> are collected)
> 
> or with your example with differenciated parts of the string using
> upper/lower case:
 re.findall('(?i)(ab){2}','aBzzAbAB')
> ['AB']


> hth,
>vbr

You explanation of re.findall() results is correct. My point is that the 
documentation states:

re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of 
strings

and this is not what re.findall does. IMHO it should be more reasonable 
to get back the whole matches, since this seems to me the most useful 
information for the user. In any case I'll go with finditer, that returns 
in match object all the infos that anyone can look for.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: one more question on regex

2016-01-22 Thread Vlastimil Brom
2016-01-22 16:50 GMT+01:00 mg :
> Il Fri, 22 Jan 2016 15:32:57 +, mg ha scritto:
>
>> python 3.4.3
>>
>> import re re.search('(ab){2}','abzzabab')
>> <_sre.SRE_Match object; span=(4, 8), match='abab'>
>>
> re.findall('(ab){2}','abzzabab')
>> ['ab']
>>
>> Why for search() the match is 'abab' and for findall the match is 'ab'?
>
> finditer seems to be consistent with search:
> regex = re.compile('(ab){2}')
>
> for match in regex.finditer('abzzababab'):
>   print ("%s: %s" % (match.start(), match.span() ))
> ...
> 4: (4, 8)
>
> --
> https://mail.python.org/mailman/listinfo/python-list

Hi,
as was already pointed out, findall "collects" the content of the
capturing groups (if present), rather than the whole matching text;

for repeated captures the last content of them is taken discarding the
previous ones; cf.:

>>> re.findall('(?i)(a)x(b)+','axbB')
[('a', 'B')]
>>>
(for multiple capturing groups in the pattern, a tuple of captured
parts are collected)

or with your example with differenciated parts of the string using
upper/lower case:
>>> re.findall('(?i)(ab){2}','aBzzAbAB')
['AB']
>>>

hth,
   vbr
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: one more question on regex

2016-01-22 Thread mg
Il Fri, 22 Jan 2016 15:32:57 +, mg ha scritto:

> python 3.4.3
> 
> import re re.search('(ab){2}','abzzabab')
> <_sre.SRE_Match object; span=(4, 8), match='abab'>
> 
 re.findall('(ab){2}','abzzabab')
> ['ab']
> 
> Why for search() the match is 'abab' and for findall the match is 'ab'?

finditer seems to be consistent with search:
regex = re.compile('(ab){2}')

for match in regex.finditer('abzzababab'): 
  print ("%s: %s" % (match.start(), match.span() ))
... 
4: (4, 8)

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: one more question on regex

2016-01-22 Thread Peter Otten
mg wrote:

> python 3.4.3
> 
> import re
> re.search('(ab){2}','abzzabab')
> <_sre.SRE_Match object; span=(4, 8), match='abab'>
> 
 re.findall('(ab){2}','abzzabab')
> ['ab']
> 
> Why for search() the match is 'abab' and for findall the match is 'ab'?

I suppose someone thought it was convenient for findall to return the 
explicit groups if there are any. If you want the whole match aka group(0) 
you can get that with

>>> re.findall('(?:ab){2}','abzzabab')
['abab']


-- 
https://mail.python.org/mailman/listinfo/python-list


one more question on regex

2016-01-22 Thread mg
python 3.4.3 

import re
re.search('(ab){2}','abzzabab')
<_sre.SRE_Match object; span=(4, 8), match='abab'>

>>> re.findall('(ab){2}','abzzabab')
['ab']

Why for search() the match is 'abab' and for findall the match is 'ab'? 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question on regex

2006-12-23 Thread Felix Benner
Prabhu Gurumurthy schrieb:

> to fix this problem, i used negative lookahead with ip pattern:
> so the ip pattern now changes to:
> \d{1,3}(\.\d{1,3}){3}(?!/\d+)
> 
> now the problem is  10.150.100.0 works fine, 10.100.4.64 subnet gets
> matched with ip pattern with the following result:
> 
> 10.100.4.6
> 
> Is there a workaround for this or what should change in ip regex pattern.
> 

I think what you want is that neither /d+ nor another digit nor a . follows:
\d{1,3}(\.\d{1,3}){3}(?!(/\d)|\d|\.)
This way 10.0.0.1234 won't be recognized as ip. Neither will 23.12.
which could be a problem if an ip is at the end of a sentence, so you
might want to omit that.
-- 
http://mail.python.org/mailman/listinfo/python-list


Question on regex

2006-12-23 Thread Prabhu Gurumurthy

Hello all -

I have a file which has IP address and subnet number and I use regex to extract 
the IP separately from subnet.


pattern used for IP: \d{1,3}(\.\d{1,3}){3}
pattern used for subnet:((\d{1,3})|(\d{1,3}(\.\d{1,3}){1,3}))/(\d{1,2})

so I have list of ip/subnets strewn around like this

10.200.0.34
10.200.4.5
10.178.9.45
10.200/22
10.178/16
10.100.4.64/26,
10.150.100.0/28
10/8

with that above examples:
ip regex pattern works for all IP address
subnet regex pattern works for all subnets

problem now is ip pattern also matches the last 2 subnet numbers, because it 
falls under ip regex.


to fix this problem, i used negative lookahead with ip pattern:
so the ip pattern now changes to:
\d{1,3}(\.\d{1,3}){3}(?!/\d+)

now the problem is  10.150.100.0 works fine, 10.100.4.64 subnet gets matched 
with ip pattern with the following result:


10.100.4.6

Is there a workaround for this or what should change in ip regex pattern.

python script:
#!/usr/bin/env python

import re, sys

fh = 0
try:
   fh = open(sys.argv[1], "r")
except IOError, message:
   print "cannot open file: %s" %message
else:

   for lines in fh.readlines():
  lines = lines.strip()

  pattIp = re.compile("(\d{1,3}(\.\d{1,3}){3})(?!/\d+)")
  pattNet = re.compile("((\d{1,3})|(\d{1,3}(\.\d{1,3}){1,3}))/(\d{1,2})")

  match = pattIp.search(lines)
  if match is not None:
 print "ipmatch: %s" %match.groups()[0]

  match = pattNet.search(lines)
  if match is not None:
 print "subnet: %s" %match.groups()[0]

fh.close()

output with that above ip/subnet in a file

ipmatch: 10.200.0.34
ipmatch: 10.200.4.5
ipmatch: 10.178.9.45
subnet: 10.200
subnet: 10.178
ipmatch: 10.100.4.6
subnet: 10.100.4.64
subnet: 10.150.100.0
subnet: 10

TIA
Prabhu
begin:vcard
fn:Prabhu  Gurumurthy
n:Gurumurthy;Prabhu 
org:Silver Spring Networks;IT
adr:Suite 205;;2755 Campus Drive;San Mateo;CA;94403;USA
email;internet:[EMAIL PROTECTED]
title:Network Engineer
tel;work:(650) 357 8770 x134
tel;home:(650) 585 6527
tel;cell:(831) 224 0894
x-mozilla-html:FALSE
url:http://www.silverspringnet.com
version:2.1
end:vcard

-- 
http://mail.python.org/mailman/listinfo/python-list