Re: issue with regular expressions

2019-10-22 Thread joseph pareti
Ok, thanks. It works for me.
regards,

Am Di., 22. Okt. 2019 um 11:29 Uhr schrieb Matt Wheeler :

>
>
> On Tue, 22 Oct 2019, 09:44 joseph pareti,  wrote:
>
>> the following code ends in an exception:
>>
>> import re
>> pattern = 'Sottoscrizione unica soluzione'
>> mylines = []# Declare an empty list.
>
> with open ('tmp.txt', 'rt') as myfile:  # Open tmp.txt for reading
>> text.
>> for myline in myfile:   # For each line in the file,
>> mylines.append(myline.rstrip('\n')) # strip newline and add to
>> list.
>> for element in mylines: # For each element in the
>> list,
>> #print(element)
>>match = re.search(pattern, element)
>>s = match.start()
>>e = match.end()
>>print(element[s:e])
>>
>>
>>
>> F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe
>> search_0.py
>> Traceback (most recent call last):
>>   File "search_0.py", line 10, in 
>> s = match.start()
>> AttributeError: 'NoneType' object has no attribute 'start'
>>
>> any help? Thanks
>>
>
> Check over the docs for re.match again, you'll see it returns either a
> Match object (which is always truthy), or None.
>
> So a simple solution is to wrap your attempts to use the Match object in
>
> ```
> if match:
> ...
> ```
>
>>

-- 
Regards,
Joseph Pareti - Artificial Intelligence consultant
Joseph Pareti's AI Consulting Services
https://www.joepareti54-ai.com/
cell +49 1520 1600 209
cell +39 339 797 0644
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: issue with regular expressions

2019-10-22 Thread Matt Wheeler
On Tue, 22 Oct 2019, 09:44 joseph pareti,  wrote:

> the following code ends in an exception:
>
> import re
> pattern = 'Sottoscrizione unica soluzione'
> mylines = []# Declare an empty list.

with open ('tmp.txt', 'rt') as myfile:  # Open tmp.txt for reading text.
> for myline in myfile:   # For each line in the file,
> mylines.append(myline.rstrip('\n')) # strip newline and add to
> list.
> for element in mylines: # For each element in the list,
> #print(element)
>match = re.search(pattern, element)
>s = match.start()
>e = match.end()
>print(element[s:e])
>
>
>
> F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe
> search_0.py
> Traceback (most recent call last):
>   File "search_0.py", line 10, in 
> s = match.start()
> AttributeError: 'NoneType' object has no attribute 'start'
>
> any help? Thanks
>

Check over the docs for re.match again, you'll see it returns either a
Match object (which is always truthy), or None.

So a simple solution is to wrap your attempts to use the Match object in

```
if match:
...
```

>
-- 
https://mail.python.org/mailman/listinfo/python-list


issue with regular expressions

2019-10-22 Thread joseph pareti
the following code ends in an exception:

import re
pattern = 'Sottoscrizione unica soluzione'
mylines = []# Declare an empty list.
with open ('tmp.txt', 'rt') as myfile:  # Open tmp.txt for reading text.
for myline in myfile:   # For each line in the file,
mylines.append(myline.rstrip('\n')) # strip newline and add to list.
for element in mylines: # For each element in the list,
#print(element)
   match = re.search(pattern, element)
   s = match.start()
   e = match.end()
   print(element[s:e])


F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe
search_0.py
Traceback (most recent call last):
  File "search_0.py", line 10, in 
s = match.start()
AttributeError: 'NoneType' object has no attribute 'start'

any help? Thanks
-- 
Regards,
Joseph Pareti - Artificial Intelligence consultant
Joseph Pareti's AI Consulting Services
https://www.joepareti54-ai.com/
cell +49 1520 1600 209
cell +39 339 797 0644
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Issue with regular expressions

2008-04-30 Thread Shawn Milochik
My stab at it:


My stab at it:

#!/usr/bin/env python

import re

query = '   "  some words"  with and "withoutquotes   "  '

query = re.sub("\s+", " ", query)


words = []

while query.__len__():

query = query.strip()
print("Current query value: '%s'" % query)
print words
print

if query[0] == '"':
secondQuote = query[1:].index('"') + 2
words.append(query[0:secondQuote].replace('"', '').strip())
query = query[secondQuote:]

else:
if query.count(" ") == 0 :
words.append(query)
query = ""
else:
space = query.index(" ")
words.append(query[0:space])
query = query[space:]

print words
print query
--
http://mail.python.org/mailman/listinfo/python-list

Re: Issue with regular expressions

2008-04-30 Thread Gerard Flanagan
On Apr 29, 3:46 pm, Julien <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm fairly new in Python and I haven't used the regular expressions
> enough to be able to achieve what I want.
> I'd like to select terms in a string, so I can then do a search in my
> database.
>
> query = '   "  some words"  with and "withoutquotes   "  '
> p = re.compile(magic_regular_expression)   $ <--- the magic happens
> m = p.match(query)
>
> I'd like m.groups() to return:
> ('some words', 'with', 'and', 'without quotes')
>
> Is that achievable with a single regular expression, and if so, what
> would it be?
>
> Any help would be much appreciated.
>

With simpleparse:

--

from simpleparse.parser import Parser
from simpleparse.common import strings
from simpleparse.dispatchprocessor import DispatchProcessor, getString


grammar = '''
text := (quoted / unquoted / ws)+
quoted   := string
unquoted := -ws+
ws   := [ \t\r\n]+
'''

class MyProcessor(DispatchProcessor):

def __init__(self, groups):
self.groups = groups

def quoted(self, val, buffer):
self.groups.append(' '.join(getString(val, buffer)
[1:-1].split()))

def unquoted(self, val, buffer):
self.groups.append(getString(val, buffer))

def ws(self, val, buffer):
pass

groups = []
parser = Parser(grammar, 'text')
proc = MyProcessor(groups)
parser.parse(TESTS[1][1][0], processor=proc)

print groups
--

G.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Issue with regular expressions

2008-04-29 Thread George Sakkis
On Apr 29, 9:46 am, Julien <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm fairly new in Python and I haven't used the regular expressions
> enough to be able to achieve what I want.
> I'd like to select terms in a string, so I can then do a search in my
> database.
>
> query = '   "  some words"  with and "withoutquotes   "  '
> p = re.compile(magic_regular_expression)   $ <--- the magic happens
> m = p.match(query)
>
> I'd like m.groups() to return:
> ('some words', 'with', 'and', 'without quotes')
>
> Is that achievable with a single regular expression, and if so, what
> would it be?

As other replies mention, there is no single expression since you are
doing two things: find all matches and substitute extra spaces within
the quoted matches. It can be done with two expressions though:

def normquery(text, findterms=re.compile(r'"([^"]+)"|(\S+)').findall,
normspace=re.compile(r'\s{2,}').sub):
return [normspace(' ', (t[0] or t[1]).strip()) for t in
findterms(text)]

>>> normquery('   "some words"  with and "withoutquotes   "  ')
>>> ['some words', 'with', 'and', 'without quotes']


HTH,
George
--
http://mail.python.org/mailman/listinfo/python-list


Re: Issue with regular expressions

2008-04-29 Thread Paul McGuire
On Apr 29, 9:20 am, Paul McGuire <[EMAIL PROTECTED]> wrote:
> On Apr 29, 8:46 am, Julien <[EMAIL PROTECTED]> wrote:
>
> > I'd like to select terms in a string, so I can then do a search in my
> > database.
>
> > query = '   "  some words"  with and "without    quotes   "  '
> > p = re.compile(magic_regular_expression)   $ <--- the magic happens
> > m = p.match(query)
>
> > I'd like m.groups() to return:
> > ('some words', 'with', 'and', 'without quotes')
>

Oh! It wasn't until Matimus's post that I saw that you wanted the
interior whitespace within the quoted strings collapsed also.  Just
add another parse action to the chain of functions on dblQuotedString:

# when a quoted string is found, remove the quotes,
# then strip whitespace from the contents, then
# collapse interior whitespace
dblQuotedString.setParseAction(removeQuotes,
   lambda s:s[0].strip(),
   lambda s:" ".join(s[0].split()))

Plugging this into the previous script now gives:
('some words', 'with', 'and', 'without quotes')

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list


Re: Issue with regular expressions

2008-04-29 Thread Matimus
On Apr 29, 6:46 am, Julien <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm fairly new in Python and I haven't used the regular expressions
> enough to be able to achieve what I want.
> I'd like to select terms in a string, so I can then do a search in my
> database.
>
> query = '   "  some words"  with and "withoutquotes   "  '
> p = re.compile(magic_regular_expression)   $ <--- the magic happens
> m = p.match(query)
>
> I'd like m.groups() to return:
> ('some words', 'with', 'and', 'without quotes')
>
> Is that achievable with a single regular expression, and if so, what
> would it be?
>
> Any help would be much appreciated.
>
> Thanks!!
>
> Julien

I don't know if it is possible to do it all with one regex, but it
doesn't seem practical. I would check-out the shlex module.

>>> import shlex
>>>
>>> query = '   "  some words"  with and "withoutquotes   "  '
>>> shlex.split(query)
['  some words', 'with', 'and', 'withoutquotes   ']

To get rid of the leading and trailing space you can then use strip:

>>> [s.strip() for s in shlex.split(query)]
['some words', 'with', 'and', 'withoutquotes']

The only problem is getting rid of the extra white-space in the middle
of the expression, for which re might still be a good solution.

>>> import re
>>> [re.sub(r"\s+", ' ', s.strip()) for s in shlex.split(query)]
['some words', 'with', 'and', 'without quotes']

Matt
--
http://mail.python.org/mailman/listinfo/python-list


Re: Issue with regular expressions

2008-04-29 Thread harvey . thomas
On Apr 29, 2:46 pm, Julien <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm fairly new in Python and I haven't used the regular expressions
> enough to be able to achieve what I want.
> I'd like to select terms in a string, so I can then do a search in my
> database.
>
> query = '   "  some words"  with and "without    quotes   "  '
> p = re.compile(magic_regular_expression)   $ <--- the magic happens
> m = p.match(query)
>
> I'd like m.groups() to return:
> ('some words', 'with', 'and', 'without quotes')
>
> Is that achievable with a single regular expression, and if so, what
> would it be?
>
> Any help would be much appreciated.
>
> Thanks!!
>
> Julien

You can't do it simply and completely with regular expressions alone
because of the requirement to strip the quotes and normalize
whitespace, but its not too hard to write a function to do it. Viz:

import re

wordre = re.compile('"[^"]+"|[a-zA-Z]+').findall
def findwords(src):
ret = []
for x in wordre(src):
if x[0] == '"':
#strip off the quotes and normalise spaces
ret.append(' '.join(x[1:-1].split()))
else:
ret.append(x)
return ret

query = '   "  Some words"  withand "withoutquotes   "  '
print findwords(query)

Running this gives
['Some words', 'with', 'and', 'without quotes']

HTH

Harvey
--
http://mail.python.org/mailman/listinfo/python-list


Re: Issue with regular expressions

2008-04-29 Thread Hrvoje Niksic
Julien <[EMAIL PROTECTED]> writes:

> I'm fairly new in Python and I haven't used the regular expressions
> enough to be able to achieve what I want.
> I'd like to select terms in a string, so I can then do a search in my
> database.
>
> query = '   "  some words"  with and "withoutquotes   "  '
> p = re.compile(magic_regular_expression)   $ <--- the magic happens
> m = p.match(query)

I don't think you can achieve this with a single regular expression.
Your best bet is to use p.findall() to find all plausible matches, and
then rework them a bit.  For example:

p = re.compile(r'"[^"]*"|[\S]+')
p.findall(query)
['"  some words"', 'with', 'and', '"withoutquotes   "']

At that point, you can easily iterate through the list and remove the
quotes and excess whitespace.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Issue with regular expressions

2008-04-29 Thread Paul Melis

Julien wrote:

Hi,

I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database.

query = '   "  some words"  with and "withoutquotes   "  '
p = re.compile(magic_regular_expression)   $ <--- the magic happens
m = p.match(query)

I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')

Is that achievable with a single regular expression, and if so, what
would it be?


Here's one way with a single regexp plus an extra filter function.

>>> import re
>>> p = re.compile('("([^"]+)")|([^ \t]+)')
>>> m = p.findall(q)
>>> m
[('"  some words"', '  some words', ''), ('', '', 'with'), ('', '', 
'and'), ('"withoutquotes   "', 'withoutquotes   ', '')]

>>> def f(t):
... if t[0] == '':
... return t[2]
... else:
... return t[1]
...
>>> map(f, m)
['  some words', 'with', 'and', 'withoutquotes   ']

If you want to strip away the leading/trailing whitespace from the 
quoted strings, then change the last return statement to

be "return t[1].strip()".

Paul
--
http://mail.python.org/mailman/listinfo/python-list


Re: Issue with regular expressions

2008-04-29 Thread cokofreedom
| #  Double Quote Text 
|"# match a double quote
|(#  - Two Possiblities:
|\\.  # match two backslashes followed by anything
(include newline)
||# OR
|[^"] # do not match a single quote
|)*   #  - from zero to many
|"# finally match a double quote
|
||#  OR 
|
| #  Single Quote Text 
|'# match a single quote
|(#  - Two Possiblities:
|\\.  # match two backslashes followed by anything
(include newline)
||# OR
|[^'] # do not match a single quote
|)*   #  - from zero to many
|'# finally match a single quote
|""", DOTALL|VERBOSE)

Used this before (minus those | at the beginning) to find double
quotes and single quotes in a file (there is more to this that looks
for C++ and C style quotes but that isn't needed here), perhaps you
can take it another step to not do changes to these matches?

r(\\.|[^"])*"|'(\\.|[^'])*'""", DOTALL)

is it in a single line :)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Issue with regular expressions

2008-04-29 Thread Robert Bossy

Julien wrote:

Hi,

I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database.

query = '   "  some words"  with and "withoutquotes   "  '
p = re.compile(magic_regular_expression)   $ <--- the magic happens
m = p.match(query)

I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')

Is that achievable with a single regular expression, and if so, what
would it be?

Any help would be much appreciated.
  

Hi,

I think re is not the best tool for you. Maybe there's a regular 
expression that does what you want but it will be quite complex and hard 
to maintain.


I suggest you split the query with the double quotes and process 
alternate inside/outside chunks. Something like:


import re

def spulit(s):
   inq = False
   for term in s.split('"'):
   if inq:
   yield re.sub('\s+', ' ', term.strip())
   else:
   for word in term.split():
   yield word
   inq = not inq

for token in spulit('   "  some words"  with and "withoutquotes   "  '):
   print token
  
 
Cheers,

RB
--
http://mail.python.org/mailman/listinfo/python-list


Re: Issue with regular expressions

2008-04-29 Thread Paul McGuire
On Apr 29, 8:46 am, Julien <[EMAIL PROTECTED]> wrote:
> I'd like to select terms in a string, so I can then do a search in my
> database.
>
> query = '   "  some words"  with and "without    quotes   "  '
> p = re.compile(magic_regular_expression)   $ <--- the magic happens
> m = p.match(query)
>
> I'd like m.groups() to return:
> ('some words', 'with', 'and', 'without quotes')
>
> Is that achievable with a single regular expression, and if so, what
> would it be?
>

Julien -

I dabbled with re's for a few minutes trying to get your solution,
then punted and used pyparsing instead.  Pyparsing will run slower
than re, but many people find it much easier to work with readable
class names and instances rather than re's typoglyphics:

from pyparsing import OneOrMore, Word, printables, dblQuotedString,
removeQuotes

# when a quoted string is found, remove the quotes,
# then strip whitespace from the contents
dblQuotedString.setParseAction(removeQuotes,
   lambda s:s[0].strip())

# define terms to be found in query string
term = dblQuotedString | Word(printables)
query_terms = OneOrMore(term)

# parse query string to extract terms
query = '   "  some words"  with and "withoutquotes   "  '
print tuple(query_terms.parseString(query))

Gives:
('some words', 'with', 'and', 'withoutquotes')

The pyparsing wiki is at http://pyparsing.wikispaces.com.  You'll find
an examples page that includes a search query parser, and pointers to
a number of online documentation and presentation sources.

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list


Issue with regular expressions

2008-04-29 Thread Julien
Hi,

I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database.

query = '   "  some words"  with and "withoutquotes   "  '
p = re.compile(magic_regular_expression)   $ <--- the magic happens
m = p.match(query)

I'd like m.groups() to return:
('some words', 'with', 'and', 'without quotes')

Is that achievable with a single regular expression, and if so, what
would it be?

Any help would be much appreciated.

Thanks!!

Julien
--
http://mail.python.org/mailman/listinfo/python-list