Re: issue with regular expressions
Ok, thanks. It works for me. regards, Am Di., 22. Okt. 2019 um 11:29 Uhr schrieb Matt Wheeler : > > > On Tue, 22 Oct 2019, 09:44 joseph pareti, wrote: > >> the following code ends in an exception: >> >> import re >> pattern = 'Sottoscrizione unica soluzione' >> mylines = []# Declare an empty list. > > with open ('tmp.txt', 'rt') as myfile: # Open tmp.txt for reading >> text. >> for myline in myfile: # For each line in the file, >> mylines.append(myline.rstrip('\n')) # strip newline and add to >> list. >> for element in mylines: # For each element in the >> list, >> #print(element) >>match = re.search(pattern, element) >>s = match.start() >>e = match.end() >>print(element[s:e]) >> >> >> >> F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe >> search_0.py >> Traceback (most recent call last): >> File "search_0.py", line 10, in >> s = match.start() >> AttributeError: 'NoneType' object has no attribute 'start' >> >> any help? Thanks >> > > Check over the docs for re.match again, you'll see it returns either a > Match object (which is always truthy), or None. > > So a simple solution is to wrap your attempts to use the Match object in > > ``` > if match: > ... > ``` > >> -- Regards, Joseph Pareti - Artificial Intelligence consultant Joseph Pareti's AI Consulting Services https://www.joepareti54-ai.com/ cell +49 1520 1600 209 cell +39 339 797 0644 -- https://mail.python.org/mailman/listinfo/python-list
Re: issue with regular expressions
On Tue, 22 Oct 2019, 09:44 joseph pareti, wrote: > the following code ends in an exception: > > import re > pattern = 'Sottoscrizione unica soluzione' > mylines = []# Declare an empty list. with open ('tmp.txt', 'rt') as myfile: # Open tmp.txt for reading text. > for myline in myfile: # For each line in the file, > mylines.append(myline.rstrip('\n')) # strip newline and add to > list. > for element in mylines: # For each element in the list, > #print(element) >match = re.search(pattern, element) >s = match.start() >e = match.end() >print(element[s:e]) > > > > F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe > search_0.py > Traceback (most recent call last): > File "search_0.py", line 10, in > s = match.start() > AttributeError: 'NoneType' object has no attribute 'start' > > any help? Thanks > Check over the docs for re.match again, you'll see it returns either a Match object (which is always truthy), or None. So a simple solution is to wrap your attempts to use the Match object in ``` if match: ... ``` > -- https://mail.python.org/mailman/listinfo/python-list
issue with regular expressions
the following code ends in an exception: import re pattern = 'Sottoscrizione unica soluzione' mylines = []# Declare an empty list. with open ('tmp.txt', 'rt') as myfile: # Open tmp.txt for reading text. for myline in myfile: # For each line in the file, mylines.append(myline.rstrip('\n')) # strip newline and add to list. for element in mylines: # For each element in the list, #print(element) match = re.search(pattern, element) s = match.start() e = match.end() print(element[s:e]) F:\October20-2019-RECOVERY\Unicredit_recovery\tmp_re_search>c:\Users\joepareti\Miniconda3\pkgs\python-3.7.1-h8c8aaf0_6\python.exe search_0.py Traceback (most recent call last): File "search_0.py", line 10, in s = match.start() AttributeError: 'NoneType' object has no attribute 'start' any help? Thanks -- Regards, Joseph Pareti - Artificial Intelligence consultant Joseph Pareti's AI Consulting Services https://www.joepareti54-ai.com/ cell +49 1520 1600 209 cell +39 339 797 0644 -- https://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
My stab at it: My stab at it: #!/usr/bin/env python import re query = ' " some words" with and "withoutquotes " ' query = re.sub("\s+", " ", query) words = [] while query.__len__(): query = query.strip() print("Current query value: '%s'" % query) print words print if query[0] == '"': secondQuote = query[1:].index('"') + 2 words.append(query[0:secondQuote].replace('"', '').strip()) query = query[secondQuote:] else: if query.count(" ") == 0 : words.append(query) query = "" else: space = query.index(" ") words.append(query[0:space]) query = query[space:] print words print query -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
On Apr 29, 3:46 pm, Julien <[EMAIL PROTECTED]> wrote: > Hi, > > I'm fairly new in Python and I haven't used the regular expressions > enough to be able to achieve what I want. > I'd like to select terms in a string, so I can then do a search in my > database. > > query = ' " some words" with and "withoutquotes " ' > p = re.compile(magic_regular_expression) $ <--- the magic happens > m = p.match(query) > > I'd like m.groups() to return: > ('some words', 'with', 'and', 'without quotes') > > Is that achievable with a single regular expression, and if so, what > would it be? > > Any help would be much appreciated. > With simpleparse: -- from simpleparse.parser import Parser from simpleparse.common import strings from simpleparse.dispatchprocessor import DispatchProcessor, getString grammar = ''' text := (quoted / unquoted / ws)+ quoted := string unquoted := -ws+ ws := [ \t\r\n]+ ''' class MyProcessor(DispatchProcessor): def __init__(self, groups): self.groups = groups def quoted(self, val, buffer): self.groups.append(' '.join(getString(val, buffer) [1:-1].split())) def unquoted(self, val, buffer): self.groups.append(getString(val, buffer)) def ws(self, val, buffer): pass groups = [] parser = Parser(grammar, 'text') proc = MyProcessor(groups) parser.parse(TESTS[1][1][0], processor=proc) print groups -- G. -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
On Apr 29, 9:46 am, Julien <[EMAIL PROTECTED]> wrote: > Hi, > > I'm fairly new in Python and I haven't used the regular expressions > enough to be able to achieve what I want. > I'd like to select terms in a string, so I can then do a search in my > database. > > query = ' " some words" with and "withoutquotes " ' > p = re.compile(magic_regular_expression) $ <--- the magic happens > m = p.match(query) > > I'd like m.groups() to return: > ('some words', 'with', 'and', 'without quotes') > > Is that achievable with a single regular expression, and if so, what > would it be? As other replies mention, there is no single expression since you are doing two things: find all matches and substitute extra spaces within the quoted matches. It can be done with two expressions though: def normquery(text, findterms=re.compile(r'"([^"]+)"|(\S+)').findall, normspace=re.compile(r'\s{2,}').sub): return [normspace(' ', (t[0] or t[1]).strip()) for t in findterms(text)] >>> normquery(' "some words" with and "withoutquotes " ') >>> ['some words', 'with', 'and', 'without quotes'] HTH, George -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
On Apr 29, 9:20 am, Paul McGuire <[EMAIL PROTECTED]> wrote: > On Apr 29, 8:46 am, Julien <[EMAIL PROTECTED]> wrote: > > > I'd like to select terms in a string, so I can then do a search in my > > database. > > > query = ' " some words" with and "without quotes " ' > > p = re.compile(magic_regular_expression) $ <--- the magic happens > > m = p.match(query) > > > I'd like m.groups() to return: > > ('some words', 'with', 'and', 'without quotes') > Oh! It wasn't until Matimus's post that I saw that you wanted the interior whitespace within the quoted strings collapsed also. Just add another parse action to the chain of functions on dblQuotedString: # when a quoted string is found, remove the quotes, # then strip whitespace from the contents, then # collapse interior whitespace dblQuotedString.setParseAction(removeQuotes, lambda s:s[0].strip(), lambda s:" ".join(s[0].split())) Plugging this into the previous script now gives: ('some words', 'with', 'and', 'without quotes') -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
On Apr 29, 6:46 am, Julien <[EMAIL PROTECTED]> wrote: > Hi, > > I'm fairly new in Python and I haven't used the regular expressions > enough to be able to achieve what I want. > I'd like to select terms in a string, so I can then do a search in my > database. > > query = ' " some words" with and "withoutquotes " ' > p = re.compile(magic_regular_expression) $ <--- the magic happens > m = p.match(query) > > I'd like m.groups() to return: > ('some words', 'with', 'and', 'without quotes') > > Is that achievable with a single regular expression, and if so, what > would it be? > > Any help would be much appreciated. > > Thanks!! > > Julien I don't know if it is possible to do it all with one regex, but it doesn't seem practical. I would check-out the shlex module. >>> import shlex >>> >>> query = ' " some words" with and "withoutquotes " ' >>> shlex.split(query) [' some words', 'with', 'and', 'withoutquotes '] To get rid of the leading and trailing space you can then use strip: >>> [s.strip() for s in shlex.split(query)] ['some words', 'with', 'and', 'withoutquotes'] The only problem is getting rid of the extra white-space in the middle of the expression, for which re might still be a good solution. >>> import re >>> [re.sub(r"\s+", ' ', s.strip()) for s in shlex.split(query)] ['some words', 'with', 'and', 'without quotes'] Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
On Apr 29, 2:46 pm, Julien <[EMAIL PROTECTED]> wrote: > Hi, > > I'm fairly new in Python and I haven't used the regular expressions > enough to be able to achieve what I want. > I'd like to select terms in a string, so I can then do a search in my > database. > > query = ' " some words" with and "without quotes " ' > p = re.compile(magic_regular_expression) $ <--- the magic happens > m = p.match(query) > > I'd like m.groups() to return: > ('some words', 'with', 'and', 'without quotes') > > Is that achievable with a single regular expression, and if so, what > would it be? > > Any help would be much appreciated. > > Thanks!! > > Julien You can't do it simply and completely with regular expressions alone because of the requirement to strip the quotes and normalize whitespace, but its not too hard to write a function to do it. Viz: import re wordre = re.compile('"[^"]+"|[a-zA-Z]+').findall def findwords(src): ret = [] for x in wordre(src): if x[0] == '"': #strip off the quotes and normalise spaces ret.append(' '.join(x[1:-1].split())) else: ret.append(x) return ret query = ' " Some words" withand "withoutquotes " ' print findwords(query) Running this gives ['Some words', 'with', 'and', 'without quotes'] HTH Harvey -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
Julien <[EMAIL PROTECTED]> writes: > I'm fairly new in Python and I haven't used the regular expressions > enough to be able to achieve what I want. > I'd like to select terms in a string, so I can then do a search in my > database. > > query = ' " some words" with and "withoutquotes " ' > p = re.compile(magic_regular_expression) $ <--- the magic happens > m = p.match(query) I don't think you can achieve this with a single regular expression. Your best bet is to use p.findall() to find all plausible matches, and then rework them a bit. For example: p = re.compile(r'"[^"]*"|[\S]+') p.findall(query) ['" some words"', 'with', 'and', '"withoutquotes "'] At that point, you can easily iterate through the list and remove the quotes and excess whitespace. -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
Julien wrote: Hi, I'm fairly new in Python and I haven't used the regular expressions enough to be able to achieve what I want. I'd like to select terms in a string, so I can then do a search in my database. query = ' " some words" with and "withoutquotes " ' p = re.compile(magic_regular_expression) $ <--- the magic happens m = p.match(query) I'd like m.groups() to return: ('some words', 'with', 'and', 'without quotes') Is that achievable with a single regular expression, and if so, what would it be? Here's one way with a single regexp plus an extra filter function. >>> import re >>> p = re.compile('("([^"]+)")|([^ \t]+)') >>> m = p.findall(q) >>> m [('" some words"', ' some words', ''), ('', '', 'with'), ('', '', 'and'), ('"withoutquotes "', 'withoutquotes ', '')] >>> def f(t): ... if t[0] == '': ... return t[2] ... else: ... return t[1] ... >>> map(f, m) [' some words', 'with', 'and', 'withoutquotes '] If you want to strip away the leading/trailing whitespace from the quoted strings, then change the last return statement to be "return t[1].strip()". Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
| # Double Quote Text |"# match a double quote |(# - Two Possiblities: |\\. # match two backslashes followed by anything (include newline) ||# OR |[^"] # do not match a single quote |)* # - from zero to many |"# finally match a double quote | ||# OR | | # Single Quote Text |'# match a single quote |(# - Two Possiblities: |\\. # match two backslashes followed by anything (include newline) ||# OR |[^'] # do not match a single quote |)* # - from zero to many |'# finally match a single quote |""", DOTALL|VERBOSE) Used this before (minus those | at the beginning) to find double quotes and single quotes in a file (there is more to this that looks for C++ and C style quotes but that isn't needed here), perhaps you can take it another step to not do changes to these matches? r(\\.|[^"])*"|'(\\.|[^'])*'""", DOTALL) is it in a single line :) -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
Julien wrote: Hi, I'm fairly new in Python and I haven't used the regular expressions enough to be able to achieve what I want. I'd like to select terms in a string, so I can then do a search in my database. query = ' " some words" with and "withoutquotes " ' p = re.compile(magic_regular_expression) $ <--- the magic happens m = p.match(query) I'd like m.groups() to return: ('some words', 'with', 'and', 'without quotes') Is that achievable with a single regular expression, and if so, what would it be? Any help would be much appreciated. Hi, I think re is not the best tool for you. Maybe there's a regular expression that does what you want but it will be quite complex and hard to maintain. I suggest you split the query with the double quotes and process alternate inside/outside chunks. Something like: import re def spulit(s): inq = False for term in s.split('"'): if inq: yield re.sub('\s+', ' ', term.strip()) else: for word in term.split(): yield word inq = not inq for token in spulit(' " some words" with and "withoutquotes " '): print token Cheers, RB -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
On Apr 29, 8:46 am, Julien <[EMAIL PROTECTED]> wrote: > I'd like to select terms in a string, so I can then do a search in my > database. > > query = ' " some words" with and "without quotes " ' > p = re.compile(magic_regular_expression) $ <--- the magic happens > m = p.match(query) > > I'd like m.groups() to return: > ('some words', 'with', 'and', 'without quotes') > > Is that achievable with a single regular expression, and if so, what > would it be? > Julien - I dabbled with re's for a few minutes trying to get your solution, then punted and used pyparsing instead. Pyparsing will run slower than re, but many people find it much easier to work with readable class names and instances rather than re's typoglyphics: from pyparsing import OneOrMore, Word, printables, dblQuotedString, removeQuotes # when a quoted string is found, remove the quotes, # then strip whitespace from the contents dblQuotedString.setParseAction(removeQuotes, lambda s:s[0].strip()) # define terms to be found in query string term = dblQuotedString | Word(printables) query_terms = OneOrMore(term) # parse query string to extract terms query = ' " some words" with and "withoutquotes " ' print tuple(query_terms.parseString(query)) Gives: ('some words', 'with', 'and', 'withoutquotes') The pyparsing wiki is at http://pyparsing.wikispaces.com. You'll find an examples page that includes a search query parser, and pointers to a number of online documentation and presentation sources. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Issue with regular expressions
Hi, I'm fairly new in Python and I haven't used the regular expressions enough to be able to achieve what I want. I'd like to select terms in a string, so I can then do a search in my database. query = ' " some words" with and "withoutquotes " ' p = re.compile(magic_regular_expression) $ <--- the magic happens m = p.match(query) I'd like m.groups() to return: ('some words', 'with', 'and', 'without quotes') Is that achievable with a single regular expression, and if so, what would it be? Any help would be much appreciated. Thanks!! Julien -- http://mail.python.org/mailman/listinfo/python-list