Re: Using dictionary to hold regex patterns?

2008-11-26 Thread Bruno Desthuilliers

André a écrit :
(snip)

you don't need to use pattern.items()...

Here is something I use (straight cut-and-paste):

def parse_single_line(self, line):
'''Parses a given line to see if it match a known pattern'''
for name in self.patterns:
result = self.patterns[name].match(line)


FWIW, this is more expansive than iterating over (key, value) tuples 
using dict.items(), since you have one extra call to dict.__getitem__ 
per entry.



if result is not None:
return name, result.groups()
return None, line


where self.patterns is something like
self.patterns={
'pattern1': re.compile(...),
'pattern2': re.compile(...)
}

The one potential problem with the method as I wrote it is that
sometimes a more generic pattern gets matched first whereas a more
specific pattern may be desired.


As usual when order matters, the solution is to use a list of (name, 
whatever) tuples instead of a dict. You can still build a dict from this 
list when needed (the dict initializer accepts a list of (name, object) 
as argument).


--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-25 Thread Thomas Mlynarczyk

John Machin schrieb:


No, complicated is more related to unused features. In
the case of using an aeroplane to transport 3 passengers 10 km along
the autobahn, you aren't using the radar, wheel-retractability, wings,
pressurised cabin, etc. In your original notion of using a dict in
your lexer, you weren't using the mapping functionality of a dict at
all. In both cases you have perplexed bystanders asking Why use a
plane/dict when a car/list will do the job?.


Now the matter is getting clearer in my head.

Thanks and greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-24 Thread Thomas Mlynarczyk

Dennis Lee Bieber schrieb:

Is [ ( name, regex ), ... ] really simpler than { name: regex, ... 
}? Intuitively, I would consider the dictionary to be the simpler 
structure.



Why, when you aren't /using/ the name to retrieve the expression...


So as soon as I start retrieving a regex by its name, the dict will be 
the most suitable structure?


Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-24 Thread Thomas Mlynarczyk

John Machin schrieb:


Rephrasing for clarity: Don't use a data structure that is more
complicated than that indicated by your requirements.


Could you please define complicated in this context? In terms of 
characters to type and reading, the dict is surely simpler. But I 
suppose that under the hood, it is less work for Python to deal with a 
list of tuples than a dict?



Judging which of two structures is simpler should not be independent
of those requirements. I don't see a role for intuition in this
process.


Maybe I should have said upon first sight / judging from the outer 
appearance instead of intuition.



Please see my belated response in your My first Python program -- a
lexer thread.


(See my answer there.) I think I should definitely read up a bit on the 
implementation details of those data structures in Python. (As it was 
suggested earlier in my lexer thread.)


Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-24 Thread John Machin
On Nov 25, 4:38 am, Thomas Mlynarczyk [EMAIL PROTECTED]
wrote:
 John Machin schrieb:

  Rephrasing for clarity: Don't use a data structure that is more
  complicated than that indicated by your requirements.

 Could you please define complicated in this context? In terms of
 characters to type and reading, the dict is surely simpler. But I
 suppose that under the hood, it is less work for Python to deal with a
 list of tuples than a dict?

The two extra parentheses per item are a trivial cosmetic factor only
when the data is hard-coded i.e. don't exist if the data is read from
a file i.e nothing to do with complicated. The amount of work done
by Python under the hood is relevant only to a speed/memory
requirement. No, complicated is more related to unused features. In
the case of using an aeroplane to transport 3 passengers 10 km along
the autobahn, you aren't using the radar, wheel-retractability, wings,
pressurised cabin, etc. In your original notion of using a dict in
your lexer, you weren't using the mapping functionality of a dict at
all. In both cases you have perplexed bystanders asking Why use a
plane/dict when a car/list will do the job?.


  Judging which of two structures is simpler should not be independent
  of those requirements. I don't see a role for intuition in this
  process.

 Maybe I should have said upon first sight / judging from the outer
 appearance instead of intuition.

I don't see a role for upon first sight or judging from the outer
appearance either.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-24 Thread Steve Holden
John Machin wrote:
 On Nov 25, 4:38 am, Thomas Mlynarczyk [EMAIL PROTECTED]
[...]
 Judging which of two structures is simpler should not be independent
 of those requirements. I don't see a role for intuition in this
 process.
 Maybe I should have said upon first sight / judging from the outer
 appearance instead of intuition.
 
 I don't see a role for upon first sight or judging from the outer
 appearance either.
 
They are all potentially (inadequate) substitutes for the knowledge and
experience you bring to the problem.

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread Arnaud Delobelle
Gilles Ganault [EMAIL PROTECTED] writes:

 Hello

 After downloading a web page, I need to search for several patterns,
 and if found, extract information and put them into a database.

 To avoid a bunch of if m, I figured maybe I could use a dictionary
 to hold the patterns, and loop through it:

 ==
 pattern = {}
 pattern[pattern1] = .+?/td.+?(.+?)/td

  pattern[pattern1] = re.compile(.+?/td.+?(.+?)/td)

 for key,value in pattern.items():
   response = whatever/td.+?Blababla/td

   #AttributeError: 'str' object has no attribute 'search'
   m = key.search(response)

m = value.search(response)

   if m:
   print key + # + value
 ==

 Is there a way to use a dictionary this way, or am I stuck with
 copy/pasting blocks of if m:?

But there is no reason why you should use a dictionary; just use a list
of key-value pairs:

patterns = [
(pattern1, re.compile(.+?/td.+?(.+?)/td),
(pattern2, re.compile(something else),

]

for name, pattern in patterns:
...

-- 
Arnaud

--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread Terry Reedy

Gilles Ganault wrote:

Hello

After downloading a web page, I need to search for several patterns,
and if found, extract information and put them into a database.

To avoid a bunch of if m, I figured maybe I could use a dictionary
to hold the patterns, and loop through it:


Good idea.

import re


pattern = {}
pattern[pattern1] = .+?/td.+?(.+?)/td


... = re.compile(...)


for key,value in pattern.items():


for name, regex in ...


response = whatever/td.+?Blababla/td

#AttributeError: 'str' object has no attribute 'search'


Correct, only compiled re patterns have search, better naming would make 
error obvious.



m = key.search(response)


m = regex.search(response)


if m:
print key + # + value


print name + '#' + regex

--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread Gilles Ganault
On Sun, 23 Nov 2008 17:55:48 +, Arnaud Delobelle
[EMAIL PROTECTED] wrote:
But there is no reason why you should use a dictionary; just use a list
of key-value pairs:

patterns = [
(pattern1, re.compile(.+?/td.+?(.+?)/td),

Thanks for the tip, but... I thought that lists could only use integer
indexes, while text indexes had to use dictionaries. In which case do
we need dictionaries, then?
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread Vlastimil Brom
2008/11/23 Gilles Ganault [EMAIL PROTECTED]

 Hello

 After downloading a web page, I need to search for several patterns,
 and if found, extract information and put them into a database.

 To avoid a bunch of if m, I figured maybe I could use a dictionary
 to hold the patterns, and loop through it:

 ==
 pattern = {}
 pattern[pattern1] = .+?/td.+?(.+?)/td
 for key,value in pattern.items():
response = whatever/td.+?Blababla/td

#AttributeError: 'str' object has no attribute 'search'
m = key.search(response)
if m:
print key + # + value
 ==

 Is there a way to use a dictionary this way, or am I stuck with
 copy/pasting blocks of if m:?

 Thank you.
 --
 http://mail.python.org/mailman/listinfo/python-list

I'm not quite sure, whether I underestand correctly, what should be
achieved; but it seems, that you should do the searches on dict values,
instead of keys, if you want to access the re patterns.
m = re.search(re_pattern_value, text_to_search_in):
if m:
print key + # + m.group()
...
In case, there could be multiple matches, probably findall or finditer would
be more suitable than search.
But after all, regexes aren't very efficient for dealing with HTML, unless
you know quite exactly, what structure you can expect;
probably e.g. BeautifulSoup could be used.
hth,
  Vlasta
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread Benjamin Kaplan
On Sun, Nov 23, 2008 at 2:55 PM, Gilles Ganault [EMAIL PROTECTED] wrote:

 On Sun, 23 Nov 2008 17:55:48 +, Arnaud Delobelle
 [EMAIL PROTECTED] wrote:
 But there is no reason why you should use a dictionary; just use a list
 of key-value pairs:
 
 patterns = [
 (pattern1, re.compile(.+?/td.+?(.+?)/td),

 Thanks for the tip, but... I thought that lists could only use integer
 indexes, while text indexes had to use dictionaries. In which case do
 we need dictionaries, then?
 --

Lists do use integer indexes. Since you never use the dict[key] syntax, you
don't need key value pairs like that. Instead, the example uses two-item
tuples.


 patterns = [(pattern1, re.compile(.+?/td.+?(.+?)/td)),
(pattern2, re.compile(something else))]
 patterns[0]
('pattern1', _sre.SRE_Pattern object at 0x3c7a0)
 for pattern, regex in patterns :
...print pattern + : + str(regex)
...
pattern1:_sre.SRE_Pattern object at 0x3c7a0
pattern2:_sre.SRE_Pattern object at 0x35860




 http://mail.python.org/mailman/listinfo/python-list

--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread John Machin
On Nov 24, 6:55 am, Gilles Ganault [EMAIL PROTECTED] wrote:
 On Sun, 23 Nov 2008 17:55:48 +, Arnaud Delobelle

 [EMAIL PROTECTED] wrote:
 But there is no reason why you should use a dictionary; just use a list
 of key-value pairs:

 patterns = [
     (pattern1, re.compile(.+?/td.+?(.+?)/td),

 Thanks for the tip, but... I thought that lists could only use integer
 indexes, while text indexes had to use dictionaries. In which case do
 we need dictionaries, then?

You don't have a requirement for indexing -- neither a text index nor
an integer index. Your requirement is met by a sequence of (name,
regex) pairs. Yes, a list is a sequence, and a list has integer
indexes, but this is irrelevant.

General tip: Don't us a data structure that is more complicated than
what you need.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread John Machin
On Nov 24, 5:36 am, Terry Reedy [EMAIL PROTECTED] wrote:
 Gilles Ganault wrote:
  Hello

  After downloading a web page, I need to search for several patterns,
  and if found, extract information and put them into a database.

  To avoid a bunch of if m, I figured maybe I could use a dictionary
  to hold the patterns, and loop through it:

 Good idea.

 import re

  pattern = {}
  pattern[pattern1] = .+?/td.+?(.+?)/td

 ... = re.compile(...)

  for key,value in pattern.items():

 for name, regex in ...

     response = whatever/td.+?Blababla/td

     #AttributeError: 'str' object has no attribute 'search'

 Correct, only compiled re patterns have search, better naming would make
 error obvious.

     m = key.search(response)

 m = regex.search(response)

     if m:
             print key + # + value

 print name + '#' + regex

Perhaps you meant:
   print key + # + regex.pattern
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread Thomas Mlynarczyk

John Machin schrieb:


General tip: Don't us a data structure that is more complicated than
what you need.


Is [ ( name, regex ), ... ] really simpler than { name: regex, ... 
}? Intuitively, I would consider the dictionary to be the simpler 
structure.


Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread André
On Nov 23, 1:40 pm, Gilles Ganault [EMAIL PROTECTED] wrote:
 Hello

 After downloading a web page, I need to search for several patterns,
 and if found, extract information and put them into a database.

 To avoid a bunch of if m, I figured maybe I could use a dictionary
 to hold the patterns, and loop through it:

 ==
 pattern = {}
 pattern[pattern1] = .+?/td.+?(.+?)/td
 for key,value in pattern.items():
         response = whatever/td.+?Blababla/td

         #AttributeError: 'str' object has no attribute 'search'
         m = key.search(response)
         if m:
                 print key + # + value
 ==

 Is there a way to use a dictionary this way, or am I stuck with
 copy/pasting blocks of if m:?

 Thank you.

Yes it is possible and you don't need to use pattern.items()...

Here is something I use (straight cut-and-paste):

def parse_single_line(self, line):
'''Parses a given line to see if it match a known pattern'''
for name in self.patterns:
result = self.patterns[name].match(line)
if result is not None:
return name, result.groups()
return None, line


where self.patterns is something like
self.patterns={
'pattern1': re.compile(...),
'pattern2': re.compile(...)
}

The one potential problem with the method as I wrote it is that
sometimes a more generic pattern gets matched first whereas a more
specific pattern may be desired.

André
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread John Machin
On Nov 24, 7:48 am, John Machin [EMAIL PROTECTED] wrote:
 On Nov 24, 5:36 am, Terry Reedy [EMAIL PROTECTED] wrote:

  print name + '#' + regex

 Perhaps you meant:
    print key + # + regex.pattern

I definitely meant:
   print name + '#' + regex.pattern
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread John Machin
On Nov 24, 7:49 am, Thomas Mlynarczyk [EMAIL PROTECTED]
wrote:
 John Machin schrieb:

  General tip: Don't us a data structure that is more complicated than
  what you need.

 Is [ ( name, regex ), ... ] really simpler than { name: regex, ...}? 
 Intuitively, I would consider the dictionary to be the simpler

 structure.

Hi Thomas,

Rephrasing for clarity: Don't use a data structure that is more
complicated than that indicated by your requirements.

Judging which of two structures is simpler should not be independent
of those requirements. I don't see a role for intuition in this
process.

Please see my belated response in your My first Python program -- a
lexer thread.

Cheers,
John

--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread Gilles Ganault
On Sun, 23 Nov 2008 17:55:48 +, Arnaud Delobelle
[EMAIL PROTECTED] wrote:
But there is no reason why you should use a dictionary; just use a list
of key-value pairs:

Thanks for the tip. I didn't know it was possible to use arrays to
hold more than one value. Actually, it's a better solution, as
key/value tuples in a dictionary aren't used in the order in which
they're put in the dictionary, while arrays are.

For those interested:


response = dummy/tdblagood stuff/td
for name, pattern in patterns:
m = pattern.search(response)
if m:
print m.group(1)
break
else:
print here


Thanks guys.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread MRAB

Gilles Ganault wrote:

On Sun, 23 Nov 2008 17:55:48 +, Arnaud Delobelle
[EMAIL PROTECTED] wrote:

But there is no reason why you should use a dictionary; just use a list
of key-value pairs:


Thanks for the tip. I didn't know it was possible to use arrays to
hold more than one value. Actually, it's a better solution, as
key/value tuples in a dictionary aren't used in the order in which
they're put in the dictionary, while arrays are.


[snip]
A list is an ordered collection of items. Each item can be anything: a 
string, an integer, a dictionary, a tuple, a list...

--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread Gilles Ganault
On Sun, 23 Nov 2008 23:18:06 +, MRAB [EMAIL PROTECTED]
wrote:
A list is an ordered collection of items. Each item can be anything: a 
string, an integer, a dictionary, a tuple, a list...

Yup, learned something new today. Naively, I though a list was
index=value, where value=a single piece of data. Works like a charm.
Thanks.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Using dictionary to hold regex patterns?

2008-11-23 Thread Marc 'BlackJack' Rintsch
On Mon, 24 Nov 2008 00:46:42 +0100, Gilles Ganault wrote:

 On Sun, 23 Nov 2008 23:18:06 +, MRAB [EMAIL PROTECTED]
 wrote:
A list is an ordered collection of items. Each item can be anything: a
string, an integer, a dictionary, a tuple, a list...
 
 Yup, learned something new today. Naively, I though a list was
 index=value, where value=a single piece of data.

Your thought was correct, each value is a single piece of data: *one* 
tuple.

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list