Re: [Tutor] using re to build dictionary

2009-02-24 Thread Norman Khine

Thank you all, this is great.

Kent Johnson wrote:

On Tue, Feb 24, 2009 at 6:48 AM, Norman Khine  wrote:

Hello,
From my previous post on create dictionary from csv, i have broken the
problem further and wanted the lists feedback if it could be done better:


s = 'Association of British Travel Agents (ABTA) No. 56542\nAir Travel
Organisation Licence (ATOL)\nAppointed Agents of IATA (IATA)\nIncentive
Travel & Meet. Association (ITMA)'
licences = re.split("\n+", s)
licence_list = [re.split("\((\w+)\)", licence) for licence in licences]


This is awkward. You can match directly on what you want:

In [7]: import re

In [8]: s = 'Association of British Travel Agents (ABTA) No.
56542\nAir Travel Organisation Licence (ATOL)\nAppointed Agents of
IATA (IATA)\nIncentive Travel & Meet. Association (ITMA)'

In [9]: licenses = re.split("\n+", s)

In [10]: licenseRe = re.compile(r'\(([A-Z]+)\)( No. (\d+))?')

In [11]: for license in licenses:
   : m = licenseRe.search(license)
   : print m.group(1, 3)

('ABTA', '56542')
('ATOL', None)
('IATA', None)
('ITMA', None)

Kent


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] using re to build dictionary

2009-02-24 Thread Kent Johnson
On Tue, Feb 24, 2009 at 6:48 AM, Norman Khine  wrote:
> Hello,
> From my previous post on create dictionary from csv, i have broken the
> problem further and wanted the lists feedback if it could be done better:
>
 s = 'Association of British Travel Agents (ABTA) No. 56542\nAir Travel
 Organisation Licence (ATOL)\nAppointed Agents of IATA (IATA)\nIncentive
 Travel & Meet. Association (ITMA)'
 licences = re.split("\n+", s)
 licence_list = [re.split("\((\w+)\)", licence) for licence in licences]

This is awkward. You can match directly on what you want:

In [7]: import re

In [8]: s = 'Association of British Travel Agents (ABTA) No.
56542\nAir Travel Organisation Licence (ATOL)\nAppointed Agents of
IATA (IATA)\nIncentive Travel & Meet. Association (ITMA)'

In [9]: licenses = re.split("\n+", s)

In [10]: licenseRe = re.compile(r'\(([A-Z]+)\)( No. (\d+))?')

In [11]: for license in licenses:
   : m = licenseRe.search(license)
   : print m.group(1, 3)

('ABTA', '56542')
('ATOL', None)
('IATA', None)
('ITMA', None)

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] using re to build dictionary

2009-02-24 Thread spir
Le Tue, 24 Feb 2009 12:48:51 +0100,
Norman Khine  s'exprima ainsi:

> Hello,
>  From my previous post on create dictionary from csv, i have broken the 
> problem further and wanted the lists feedback if it could be done better:
> 
>  >>> s = 'Association of British Travel Agents (ABTA) No. 56542\nAir 
> Travel Organisation Licence (ATOL)\nAppointed Agents of IATA 
> (IATA)\nIncentive Travel & Meet. Association (ITMA)'
>  >>> licences = re.split("\n+", s)
>  >>> licence_list = [re.split("\((\w+)\)", licence) for licence in licences]
>  >>> association = []
>  >>> for x in licence_list:
> ... for y in x:
> ... if y.isupper():
> ...association.append(y)
> ...
>  >>> association
> ['ABTA', 'ATOL', 'IATA', 'ITMA']
> 
> 
> In my string 's', I have 'No. 56542', how would I extract the '56542' 
> and map it against the 'ABTA' so that I can have a dictionary for example:
> 
>  >>> my_dictionary = {'ABTA': '56542', 'ATOL': '', 'IATA': '', 'ITMA': ''}
>  >>>
> 
> 
> Here is what I have so far:
> 
>  >>> my_dictionary = {}
> 
>  >>> for x in licence_list:
> ... for y in x:
> ... if y.isupper():
> ... my_dictionary[y] = y
> ...
>  >>> my_dictionary
> {'ABTA': 'ABTA', 'IATA': 'IATA', 'ITMA': 'ITMA', 'ATOL': 'ATOL'}
> 
> This is wrong as the values should be the 'decimal' i.e. 56542 that is 
> in the licence_list.
> 
> here is where I miss the point as in my licence_list, not all items have 
> a code, all but one are empty, for my usecase, I still need to create 
> the dictionary so that it is in the form:
> 
>  >>> my_dictionary = {'ABTA': '56542', 'ATOL': '', 'IATA': '', 'ITMA': ''}
> 
> Any advise much appreciated.
> 
> Norman

I had a similar problem once. The nice solution was -- I think, don't take this 
for granted I have no time to verify -- simply using multiple group with 
re.findall again. Build a rule like:
r'.+(code-pattern).+(number_pattern).+\n+'
Then the results will be a list of tuples like
[
(code1, n1),
(code2, n2),
...
]
where some numbers will be missing. from this it's straightforward to 
instantiate a dict, maybe using a default None value for n/a numbers. Someone 
will probably infirm or confirm this method.

denis
--
la vita e estrany
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] using re to build dictionary

2009-02-24 Thread Norman Khine

Hello,
From my previous post on create dictionary from csv, i have broken the 
problem further and wanted the lists feedback if it could be done better:


>>> s = 'Association of British Travel Agents (ABTA) No. 56542\nAir 
Travel Organisation Licence (ATOL)\nAppointed Agents of IATA 
(IATA)\nIncentive Travel & Meet. Association (ITMA)'

>>> licences = re.split("\n+", s)
>>> licence_list = [re.split("\((\w+)\)", licence) for licence in licences]
>>> association = []
>>> for x in licence_list:
... for y in x:
... if y.isupper():
...association.append(y)
...
>>> association
['ABTA', 'ATOL', 'IATA', 'ITMA']


In my string 's', I have 'No. 56542', how would I extract the '56542' 
and map it against the 'ABTA' so that I can have a dictionary for example:


>>> my_dictionary = {'ABTA': '56542', 'ATOL': '', 'IATA': '', 'ITMA': ''}
>>>


Here is what I have so far:

>>> my_dictionary = {}

>>> for x in licence_list:
... for y in x:
... if y.isupper():
... my_dictionary[y] = y
...
>>> my_dictionary
{'ABTA': 'ABTA', 'IATA': 'IATA', 'ITMA': 'ITMA', 'ATOL': 'ATOL'}

This is wrong as the values should be the 'decimal' i.e. 56542 that is 
in the licence_list.


here is where I miss the point as in my licence_list, not all items have 
a code, all but one are empty, for my usecase, I still need to create 
the dictionary so that it is in the form:


>>> my_dictionary = {'ABTA': '56542', 'ATOL': '', 'IATA': '', 'ITMA': ''}

Any advise much appreciated.

Norman



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor