Re: [Tutor] using re to build dictionary
Thank you all, this is great. Kent Johnson wrote: On Tue, Feb 24, 2009 at 6:48 AM, Norman Khine wrote: Hello, From my previous post on create dictionary from csv, i have broken the problem further and wanted the lists feedback if it could be done better: s = 'Association of British Travel Agents (ABTA) No. 56542\nAir Travel Organisation Licence (ATOL)\nAppointed Agents of IATA (IATA)\nIncentive Travel & Meet. Association (ITMA)' licences = re.split("\n+", s) licence_list = [re.split("\((\w+)\)", licence) for licence in licences] This is awkward. You can match directly on what you want: In [7]: import re In [8]: s = 'Association of British Travel Agents (ABTA) No. 56542\nAir Travel Organisation Licence (ATOL)\nAppointed Agents of IATA (IATA)\nIncentive Travel & Meet. Association (ITMA)' In [9]: licenses = re.split("\n+", s) In [10]: licenseRe = re.compile(r'\(([A-Z]+)\)( No. (\d+))?') In [11]: for license in licenses: : m = licenseRe.search(license) : print m.group(1, 3) ('ABTA', '56542') ('ATOL', None) ('IATA', None) ('ITMA', None) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] using re to build dictionary
On Tue, Feb 24, 2009 at 6:48 AM, Norman Khine wrote: > Hello, > From my previous post on create dictionary from csv, i have broken the > problem further and wanted the lists feedback if it could be done better: > s = 'Association of British Travel Agents (ABTA) No. 56542\nAir Travel Organisation Licence (ATOL)\nAppointed Agents of IATA (IATA)\nIncentive Travel & Meet. Association (ITMA)' licences = re.split("\n+", s) licence_list = [re.split("\((\w+)\)", licence) for licence in licences] This is awkward. You can match directly on what you want: In [7]: import re In [8]: s = 'Association of British Travel Agents (ABTA) No. 56542\nAir Travel Organisation Licence (ATOL)\nAppointed Agents of IATA (IATA)\nIncentive Travel & Meet. Association (ITMA)' In [9]: licenses = re.split("\n+", s) In [10]: licenseRe = re.compile(r'\(([A-Z]+)\)( No. (\d+))?') In [11]: for license in licenses: : m = licenseRe.search(license) : print m.group(1, 3) ('ABTA', '56542') ('ATOL', None) ('IATA', None) ('ITMA', None) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] using re to build dictionary
Le Tue, 24 Feb 2009 12:48:51 +0100, Norman Khine s'exprima ainsi: > Hello, > From my previous post on create dictionary from csv, i have broken the > problem further and wanted the lists feedback if it could be done better: > > >>> s = 'Association of British Travel Agents (ABTA) No. 56542\nAir > Travel Organisation Licence (ATOL)\nAppointed Agents of IATA > (IATA)\nIncentive Travel & Meet. Association (ITMA)' > >>> licences = re.split("\n+", s) > >>> licence_list = [re.split("\((\w+)\)", licence) for licence in licences] > >>> association = [] > >>> for x in licence_list: > ... for y in x: > ... if y.isupper(): > ...association.append(y) > ... > >>> association > ['ABTA', 'ATOL', 'IATA', 'ITMA'] > > > In my string 's', I have 'No. 56542', how would I extract the '56542' > and map it against the 'ABTA' so that I can have a dictionary for example: > > >>> my_dictionary = {'ABTA': '56542', 'ATOL': '', 'IATA': '', 'ITMA': ''} > >>> > > > Here is what I have so far: > > >>> my_dictionary = {} > > >>> for x in licence_list: > ... for y in x: > ... if y.isupper(): > ... my_dictionary[y] = y > ... > >>> my_dictionary > {'ABTA': 'ABTA', 'IATA': 'IATA', 'ITMA': 'ITMA', 'ATOL': 'ATOL'} > > This is wrong as the values should be the 'decimal' i.e. 56542 that is > in the licence_list. > > here is where I miss the point as in my licence_list, not all items have > a code, all but one are empty, for my usecase, I still need to create > the dictionary so that it is in the form: > > >>> my_dictionary = {'ABTA': '56542', 'ATOL': '', 'IATA': '', 'ITMA': ''} > > Any advise much appreciated. > > Norman I had a similar problem once. The nice solution was -- I think, don't take this for granted I have no time to verify -- simply using multiple group with re.findall again. Build a rule like: r'.+(code-pattern).+(number_pattern).+\n+' Then the results will be a list of tuples like [ (code1, n1), (code2, n2), ... ] where some numbers will be missing. from this it's straightforward to instantiate a dict, maybe using a default None value for n/a numbers. Someone will probably infirm or confirm this method. denis -- la vita e estrany ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] using re to build dictionary
Hello, From my previous post on create dictionary from csv, i have broken the problem further and wanted the lists feedback if it could be done better: >>> s = 'Association of British Travel Agents (ABTA) No. 56542\nAir Travel Organisation Licence (ATOL)\nAppointed Agents of IATA (IATA)\nIncentive Travel & Meet. Association (ITMA)' >>> licences = re.split("\n+", s) >>> licence_list = [re.split("\((\w+)\)", licence) for licence in licences] >>> association = [] >>> for x in licence_list: ... for y in x: ... if y.isupper(): ...association.append(y) ... >>> association ['ABTA', 'ATOL', 'IATA', 'ITMA'] In my string 's', I have 'No. 56542', how would I extract the '56542' and map it against the 'ABTA' so that I can have a dictionary for example: >>> my_dictionary = {'ABTA': '56542', 'ATOL': '', 'IATA': '', 'ITMA': ''} >>> Here is what I have so far: >>> my_dictionary = {} >>> for x in licence_list: ... for y in x: ... if y.isupper(): ... my_dictionary[y] = y ... >>> my_dictionary {'ABTA': 'ABTA', 'IATA': 'IATA', 'ITMA': 'ITMA', 'ATOL': 'ATOL'} This is wrong as the values should be the 'decimal' i.e. 56542 that is in the licence_list. here is where I miss the point as in my licence_list, not all items have a code, all but one are empty, for my usecase, I still need to create the dictionary so that it is in the form: >>> my_dictionary = {'ABTA': '56542', 'ATOL': '', 'IATA': '', 'ITMA': ''} Any advise much appreciated. Norman ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor