Have a look at "martel", part of biopython. The world of bioinformatics is filled with files with structure like this.
http://www.biopython.org/docs/api/public/Martel-module.html James On Thursday 03 March 2005 12:03 pm, Yatima wrote: > On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard <[EMAIL PROTECTED]> wrote: > > A possible solution, using the re module: > > > > py> s = """\ > > ... Gibberish > > ... 53 > > ... MoreGarbage > > ... 12 > > ... RelevantInfo1 > > ... 10/10/04 > > ... NothingImportant > > ... ThisDoesNotMatter > > ... 44 > > ... RelevantInfo2 > > ... 22 > > ... BlahBlah > > ... 343 > > ... RelevantInfo3 > > ... 23 > > ... Hubris > > ... Crap > > ... 34 > > ... """ > > py> import re > > py> m = re.compile(r"""^RelevantInfo1\n([^\n]*) > > ... .* > > ... ^RelevantInfo2\n([^\n]*) > > ... .* > > ... ^RelevantInfo3\n([^\n]*)""", > > ... re.DOTALL | re.MULTILINE | re.VERBOSE) > > py> score = {} > > py> for info1, info2, info3 in m.findall(s): > > ... score.setdefault(info1, {})[info3] = info2 > > ... > > py> score > > {'10/10/04': {'23': '22'}} > > > > Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE > > to have ^ apply at the start of each line, and VERBOSE to allow me to > > write the re in a more readable form. > > > > If I didn't get your dict update quite right, hopefully you can see how > > to fix it! > > Thanks! That was very helpful. Unfortunately, I wasn't completely clear > when describing the problem. Is there anyway to extract multiple scores > from the same file and from multiple files (I will probably use the > "fileinput" module to deal with multiple files). So, if I've got say: > > Gibberish > 53 > MoreGarbage > 12 > RelevantInfo1 > 10/10/04 > NothingImportant > ThisDoesNotMatter > 44 > RelevantInfo2 > 22 > BlahBlah > 343 > RelevantInfo3 > 23 > Hubris > Crap > 34 > > SecondSetofGarbage > 2423 > YouGetThePicture > 342342 > RelevantInfo1 > 10/10/04 > HoHum > 343 > MoreStuffNotNeeded > 232 > RelevantInfo2 > 33 > RelevantInfo3 > 44 > sdfsdf > RelevantInfo1 > 10/11/04 > InsertBoringFillerHere > 43234 > Stuff > MoreStuff > RelevantInfo2 > 45 > ExcitingIsntIt > 324234 > RelevantInfo3 > 60 > Lalala > > Sorry for the long and painful example input. Notice that the first two > "RelevantInfo1" fields have the same info but that the RelevantInfo2 and > RelevantInfo3 fields have different info. Also, there will be cases where > RelevantInfo3 might be the same with a different RelevantInfo2. What, I'm > hoping for is something along then lines of being able to organize it like > so (don't worry about the format of the output -- I'll deal with that > later; "RelevantInfo" shortened to "Info" for readability): > > Info1[0], Info[1], Info[2] > ... Info3[0] Info2[Info1[0],Info3[0]] Info2[Info1[1],Info3[1]] ... > Info3[1] Info2[Info1[0],Info3[1]] ... > Info3[2] Info2[Info1[0],Info3[2]] ... > ... > > I don't really care if it's a list, dictionary, array etc. > > Thanks again for your help. The multiline option in the re module is very > useful. > > Take care. > > -- > Clarke's Conclusion: > Never let your sense of morals interfere with doing the right thing. -- James Stroud, Ph.D. UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 -- http://mail.python.org/mailman/listinfo/python-list