On Fri, 27 Jul 2007 02:28:27 -0700, Ira.Kovac wrote: > I am working with 30K+ record datasets in flat file format (.txt) that > look like this: > > //-+alibaba sinage > //-+amra damian//_9 > //-+anix anire//_ > //-+borom > //-+bokima sun drane > //-+ciren > //-+cop calestieon eded > //-+ciciban > //-+drago kimano sole
The example seems to be sorted, is this true for the real data too? And are there records that don't start with a-z or 0-9? > a) By looping thru the file the program should isolate all records > that have letter a following the //-+ > b) The isolated dataset will contain only records that start with //- > +a > c) Save the isolated dataset as flat flat text file named a.txt > d) Repeat a), b) and c) for all letters of english alphabet (a thru z) > and numerical values (0 thru 9) This might be a little bit inefficient because the file gets read 36 times. If the data is already sorted you can use `itertools.groupby()` to get the groups and write them to several files. Otherwise if the files can be read into memory completely you can sort in memory and then use `itertools.groupby()`. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list