On Sep 2, 7:06 pm, Steven D'Aprano <[EMAIL PROTECTED] cybersource.com.au> wrote: > On Tue, 02 Sep 2008 09:48:32 -0700, cnb wrote: > > I have a bunch of files consisting of moviereviews. > > > For each file I construct a list of reviews and then for each new file I > > merge the reviews so that in the end have a list of reviewers and for > > each reviewer all their reviews. > > > What is the fastest way to do this? > > Use the timeit module to find out. > > > 1. Create one file with reviews, open next file an for each review see > > if the reviewer exists, then add the review else create new reviewer. > > > 2. create all the separate files with reviews then mergesort them? > > The answer will depend on whether you have three reviews or three > million, whether each review is twenty words or twenty thousand words, > and whether you have to do the merging once only or over and over again. > > -- > Steven
I merge once. each review has 3 fields, date rating customerid. in total ill be parsing between 10K and 100K, eventually 450K reviews. -- http://mail.python.org/mailman/listinfo/python-list