On Tue, Nov 3, 2009 at 7:43 AM, Siva Subramanian <elpost...@rediffmail.com> wrote: > Hello all, > > I am new on this list and computer programming > > I have two distinct statistical files (both csv) > > 1. Report_2_5 – this is a report dump containing over a 10 million > records and is different every day > > 2. Customer_id dump – this is a daily dump of customers who have made > payments. This is generally a million record > > I need to extract past history depending on customers who make regular > payments > > For example, > > Report_2_5 > > Customer ID, Plan_NO, stat1, vol2, amount3 > 2134, Ins1, 10000, 20000, 10 > 2112, Ins3, 30000, 20000, 10 > 2121, Ins3, 30000, 20000, 10 > 2145, Ins2, 15000, 10000, 5 > 2245, Ins2, 15000, 10000, 5 > 0987, Ins1, 10000, 20000, 10 > > 4546, Ins1, 10020, 21000, 10 > > 6757, Ins1, 10200, 22000, 10 > … > > customer_id dump > > > 0987 > > 4546 > > 6757 > > 2134 > > I need to process the Report_2_5 and extract the following output > > Stat1: 40220 > Vol2 : 83000 > Amount3 : 40 > > I am new to programming and I have been extracting this data using MS – > Access and I badly need a better solution.
Have you considered using a proper SQL database? (See http://en.wikipedia.org/wiki/SQL ; MySQL is one example: http://en.wikipedia.org/wiki/MySQL) Mucking around with CSV files like this is basically doing the work of some simple SQL queries, only in an ad-hoc, inefficient manner. (MS Access is essentially a non-industrial-strength SQL for non-programmers.) Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list