Re: [Ilugc] using dictreader in python

Siva Subramanian Tue, 10 Nov 2009 02:13:49 -0800

Hi Steve,

Thanks a lot. It works like a breeze when the files are small (< 100MB)


However, it takes a jolly good 45 mins on a desktop with 2 GB of memory and
3.2 G P4

The code i am using is given below,

import csv
readerR25 = csv.reader(open('report_2_5.csv', "rb"))
cids = [line.strip() for line in open('prov_30.csv')]

rownum = 0

for row in readerR25:
    # Save header row.
    if rownum == 0:
        header = row
        print header
    else:
        if row and (row[0].strip() in cids):
            colnum = 0
            a += float(row[4].strip())
            b += float(row[7].strip())
            c += float(row[10].strip())
            d += float(row[13].strip())
            e += float(row[16].strip())
            ...

    rownum += 1
<< print / write to file >>
Is there someway i can optimize it to process in say 5 mins ? How do i
optimize this code further ?

i looked up on the net and found the following link which talks about
csv.field_size_limit , but, even setting it did not help.

http://lethain.com/entry/2009/jan/22/handling-very-large-csv-and-xml-files-in-python/

Aside, Steve, really appreciate the detailed email you had sent earlier. It
has really kick started me into python. :)

Thanks in advance
Siva
On Wed, Nov 4, 2009 at 4:19 PM, steve <[email protected]> wrote:

> Hi Siva,
>
> A few comments before going further:
> a. You should have continued in the same thread as before (changing the
> subject line, if necessary), nobody would've minded. People not interested
> would've ignored the thread.
>
> b. If you did want to start a new thread (since this one is not really
> about gawk), you should repeat the context (ie: restate the original
> problem). By not doing so, you've confused the people who weren't interested
> in gwak but are interested in python.
>
> c. I noticed you asked the same question on python-list. Nothing wrong with
> that per se. ...however, you are unlikely to get too many (relevant)
> responses there because the people there who are not on ILUGC and have not
> read the gawk thread have no context information !(*)
>
> Anyways, that said, here is something that might help. Assuming the file
> you pasted in the last thread:
> [st...@laptop ~]$ cat Report_2_5
> C_ID, ID_NO, stat1, vol2, amount3
> 2134, Ins1, 10000, 20000, 10
> 2112, Ins3, 30000, 20000, 10
> 2121, Ins3, 30000, 20000, 10
> 2145, Ins2, 15000, 10000, 5
> 2245, Ins2, 15000, 10000, 5
> 0987, Ins1, 10000, 20000, 10
> [st...@laptop ~]$ cat cids
> Ins1
> Ins3
>
> here, are some basic operations using the csv module for you to try out:
> >>> r25 = csv.reader(open('Report_2_5'))
> >>> cids = [ line.strip() for line in open('cids') ]
> >>> cids
> ['Ins1', 'Ins3']
> >>> for row in r25:
> ...     if row and (row[1].strip() in cids):
> ...             print row
> ...
>
> < snip > ... </snip>
_______________________________________________
To unsubscribe, email [email protected] with 
"unsubscribe <password> <address>"
in the subject or body of the message.  
http://www.ae.iitm.ac.in/mailman/listinfo/ilugc

Re: [Ilugc] using dictreader in python

Reply via email to