On 2013-03-24 08:57, rusi wrote: > On Mar 24, 6:49 pm, Tim Chase <python.l...@tim.thechases.com> wrote: > After doing: > > >>> import csv > >>> original = file('friends.csv', 'rU') > >>> reader = csv.reader(original, delimiter='\t') > > > Stripping of the first line is: > >>> list(reader)[1:] > >>> [tuple(row) for row in list(reader)[1:]] > >>> map(tuple,list(reader)[1:])
This works for small sources, but slurps all the data into memory. Because csv.reader is an iterator/generator, it can process huge CSV files that wouldn't otherwise fit in memory. By using either r.next() (or "next(r)" in newer versions), it fetches one record from the generator, to be discarded/stored as appropriate. > Then you can of course make your code more performant thus: > >>> reader.next() > >>> (tuple(row) for row in reader) > > In the majority of cases this optimization is not worth it If the CSV file is large, using the iterator version is usually worth the small performance penalty, as you don't have to keep the whole file in memory. As somebody who regularly deals with 0.5-1GB CSV files from cellular providers, I speak from experience of having my machine choke when reading the whole thing in. > In any case, strewing prints all over the code is a bad habit > (except for debugging). Sorry if my print-statements were misinterpreted--I meant them as a "do what you want with the data here" stand-in (thus the ellipsis). -tkc -- http://mail.python.org/mailman/listinfo/python-list