In article <0044bfd0-f07f-4f7b-b976-5df034b6f...@googlegroups.com>, Harsh Jha <harshjha2...@gmail.com> wrote:
> I've a huge csv file and I want to read stuff from it again and again. Is it > useful to pickle it and keep and then unpickle it whenever I need to use that > data? Is it faster that accessing that file simply by opening it again and > again? Please explain, why? > > Thank you. It can be. I did a project a bunch of years ago which involved reading (and parsing) SNMP MIBs before you could do any work. Startup took something like 10-20 seconds. If I pre-parsed the MIBs and wrote out the data structures as pickles, I could cut startup time to a couple of seconds. But, that's because the parsing I was doing was pretty complicated. Parsing a CSV file is much easier, so I wouldn't expect you to have much improvement reading a pickle file vs. reading the original CSV. The bottom line is, you should try it. Pickling a data structure is about one line of code (not counting the 'import cPickle'). Try it and see what happens. Time how long it takes to read the original file, and how long it takes to read the pickle. Let us know your results. Also, let us know what "huge" means. 1000 rows? A million? 100 million? -- https://mail.python.org/mailman/listinfo/python-list