On Thu, May 5, 2011 at 2:12 PM, Miki Tebeka <miki.teb...@gmail.com> wrote:
> Greetings, > > I'm reading some data from avro file using the avro library. It takes about > a minute to load 33K objects from the file. This seem very slow to me, > specially with the Java version reading the same file in about 1sec. > You might want to try an apache mailing list, like at http://avro.apache.org/mailing_lists.html , as I suspect most Python people use Python's native pickle support instead. It looks like the Python version of Avro is doing single-byte-at-a-time I/O for some types, which is almost guaranteed to perform poorly. If you're decoding an 8 byte integer, its much faster to at least read 8 bytes and then chop that up, and better still is to read a buffer at a time and chop that up too. Even in C, the performance of byte-at-a-time I/O is not going to be stellar, especially if you use read() rather than fread(). A related note: Python is often more about programmer efficiency than machine efficiency. With cost per MIPS going down and the price of programmer time going up, it seems a good idea.
-- http://mail.python.org/mailman/listinfo/python-list