Tim Harig, 26.12.2010 02:05:
On 2010-12-25, Nobody<nob...@nowhere.com>  wrote:
On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:
Of course, one advantage of XML is that with so much redundant text, it
compresses well.  We typically see gzip compression ratios of 20:1.
But, that just means you can archive them efficiently; you can't do
anything useful until you unzip them.

XML is typically processed sequentially, so you don't need to create a
decompressed copy of the file before you start processing it.

Sometimes XML is processed sequentially.  When the markup footprint is
large enough it must be.  Quite often, as in the case of the OP, you only
want to extract a small piece out of the total data.  In those cases, being
forced to read all of the data sequentially is both inconvenient and and a
performance penalty unless there is some way to address the data you want
directly.

So what? If you only have to do that once, it doesn't matter if you have to read the whole file or just a part of it. Should make a difference of a couple of minutes.

If you do it a lot, you will have to find a way to make the access efficient for your specific use case. So the file format doesn't matter either, because the data will most likely end up in a fast data base after reading it in sequentially *once*, just as in the case above.

I really don't think there are many important use cases where you need fast random access to large data sets and cannot afford to adapt the storage layout before hand.

Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to