Re: Trying to parse a HUGE(1gb) xml file

Stefan Behnel Sun, 26 Dec 2010 01:48:03 -0800

Tim Harig, 26.12.2010 10:22:

On 2010-12-26, Stefan Behnel wrote:

Tim Harig, 26.12.2010 02:05:

On 2010-12-25, Nobody wrote:

On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:

Of course, one advantage of XML is that with so much redundant text, it
compresses well.  We typically see gzip compression ratios of 20:1.
But, that just means you can archive them efficiently; you can't do
anything useful until you unzip them.


XML is typically processed sequentially, so you don't need to create a
decompressed copy of the file before you start processing it.


Sometimes XML is processed sequentially.  When the markup footprint is
large enough it must be.  Quite often, as in the case of the OP, you only
want to extract a small piece out of the total data.  In those cases, being
forced to read all of the data sequentially is both inconvenient and and a
performance penalty unless there is some way to address the data you want
directly.

 [...]
If you do it a lot, you will have to find a way to make the access
efficient for your specific use case. So the file format doesn't matter
either, because the data will most likely end up in a fast data base after
reading it in sequentially *once*, just as in the case above.


If the data is just going to end up in a database anyway; then why not
send it as a database to begin with and save the trouble of having to
convert it?

I don't think anyone would object to using a native format when copyingdata from one database 1:1 to another one. But if the database formats aredifferent on both sides, it's a lot easier to map XML formatted data to agiven schema than to map a SQL dump, for example. Matter of use cases, notof data size.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Re: Trying to parse a HUGE(1gb) xml file

Reply via email to