In article <mailman.285.1293297695.6505.python-l...@python.org>, Adam Tauno Williams <awill...@whitemice.org> wrote:
> XML works extremely well for large datasets. Barf. I'll agree that there are some nice points to XML. It is portable. It is (to a certain extent) human readable, and in a pinch you can use standard text tools to do ad-hoc queries (i.e. grep for a particular entry). And, yes, there are plenty of toolsets for dealing with XML files. On the other hand, the verbosity is unbelievable. I'm currently working with a data feed we get from a supplier in XML. Every day we get incremental updates of about 10-50 MB each. The total data set at this point is 61 GB. It's got stuff like this in it: <Parental-Advisory>FALSE</Parental-Advisory> That's 54 bytes to store a single bit of information. I'm all for human-readable formats, but bloating the data by a factor of 432 is rather excessive. Of course, that's an extreme example. A more efficient example would be: <Id>1173722</Id> which is 26 bytes to store an integer. That's only a bloat factor of 6-1/2. Of course, one advantage of XML is that with so much redundant text, it compresses well. We typically see gzip compression ratios of 20:1. But, that just means you can archive them efficiently; you can't do anything useful until you unzip them. -- http://mail.python.org/mailman/listinfo/python-list