Diez B. Roggisch wrote: > > the xml.dom.minidom object is too slow when parsing such a big XML file > > to a DOM object. while pulldom should spend quite a long time going > > through the whole database file. How to enhance the searching speed? > > Are there existing solution or algorithm? Thank you for your > > suggetion... > > I've told you that before, and I tell you again: RDBMS is the way to go.
We've lost some context from the original post that may be relevant here, but if populating what the original questioner calls "the database" is an infrequent operation, then an RDBMS probably is the way to go, in general. On the other hand, if a lot of parsing has to happen in order to perform a search, such parsing would probably incur a lot of overhead from SQL inserts that wouldn't be particularly desirable. > There might be XML-parsers that work faster - I suppose cElementTree can > gain you some speed - but ultimately the problems are inherent in the > representation as DOM: no type-information, no indices, no nothing. Just a > huge pile of nodes in memory. Well, I would hope that W3C DOM operations like getElementById would be supported by some index in the implementation: that would make some of the searches mentioned by the questioner fairly rapid, given enough memory. > So all searches are linear in the number of nodes. Of course you might be > able to create indices yourself, even devise a clever scheme to make using > them as declarative as possible. But that would in the end mean nothing but > re-creating RDBMS technology - why do that, if it's already there? I agree that careful usage of RDBMS technology would solve the general problems of searching large amounts of data, but the stated queries should involve indexes and be fairly quick. Paul -- http://mail.python.org/mailman/listinfo/python-list