Re: [basex-talk] Navigating a DOM

Charles Foster Mon, 15 Oct 2012 06:36:37 -0700

>> May I suggest XOM? [1].
>
> Had a look at it but don't see how it could solve my challenges.


Did you thoroughly investigate XOM?

Your requirement:
"I have a really large XML file which does not fit into memory, and I
would like to navigate it as a DOM."

XOM Website (front page):
"XOM is very memory efficient. If you read an entire document into
memory, XOM uses as little memory as possible. More importantly, XOM
allows you to filter documents as they're built so you don't have to
build the parts of the tree you aren't interested in. For instance,
you can skip building text nodes that only represent boundary white
space, if such white space is not significant in your application. You
can even process a document piece by piece and throw away each piece
when you're done with it. XOM has been used to process documents that
are gigabytes in size."

Failing that, you could check out Saxon's TinyTree implementation.

>> Also, navigating a DOM tree is quite irksome in comparison to using
>> XQuery/XPath. Perhaps you could consider slicing up your large document
>> into "manageable chunks" (e.g. smaller documents) then inserting the
>> smaller documents into BaseX with a view to then running XQuery to get
>> specific parts of the logical large document when and as required. This
>> approach would use less memory and may well be more efficient.
>
> Not really, because everything is somewhat deeply nested with very
> different numbers of nodes in the various subtrees. Partitioning would
> be at least cumbersome and would have to be done each time a new version
> of the data comes along.

I find it difficult to understand how there there can not be
"something" you can do to break the XML down to something more
manageable, and perhaps put the sliced XML documents in their own
collection to signify a complete logical document. Could you perhaps
give an example?

If BaseX's model to storing XML documents can not cope with such large
XML documents then consider Sedna. As far as I am aware, Sedna is
actually ideal for storing huge single file XML documents.

>> May I also suggest checking out the BaseX XQJ API [2], where retrieved XML
>> can be obtained as a Java DOM Node (e.g. Element / Document), StaX
>> XMLStreamReader and SAX ContentHandler.
>
> Yes, I tried XQJ, but I cannot deploy XQJ on Android because it is in
> the javax.* namespace. Sure, I could repackage interface and
> implementation, but I'd rather try to avoid it. And I guess using XQJ
> would still cause BaseX to build up the whole tree in memory.

That's a shame.

Regards,

Charles
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Re: [basex-talk] Navigating a DOM

Reply via email to