Re: [basex-talk] Navigating a DOM

Rainer Klute Mon, 15 Oct 2012 07:53:35 -0700

On 15.10.2012 15:36, Charles Foster wrote:
>>> May I suggest XOM? [1].
>> Had a look at it but don't see how it could solve my challenges.
> Did you thoroughly investigate XOM?
>
> Your requirement:
> "I have a really large XML file which does not fit into memory, and I
> would like to navigate it as a DOM."
>
> XOM Website (front page):
> "XOM is very memory efficient. If you read an entire document into
> memory, XOM uses as little memory as possible. More importantly, XOM
> allows you to filter documents as they're built so you don't have to
> build the parts of the tree you aren't interested in. For instance,
> you can skip building text nodes that only represent boundary white
> space, if such white space is not significant in your application. You
> can even process a document piece by piece and throw away each piece
> when you're done with it. XOM has been used to process documents that
> are gigabytes in size."
>
> Failing that, you could check out Saxon's TinyTree implementation.


XML documents with gigabytes in size? Sounds good! I'll probably get
back to it if BaseX can indeed not cope with my DOM navigation
requirement and my second-best approach fails, which is to convert the
XML document into an SQLite database.


>>> Also, navigating a DOM tree is quite irksome in comparison to using
>>> XQuery/XPath. Perhaps you could consider slicing up your large document
>>> into "manageable chunks" (e.g. smaller documents) then inserting the
>>> smaller documents into BaseX with a view to then running XQuery to get
>>> specific parts of the logical large document when and as required. This
>>> approach would use less memory and may well be more efficient.
>> Not really, because everything is somewhat deeply nested with very
>> different numbers of nodes in the various subtrees. Partitioning would
>> be at least cumbersome and would have to be done each time a new version
>> of the data comes along.
> I find it difficult to understand how there there can not be
> "something" you can do to break the XML down to something more
> manageable, and perhaps put the sliced XML documents in their own
> collection to signify a complete logical document. Could you perhaps
> give an example?
>
> If BaseX's model to storing XML documents can not cope with such large
> XML documents then consider Sedna. As far as I am aware, Sedna is
> actually ideal for storing huge single file XML documents.

Would be a nice try if Sedna could run on Android. But it is in C and
not Java, so ...


>>> May I also suggest checking out the BaseX XQJ API [2], where retrieved XML
>>> can be obtained as a Java DOM Node (e.g. Element / Document), StaX
>>> XMLStreamReader and SAX ContentHandler.
>> Yes, I tried XQJ, but I cannot deploy XQJ on Android because it is in
>> the javax.* namespace. Sure, I could repackage interface and
>> implementation, but I'd rather try to avoid it. And I guess using XQJ
>> would still cause BaseX to build up the whole tree in memory.
> That's a shame.

Yes! It can be circumvented, and I am prepared to do so, but this
probably won't help me due to BaseX returning large trees and not load
objects lazily.


-- 

Best regards
Rainer Klute

signature.asc
Description: OpenPGP digital signature

_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Re: [basex-talk] Navigating a DOM

Reply via email to