On Mon, Jul 12, 2010 at 12:59 PM, Alex Ott <alex...@gmail.com> wrote:
> > May be it worth to separate metadata of top-level objects from metadata of > embedded objects? And allow to traverse through hierarchy of embedded > objects? And provide several implementations, something like: collector of > metadata for all embedded objects, or collector only of top-level metadata, > etc. > > As long as the final solution can also handle containers with thousands or millions of documents, each document having its own set of metadata. In other words, I need a streaming way of accessing the metadata. To me, I think that means that either there is some place in the SAX api where I can get the metadata for the current subdocument (attributes in the DIV element?) or there is a point in time in using the SAX api where I can query a metadata object and know it only contains metadata for the current subdocument (get the current metadata object from the parse context at the time startElement is called for a new DIV?). Paul