I'm planning to use Lucene to index scads of XML files whose data model includes replicated blocks of tags. Translation: a novice question follows.
My files have a common XML pattern (for illustrative purposes): <blocks> <block id="123">some text 1</block> <block id="456">some text 2</block> <block id="789">some text 3</block> </blocks> Each block has a unique id, but the tagname is identical. The actual data model has nested tags within these blocks - ie: metadata with the same tagnames within each block. So, in the real data model, there are multiple identical tagnames that are associated with a specific parent. Something more like this: <blocks> <block id="123"> <author>Joe Blow</author> <job>hack</job> </block> <block id="456"> <author>Jane Doe</author> <job>President</job> </block> </blocks> In latter case, I need to be able to search by author or job, for example, and get the tag's text contents as well as the parent block id. Adding a field name of "block" or "author" or "job" multiple times to the same Lucene Document, according to the Lucene javadoc, has the effect of appending the text for search purposes. I take that to mean, in order to use a 'hit' I would need to somehow uniquely identify the field from which the content came even though the content was appended for search purposes. If I searched an 'author' field name and got a hit, I would not be able to disambiguate which block id the actual hit belonged to. Or if I searched on "job", how would I know a hit belonged to block id 456 instead of block id 123 parent? What is the Lucene approach for indexing a single document that has the same field name appearing in multiple places and then using the hit to find the exact association of block id in the above example? Hope this question makes sense. I'm sure I'm missing something obvious/simple in how the API would work in this case. Thanks, Landon Cox -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>