Doug wrote: > I'm having trouble getting a clear picture of your indexing scheme.
I've been doing a lot of thinking about this same problem, so I may be a little more in tune with what Elliot's saying. By the way, Elliot, I'm very interested in your results. I considered the basic approach you're using, but I thought it was a bit extreme in terms of having zillions of tiny lucene Documents. I'm working on a quick kludge that may serve my immediate purposes (if it does, I'm planning to post the deatils here). > Could you provide some simple examples, e.g., for the xml: > <tag1>this is some text > <tag2>and some other text</tag2> > </tag1> > would you have something like the following? > doc1 > node_type: tag1 > contents: this is some text > doc2 > node_type: tag2 > contents: and some other text > doc3 > node_type: all_contents > contents: this is some text and some other text I think that's exactly what Elliot is intending. > My first instinct would be to have something like: > doc1 > tag1: this is some text > tag2: and some other text > all-tags: this is some text and some other text > What do you need that that does not achieve? Name collision - you can have multiple Elements at different levels, and you may have attributes and tags having the same name. Obviously one way around this is "Don't do that", but that could get really tiresome, quickly. If you just conflate the elements and attributes under the same name (i.e. field "blah" contains a concatenated set of values from all occurrences of both elements and attributes) then your searches become much more limited in what you can specify. This is, by the way, the approach I'm trying out, with a second stage to refine the results and drop out false positives. But I'll have to wait on saying any more about that. All of this, of course, is in the context of having arbitrary XML documents. If you have predefined XML schemas then you can hand-code the mappings from elements to lucene document fields. But then you trade a heck of a lot of flexibility for a lot of maintenance. Steven J. Owens [EMAIL PROTECTED]