OK, I think I'm getting a better handle here. I can't imagine how it would work to combine indexes that use *different* analyzers on the *same* field. Regardless of what Lucene did, you simply could NOT explain this to a user. To take a simple example, index part of your data for field1 with KeywordAnalyzer and part with WhiteSpaceAnalyzer. Now you index the same data in two different documents, i.e. field1 has "some data" in two docs.
Now one doc has the tokens "some" and "data", and the other has "some data". Depending on the analyzer you use at query time, searching for "some" would return one doc. Searching for "some data" would return the other doc. Or would return the first doc depending upon the query-time analyzer. And no matter what analyzer you used, searching for "some" would never return doc 2. Even assuming Lucene allows this how in the world could you explain what the "correct" results were to a user? Even assuming that Lucene does something reasonable, it'll still be wrong enough of the time to make the system unusable. And we're not even into stopwords, cases, different languages, accent folding. Not to mention how in the world you'd be able to explain what query analyzer to use <G>.... That said, you *can* use *different* analyzers on *different* fields in the same index. See PerFieldAnalyzerWrapper. And there's no restriction that all documents have the *same* fields. So as long as all your field names were disjoint, it could work. *But* I have no real idea whether this is supported, so you'll have to experiment if you still think it's a good thing. But it also seems that the parallel/not parallel decision is something you control on the back end, so I'm not sure the user is involved in the merge question at all. In other words, you could easily split the indexing task up amongst several machines and/or processes and combine all the results after all the sub-indexes were build, thus making your question basically irrelevant. But you still haven't explained what the user is getting from all this flexibility. I have a hard time understanding the use-case you're trying to support. If you're trying to build a generic front-end to allow parameterized Lucene index building, have you looked at SOLR, which uses XML configuration files to drive the indexing and searching process? (which I haven't used, but I'm tracking the user's group list.....). Best Erick On Jan 14, 2008 10:35 AM, <[EMAIL PROTECTED]> wrote: > > Then why would you want to combine them? > > > > I really think you need to explain what you're trying to accomplish > > rather then obsess on the details. > > I have to create indexes in parallel because the amount of data is very > high. > Then I want to merge them into bigger indexes an move them to the search > server. > > (See therad "Max size of index (FSDirectory )" too.) > > My question now is, what will happen if one merges indexes which were > created with different analyzer (the customer can confige the analyzer > depending on the data which is indexed)? > > I think this will produce unpredicted results when searching. > If so I have to document that only indexes created with the same analyzer > are allowed to merge. > > Thank you > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >