> But it also seems that the parallel/not parallel decision is > something you control on the back end, so I'm not sure the user > is involved in the merge question at all. In other words, you could > easily split the indexing task up amongst several machines and/or > processes and combine all the results after all the sub-indexes > were build, thus making your question basically irrelevant.
I'm just writing the tool. The customer (IT staff, not end user) uses it. Now I have to find out the best stategy which allows fast indexing and searching. > But you still haven't explained what the user is getting from all > this flexibility. I'm not asking about flexibility, only about indexing performance. I know that I have to index in parallel (initial data is about 150 million pages with more than 1 TByte). Now I'm thinking about what I have to take into consideration when building the merge tool. Then I saw the code of IndexMergeTool I was astonished about the hardcoded use of SimpleAnalyzer and why one uses an anylzer at all. Therefore my questions. > I have a hard time understanding the use-case > you're trying to support. If you're trying to build a generic > front-end > to allow parameterized Lucene index building, have you looked at > SOLR, which uses XML configuration files to drive the indexing > and searching process? (which I haven't used, but I'm tracking > the user's group list.....). No the frontend just provides a textbox where the user can type in a text like "foo*". The query is then executed against some un_tokenized fields (fixed parameters for the current user like company etc.) and against this single tokenized field. I want to filter on these fixed fields too, to reduce the number of hits for the fulltext query. I think this is the right approach to achieve a better search performance. Isn't it? Thank you! --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]