> But it also seems that the parallel/not parallel decision is
> something you control on the back end, so I'm not sure the user
> is involved in the merge question at all. In other words, you could
> easily split the indexing task up amongst several machines and/or
> processes and combine all the results after all the sub-indexes
> were build, thus making your question basically irrelevant.

I'm just writing the tool. The customer (IT staff, not end user) uses it.
Now I have to find out the best stategy which allows fast indexing and
searching.

> But you still haven't explained what the user is getting from all
> this flexibility. 

I'm not asking about flexibility, only about indexing performance.
I know that I have to index in parallel (initial data is about 150 million
pages with more than 1 TByte).
Now I'm thinking about what I have to take into consideration when building
the merge tool.
Then I saw the code of IndexMergeTool I was astonished about the hardcoded
use of SimpleAnalyzer and why one uses an anylzer at all. Therefore my
questions.

> I have a hard time understanding the use-case
> you're trying to support. If you're trying to build a generic 
> front-end
> to allow parameterized Lucene index building, have you looked at
> SOLR, which uses XML configuration files to drive the indexing
> and searching process? (which I haven't used, but I'm tracking
> the user's group list.....).

No the frontend just provides a textbox where the user can type in a text
like "foo*".

The query is then executed against some un_tokenized fields (fixed
parameters for the current user like company etc.) and against this single
tokenized field.
I want to filter on these fixed fields too, to reduce the number of hits for
the fulltext query. I think this is the right approach to achieve a better
search performance. Isn't it?

Thank you!


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to