On Fri, 2014-06-27 at 12:33 +0200, Sandeep Khanzode wrote:
> I have an index that runs into 200-300GB. It is not frequently updated.

"not frequently" means different things for different people. Could you
give an approximate time span? If it is updated monthly, you might
consider a full optimization after update.

> What are the best strategies to query on this index?

> 1.] Should I, at index time, split the content, like a hash based
> partition, into multiple separate smaller indexes and aggregate the
> results programmatically?

Assuming you use multiple machines or independent storage for the
multiple indexes, this will bring down latency. Do this if your searches
are too slow.

>  2.] Should I replicate this index and provide some
> sort of document ID, and search on each node for a specific range of
> document IDs?

I don't really follow that idea. Are your searches only ID-based?

Anyway, replication increases throughput. Do this if your server have
trouble keeping up with the full amount of work.

>  3.] Is there any way I can split or move individual
> segments to different nodes and aggregate the results?

Copy the full index. Delete all documents in copy 1 that matches one
half of your ID-hash function, do the reverse for the other. As your
corpus is semi-randomly distributed, scores should be comparable between
the indexes so that the result sets can be easily merged.

But at Jigar says, you should consider switching to SolrCloud (or
ElasticSearch) which does all this for you.

> I am not fully aware of the large scale query strategies. Can you
> please share your findings or experiences?

Depends on what you mean by large scale. You have a running system -
what do you want? Scaling up? Lowering latency? Increasing throughput?
More complex queries?

- Toke Eskildsen, State and University Library, Denmark



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to