Impossible to say - how big is big?  How fast is fast?  I'd start with
the simplest option and if it's fast enough, stop.


--
Ian.


On Sat, May 5, 2012 at 12:47 AM, Yang <teddyyyy...@gmail.com> wrote:
> I have an index containing all students, now I want to do an index
> search inside an Apache Hadoop mapper,
> i.e.
>
> for each (record from mapper input reader) {
>    output = lucene.search("name:"+ record.name  + " OR " + " id:" +
> record.id );
>    emit(output)
> }
>
>
> my question is whether I should shard the index (across terms, not
> splitting the same postings list for one term) or simply replicate it.
> the index for the entire dataset is not too big, so it can fig into
> my local disk, and I can copy it to every node in the cluster, and let
> them sit there all the time, so no copy overhead is incurred.
> the only argument in favor of sharding is that a smaller index might
> be faster.  but since index search is only O(lg(n)) time, maybe this
> time saving is very small.
>
> so will sharding be worth the effort?
>
> thanks
> yang
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to