Impossible to say - how big is big? How fast is fast? I'd start with the simplest option and if it's fast enough, stop.
-- Ian. On Sat, May 5, 2012 at 12:47 AM, Yang <teddyyyy...@gmail.com> wrote: > I have an index containing all students, now I want to do an index > search inside an Apache Hadoop mapper, > i.e. > > for each (record from mapper input reader) { > output = lucene.search("name:"+ record.name + " OR " + " id:" + > record.id ); > emit(output) > } > > > my question is whether I should shard the index (across terms, not > splitting the same postings list for one term) or simply replicate it. > the index for the entire dataset is not too big, so it can fig into > my local disk, and I can copy it to every node in the cluster, and let > them sit there all the time, so no copy overhead is incurred. > the only argument in favor of sharding is that a smaller index might > be faster. but since index search is only O(lg(n)) time, maybe this > time saving is very small. > > so will sharding be worth the effort? > > thanks > yang > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org