Re: Batch searching

tsuraan Wed, 22 Jul 2009 10:42:12 -0700

> If you did this, wouldn't you be binding the processing of the results
> of all queries to that of the slowest performing one within the collection?


I would imagine it would, but I haven't seen too much variance between
lucene query speeds in our data.

> I'm guessing you are trying for some sort of performance benefit by
> batch processing, but I question whether or not you will actually get
> more performance by performing your queries in a threaded type
> environment, and then processing their results as they come in.
>
> Could you give a bit more description about what you are actually trying
> to accomplish, I'm sure this list could help better if we had more
> information.

What I'd like to do is build lots of small indices (a few thousand
documents per index) and put them into HDFS for search distribution.
We already have our own map-reduce framework for searching, but HDFS
seems to be a really good fit for an actual storage mechanism.

My concern is that when we have one searcher using thousands of
HDFS-backed indices, the seeking might get a bit nasty. HDFS
apparently has pretty good seeking performance, but it really looks
like it was designed for streaming, so if I could make my searches use
sequential index access, I would expect better performance than having
a ton of simultaneous searches making HDFS seek all over the place.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Batch searching

Reply via email to