On Thu, Feb 5, 2009 at 12:47 PM, Michael Stoppelman <stop...@gmail.com>wrote:

>
>
> On Thu, Feb 5, 2009 at 9:05 AM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> Google uses dedicated highlighting servers.  Maybe this architecture would
>> work for you.
>>
>
> What's your reference? I used to work at Google.
>

I think creating a separate index/service would be reasonable and it's what
I purposed in a previous email on this thread...
"One option to get around the changing scoring would be to to run a
completely separate index for highlighting (with the overlapping docs you
described)."

Still do lucene developers think storing the offsets is a bad idea from an
index size prospective or some other reason?

M


>
>> On Mon, Feb 2, 2009 at 11:24 PM, Michael Stoppelman <stop...@gmail.com
>> >wrote:
>>
>> > Hi all,
>> >
>> > My search backends are only able to eek out 13-15 qps even with the
>> entire
>> > index in memory (this makes it very expensive to scale). According to my
>> > YourKit profiler 80% of the program's time ends up in highlighting. With
>> > highlighting disabled my backend gets about 45-50 qps (cheaper scaling)!
>> > We're using Mark's TokenSources contrib. to make reconstructing of the
>> > document quicker. I was contemplating patching the index to store
>> offsets
>> > for every term (instead of just the ordinal positions) so that I could
>> make
>> > the highlighting faster (since you would know where you hit in the
>> document
>> > on the search pass). I saw this thread from 2004:
>> > http://www.mail-archive.com/lucene-...@jakarta.apache.org/msg04743.html-
>> > which asks about adding offsets to the index but it was decided against
>> > because it would make the index too large. I can totally understand
>> this;
>> > but as machines get more beefy it would probably be nice to make this
>> > optional since having 15 qps vs 50qps is quite a trade-off right now.
>> Are
>> > other folks seeing this? My documents are quite big sometimes up to 300k
>> > tokens. Also my document fields are compressed which is also a time sink
>> > for
>> > the cpu.
>> >
>> > Please let me know if you need more details, happy to share.
>> >
>> > Sincerely,
>> > M
>> >
>>
>
>

Reply via email to