RE: Text storing design and performance question

moraleslos Wed, 10 Jan 2007 12:14:55 -0800

Maybe keeping the data in the DB would make it quicker?  Seems like the I/O
performance would cause most of the performance issues you're seeing.


-los


Renaud Waldura-5 wrote:
> 
> We used to store a big text field for highlighting purposes too, and it
> proved a big pain. The index was gigantic, it took forever to build, and
> the
> search performance would sometimes suffer from it (just a hunch).
> 
> Now we keep this big text field on disk (in a file), and feed it to the
> highlighter. Unfortunately the highlighter has to read the file, parse it,
> etc... It's slooow, sometimes over a second on a large document.
> 
> I vaguely remember reading somewhere newer versions of the highlighter are
> able to leverage term vectors to avoid re-parsing the field. (I could be
> mistaken.) Maybe just storing term vectors would keep the index lean and
> allow for fast highlighting?
> 
> --Renaud
> 
>  
> 
> -----Original Message-----
> From: Mark Miller [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, January 10, 2007 9:54 AM
> To: [email protected]
> Subject: Re: Text storing design and performance question
> 
> Being stateless should not be much of an issue. As Erick mentioned, the
> highlighter just expects you to pass it the query again and the text to be
> highlighted. So when you show the pagination you just need to keep around
> what query generated the current page...then shove each piece of relevant
> text from the database through the highlighter (with the query) before
> displaying it.
> 
> - Mark
> 
> On 1/10/07, moraleslos <[EMAIL PROTECTED]> wrote:
>>
>>
>> Hi Mark,
>>
>> Looks like I've got to implement some sort of pagination for my clients.
>> Problem is everything is stateless so looks like there's some work I 
>> need to do on my end.  Thanks.
>>
>> -los
>>
>>
>>
>> markrmiller wrote:
>> >
>> > Usually a user cannot easily browse 50,000 on a single display, and 
>> > so you would only highlight the docs as they became visible to the
>> user.
>> > This is generally a small amount...often one at a time.
>> >
>> > - Mark
>> >
>> > moraleslos wrote:
>> >> Hi Erik,
>> >>
>> >> Would that slow performance a bit?  For example, say I receive 
>> >> 50,000 hits from a search.  From your explanation, I have to 
>> >> retrieve the DB id
>> from
>> >> each hit, perform a query to the DB using the id to retrieve the 
>> >> full contents for each hit, run highlighter on each content, and 
>> >> then
>> return?
>> >> Although I'll give this a shot, it will seem to slow performance on 
>> >> the searching side of things, wouldn't it?  Thanks for the reply.
>> >>
>> >> -los
>> >>
>> >>
>> >>
>> >> Erik Hatcher wrote:
>> >>
>> >>> You don't have to store a field to highlight text.  If you've got 
>> >>> it in your database, retrieve it from there and pass that string 
>> >>> to the highlighter instead.
>> >>>
>> >>>     Erik
>> >>>
>> >>>
>> >>> On Jan 10, 2007, at 10:45 AM, moraleslos wrote:
>> >>>
>> >>>
>> >>>> I'm running into a little dilemma with Lucene highlighting and 
>> >>>> indexing.  I currently index anything and everything that gets 
>> >>>> inserted into a database.
>> >>>> This database includes all the content that is searched.  Now 
>> >>>> I'll have lots and lots of content, thinking of the range of 
>> >>>> 50GB+, all stored in the DB.
>> >>>> Using Lucene, I index all of this.  But since I'm using 
>> >>>> highlighting features, I'll also need to store the content into 
>> >>>> the index.  Not sure what the performance implications are during 
>> >>>> a search but I know that indexing performance should be slower as 
>> >>>> well as the index size being
>> enormous.
>> >>>> Because I have duplicated data, one in the index and the other in 
>> >>>> the db, are there other ways of handling this situation in a more 
>> >>>> efficient and performant way?  Thanks in advance.
>> >>>>
>> >>>> -los
>> >>>> --
>> >>>> View this message in context: http://www.nabble.com/Text-storing-
>> >>>> design-and-performance-question-tf2953201.html#a8259883
>> >>>> Sent from the Lucene - Java Users mailing list archive at
>> Nabble.com.
>> >>>>
>> >>>>
>> >>>>
>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> >>>> For additional commands, e-mail: [EMAIL PROTECTED]
>> >>>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> >>> For additional commands, e-mail: [EMAIL PROTECTED]
>> >>>
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [EMAIL PROTECTED]
>> > For additional commands, e-mail: [EMAIL PROTECTED]
>> >
>> >
>> >
>>
>> --
>> View this message in context:
>>
> http://www.nabble.com/Text-storing-design-and-performance-question-tf2953201
> .html#a8261739
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Text-storing-design-and-performance-question-tf2953201.html#a8265458
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Text storing design and performance question

Reply via email to