We used to store a big text field for highlighting purposes too, and it proved a big pain. The index was gigantic, it took forever to build, and the search performance would sometimes suffer from it (just a hunch).
Now we keep this big text field on disk (in a file), and feed it to the highlighter. Unfortunately the highlighter has to read the file, parse it, etc... It's slooow, sometimes over a second on a large document. I vaguely remember reading somewhere newer versions of the highlighter are able to leverage term vectors to avoid re-parsing the field. (I could be mistaken.) Maybe just storing term vectors would keep the index lean and allow for fast highlighting? --Renaud -----Original Message----- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 10, 2007 9:54 AM To: java-user@lucene.apache.org Subject: Re: Text storing design and performance question Being stateless should not be much of an issue. As Erick mentioned, the highlighter just expects you to pass it the query again and the text to be highlighted. So when you show the pagination you just need to keep around what query generated the current page...then shove each piece of relevant text from the database through the highlighter (with the query) before displaying it. - Mark On 1/10/07, moraleslos <[EMAIL PROTECTED]> wrote: > > > Hi Mark, > > Looks like I've got to implement some sort of pagination for my clients. > Problem is everything is stateless so looks like there's some work I > need to do on my end. Thanks. > > -los > > > > markrmiller wrote: > > > > Usually a user cannot easily browse 50,000 on a single display, and > > so you would only highlight the docs as they became visible to the user. > > This is generally a small amount...often one at a time. > > > > - Mark > > > > moraleslos wrote: > >> Hi Erik, > >> > >> Would that slow performance a bit? For example, say I receive > >> 50,000 hits from a search. From your explanation, I have to > >> retrieve the DB id > from > >> each hit, perform a query to the DB using the id to retrieve the > >> full contents for each hit, run highlighter on each content, and > >> then > return? > >> Although I'll give this a shot, it will seem to slow performance on > >> the searching side of things, wouldn't it? Thanks for the reply. > >> > >> -los > >> > >> > >> > >> Erik Hatcher wrote: > >> > >>> You don't have to store a field to highlight text. If you've got > >>> it in your database, retrieve it from there and pass that string > >>> to the highlighter instead. > >>> > >>> Erik > >>> > >>> > >>> On Jan 10, 2007, at 10:45 AM, moraleslos wrote: > >>> > >>> > >>>> I'm running into a little dilemma with Lucene highlighting and > >>>> indexing. I currently index anything and everything that gets > >>>> inserted into a database. > >>>> This database includes all the content that is searched. Now > >>>> I'll have lots and lots of content, thinking of the range of > >>>> 50GB+, all stored in the DB. > >>>> Using Lucene, I index all of this. But since I'm using > >>>> highlighting features, I'll also need to store the content into > >>>> the index. Not sure what the performance implications are during > >>>> a search but I know that indexing performance should be slower as > >>>> well as the index size being > enormous. > >>>> Because I have duplicated data, one in the index and the other in > >>>> the db, are there other ways of handling this situation in a more > >>>> efficient and performant way? Thanks in advance. > >>>> > >>>> -los > >>>> -- > >>>> View this message in context: http://www.nabble.com/Text-storing- > >>>> design-and-performance-question-tf2953201.html#a8259883 > >>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. > >>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>>> For additional commands, e-mail: [EMAIL PROTECTED] > >>>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>> For additional commands, e-mail: [EMAIL PROTECTED] > >>> > >>> > >>> > >>> > >> > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > -- > View this message in context: > http://www.nabble.com/Text-storing-design-and-performance-question-tf2953201 .html#a8261739 > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]