It could well be. Or maybe not; one could speculate the JDBC overhead (especially over a network) would make DB access slower than disk (local filesystem). YMMV as they say.
-----Original Message----- From: moraleslos [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 10, 2007 12:14 PM To: java-user@lucene.apache.org Subject: RE: Text storing design and performance question Maybe keeping the data in the DB would make it quicker? Seems like the I/O performance would cause most of the performance issues you're seeing. -los Renaud Waldura-5 wrote: > > We used to store a big text field for highlighting purposes too, and > it proved a big pain. The index was gigantic, it took forever to > build, and the search performance would sometimes suffer from it (just > a hunch). > > Now we keep this big text field on disk (in a file), and feed it to > the highlighter. Unfortunately the highlighter has to read the file, > parse it, etc... It's slooow, sometimes over a second on a large document. > > I vaguely remember reading somewhere newer versions of the highlighter > are able to leverage term vectors to avoid re-parsing the field. (I > could be > mistaken.) Maybe just storing term vectors would keep the index lean > and allow for fast highlighting? > > --Renaud > > > > -----Original Message----- > From: Mark Miller [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 10, 2007 9:54 AM > To: java-user@lucene.apache.org > Subject: Re: Text storing design and performance question > > Being stateless should not be much of an issue. As Erick mentioned, > the highlighter just expects you to pass it the query again and the > text to be highlighted. So when you show the pagination you just need > to keep around what query generated the current page...then shove each > piece of relevant text from the database through the highlighter (with > the query) before displaying it. > > - Mark > > On 1/10/07, moraleslos <[EMAIL PROTECTED]> wrote: >> >> >> Hi Mark, >> >> Looks like I've got to implement some sort of pagination for my clients. >> Problem is everything is stateless so looks like there's some work I >> need to do on my end. Thanks. >> >> -los >> >> >> >> markrmiller wrote: >> > >> > Usually a user cannot easily browse 50,000 on a single display, and >> > so you would only highlight the docs as they became visible to the >> user. >> > This is generally a small amount...often one at a time. >> > >> > - Mark >> > >> > moraleslos wrote: >> >> Hi Erik, >> >> >> >> Would that slow performance a bit? For example, say I receive >> >> 50,000 hits from a search. From your explanation, I have to >> >> retrieve the DB id >> from >> >> each hit, perform a query to the DB using the id to retrieve the >> >> full contents for each hit, run highlighter on each content, and >> >> then >> return? >> >> Although I'll give this a shot, it will seem to slow performance >> >> on the searching side of things, wouldn't it? Thanks for the reply. >> >> >> >> -los >> >> >> >> >> >> >> >> Erik Hatcher wrote: >> >> >> >>> You don't have to store a field to highlight text. If you've got >> >>> it in your database, retrieve it from there and pass that string >> >>> to the highlighter instead. >> >>> >> >>> Erik >> >>> >> >>> >> >>> On Jan 10, 2007, at 10:45 AM, moraleslos wrote: >> >>> >> >>> >> >>>> I'm running into a little dilemma with Lucene highlighting and >> >>>> indexing. I currently index anything and everything that gets >> >>>> inserted into a database. >> >>>> This database includes all the content that is searched. Now >> >>>> I'll have lots and lots of content, thinking of the range of >> >>>> 50GB+, all stored in the DB. >> >>>> Using Lucene, I index all of this. But since I'm using >> >>>> highlighting features, I'll also need to store the content into >> >>>> the index. Not sure what the performance implications are >> >>>> during a search but I know that indexing performance should be >> >>>> slower as well as the index size being >> enormous. >> >>>> Because I have duplicated data, one in the index and the other >> >>>> in the db, are there other ways of handling this situation in a >> >>>> more efficient and performant way? Thanks in advance. >> >>>> >> >>>> -los >> >>>> -- >> >>>> View this message in context: >> >>>> http://www.nabble.com/Text-storing- >> >>>> design-and-performance-question-tf2953201.html#a8259883 >> >>>> Sent from the Lucene - Java Users mailing list archive at >> Nabble.com. >> >>>> >> >>>> >> >>>> >> --------------------------------------------------------------------- >> >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >> >>>> For additional commands, e-mail: >> >>>> [EMAIL PROTECTED] >> >>>> >> >>> ----------------------------------------------------------------- >> >>> ---- To unsubscribe, e-mail: >> >>> [EMAIL PROTECTED] >> >>> For additional commands, e-mail: [EMAIL PROTECTED] >> >>> >> >>> >> >>> >> >>> >> >> >> >> >> > >> > ------------------------------------------------------------------- >> > -- To unsubscribe, e-mail: [EMAIL PROTECTED] >> > For additional commands, e-mail: [EMAIL PROTECTED] >> > >> > >> > >> >> -- >> View this message in context: >> > http://www.nabble.com/Text-storing-design-and-performance-question-tf2 > 953201 > .html#a8261739 >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Text-storing-design-and-performance-question-tf2953201 .html#a8265458 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]