cool.  i will use compression and store in index. is there anything special
i need to for decompressing the text? i presume i can just do
doc.get("content")?
thanks for your advice all!

On Sat, Mar 7, 2009 at 11:50 AM, Uwe Schindler <u...@thetaphi.de> wrote:

> You could store the text contents compressed; I think extracting text from
> PDF files is much more time-intensive than decompressing a stored field.
> And
> text-only contents often compress very good. In my opinion, if the
> (uncompressed) contents of the docs are not very large (so I mean several
> megabytes each), I would prefer storing it in index.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -----Original Message-----
> > From: Erik Hatcher [mailto:e...@ehatchersolutions.com]
> > Sent: Saturday, March 07, 2009 12:46 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene Highlighting and Dynamic Summaries
> >
> > It depends :)
> >
> > It's a trade-off.  If storing is not prohibitive, I recommend that as
> > it makes life easier for highlighting.
> >
> >       Erik
> >
> > On Mar 7, 2009, at 6:37 AM, Amin Mohammed-Coleman wrote:
> >
> > > hi
> > > that's what i was thinking about.  i would need to get the file and
> > > extract
> > > the text again and then pass through the highlighter.  The other
> > > option is
> > > storing the content in the index the downside being index is going
> > > to be
> > > large.  Which would be the recommended approach?
> > >
> > > Cheers
> > >
> > > Amin
> > >
> > > On Sat, Mar 7, 2009 at 10:50 AM, Erik Hatcher
> > <e...@ehatchersolutions.com
> > > >wrote:
> > >
> > >> With the caveat that if you're not storing the text you want
> > >> highlighted,
> > >> you'll have to retrieve it somehow and send it into the Highlighter
> > >> yourself.
> > >>
> > >>       Erik
> > >>
> > >>
> > >> On Mar 7, 2009, at 5:40 AM, Michael McCandless wrote:
> > >>
> > >>
> > >>> You should look at contrib/highlighter, which does exactly this.
> > >>>
> > >>> Mike
> > >>>
> > >>> Amin Mohammed-Coleman wrote:
> > >>>
> > >>> Hi
> > >>>> I am currently indexing documents (pdf, ms word, etc) that are
> > >>>> uploaded,
> > >>>> these documents can be searched and what the search returns to
> > >>>> the user
> > >>>> are
> > >>>> summaries of the documents.  Currently the summaries are
> > >>>> extracted when
> > >>>> indexing the file (summary constructed by taking the first 10
> > >>>> lines of
> > >>>> the
> > >>>> document and stored in the index as field).  This is not ideal
> > >>>> (static
> > >>>> summary), and I was wondering if it would be possible to create a
> > >>>> dynamic
> > >>>> summary when a hit is found and highlight the terms found.  The
> > >>>> content
> > >>>> of
> > >>>> the document is not stored in the index.
> > >>>>
> > >>>> So basically what I'm looking to do is:
> > >>>>
> > >>>> 1) PDF indexed
> > >>>> 2) PDF body contains the word "search"
> > >>>> 3) Do a search and return the hit
> > >>>> 4) Construct a summary with the term "search" included.
> > >>>>
> > >>>> I'm not sure how to go about doing this (I presume it is
> > >>>> possible).  I
> > >>>> would
> > >>>> be grateful for any advice.
> > >>>>
> > >>>>
> > >>>> Cheers
> > >>>> Amin
> > >>>>
> > >>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >>>
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >>
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to