Re: How do we limit the growth of a Lucene Index?

Marcus Herou Mon, 05 Nov 2007 04:24:17 -0800

As you suggest you could either roll the index on the local machine or
remote and gzip the content fileds on the archive index and provide a
GzipReader when you need to search old results.


If money is of the essence then the best solution probably is to have 1 good
box with fast SCSI disks which serves the primary index and an "archive"
slower machine with huge cheap IDE disks.
You then need to have a servlet or such on the archive server which
understands searching over http. Anyone said SOLR :)

//Marcus

On 11/5/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
>
> You could search this list about distributing your indexes, etc.
> RemoteSearchable may be handy, but you will probably have to build
> some infrastructure around it for handling failover, etc. (would make
> for a nice contribution)
>
> How often do you think archived data will need to be accessed?  And
> how much data are you talking?  Seems to me like the main issue will
> be in managing the searchers in light of having a lot of potential
> indexes.  Just thinking out loud, though.
>
> -Grant
>
> On Nov 4, 2007, at 1:48 PM, Sandeep Mahendru wrote:
>
> > Hi ,
> >
> >   We have been developing an enterprise logging service at the
> > Wachovia
> > bank. The logs (Busines, application, error) for all the bank related
> > applications are consolidated
> > at one single location in an Oracle 10g Database.
> >
> > In our second phase, we are now building a high perforinmg report
> > viewer
> > over it. So our search algorithm does not go to the Oracle 10g DB. We
> > therfore avoid network and I/O.
> > Our serach algorith now goes to a LUCENE index. We have Lucene indexes
> > created for each application. These indexes are present on the same
> > machine,
> > where the search algorithm runs. As more applications at the bank
> > are now
> > beginning to consume this service, the Lucene Index is now growing.
> >
> > One of my team leads has suggested the following approach to resolve
> > this
> > issue:
> >
> > *I think the best approach is to restrict the Index size , is to
> > keep it for
> > some limited time and then archive the same. In case user wants to
> > search
> > against the old files then we might need to provide some
> > configuration using
> > which the lucene searcher can point to the achieved file and search
> > the
> > content. To implement this we need to rename the Index file with
> > from and to
> > date before its archived. While searching against the older files,
> > user need
> > to provide the date range and then the app can point to the relevant
> > archived index files for search. Let me know your thoughts on this. *
> > **
> > At present this sounds the most logical to me. But then we begin to
> > store
> > the Lucene indexes on a diffferent machine. This might again cause the
> > search algorithm to make a network trip, if the serach is based on old
> > archived data.
> >
> > Is there a better design to resolve the above concern. Does Lucene
> > provid
> > some sort of API to handle the above scenario's?
> >
> > Regards,
> > Sandeep.
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
>
> Lucene Boot Camp Training:
> ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Marcus Herou Solution Architect & Core Java developer Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com

Re: How do we limit the growth of a Lucene Index?

Reply via email to