As you suggest you could either roll the index on the local machine or remote and gzip the content fileds on the archive index and provide a GzipReader when you need to search old results.
If money is of the essence then the best solution probably is to have 1 good box with fast SCSI disks which serves the primary index and an "archive" slower machine with huge cheap IDE disks. You then need to have a servlet or such on the archive server which understands searching over http. Anyone said SOLR :) //Marcus On 11/5/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > You could search this list about distributing your indexes, etc. > RemoteSearchable may be handy, but you will probably have to build > some infrastructure around it for handling failover, etc. (would make > for a nice contribution) > > How often do you think archived data will need to be accessed? And > how much data are you talking? Seems to me like the main issue will > be in managing the searchers in light of having a lot of potential > indexes. Just thinking out loud, though. > > -Grant > > On Nov 4, 2007, at 1:48 PM, Sandeep Mahendru wrote: > > > Hi , > > > > We have been developing an enterprise logging service at the > > Wachovia > > bank. The logs (Busines, application, error) for all the bank related > > applications are consolidated > > at one single location in an Oracle 10g Database. > > > > In our second phase, we are now building a high perforinmg report > > viewer > > over it. So our search algorithm does not go to the Oracle 10g DB. We > > therfore avoid network and I/O. > > Our serach algorith now goes to a LUCENE index. We have Lucene indexes > > created for each application. These indexes are present on the same > > machine, > > where the search algorithm runs. As more applications at the bank > > are now > > beginning to consume this service, the Lucene Index is now growing. > > > > One of my team leads has suggested the following approach to resolve > > this > > issue: > > > > *I think the best approach is to restrict the Index size , is to > > keep it for > > some limited time and then archive the same. In case user wants to > > search > > against the old files then we might need to provide some > > configuration using > > which the lucene searcher can point to the achieved file and search > > the > > content. To implement this we need to rename the Index file with > > from and to > > date before its archived. While searching against the older files, > > user need > > to provide the date range and then the app can point to the relevant > > archived index files for search. Let me know your thoughts on this. * > > ** > > At present this sounds the most logical to me. But then we begin to > > store > > the Lucene indexes on a diffferent machine. This might again cause the > > search algorithm to make a network trip, if the serach is based on old > > archived data. > > > > Is there a better design to resolve the above concern. Does Lucene > > provid > > some sort of API to handle the above scenario's? > > > > Regards, > > Sandeep. > > -------------------------- > Grant Ingersoll > http://lucene.grantingersoll.com > > Lucene Boot Camp Training: > ApacheCon Atlanta, Nov. 12, 2007. Sign up now! http://www.apachecon.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Marcus Herou Solution Architect & Core Java developer Tailsweep AB +46702561312 [EMAIL PROTECTED] http://www.tailsweep.com