Hi Marcus, Thanks for providing these suggestions. I will work on these directions with my team.
Regards, Sandeep. On 11/5/07, Marcus Herou <[EMAIL PROTECTED]> wrote: > > As you suggest you could either roll the index on the local machine or > remote and gzip the content fileds on the archive index and provide a > GzipReader when you need to search old results. > > If money is of the essence then the best solution probably is to have 1 > good > box with fast SCSI disks which serves the primary index and an "archive" > slower machine with huge cheap IDE disks. > You then need to have a servlet or such on the archive server which > understands searching over http. Anyone said SOLR :) > > //Marcus > > On 11/5/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > > > You could search this list about distributing your indexes, etc. > > RemoteSearchable may be handy, but you will probably have to build > > some infrastructure around it for handling failover, etc. (would make > > for a nice contribution) > > > > How often do you think archived data will need to be accessed? And > > how much data are you talking? Seems to me like the main issue will > > be in managing the searchers in light of having a lot of potential > > indexes. Just thinking out loud, though. > > > > -Grant > > > > On Nov 4, 2007, at 1:48 PM, Sandeep Mahendru wrote: > > > > > Hi , > > > > > > We have been developing an enterprise logging service at the > > > Wachovia > > > bank. The logs (Busines, application, error) for all the bank related > > > applications are consolidated > > > at one single location in an Oracle 10g Database. > > > > > > In our second phase, we are now building a high perforinmg report > > > viewer > > > over it. So our search algorithm does not go to the Oracle 10g DB. We > > > therfore avoid network and I/O. > > > Our serach algorith now goes to a LUCENE index. We have Lucene indexes > > > created for each application. These indexes are present on the same > > > machine, > > > where the search algorithm runs. As more applications at the bank > > > are now > > > beginning to consume this service, the Lucene Index is now growing. > > > > > > One of my team leads has suggested the following approach to resolve > > > this > > > issue: > > > > > > *I think the best approach is to restrict the Index size , is to > > > keep it for > > > some limited time and then archive the same. In case user wants to > > > search > > > against the old files then we might need to provide some > > > configuration using > > > which the lucene searcher can point to the achieved file and search > > > the > > > content. To implement this we need to rename the Index file with > > > from and to > > > date before its archived. While searching against the older files, > > > user need > > > to provide the date range and then the app can point to the relevant > > > archived index files for search. Let me know your thoughts on this. * > > > ** > > > At present this sounds the most logical to me. But then we begin to > > > store > > > the Lucene indexes on a diffferent machine. This might again cause the > > > search algorithm to make a network trip, if the serach is based on old > > > archived data. > > > > > > Is there a better design to resolve the above concern. Does Lucene > > > provid > > > some sort of API to handle the above scenario's? > > > > > > Regards, > > > Sandeep. > > > > -------------------------- > > Grant Ingersoll > > http://lucene.grantingersoll.com > > > > Lucene Boot Camp Training: > > ApacheCon Atlanta, Nov. 12, 2007. Sign up now! > http://www.apachecon.com > > > > Lucene Helpful Hints: > > http://wiki.apache.org/lucene-java/BasicsOfPerformance > > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > -- > Marcus Herou Solution Architect & Core Java developer Tailsweep AB > +46702561312 > [EMAIL PROTECTED] > http://www.tailsweep.com >
