Because I've setup Lucene as a webapp with a centralized Init file and setup properties file, I do my sanity check in the Init, because if the serer crashes mid-indexing, I have to delete the lock files optimize and re-index the files that were indexing when the crash occurred, there was long discussion about this back in August, search for "Crash / Recovery Scenario" in the lucene-dev archived discussions. Should answer all your questions
Nader Henein -----Original Message----- From: Gareth Griffiths [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 24, 2003 1:11 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: Re: commercial websites powered by Lucene? Nader, You say you have to cope with server crash mid-indexing. I think I'm seeing lots of garbage files created by server crash mid merge/optimise while lucene is creating a new index. Did you write code specifically to handle this or is there something more automated. (I was thinking of writing a sanity check for before start-up that looked in 'segments' and 'deletable and got rid of any files in the catalog directory that are not referenced.) Did you do something similar or have I missed something... TIA Gareth ----- Original Message ----- From: "Nader S. Henein" <[EMAIL PROTECTED]> To: "'Lucene Users List'" <[EMAIL PROTECTED]> Sent: Tuesday, June 24, 2003 9:30 AM Subject: RE: commercial websites powered by Lucene? > I handle updates or inserts the same way first I delete the document > from the index and then I insert it (better safe than sorry), I batch > my updates/inserts every twenty minutes, I would do it in smaller > intervals but since I have to sync the XML files created from the DB > to three machines (I maintain three separate Lucene indices on my > three separate > web-servers) it takes a little longer. You have to batch your changes > because Updating the index takes time as opposed to deleted which I > batch every two minutes. You won't have a problem updating the index and > searching at the same time because lucene updates the index on a > separate set of files and then when It's done it overwrites the old > version. I've had to provide for Backups, and things like server crashes > mid-indexing, but I was using Oracle Intermedia before and Lucene BLOWS > IT AWAY. > > -----Original Message----- > From: news [mailto:[EMAIL PROTECTED] On Behalf Of Chris Miller > Sent: Tuesday, June 24, 2003 12:06 PM > To: [EMAIL PROTECTED] > Subject: Re: commercial websites powered by Lucene? > > > Hi Nader, > > I was wondering if you'd mind me asking you a couple of questions > about your implementation? > > The main thing I'm interested in is how you handle updates to Lucene's > index. I'd imagine you have a fairly high turnover of CVs and jobs, so > index updates must place a reasonable load on the CPU/disk. Do you > keep CVs and jobs in the same index or two different ones? And what is > the process you use to update the index(es) - do you batch-process > updates or do you handle them in real-time as changes are made? > > Any insight you can offer would be much appreciated as I'm about to > implement something similar and am a little unsure of the best > approach to take. We need to be able to handle indexing about 60,000 > documents/day, while allowing (many) searches to continue operating > alongside. > > Thanks! > Chris > > "Nader S. Henein" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] > > We use Lucene http://www.bayt.com , we're basically an on-line > > Recruitment site and up until now we've got around 500 000 CVs and > > documents indexed with results that stump Oracle Intermedia. > > > > Nader Henein > > Senior Web Dev > > > > Bayt.com > > > > -----Original Message----- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, June 04, 2003 6:09 PM > > To: [EMAIL PROTECTED] > > Subject: commercial websites powered by Lucene? > > > > > > > > Hello All, > > > > I've been trying to find examples of large commercial websites that > > use Lucene to power their search. Having such examples would make > > Lucene an easy sell to management > > > > Does anyone know of any good examples? The bigger the better, and > > the > > > more the better. > > > > TIA, > > -John > > > > > > > > -------------------------------------------------------------------- > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]