Very cool Luke.  I am not quite there yet.  I am half way through implementing 
the queue approach, but I have hit walls that are making me sit back and figure 
out my strategy.   I have a struts/tomcat/ojb/mysql project that can 
potentially have a million records and growing over time and updates will occur 
perhaps 100,000/day.  This is not today, but what I am building for.

My concerns not just Lucene itself, but its surrounding effects as follows.  I 
am finding out that edge case scenerios are making things difficult due to 
having two databases instead of one.

----- How to know the index on this huge database is always in synch.
----- What happens if the server crashes or is brought down.  <solution might 
be db last modified date>
----- Backups of the database and the index handled in an efficient, safe 
manner on a live system.
-----  How to reindex while the system is in place <solution might be doing new 
index to a different location> as a seperate tool.
-----  How to handle the fact that the IndexWriter is not very good in 
incremental data cases in a high volume update/query system. <soluction might 
be to query for records from the database that have changed every 45 seconds or 
so and applying the changes>.
-----  How the IndexWriter solution above might cause bad lag on queries 
frequently. <no solution>
-----  how to get Tomcat to start up a thread to run this updater at startup 
and not have a problem with memory management.
-----  How to make this all work in my startup business to allow me to feel I 
can sleep at night.


In general, things just got much more complicated then I was hoping for though 
I don't know how I can do without using Lucene or something like Lucene.  This 
has been done so many times before that I would have suspected it would be 
easy, but I have not seen clear yet because it is all new.   I wish a database 
Text field could have this sort of mechanism built into it.   MySql does not do 
this (what I am using), but I am going to check into other databases now.  OJB 
will work with most all of them so that would help if there is a database type 
of solution that will allow that sleep at night thing to happen!!!

If you have input to these things, I had found some answers in the mailing 
list, but not really a concept of how to manage the whole thing.  Is there an 
incremental big open source project out there that uses Lucene and a database?  
I don't think so.

If you have any code or ideas I would appreciate both!!!  Also having a FAQ 
that handles lots of these common problems, though a bit off topic they are, 
might really help people choose to use Lucene.

Thanks,

JohnE




----- Original Message -----
From: Luke Shannon <[EMAIL PROTECTED]>
Date: Tuesday, November 16, 2004 10:51 pm
Subject: Index Locking Issues Resolved...I hope

> Hello;
> 
> I think I have solved my locking issues. I just made it through 
> the set of
> test cases that previously resulted in Index Locking Errors. I 
> just removed
> the method from my code that checks for a Index lock and 
> forcefully removes
> it after 1 minute. Hopefully they never need to be put back in.
> 
> Here is what I changed:
> 
> I moved all my Indexer logic into a class called Index.java that 
> implementedRunnable. Index's start() called a method named go() 
> which was static and
> synchronized. go() kicks off all the logic to update the index 
> (the reader,
> writer and other members involved with incremental updates also 
> static). I
> put logging in place that logs when a thread has executed the 
> method and
> what the thread's name is.
> 
> Every time a client class changes the content it can create a thread
> reference and pass it the runnable Index. The convention I have 
> requestedfor naming the thread is a toString() of the current 
> date. Then they start
> the thread.
> 
> How it worked:
> 
> A few users just tested the system, half added documents to the 
> system while
> another half deleted documents at the same time. No locking issues 
> were seen
> and the index was current with the changes made a short time after 
> the last
> operation (in my previous code this test resulted in a issue with 
> indexlocking).
> 
> I was able to go through the log file and find the start of the 
> synchronizedgo() method and the successful completion of the 
> indexing operations for
> every request made.
> 
> The only performance issue I noticed was if someone added a very 
> large PDF
> it took a while before the thread handling the request could 
> finish. If this
> is the first operation of many it means the operations following 
> this large
> file take that much longer. Luckily for me search results don't 
> need to be
> instant.
> 
> Things are looking much better. For now...
> 
> Thanks to all that helped me up till now.
> 
> Luke
> 
> ----- Original Message ----- 
> From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Tuesday, November 16, 2004 4:01 PM
> Subject: Re: _4c.fnm missing
> 
> 
> > 'Concurrent' and 'updates' in the same sentence sounds like a 
> possible> source of the problem.  You have to use a single 
> IndexWriter and it
> > should not overlap with an IndexReader that is doing deletes.
> >
> > Otis
> >
> > --- Luke Shannon <[EMAIL PROTECTED]> wrote:
> >
> > > It conistantly breaks when I run more than 10 concurrent 
> incremental> > updates.
> > >
> > > I can post the code on Bugzilla (hopefully when I get to the 
> site it
> > > will be
> > > obvious how I can post things).
> > >
> > > Luke
> > >
> > > ----- Original Message ----- 
> > > From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> > > To: "Lucene Users List" <[EMAIL PROTECTED]>
> > > Sent: Tuesday, November 16, 2004 3:20 PM
> > > Subject: Re: _4c.fnm missing
> > >
> > >
> > > > Field names are stored in the field info file, with suffix 
> .fnm. -
> > > see
> > > > http://jakarta.apache.org/lucene/docs/fileformats.html
> > > >
> > > > The .fnm should be inside the .cfs file (cfs files are compound
> > > files
> > > > that contain all index files described at the above URL).  Maybe
> > > you
> > > > can provide the code that causes this error in Bugzilla for
> > > somebody to
> > > > look at.  Does it consistently break?
> > > >
> > > > Otis
> > > >
> > > >
> > > > --- Luke Shannon <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > I received the error below when I was attempting to over 
> whelm my
> > > > > system with incremental update requests.
> > > > >
> > > > > What is this file it is looking for? I checked the index. It
> > > > > contains:
> > > > >
> > > > > _4c.del
> > > > > _4d.cfs
> > > > > deletable
> > > > > segments
> > > > >
> > > > > Where does _4c.fnm come from?
> > > > >
> > > > > Here is the error:
> > > > >
> > > > > Unable to create the create the writer and/or index new 
> content> > > > /usr/tomcat/fb_hub/WEB-INF/index/_4c.fnm (No such 
> file or
> > > directory).
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Luke
> > > >
> > > >
> > > >
> > > ---------------------------------------------------------------
> ------
> > > > To unsubscribe, e-mail: lucene-user-
> [EMAIL PROTECTED]> > > For additional commands, e-mail:
> > > [EMAIL PROTECTED]
> > > >
> > > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------
> ------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: lucene-user-
> [EMAIL PROTECTED]> >
> > >
> >
> >
> > -----------------------------------------------------------------
> ----
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> 
> 
> -------------------------------------------------------------------
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to