Very cool Luke. I am not quite there yet. I am half way through implementing
the queue approach, but I have hit walls that are making me sit back and figure
out my strategy. I have a struts/tomcat/ojb/mysql project that can
potentially have a million records and growing over time and updates will occur
perhaps 100,000/day. This is not today, but what I am building for.
My concerns not just Lucene itself, but its surrounding effects as follows. I
am finding out that edge case scenerios are making things difficult due to
having two databases instead of one.
- How to know the index on this huge database is always in synch.
- What happens if the server crashes or is brought down.
- Backups of the database and the index handled in an efficient, safe
manner on a live system.
- How to reindex while the system is in place as a seperate tool.
- How to handle the fact that the IndexWriter is not very good in
incremental data cases in a high volume update/query system. .
- How the IndexWriter solution above might cause bad lag on queries
frequently.
- how to get Tomcat to start up a thread to run this updater at startup
and not have a problem with memory management.
- How to make this all work in my startup business to allow me to feel I
can sleep at night.
In general, things just got much more complicated then I was hoping for though
I don't know how I can do without using Lucene or something like Lucene. This
has been done so many times before that I would have suspected it would be
easy, but I have not seen clear yet because it is all new. I wish a database
Text field could have this sort of mechanism built into it. MySql does not do
this (what I am using), but I am going to check into other databases now. OJB
will work with most all of them so that would help if there is a database type
of solution that will allow that sleep at night thing to happen!!!
If you have input to these things, I had found some answers in the mailing
list, but not really a concept of how to manage the whole thing. Is there an
incremental big open source project out there that uses Lucene and a database?
I don't think so.
If you have any code or ideas I would appreciate both!!! Also having a FAQ
that handles lots of these common problems, though a bit off topic they are,
might really help people choose to use Lucene.
Thanks,
JohnE
- Original Message -
From: Luke Shannon <[EMAIL PROTECTED]>
Date: Tuesday, November 16, 2004 10:51 pm
Subject: Index Locking Issues Resolved...I hope
> Hello;
>
> I think I have solved my locking issues. I just made it through
> the set of
> test cases that previously resulted in Index Locking Errors. I
> just removed
> the method from my code that checks for a Index lock and
> forcefully removes
> it after 1 minute. Hopefully they never need to be put back in.
>
> Here is what I changed:
>
> I moved all my Indexer logic into a class called Index.java that
> implementedRunnable. Index's start() called a method named go()
> which was static and
> synchronized. go() kicks off all the logic to update the index
> (the reader,
> writer and other members involved with incremental updates also
> static). I
> put logging in place that logs when a thread has executed the
> method and
> what the thread's name is.
>
> Every time a client class changes the content it can create a thread
> reference and pass it the runnable Index. The convention I have
> requestedfor naming the thread is a toString() of the current
> date. Then they start
> the thread.
>
> How it worked:
>
> A few users just tested the system, half added documents to the
> system while
> another half deleted documents at the same time. No locking issues
> were seen
> and the index was current with the changes made a short time after
> the last
> operation (in my previous code this test resulted in a issue with
> indexlocking).
>
> I was able to go through the log file and find the start of the
> synchronizedgo() method and the successful completion of the
> indexing operations for
> every request made.
>
> The only performance issue I noticed was if someone added a very
> large PDF
> it took a while before the thread handling the request could
> finish. If this
> is the first operation of many it means the operations following
> this large
> file take that much longer. Luckily for me search results don't
> need to be
> instant.
>
> Things are looking much better. For now...
>
> Thanks to all that helped me up till now.
>
> Luke
>
> - Original Message -
> From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Tuesday, November 16, 2004 4:01 PM
> Subject: Re: _4c.fnm missing
>
>
> > 'Concurrent' and 'updates' in the same sentence sounds like a
> possible> source of the problem. You have to use a single
> IndexWriter and it
> > should not overlap with an IndexReader that is