Re: Index Locking Issues Resolved...I hope

2004-11-17 Thread jeichels

I was thinking that perhaps I can pre-stem words before sticking them in a 
search field in the database perhaps using Lucene stemming code, then try to 
use the Natural Language Search found in MySql 4.1.1.   I am confident the 
MySql product can't keep up with Lucene yet, but at least they hvae improved it 
some.  Not even sure if my hosting company will upgrade to 4.1.1 though.  Still 
looking for a lot of solutions to make Lucene sit in synch more nicely with 
MySql as the main database...aka an easy to use way of handling 



- Original Message -
From: Chris Lamprecht <[EMAIL PROTECTED]>
Date: Wednesday, November 17, 2004 1:38 am
Subject: Re: Index Locking Issues Resolved...I hope

> MySQL does offer a basic fulltext search (with MyISAM tables), but it
> doesn't really approach the functionality of Lucene, such as pluggable
> tokenizers, stemming, etc.  I think MS SQL server has fulltext search
> as well, but I have no idea if it's any good.
> 
> See 
> http://www.google.com/search?hl=en&lr=&safe=off&c2coff=1&q=mysql+fulltext
> > I have not seen clear yet because it is all new.   I wish a 
> database Text field could have this sort of mechanism built into 
> it.   MySql does not do this (what I am using), but I am going to 
> check into other databases now.  OJB will work with most all of 
> them so that would help if there is a database type of solution 
> that will allow that sleep at night thing to happen!!!
> >
> 
> ---
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Index Locking Issues Resolved...I hope

2004-11-16 Thread Chris Lamprecht
MySQL does offer a basic fulltext search (with MyISAM tables), but it
doesn't really approach the functionality of Lucene, such as pluggable
tokenizers, stemming, etc.  I think MS SQL server has fulltext search
as well, but I have no idea if it's any good.

See http://www.google.com/search?hl=en&lr=&safe=off&c2coff=1&q=mysql+fulltext

> I have not seen clear yet because it is all new.   I wish a database Text 
> field could have this sort of mechanism built into it.   MySql does not do 
> this (what I am using), but I am going to check into other databases now.  
> OJB will work with most all of them so that would help if there is a database 
> type of solution that will allow that sleep at night thing to happen!!!
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Index Locking Issues Resolved...I hope

2004-11-16 Thread jeichels
Very cool Luke.  I am not quite there yet.  I am half way through implementing 
the queue approach, but I have hit walls that are making me sit back and figure 
out my strategy.   I have a struts/tomcat/ojb/mysql project that can 
potentially have a million records and growing over time and updates will occur 
perhaps 100,000/day.  This is not today, but what I am building for.

My concerns not just Lucene itself, but its surrounding effects as follows.  I 
am finding out that edge case scenerios are making things difficult due to 
having two databases instead of one.

- How to know the index on this huge database is always in synch.
- What happens if the server crashes or is brought down.  
- Backups of the database and the index handled in an efficient, safe 
manner on a live system.
-  How to reindex while the system is in place  as a seperate tool.
-  How to handle the fact that the IndexWriter is not very good in 
incremental data cases in a high volume update/query system. .
-  How the IndexWriter solution above might cause bad lag on queries 
frequently. 
-  how to get Tomcat to start up a thread to run this updater at startup 
and not have a problem with memory management.
-  How to make this all work in my startup business to allow me to feel I 
can sleep at night.


In general, things just got much more complicated then I was hoping for though 
I don't know how I can do without using Lucene or something like Lucene.  This 
has been done so many times before that I would have suspected it would be 
easy, but I have not seen clear yet because it is all new.   I wish a database 
Text field could have this sort of mechanism built into it.   MySql does not do 
this (what I am using), but I am going to check into other databases now.  OJB 
will work with most all of them so that would help if there is a database type 
of solution that will allow that sleep at night thing to happen!!!

If you have input to these things, I had found some answers in the mailing 
list, but not really a concept of how to manage the whole thing.  Is there an 
incremental big open source project out there that uses Lucene and a database?  
I don't think so.

If you have any code or ideas I would appreciate both!!!  Also having a FAQ 
that handles lots of these common problems, though a bit off topic they are, 
might really help people choose to use Lucene.

Thanks,

JohnE




- Original Message -
From: Luke Shannon <[EMAIL PROTECTED]>
Date: Tuesday, November 16, 2004 10:51 pm
Subject: Index Locking Issues Resolved...I hope

> Hello;
> 
> I think I have solved my locking issues. I just made it through 
> the set of
> test cases that previously resulted in Index Locking Errors. I 
> just removed
> the method from my code that checks for a Index lock and 
> forcefully removes
> it after 1 minute. Hopefully they never need to be put back in.
> 
> Here is what I changed:
> 
> I moved all my Indexer logic into a class called Index.java that 
> implementedRunnable. Index's start() called a method named go() 
> which was static and
> synchronized. go() kicks off all the logic to update the index 
> (the reader,
> writer and other members involved with incremental updates also 
> static). I
> put logging in place that logs when a thread has executed the 
> method and
> what the thread's name is.
> 
> Every time a client class changes the content it can create a thread
> reference and pass it the runnable Index. The convention I have 
> requestedfor naming the thread is a toString() of the current 
> date. Then they start
> the thread.
> 
> How it worked:
> 
> A few users just tested the system, half added documents to the 
> system while
> another half deleted documents at the same time. No locking issues 
> were seen
> and the index was current with the changes made a short time after 
> the last
> operation (in my previous code this test resulted in a issue with 
> indexlocking).
> 
> I was able to go through the log file and find the start of the 
> synchronizedgo() method and the successful completion of the 
> indexing operations for
> every request made.
> 
> The only performance issue I noticed was if someone added a very 
> large PDF
> it took a while before the thread handling the request could 
> finish. If this
> is the first operation of many it means the operations following 
> this large
> file take that much longer. Luckily for me search results don't 
> need to be
> instant.
> 
> Things are looking much better. For now...
> 
> Thanks to all that helped me up till now.
> 
> Luke
> 
> - Original Message - 
> From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Tuesday, November 16, 2004 4:01 PM
> Subject: Re: _4c.fnm missing
> 
> 
> > 'Concurrent' and 'updates' in the same sentence sounds like a 
> possible> source of the problem.  You have to use a single 
> IndexWriter and it
> > should not overlap with an IndexReader that is