Re: Index Locking Issues Resolved...I hope
I was thinking that perhaps I can pre-stem words before sticking them in a search field in the database perhaps using Lucene stemming code, then try to use the Natural Language Search found in MySql 4.1.1. I am confident the MySql product can't keep up with Lucene yet, but at least they hvae improved it some. Not even sure if my hosting company will upgrade to 4.1.1 though. Still looking for a lot of solutions to make Lucene sit in synch more nicely with MySql as the main database...aka an easy to use way of handling - Original Message - From: Chris Lamprecht [EMAIL PROTECTED] Date: Wednesday, November 17, 2004 1:38 am Subject: Re: Index Locking Issues Resolved...I hope MySQL does offer a basic fulltext search (with MyISAM tables), but it doesn't really approach the functionality of Lucene, such as pluggable tokenizers, stemming, etc. I think MS SQL server has fulltext search as well, but I have no idea if it's any good. See http://www.google.com/search?hl=enlr=safe=offc2coff=1q=mysql+fulltext I have not seen clear yet because it is all new. I wish a database Text field could have this sort of mechanism built into it. MySql does not do this (what I am using), but I am going to check into other databases now. OJB will work with most all of them so that would help if there is a database type of solution that will allow that sleep at night thing to happen!!! --- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Index Locking Issues Resolved...I hope
Hello; I think I have solved my locking issues. I just made it through the set of test cases that previously resulted in Index Locking Errors. I just removed the method from my code that checks for a Index lock and forcefully removes it after 1 minute. Hopefully they never need to be put back in. Here is what I changed: I moved all my Indexer logic into a class called Index.java that implemented Runnable. Index's start() called a method named go() which was static and synchronized. go() kicks off all the logic to update the index (the reader, writer and other members involved with incremental updates also static). I put logging in place that logs when a thread has executed the method and what the thread's name is. Every time a client class changes the content it can create a thread reference and pass it the runnable Index. The convention I have requested for naming the thread is a toString() of the current date. Then they start the thread. How it worked: A few users just tested the system, half added documents to the system while another half deleted documents at the same time. No locking issues were seen and the index was current with the changes made a short time after the last operation (in my previous code this test resulted in a issue with index locking). I was able to go through the log file and find the start of the synchronized go() method and the successful completion of the indexing operations for every request made. The only performance issue I noticed was if someone added a very large PDF it took a while before the thread handling the request could finish. If this is the first operation of many it means the operations following this large file take that much longer. Luckily for me search results don't need to be instant. Things are looking much better. For now... Thanks to all that helped me up till now. Luke - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 16, 2004 4:01 PM Subject: Re: _4c.fnm missing 'Concurrent' and 'updates' in the same sentence sounds like a possible source of the problem. You have to use a single IndexWriter and it should not overlap with an IndexReader that is doing deletes. Otis --- Luke Shannon [EMAIL PROTECTED] wrote: It conistantly breaks when I run more than 10 concurrent incremental updates. I can post the code on Bugzilla (hopefully when I get to the site it will be obvious how I can post things). Luke - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 16, 2004 3:20 PM Subject: Re: _4c.fnm missing Field names are stored in the field info file, with suffix .fnm. - see http://jakarta.apache.org/lucene/docs/fileformats.html The .fnm should be inside the .cfs file (cfs files are compound files that contain all index files described at the above URL). Maybe you can provide the code that causes this error in Bugzilla for somebody to look at. Does it consistently break? Otis --- Luke Shannon [EMAIL PROTECTED] wrote: I received the error below when I was attempting to over whelm my system with incremental update requests. What is this file it is looking for? I checked the index. It contains: _4c.del _4d.cfs deletable segments Where does _4c.fnm come from? Here is the error: Unable to create the create the writer and/or index new content /usr/tomcat/fb_hub/WEB-INF/index/_4c.fnm (No such file or directory). Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Index Locking Issues Resolved...I hope
Very cool Luke. I am not quite there yet. I am half way through implementing the queue approach, but I have hit walls that are making me sit back and figure out my strategy. I have a struts/tomcat/ojb/mysql project that can potentially have a million records and growing over time and updates will occur perhaps 100,000/day. This is not today, but what I am building for. My concerns not just Lucene itself, but its surrounding effects as follows. I am finding out that edge case scenerios are making things difficult due to having two databases instead of one. - How to know the index on this huge database is always in synch. - What happens if the server crashes or is brought down. solution might be db last modified date - Backups of the database and the index handled in an efficient, safe manner on a live system. - How to reindex while the system is in place solution might be doing new index to a different location as a seperate tool. - How to handle the fact that the IndexWriter is not very good in incremental data cases in a high volume update/query system. soluction might be to query for records from the database that have changed every 45 seconds or so and applying the changes. - How the IndexWriter solution above might cause bad lag on queries frequently. no solution - how to get Tomcat to start up a thread to run this updater at startup and not have a problem with memory management. - How to make this all work in my startup business to allow me to feel I can sleep at night. In general, things just got much more complicated then I was hoping for though I don't know how I can do without using Lucene or something like Lucene. This has been done so many times before that I would have suspected it would be easy, but I have not seen clear yet because it is all new. I wish a database Text field could have this sort of mechanism built into it. MySql does not do this (what I am using), but I am going to check into other databases now. OJB will work with most all of them so that would help if there is a database type of solution that will allow that sleep at night thing to happen!!! If you have input to these things, I had found some answers in the mailing list, but not really a concept of how to manage the whole thing. Is there an incremental big open source project out there that uses Lucene and a database? I don't think so. If you have any code or ideas I would appreciate both!!! Also having a FAQ that handles lots of these common problems, though a bit off topic they are, might really help people choose to use Lucene. Thanks, JohnE - Original Message - From: Luke Shannon [EMAIL PROTECTED] Date: Tuesday, November 16, 2004 10:51 pm Subject: Index Locking Issues Resolved...I hope Hello; I think I have solved my locking issues. I just made it through the set of test cases that previously resulted in Index Locking Errors. I just removed the method from my code that checks for a Index lock and forcefully removes it after 1 minute. Hopefully they never need to be put back in. Here is what I changed: I moved all my Indexer logic into a class called Index.java that implementedRunnable. Index's start() called a method named go() which was static and synchronized. go() kicks off all the logic to update the index (the reader, writer and other members involved with incremental updates also static). I put logging in place that logs when a thread has executed the method and what the thread's name is. Every time a client class changes the content it can create a thread reference and pass it the runnable Index. The convention I have requestedfor naming the thread is a toString() of the current date. Then they start the thread. How it worked: A few users just tested the system, half added documents to the system while another half deleted documents at the same time. No locking issues were seen and the index was current with the changes made a short time after the last operation (in my previous code this test resulted in a issue with indexlocking). I was able to go through the log file and find the start of the synchronizedgo() method and the successful completion of the indexing operations for every request made. The only performance issue I noticed was if someone added a very large PDF it took a while before the thread handling the request could finish. If this is the first operation of many it means the operations following this large file take that much longer. Luckily for me search results don't need to be instant. Things are looking much better. For now... Thanks to all that helped me up till now. Luke - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 16, 2004 4:01 PM Subject: Re: _4c.fnm missing 'Concurrent' and 'updates' in the same
Re: Index Locking Issues Resolved...I hope
MySQL does offer a basic fulltext search (with MyISAM tables), but it doesn't really approach the functionality of Lucene, such as pluggable tokenizers, stemming, etc. I think MS SQL server has fulltext search as well, but I have no idea if it's any good. See http://www.google.com/search?hl=enlr=safe=offc2coff=1q=mysql+fulltext I have not seen clear yet because it is all new. I wish a database Text field could have this sort of mechanism built into it. MySql does not do this (what I am using), but I am going to check into other databases now. OJB will work with most all of them so that would help if there is a database type of solution that will allow that sleep at night thing to happen!!! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]