Re: Re-Indexing a moving target???
details? Yousef Ourabi wrote: Saad, Here is what I got. I will post again, and be more specific. -Y --- Nader Henein <[EMAIL PROTECTED]> wrote: We'll need a little more detail to help you, what are the sizes of your updates and how often are they updated. 1) No just re-open the index writer every time to re-index, according to you it's moderately changing index, just keep a flag on the rows and batch indexing every so often. 2) It all comes down to your needs, more detail would help us help you. Nader Henein Yousef Ourabi wrote: Hey, We are using lucene to index a moderatly changing database, and I have a couple of questions on a performance strategy. 1) Should we just have one index writer open unil the system comes down...or create a new index writer each time we re-index our data-set. 2) Does anyone have anythoughts...multi-threading and segments instead of one index? Thanks for your time and help. Best, Yousef - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Nader S. Henein Senior Applications Developer Bayt.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Re-Indexing a moving target???
We'll need a little more detail to help you, what are the sizes of your updates and how often are they updated. 1) No just re-open the index writer every time to re-index, according to you it's moderately changing index, just keep a flag on the rows and batch indexing every so often. 2) It all comes down to your needs, more detail would help us help you. Nader Henein Yousef Ourabi wrote: Hey, We are using lucene to index a moderatly changing database, and I have a couple of questions on a performance strategy. 1) Should we just have one index writer open unil the system comes down...or create a new index writer each time we re-index our data-set. 2) Does anyone have anythoughts...multi-threading and segments instead of one index? Thanks for your time and help. Best, Yousef - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Nader S. Henein Senior Applications Developer Bayt.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: QUERYPARSIN & BOOSTING
From the text on the Lucene Jakarta Site : http://jakarta.apache.org/lucene/docs/queryparsersyntax.html Lucene provides the relevance level of matching documents based on the terms found. To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be. Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for jakarta apache and you want the term "jakarta" to be more relevant boost it using the ^ symbol along with the boost factor next to the term. You would type: jakarta^4 apache This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example: "jakarta apache"^4 "jakarta lucene" By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2) Regards. Nader Henein Karthik N S wrote: Hi Guys Apologies... This Question may be asked million times on this form ,need some clarifications. 1) FieldType = keyword name = vendor 2)FieldType = text name = contents Question: 1) How to Construct a Query which would allow hits avaliable for the VENDOR to appear first ?. 2) If boosting is to be applied How TO ?. 3) Is the Query Constructed Below correct?. +Contents:shoes +((vendor:nike)^10) Please Advise. Thx in advance. WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Advice on indexing content from a database
Hibernate + Lucene, Use Hibernate to read from your DB, this will pull out the data you need in nice and clean objects, and then loop through your object collection and create Lucene documents, you can add Quartz to the equation and have this process run scheduled on chunks of your data till it's all been indexed and then continue on with incremental updates / deletes. Nader Henein [EMAIL PROTECTED] wrote: Hi I'm working on integrating lucene with a cms. All the data is stored in a database. I'm looking at about 2 million records. Any advice on an effective technique to index this (incrementally or using threads) that would not overload my server. Thanks Aneesha - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: time of indexer
Download Luke, it makes life easy when you inspect the index, so you an actually look at what you've indexed, as opposed to what you may think you indexed. Nader Daniel Cortes wrote: Hi to everybody, and merry christmas for all(and specially people who that me today are "working" instead of stay with the family). I don't understand because my search in the index give this bad results: I index 112 php files how a txt. with this machine Pentium 4 2,4GHz 512 RAM running during the index Windows XP and Eclipse Tiempo de bÃsqueda total: 80882 ms the fields that I use are doc.add(Field.Keyword("filename", file.getCanonicalPath())); doc.add(Field.UnStored("body", bodyText)); doc.add(Field.Text("titulo", title)); What I'm doing bad? thks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: index question
ok, so you can index the whole document in one shot, but you should store certain fields like what you display in the search results in the index to avoid a round trip to the DB. so for example you would store "title" "synopsis" "link" "doc_id" "date" and then just index what you want to be searchable, the reason why you would have title stored in one field and indexed again in another so if you stem that field it will become useless for display purposes. So the logical representation of your index would look something like this: stored/ indexed stored/ un-indexed stored/ un-indexed stored / indexed indexed / un stored Enjoy Nader Henein Daniel Cortes wrote: thks nader I need a general search of documents, it's for this that I ask yours recomendations, because fields are only for info in the search. Tipically search on Google for example search:casa La casa roja ..habÃa una vez una casa roja que tenia htttp:\\go.to\casaModification date:25-12-04 for do this what fields and options (keybord,text,unindex,unstored) do you should use? thks Nader Henein wrote: It comes down to your searching needs, do you need to have your documents searcheable by these fields or do you need a general search of the whole document, your decisions will impact the size of the index and the speed of indexing and searching so give it due thought, start from your GUI requirement and design the index that responds to your user needs best. Nader Daniel Cortes wrote: I want to know In the case that you use Lucene for index files how a general searcher, what fields (or keys) do you use to index. For example, in my case are html,pdf,doc,ppt and txt and I'm thinked to use Field Autor, Field title, field url, field content, field modification date. Something more? some recommendation? thks and Merry Xmas for all. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: index question
It comes down to your searching needs, do you need to have your documents searcheable by these fields or do you need a general search of the whole document, your decisions will impact the size of the index and the speed of indexing and searching so give it due thought, start from your GUI requirement and design the index that responds to your user needs best. Nader Daniel Cortes wrote: I want to know In the case that you use Lucene for index files how a general searcher, what fields (or keys) do you use to index. For example, in my case are html,pdf,doc,ppt and txt and I'm thinked to use Field Autor, Field title, field url, field content, field modification date. Something more? some recommendation? thks and Merry Xmas for all. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: MergerIndex + Searchables
As obvious as it may seem, you could always store the index ID in which you are indexing the document in the document itself and have that fetched with the search results, or is there something stopping you from doing that. Nader Henein Karthik N S wrote: Hi Guys Apologies... I have several MERGERINDEXES [ MGR1,MGR2,MGR3]. for searching across these MERGERINDEXES I use the following Code IndexSearcher[] indexToSearch = new IndexSearcher[CNTINDXDBOOK]; for(int all=0;all MultiSearcher searcher = new MultiSearcher(indexToSearch); Question : When on Search Process , How to Display that this relevan Document Id Originated from Which MRG??? [ Some thing like this : - Search word 'ISBN12345' is avalible from "MRGx" ] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: LUCENE1.4.1 - LUCENE1.4.2 - LUCENE1.4.3 Exception
This is a OS file system error not a Lucene issue (not for this board) , Google it for Gentoo specifically you a get a whole bunch of results one of which is this thread on the Gentoo Forums, http://forums.gentoo.org/viewtopic.php?t=9620 Good Luck Nader Henein Karthik N S wrote: Hi Guys Some body tell me what this Exception am Getting Pleae Sys Specifications O/s Linux Gentoo Appserver Apache Tomcat/4.1.24 Jdk build 1.4.2_03-b02 Lucene 1.4.1 ,2, 3 Note: - This Exception is displayed on Every 2nd Query after Tomcat is started java.io.IOException: Stale NFS file handle at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:307) at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:420) at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61) at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou ndFileReader.java:220) at org.apache.lucene.store.InputStream.refill(InputStream.java:158) at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83) at org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java:142) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:115) at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:143) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:137) at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:253) at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:69) at org.apache.lucene.search.Similarity.idf(Similarity.java:255) at org.apache.lucene.search.TermQuery$TermWeight.sumOfSquaredWeights(TermQuery. java:47) at org.apache.lucene.search.Query.weight(Query.java:86) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85) at org.apache.lucene.search.MultiSearcherThread.run(ParallelMultiSearcher.java: 251) WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Opinions: Using Lucene as a thin database
How big do you expect it to get and how often do you expect to update it, we've been using Lucene for about 1 M records (19 fields each) with incremental updates every 10 minutes, the performance during updates wasn't wonderful, so it took some seriously intense code to sort that out, as you mentioned, it comes down to why you need the Thin DB for, Lucene is a wonderful search engine, but if I were looking at a fast and dirty relational DB, MySQL wins hands down, put them both together and you've really got something. My 2 cents Nader Henein Kevin L. Cobb wrote: I use Lucene as a legitimate search engine which is cool. But, I am also using it as a simple database too. I build an index with a couple of keyword fields that allows me to retrieve values based on exact matches in those fields. This is all I need to do so it works just fine for my needs. I also love the speed. The index is small enough that it is wicked fast. Was wondering if anyone out there was doing the same of it there are any dissenting opinions on using Lucene for this purpose. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: HITCOLLECTOR+SCORE+DELIMMA
Dude, and I say this with love, it's open source, you've got the code, take the initiative, DIY, be creative and share your findings with the rest of us. Personally I would be interested to see how you do this, keep your changes documented and share. Nader Henein Karthik N S wrote: Hi Erik Apologies.. I got Confused with the last mail. Iterate over Hits. returns large hit values and Iteration on Hits for scores consumes time , so How Do I Limit my Search Between [ X.xf to Y.yf ] prior getting the Hits. Note:- The search is being done on Field Type 'Text' ,consists of 'Contents' from various Html documents Please Advise me Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, December 13, 2004 5:05 PM To: Lucene Users List Subject: Re: HITCOLLECTOR+SCORE+DELIMA On Dec 13, 2004, at 1:16 AM, Karthik N S wrote: So u say I have to Build a Filter to Collect all the Scores between the 2 Ranges [ 0.2f to 1.0f] My message is being misinterpreted. I said "filter" as a verb, not a noun. :) In other words, I was not intending to mean write a Filter - a Filter would not be able to filter on score. so the API for the same would be Hits hit = search(Query query, Filter filtertoGetScore) But while writing the Filter Score again depends on Hits > Score = hits.score(x); Again, you cannot write a Filter (capital 'F') to deal with score. Please re-read what I said below... Hits are in descending score order, so you may just want to use Hits and filter based on the score provided by hits.score(i). Iterate over Hits... when you encounter scores below your desired range, stop iterating. Why is this simple procedure not good enough for what you are trying to achieve? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: SEARCH CRITERIA
they probably create a list of similar results by doing some sort of data mining on the search criteria that people use in succession, so for example someone, or they have a list of searches that are too general (a search for the word kid is at best stupid) but you can't call your users stupid so you try to guess what they're searching for based on other searches conducted (kid rock, kid games, star wars kid, karate kid ) that contain the initial search string "kid". You can use fuzzy search in Lucene, but that won't do that really, the short answer is DIY depending on your needs. My two galiuns Nader Henein Karthik N S wrote: Hi Guys Apologies. On yahoo and Altavista ,if searched upon a word like 'kid' returns the search with similar as below. Also try: kid rock, kid games, star wars kid, karate kid More... How to obtain the similar search criteria using Lucene. Thx in advance Warm regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: disadvantages
You may singe your fingers if you touch the keyboard during indexing Nader Miguel Angel wrote: What are disadvantages the Lucene?? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optimized??
The down and dirty answer is it's like defragmenting your harddrive, you're basically compacting and sorting out index references. What you need to know is that it makes searching so much faster after you've updating the index. Nader Henein Miguel Angel wrote: What`s mean Optimized index in Lucene¿? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Need help with filtering
Well if the document ID is number (even if it isn't really) you could use a range query, or just rebuild your index using that specific filed as a sorted field but if it numeric be aware that if you use integer it limits how high your numbers can get. nader Edwin Tang wrote: Hello, I have been using DateFilter to limit my search results to a certain date range. I am now asked to replace this filter with one where my search results have document IDs greater than a given document ID. This document ID is assigned during indexing and is a Keyword field. I've browsed around the FAQs and archives and see that I can either use QueryFilter or BooleanQuery. I've tried both approaches to limit the document ID range, but am getting the BooleanQuery.TooManyClauses exception in both cases. I've also tried bumping max number of clauses via setMaxClauseCount(), but that number has gotten pretty big. Is there another approach to this? Or am I setting this up incorrectly? Snippet of one of my approaches follows: queryFilter = new QueryFilter(new RangeQuery(new Term("id", sLastSearchedId), null, false)); docs = searcher.search(parser.parse(sSearchPhrase), queryFilter, utility.iMaxResults, new Sort(sortFields)); Thanks in advance, Ed __ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: _4c.fnm missing
That's it, you need to batch your updates, it comes down to do you need to give your users search accuracy to the second, take your database and put an is_dirty row on the master table of the object you're indexing and run a scheduled task every x minutes and have your process read the objects that are set to dirty and then re set the flag once they've been indexed correctly. my two cents Nader Otis Gospodnetic wrote: 'Concurrent' and 'updates' in the same sentence sounds like a possible source of the problem. You have to use a single IndexWriter and it should not overlap with an IndexReader that is doing deletes. Otis --- Luke Shannon <[EMAIL PROTECTED]> wrote: It conistantly breaks when I run more than 10 concurrent incremental updates. I can post the code on Bugzilla (hopefully when I get to the site it will be obvious how I can post things). Luke - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, November 16, 2004 3:20 PM Subject: Re: _4c.fnm missing Field names are stored in the field info file, with suffix .fnm. - see http://jakarta.apache.org/lucene/docs/fileformats.html The .fnm should be inside the .cfs file (cfs files are compound files that contain all index files described at the above URL). Maybe you can provide the code that causes this error in Bugzilla for somebody to look at. Does it consistently break? Otis --- Luke Shannon <[EMAIL PROTECTED]> wrote: I received the error below when I was attempting to over whelm my system with incremental update requests. What is this file it is looking for? I checked the index. It contains: _4c.del _4d.cfs deletable segments Where does _4c.fnm come from? Here is the error: Unable to create the create the writer and/or index new content /usr/tomcat/fb_hub/WEB-INF/index/_4c.fnm (No such file or directory). Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: _4c.fnm missing
what kind of incremental updates are you doing, because we update our index every 15 minutes with 100 ~ 200 documents and we're writing to a 6 GB memory resident index, the IndexWriter runs one instance at a time, so what kind of increments are we talking about it takes a bit of doing to overwhelm Lucene. What's your update schedule, how big is the index, and after how many updates does the system crash? Nader Henein Luke Shannon wrote: It conistantly breaks when I run more than 10 concurrent incremental updates. I can post the code on Bugzilla (hopefully when I get to the site it will be obvious how I can post things). Luke - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, November 16, 2004 3:20 PM Subject: Re: _4c.fnm missing Field names are stored in the field info file, with suffix .fnm. - see http://jakarta.apache.org/lucene/docs/fileformats.html The .fnm should be inside the .cfs file (cfs files are compound files that contain all index files described at the above URL). Maybe you can provide the code that causes this error in Bugzilla for somebody to look at. Does it consistently break? Otis --- Luke Shannon <[EMAIL PROTECTED]> wrote: I received the error below when I was attempting to over whelm my system with incremental update requests. What is this file it is looking for? I checked the index. It contains: _4c.del _4d.cfs deletable segments Where does _4c.fnm come from? Here is the error: Unable to create the create the writer and/or index new content /usr/tomcat/fb_hub/WEB-INF/index/_4c.fnm (No such file or directory). Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Backup strategies
We've recently implemented something similar with the backup process creating a file (much like the lock files during indexing) that the IndexWriter recognizes (tweak) and doesn't attempt to start and indexing or a delete while it's there, wasn't that much work actually. Nader Doug Cutting wrote: Christoph Kiehl wrote: I'm curious about your strategy to backup indexes based on FSDirectory. If I do a file based copy I suspect I will get corrupted data because of concurrent write access. My current favorite is to create an empty index and use IndexWriter.addIndexes() to copy the current index state. But I'm not sure about the performance of this solution. How do you make your backups? A safe way to backup is to have your indexing process, when it knows the index is stable (e.g., just after calling IndexWriter.close()), make a checkpoint copy of the index by running a shell command like "cp -lpr index index.YYYMMDDHHmmSS". This is very fast and requires little disk space, since it creates only a new directory of hard links. Then you can separately back this up and subsequently remove it. This is also a useful way to replicate indexes. On the master indexing server periodically perform "cp -lpr" as above. Then search slaves can use rsync to pull down the latest version of the index. If a very small mergefactor is used (e.g., 2) then the index will have only a few segments, so that searches are fast. On the slave, periodically find the latest index.YYYMMDDHHmmSS, use "cp -lpr index/ index.YYYMMDDHHmmSS" and 'rsync --delete master:index.YYYMMDDHHmmSS index.YYYMMDDHHmmSS' to efficiently get a local copy, and finally "ln -fsn index.YYYMMDDHHmmSS index" to publish the new version of the index. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to efficiently get # of search results, per attribute
It depends on how many results they're looking through, here are two scenarios I see: 1] If you don't have that many records you can fetch all the results and then do a post parsing step the determine totals 2] If you have a lot of entries in each category and you're worried about fetching thousands of records every time, you can just have seperate indecies per category and search them in in parallel (not Lucene Parallel Search) and you can get up to 100 hits for each one (efficiency) but you'll also have the total from the search to display. Either way you can boost up speed using RamDirectory if you need more speed from the search, but whichever approach you choose I would recommend that you sit down and do some number crunching to figure out which way to go. Hope this helps Nader Henein Chris Lamprecht wrote: I'd like to implement a search across several types of "entities", let's say, classes, professors, and departments. I want the user to be able to enter a simple, single query and not have to specify what they're looking for. Then I want the search results to be something like this: Search results for: "philosophy boyer" Found: 121 classes - 5 professors - 2 departments I know I could iterate through every hit returned and count them up myself, but that seems inefficient if there are lots of results. Is there some other way to get this kind of information from the search result set? My other ideas are: doing a separate search each result type, or storing different types in different indexes. Any suggestions? Thanks for your help! -Chris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: UPDATION+MERGERINDEX
Well if you do all the steps in one run, I guess optimizing once at the end would be faster overall, but all you have to do is test it out and time it, performance wise, I don't think that step 3 (OPTIMIZE) in scenario (a) will really improve the performance of the new index merge. my 2 cents Nader Henein Karthik N S wrote: Hi Guys Apologies. a) 1) SEARCH FOR SUBINDEX IN A OPTIMISED MERGED INDEX 2) DELETE THE FOUND SUBINDEX FROM THE OPTIMISED MERGERINDEX 3) OPTIMISE THE MERGERINDEX 4) ADD A NEW VERSION OF THE SUBINDEX TO THE MERGER INDEX 5) OPTIMISE THE MERGERINDEX b) 1) SEARCH FOR SUBINDEX IN A OPTIMISED MERGED INDEX 2) DELETE THE FOUND SUBINDEX FROM THE OPTIMISED MERGERINDEX 3) ADD A NEW VERSION OF THE SUBINDEX TO THE MERGER INDEX 4) OPTIMISE THE MERGERINDEX a OR b WHICH IS BETTER CHOICE THX IN ADVANCE WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: commit lock, graceful handler
Graceful, no, I started a discussion on this about two years ago, what I'm doing is a batched indexing so if a crash occurs the next time the application starts up I have an LuceneInit class that goes and ensures that all indecies have no locks on them by simply deleting the lock file and optimizing the index, this has worked for us well for the past two years in a production environment and the next indexing run will pick up the same batch and re-index it, which doesn't hurt the index because every time I add a document to the index, I actually delete it first to ensure that there are no repetitions, we've never had an index go corrupt on us but we do have six indecies being updated in parallel in addition to nightly backups by our hosting facility during a one hour window where we do no updates/deletes on the index to ensure that the backup is kosher. It may not be graceful as Oracle Rollback Tables but it's functional and a lot less complicated. Nader Jackson Earnst wrote: I'm testing fault tollerance aspects of an application using Lucene. Consider if power is pulled form the server/workstation and it immediately shuts down hard or crashes. I'm faced with a situation of a commit.lock file exising in the temp directory. Lucene is throwing an exception when a writer is first created against this index. An IOexception comes about and "Lock obtained timed out" error occurs. REading some docs anf FAQs I see that this could be deleted and the index will be in a usable state. Any advice/comments/thoughts? Is there a graceful way to handle this? Thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Atomicity in Lucene operations
As soon as I've cleaned up the code, I'll publish it, it needs a little more documentation as well. Nader Roy Shan wrote: Maybe you can contribute it to sandbox? On Mon, 18 Oct 2004 08:31:30 -0700 (PDT), Yonik Seeley <[EMAIL PROTECTED]> wrote: Hi Nader, I would greatly appreciate it if you could CC me on the docs or the code. Thanks! Yonik --- Nader Henein <[EMAIL PROTECTED]> wrote: It's pretty integrated into our system at this point, I'm working on Packaging it and cleaning up my documentation and then I'll make it available, I can give you the documents and if you still want the code I'll slap together a ruff copy for you and ship it across. Nader Henein Roy Shan wrote: Hello, Nader: I am very interested in how you implement the atomicity. Could you send me a copy of your code? Thanks in advance. Roy __ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: simultanous search and indexing
you can do both at the same time, it's thread safe, you will face different issues depending on the frequency or your indexing and the load on the search, but that shouldn't come into play till your index gets nice and heavy. So basically code on. Nader Henein Miro Max wrote: hi, i'm using servlet to search my index and i wish to be able to create an index at the same time. do i have to use threads - i'm beginner thx ___ Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Atomicity in Lucene operations
It's pretty integrated into our system at this point, I'm working on Packaging it and cleaning up my documentation and then I'll make it available, I can give you the documents and if you still want the code I'll slap together a ruff copy for you and ship it across. Nader Henein Roy Shan wrote: Hello, Nader: I am very interested in how you implement the atomicity. Could you send me a copy of your code? Thanks in advance. Roy On Sat, 16 Oct 2004 01:20:09 +0400, Nader Henein <[EMAIL PROTECTED]> wrote: We use Lucene over 4 replicated indecies and we have to maintain atomicity on deletion and updates with multiple fallback points. I'll send you the right up, it's too big to CC the entire board. nader henein Christian Rodriguez wrote: Hello guys, I need additions and deletions of documents to the index to be ATOMIC (they either happen to completion or not at all). On top of this, I need updates (which I currently implement with a deletion of the document followed by an addition) to be ATOMIC and DURABLE (once I return from the "update" function its because the operation happened to completion and stays in the index). Notice that I dont really need all the ACID properties for all the operations. I have tried to solve the problem by using the Lucene + BDB package written by Andi Vajda and using transactions, but the BDB database gets corrupted if I insert random System.exit() to simulate a crash of the application before aborting or commiting transactions. So I have two questions: 1. Has anyone been able to use the Lucene + BDB WITH transactions and simulate random crashes at different points in the process of addding items and found it to be robust (specially, have you been able to always recover after a crash, with uncommited txns rolled back and commited ones present in the DB)? 2. Can anyone suggest other solutions (beside using BDB) that may work? For example: are any of these operations already atomic in Lucene (using an FSDirectory)? Thanks for any help you can give me! Xtian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Atomicity in Lucene operations
We use Lucene over 4 replicated indecies and we have to maintain atomicity on deletion and updates with multiple fallback points. I'll send you the right up, it's too big to CC the entire board. nader henein Christian Rodriguez wrote: Hello guys, I need additions and deletions of documents to the index to be ATOMIC (they either happen to completion or not at all). On top of this, I need updates (which I currently implement with a deletion of the document followed by an addition) to be ATOMIC and DURABLE (once I return from the "update" function its because the operation happened to completion and stays in the index). Notice that I dont really need all the ACID properties for all the operations. I have tried to solve the problem by using the Lucene + BDB package written by Andi Vajda and using transactions, but the BDB database gets corrupted if I insert random System.exit() to simulate a crash of the application before aborting or commiting transactions. So I have two questions: 1. Has anyone been able to use the Lucene + BDB WITH transactions and simulate random crashes at different points in the process of addding items and found it to be robust (specially, have you been able to always recover after a crash, with uncommited txns rolled back and commited ones present in the DB)? 2. Can anyone suggest other solutions (beside using BDB) that may work? For example: are any of these operations already atomic in Lucene (using an FSDirectory)? Thanks for any help you can give me! Xtian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Encrypted indexes
Well, are you "storing" any data for retrieval from the index, because you could encrypt the actual data and then encrypt the search string public key style. Nader Henein Weir, Michael wrote: We need to have index files that can't be reverse engineered, etc. An obvious approach would be to write a 'FSEncryptedDirectory' class, but sounds like a performance killer. Does anyone have experience in making an index secure? Thanks for any help, Michael Weir This message may contain privileged and/or confidential information. If you have received this e-mail in error or are not the intended recipient, you may not use, copy, disseminate or distribute it; do not open any attachments, delete it immediately from your system and notify the sender promptly by e-mail that you have done so. Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sorting and score ordering
As far as my testing showed, the sort will take priority, because it's basically an opt-in sort as opposed to the defaulted score sort. So you're basically displaying a sorted set over all your results as opposed to sorting the most relevant results. Hope this helps Nader Henein Chris Fraschetti wrote: If I use a Sort instance on my searcher, what will have priority? Score or Sort? Assuming I have a pages with .9, .9, and .5 scores, ... if the .5 has a higher 'sort' value, will it return higher than one of the .9 lucene score values if they are lower? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Arabic analyzer
I'd be happy to help anyone test this out, my Arabic is pretty good. Nader Andrzej Bialecki wrote: Dawid Weiss wrote: nothing to do with each other furthermore, Arabic uses phonetic indicators on each letter called diacritics that change the way you pronounce the word which in turn changes the words meaning so two word spelled exactly the same way with different diacritics will mean two separate things, Just to point out the fact: most slavic languages also use diacritic marks (above, like 'acute', or 'dot' marks, or below, like the Polish 'ogonek' mark). Some people argue that they can be stripped off the text upon indexing and that the queries usually disambiguate the context of the word. Hmm. This brings up a question: the algorithmic stemmer package from Egothor works quite well for Polish (http://www.getopt.org/stempel), wouldn't it work well for Arabic, too? I lack the necessary expertise to evaluate results (knowing only two or three arabic words ;-) ), but I can certainly help someone to get started with testing... - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Arabic analyzer
There is a way of writing an Arabic stemmer, it's just not a weekend project, I've seen the translate/stem option as well, and even tried it with Lucene, we've implemented Lucene on our database and we have about a million records in our DB with 19 indexed fields (some of which are clobs) in each record, the free text fields in each record are in many cases Arabic, we do not provide stemming on those just because I couldn't find a valid stemming or translation option, which held up to proper testing, some were ok, but after collecting data from user searches (averaging out at 5 searches per second) the Arabic stemming options would not be able to manage user expectations, which is what it comes down to, sometimes theory does not translate well to practice. Nader Henein Dawid Weiss wrote: nothing to do with each other furthermore, Arabic uses phonetic indicators on each letter called diacritics that change the way you pronounce the word which in turn changes the words meaning so two word spelled exactly the same way with different diacritics will mean two separate things, Just to point out the fact: most slavic languages also use diacritic marks (above, like 'acute', or 'dot' marks, or below, like the Polish 'ogonek' mark). Some people argue that they can be stripped off the text upon indexing and that the queries usually disambiguate the context of the word. It is just a digression. Now back to the arabic stemmer -- there has to be a way of doing it. I know Vivisimo has clustering options for arabic. They must be using a stemmer (and an English translation dictionary), although it might be a commercial one. Take a look: http://vivisimo.com/search?v:file=cnnarabic D. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Arabic analyzer
I worked on trying to develop one and it became a colossal pain, a conclusive Arabic dictionary is about 20 volumes roughly the size of an encyclopedia, just to give you some background when you search for a word in the encyclopedia you have to reduce it to either it's 2 or three letter root, then you can look for your desired word underneath that root, reducing the words to that root as part of the stemming is useless because words belonging to the same root more often than not have nothing to do with each other furthermore, Arabic uses phonetic indicators on each letter called diacritics that change the way you pronounce the word which in turn changes the words meaning so two word spelled exactly the same way with different diacritics will mean two separate things, I've seen Arabic stemmers that kinda of work, but none of them are open source, this is a good paper from Berkeley that outlines the work and the challenges, http://metadata.sims.berkeley.edu/papers/trec2002.pdf, hope it helps. Nader Henein Scott Smith wrote: Is anyone aware of an open source (non-GPL; i.e.., free for commercial use) Arabic analyzer for Lucene? Does Arabic really require a stemmer as well (some of the reading I've seen on the web would suggest that a stemmer is almost a necessity with Arabic to get anything useful where it is not with other languages). Scott - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Moving from a single server to a cluster
be a pleasure, just didn't want to mislead someone down the wrong way. Give me a few days and I'll have the new version up. Nader - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Moving from a single server to a cluster
Hey Ben, We've been using a distributed environment with three servers and three separate indecies for the past 2 years since the first stable Lucene release and it has been great, recently and for the past two months I've been working on a redesign for our Lucene App and I've shared my findings and plans with Otis, Doug and Erik, they pointed out a few faults in my logic which you will probably come across soon enough that mainly have to do with keeping you updates atomic (not too hard) and your deletes atomic (a little more tricky), give me a few days and I'll send you both the early document and the newer version that deals squarely with Lucene in a distributed environment with high volume index. Regards. Nader Henein Ben Sinclair wrote: My application currently uses Lucene with an index living on the filesystem, and it works fine. I'm moving to a clustered environment soon and need to figure out how to keep my indexes together. Since the index is on the filesystem, each machine in the cluster will end up with a different index. I looked into JDBC Directory, but it's not tested under Oracle and doesn't seem like a very mature project. What are other people doing to solve this problem? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Most efficient way to index 14M documents (out of memory/file handles)
Here's the thread you want : http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1722573 Nader Henein Kevin A. Burton wrote: I'm trying to burn an index of 14M documents. I have two problems. 1. I have to run optimize() every 50k documents or I run out of file handles. this takes TIME and of course is linear to the size of the index so it just gets slower by the time I complete. It starts to crawl at about 3M documents. 2. I eventually will run out of memory in this configuration. I KNOW this has been covered before but for the life of me I can't find it in the archives, the FAQ or the wiki. I'm using an IndexWriter with a mergeFactor of 5k and then optimizing every 50k documents. Does it make sense to just create a new IndexWriter for every 50k docs and then do one big optimize() at the end? Kevin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: incrementally indexing a million documents
How are you documents named? is it alphabetical or numerical, Mine where numerical so I I creates n directories like so 11 , 12, 13, 14, 19, 21 , 22 , 23 .. 99 you get the idea and I stored the files into the directories that each belonged to depending on the last two numbers in the file name (you could use file size to shuffle the files around as well (ie, use the 2 rightmost numbers in the file size in bytes) so at this point you'll have shuffled your million docs into 100 directories and then Lucene can spider through each set of directories indexing let's say 5000 files at a time and then deleting them or moving them into another location, it you get 100 million files simply up the precision on the directory to a 3 digit setup or a 4 digit setup (once you automate it, sky's the limit) Hope this helps Nader Henein Michael Wechner wrote: I try to index around a million documents. The problem is that I run out of memory during sorting by uid when I go through the directory recursively. Well, I could add more memory, but this wouldn't really solve my problem, because at some point I will always run out of memory (e.g. 10 million documents). Is there another approach than sorting by uid? Thanks Michi - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Devnagari Search?
Have faith in the UNICODE standard it's well thought out and if you have any internationalization queries there was an excellent article on Java World entitled end-to-end internationalization here's the link: http://www.javaworld.com/javaworld/jw-05-2004/jw-0524-i18n_p.html have a read it helps clear out some myths. Nader Henein - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Disappearing segments
Could you share you're indexing code, and just to make sure id there anything running on your machine that could delete these files, like an a cron job that'll back up the index. You could go by process of elimination and shut down your server and see if the files disappear, coz if the problem is contained within the server you know that you can safely go on the DEBUG rampage. Nader -Original Message- From: Kelvin Tan [mailto:[EMAIL PROTECTED] Sent: Friday, April 30, 2004 9:15 AM To: Lucene Users List Subject: Re: Disappearing segments An update: Daniel Naber suggested using IndexWriter.setUseCompoundFile() to see if it happens with the compound index format. Before I had a chance to try it out, this happened: java.io.FileNotFoundException: C:\index\segments (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:200) at org.apache.lucene.store.FSInputStream$Descriptor.(FSDirectory.j ava:321) at org.apache.lucene.store.FSInputStream.(FSDirectory.java:329) at org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:71) at org.apache.lucene.index.IndexWriter$1.doBody(IndexWriter.java:154) at org.apache.lucene.store.Lock$With.run(Lock.java:116) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:149) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:131) so even the segments file somehow got deleted. Hoping someone can shed some light on this... Kelvin On Thu, 29 Apr 2004 11:45:36 +0800, Kelvin Tan said: > Errr, sorry for the cross-post to lucene-dev as well, but I realized > this mail really belongs on lucene-user... > > I've been experiencing intermittent disappearing segments which result > in the following stacktrace: > > Caused by: java.io.FileNotFoundException: C:\index\_1ae.fnm (The > system cannot find the file specified) at > java.io.RandomAccessFile.open(Native Method) at > java.io.RandomAccessFile.(RandomAccessFile.java:200) > at > org.apache.lucene.store.FSInputStream$Descriptor.(FSDirectory.ja > va:321) at > org.apache.lucene.store.FSInputStream.(FSDirectory.java:329) > at org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268) > at org.apache.lucene.index.FieldInfos.(FieldInfos.java:78) > at > org.apache.lucene.index.SegmentReader.(SegmentReader.java:104) > at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95) > at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:112) > at org.apache.lucene.store.Lock$With.run(Lock.java:116) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:103) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:91) > at > org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:75) > > The segment that disappears (_1ae.fnm) varies. > > I can't seem to reproduce this error consistently, so don't have a > clue what might cause it, but it usually happens after the application > has been running for some time. Has anyone experienced something > similar, or can anyone point me > in the right direction? > > When this occurs, I need to rebuild the entire index for it to be > usable. Very troubling indeed... > > Kelvin > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: read only file system
I hate to speak after Otis, but the way we deal with this is by clearing locks on server restart, in case a server crash occurs mid indexing and we also optimize on server restart, it doesn't happen often (God bless Resin) but when it has we faced no problems from Lucene. Just fir the record we have a validate function that the LuceneInit calls it looks something like this: try { Directory directory = FSDirectory.getDirectory(indexPath,false); if ( directory.list().length == 0 ) clear() ; Lock writeLock = directory.makeLock(writeFileName); if (!writeLock.obtain()) { IndexReader.unlock(directory) ; } else { writeLock.release() ; } } catch (IOException e) { logger.error("Index Validate",e) ; } Nader -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, April 30, 2004 4:09 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: Re: read only file system If you have a very recent Lucene, then you can disable locks with command line parameters. I believe a page describing various command line parameters is on Lucene's Wiki. Otis --- Supun Edirisinghe <[EMAIL PROTECTED]> wrote: > I think I'm alittle confused on how and index is put into use on a > readonly file system > > I'm using Lucene in my web application. Our indexes are built off our > database nightly and copied into our web app servers. > > I think our web app dies from time to time and sometimes a lock is > left behind from Lucene in /tmp/. > > I have read that there is a disableLuceneLocks System Property(is that > the full name or is it something like > org.apache.jakarta...disableLuceneLocks?). But, I'm still not sure how > I can set that. Do I give it as commandline arg to the java VM? > > thanks > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Multi-Threading
Why do you have concurency problems? are you trying to have each user initiate the indexing himself? because that will create issues, how about you put all the new files you want to index in a directory and then have a schedule procedure on the webserver run the lucene indexer on that directory, our application hasn't had any concurrency problems at all, because we index based on a pull system, rather than the user puching documents to the indexer. I hope I understood your problem correctly, so that the answer is useful Nader On Tue, 19 Aug 2003 12:55:09 +0200, Damien Lust wrote: > > Hello, > > I developed an Client-Server application on the web, > with a search > module using Lucene. In the same application, the users > can index new > text. > > So, multiple sessions can acces to the Index and > concurrences problems > can be possible. > > I used Threads in Java. Is it the best solutions? > > I call : > > IndexFiles indexFiles = new IndexFiles(); > indexFiles.run(); > > Here you are an extract of my code. > > Thanks. > > public class IndexFiles extends Thread{ > public IndexFiles(){ > } > > public void run(){ > > SynchronizedIndexWriter.insertDocument(currentIndexDocument(),"tmp/ > IndexPath",new MainAnalyser()); > } > > } > > > > public class SynchronizedIndexWriter { > > static synchronized void > insertDocument(IndexDocument > document,String indexLocValue,Analyzer analyzerValue){ > File f=new File(indexLocValue); > if (f.exists()) > addDocumentToIndex(document,indexLocValue,analyzerValue,false); > else > addDocumentToIndex(document,indexLocValue,analyzerValue,true); > } > > > static synchronized void > addDocumentToIndex(IndexDocument > document,String indexLocValue,Analyzer > analyzerValue,boolean > createNewIndex){ > try{ > IndexWriter indexWriter = new > IndexWriter(indexLocValue,analyzerValue,createNewIndex); > > indexWriter.addDocument(document.getDocument()); > indexWriter.optimize(); > indexWriter.close(); > } > catch(IOException io){ > // If IndexWrite don't know write on index because > it's locked, > recall of the function >=> It's not very safe > > addDocumentToIndex(document,indexLocValue,analyzerValue,createNewIndex); > } > catch(Exception e){ > > } > > } > } The information contained above is proprietary to BAYT.COM and confidential. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]