Updating Lucene index from two different threads in a web application
Hi, I've a web application which uses Lucene for company search functionality. When registered users add a new company,it is saved to database and also gets indexed in Lucene based company search index in real time. When adding company in Lucene index, how do I handle use case of two or more logged-in users posting a new company at the same time?Also, will both these companies get indexed without any file lock, lock time out, etc. related issues? Would appreciate if i could help with code as well. Thanks. -- View this message in context: http://www.nabble.com/Updating-Lucene-index-from-two-different-threads-in-a-web-application-tp25231264p25231264.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
How to know when Lucene Index generation process is completed
Hi, I've a batch job which generates Lucene search indexes every night. I first get all the records from the database and add it to Lucene index using IndexWriter's AddDocument method and then call Optimize method before returning from the method. Since the records fetched are faily large, indexing takes around 2-3 minutes to complete. As you already know,Lucene generates intermediate segment files while it is generating the index and it compresses the whole index into 3 files when Optimize is called. Is there anyway I can know that this index generation process is finished by Lucene and index is avaialable for search? I need to know this because I want to call another method when process is completed. -- View this message in context: http://www.nabble.com/How-to-know-when-Lucene-Index-generation-process-is-completed-tp24175423p24175423.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Synchronizing Lucene indexes across 2 application servers
I've a web application which uses Lucene for search functionality. Lucene search requests are served by web services sitting on 2 application servers (IIS 7).The 2 application servers are Load balanced using netscaler. Both these servers have a batch job running which updates search indexes on the respective servers in the night on a daily basis. I need to synchronize search indexes on these 2 servers so that at any point of time both the servers have uptodate indexes. I was thinking what could be the best architecture/design strategy to do so given the fact that any of the 2 application servers could be serving search request depending upon its availability. Any inputs please? Thanks for reading! -- View this message in context: http://www.nabble.com/Synchronizing-Lucene-indexes-across-2-application-servers-tp24086961p24086961.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Problem using Lucene RangeQuery
thanks for your message...yes,i was able to get this working! Danil Ε’ORIN wrote: Lucene stores and searches STRINGS so range [0..2] may return 0,1,101, ..109, 11, 110, ..119, 12, ., 2 prefix and normalize your number, like: 001,002...011,012,, 113, etc, if you'll have bigger numbers, put more 0's All of these and much more are documented on the wiki, javadocs and so on, please read them first. On Thu, Apr 2, 2009 at 05:40, mitu2009 musicfrea...@gmail.com wrote: I'm using Rangequery to get all the documents which have amount between say 0 to 2. When i execute the query, Lucene gives me documents which have amount greater than 2 also...What am i missing here? Here is my code: Term lowerTerm = new Term(amount, minAmount); Term upperTerm = new Term(amount, maxAmount); RangeQuery amountQuery = new RangeQuery(lowerTerm, upperTerm, true); finalQuery.Add(amountQuery, BooleanClause.Occur.MUST); and here is what goes into my index: doc.Add(new Field(amount, amount.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES)); Thanks. -- View this message in context: http://www.nabble.com/Problem-using-Lucene-RangeQuery-tp22839692p22839692.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- View this message in context: http://www.nabble.com/Problem-using-Lucene-RangeQuery-tp22839692p22997951.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Problem using Lucene RangeQuery
I'm using Rangequery to get all the documents which have amount between say 0 to 2. When i execute the query, Lucene gives me documents which have amount greater than 2 also...What am i missing here? Here is my code: Term lowerTerm = new Term(amount, minAmount); Term upperTerm = new Term(amount, maxAmount); RangeQuery amountQuery = new RangeQuery(lowerTerm, upperTerm, true); finalQuery.Add(amountQuery, BooleanClause.Occur.MUST); and here is what goes into my index: doc.Add(new Field(amount, amount.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES)); Thanks. -- View this message in context: http://www.nabble.com/Problem-using-Lucene-RangeQuery-tp22839692p22839692.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Reading document in Lucene
My indexed document in Lucene has got multiple cities assigned to it...ie. doc.Add(new Field(city, city1.Trim(), Field.Store.YES, Field.Index.TOKENIZED)); doc.Add(new Field(city, city2.Trim(), Field.Store.YES, Field.Index.TOKENIZED)); etc how do i iterate thru them and read the values after executing the Lucene search query? Thanks -- View this message in context: http://www.nabble.com/Reading-document-in-Lucene-tp22795893p22795893.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Lucene analyzer and dots
Is there any way I can make Lucene analyzer not ignore dots in the string?? for example,if my search criteria is: A.B.C.D,Lucene should give me only those documents in the search results which have A.B.C.D and not ABCD -- View this message in context: http://www.nabble.com/Lucene-analyzer-and-dots-tp22795889p22795889.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Using Highlighter for highlighting Phrase query
Am using this version of Lucene highlighter.net API. I want to get a phrase highlighted only when ALL of its words are present in the search results..But,am not able to do sofor example, if my input search string is Leading telecom company, then the API only highlights telecom in the results if the result does not contain the words leading and company... Here is the code i'm using: SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter(); var appData = (string)AppDomain.CurrentDomain.GetData(DataDirectory); var folderpath = System.IO.Path.Combine(appData, MyFolder); indexReader = IndexReader.Open(folderpath); Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(finalQuery.Rewrite(indexReader))); highlighter.SetTextFragmenter(new SimpleFragmenter(800)); int maxNumFragmentsRequired = 5; string highlightedText = string.Empty; TokenStream tokenStream = this._analyzer.TokenStream(fieldName, new System.IO.StringReader(fieldText)); highlightedText = highlighter.GetBestFragments(tokenStream, fieldText, maxNumFragmentsRequired, ...); return highlightedText; -- View this message in context: http://www.nabble.com/Using-Highlighter-for-highlighting-Phrase-query-tp22560334p22560334.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Using MultiFieldQueryParser
Hi, Am working on a book search api using Lucene.User can search for a book whose title or description field contains C.F.A.. Am using Lucene's MultiFieldQueryParser..But after parsing, its removing the dots in the string. What am i missing here? Thanks. -- View this message in context: http://www.nabble.com/Using-MultiFieldQueryParser-tp22562134p22562134.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Deleting and updating documents in Lucene index
Hi, Am using Lucene.Net dll version 2.0.0.4 Looks like its IndexWriter class does not have methods for DeleteDocument and UpdateDocument.Am i missing something here?How do i achieve delete,update functionality in this version of dll? Version 2.1 Lucene dll seems to have support for delete and update documents: public virtual void DeleteDocuments(Term term); public virtual void UpdateDocument(Term term, Document doc); For updating a document, shall i use delete and insert or a direct update command? Following URL has the source code for verion 2.1,but i will have to download all the files one by one and then build a dll out of it. https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_1_0/ Can I download latest Lucene dll and Highlighter from some site? Thanks. -- View this message in context: http://www.nabble.com/Deleting-and-updating-documents-in-Lucene-index-tp22449134p22449134.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Sorting lucene search results
hi, thanks for your answer. I had tried using TopFieldDocCollector but i got error saying value is too small or too large when i passed 5000 as numHits argument value...please suggest a valid value to pass... Thanks. Anshum wrote: Hi Mitu, Could we have usage/implementation based questions at the user forum. Would help keep things segregated :). About your problem though, I wouldn't know about the .net port. You could (in Java Lucene) use: public TopFieldDocCollector(IndexReader reader, Sort sort, int numHits) i.e.: mySearcher.search(query, TopFieldDocCollector(IndexReader reader, Sort sort, int numHits), true); Perhaps you could try doing something of this sort. Should work as I had tried something of this sort successfully a long time ago! On Sat, Jan 31, 2009 at 07:47:28AM +0530, mitu2009 wrote: Hi, I'm using following code to get execute search query in Lucene.Net var collector = new GroupingHitCollector(searcher.GetIndexReader());searcher.Search(myQuery, collector);resultsCount = collector.Hits.Count;How do i sort these search results based on a field? I need to use collector object(instead of using hits) and I dont see any overloaded Searcher.search method which returns a collector object as well as sort it on a field. Thanks. -- View this message in context: http://www.nabble.com/Sorting-lucene-search-results-tp21759077p21759077.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Anshum -- Tuesday After Lunch is the cosmic time of the week. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- View this message in context: http://www.nabble.com/Sorting-lucene-search-results-tp21759077p21766377.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Sorting lucene search results
Hi, I'm using following code to get execute search query in Lucene.Net var collector = new GroupingHitCollector(searcher.GetIndexReader());searcher.Search(myQuery, collector);resultsCount = collector.Hits.Count;How do i sort these search results based on a field? I need to use collector object(instead of using hits) and I dont see any overloaded Searcher.search method which returns a collector object as well as sort it on a field. Thanks. -- View this message in context: http://www.nabble.com/Sorting-lucene-search-results-tp21759077p21759077.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Bubbling up newer records
Thanks Steven...! Appreciate ur help! Regards Ed Steven A Rowe wrote: Hi Ed, Here's an example, based on the code from http://wiki.apache.org/lucene-java/TheBasics (UNTESTED): public class LuceneFreshnessTest { public static void main(String[] args) throws IOException { RAMDirectory directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true); Document doc = new Document(); doc.add(new Field(id, 1, Field.Store.YES, Field.Index.NOT_ANALYZED)); // 1 day since epoch = 1 Field daysField = new Field (days, 1, Field.Store.NO, Field.Index.ANALYZED); daysField.setOmitNorms(true); doc.add(daysField); writer.addDocument(doc); doc = new Document(); doc.add(new Field(id, 2, Field.Store.YES, Field.Index.NOT_ANALYZED)); // 3 days since epoch = 1 1 1 daysField = new Field(days, 1 1 1, Field.Store.NO, Field.Index.ANALYZED); daysField.setOmitNorms(true); doc.add(daysField); writer.addDocument(doc); doc = new Document(); doc.add(new Field(id, 3, Field.Store.YES, Field.Index.NOT_ANALYZED)); // 5 days since epoch = 1 1 1 1 1 daysField = new Field(days, 1 1 1 1 1, Field.Store.NO, Field.Index.ANALYZED); daysField.setOmitNorms(true); doc.add(daysField); writer.addDocument(doc); writer.close(); IndexSearcher searcher = new IndexSearcher(directory); Query query = new TermQuery(new Term(days, 1)); TopDocs rs = searcher.search(query, null, 10); System.out.println(Total hits: + rs.totalHits); Document firstHit = searcher.doc(rs.scoreDocs[0].doc); System.out.println(First hit ID (newest=3): + firstHit.getField(id).toString()); } } Steve On 01/15/2009 at 10:41 PM, mitu2009 wrote: Hi, Thanks for your suggestions! Am new to Lucene...would appreciate if u could elaborate ur following point with an example: Add a separate field, say days, in which you will put as many 1 as many days elapsed since the epoch (not neccessarily since 1 Jan 1970 - pick a date that makes sense for you). Then, if you want to prioritize newer documents, just add +days:1 to your query. Voila - the final results are a sum of other score factors plus a score factor that is higher for more recent document, containing more 1-s. Thanks again! Ed Steven A Rowe wrote: On 01/14/2009 at 10:44 PM, mitu2009 wrote: Is it possible to bubble up newer records in lucene search results? ie.I want Lucene to give a higher score to records which are closer to today's date. In addition to the fine ideas given by previous posters, Andrzej Bialecki has described a technique that uses term frequency alone to affect the score: from http://www.gossamer-threads.com/lists/lucene/java-user/43457: Here's the trick that works for me, without the issues of boost resolution or FunctionQuery. Add a separate field, say days, in which you will put as many 1 as many days elapsed since the epoch (not neccessarily since 1 Jan 1970 - pick a date that makes sense for you). Then, if you want to prioritize newer documents, just add +days:1 to your query. Voila - the final results are a sum of other score factors plus a score factor that is higher for more recent document, containing more 1-s. If you are dealing with large time spans, you can split this into years and days-in-a-year, and apply query boosts, like +years:1^10.0 +days:1^0.02. Do some experiments and find what works best for you. As noted in a later thread discussing this issue http://www.gossamer-threads.com/lists/lucene/java-user/64482, you should turn norms off on the days field: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/document/Fieldable.html#setOmitNorms(boolean) Steve - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- View this message in context: http://www.nabble.com/Bubbling-up-newer-records-tp21470766p21504735.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Bubbling up newer records
Am using Lucene.Net 2.0.4 version and am not able to see Field.Index.ANALYZED in the code. Thanks, Ed Steven A Rowe wrote: Hi Ed, Here's an example, based on the code from http://wiki.apache.org/lucene-java/TheBasics (UNTESTED): public class LuceneFreshnessTest { public static void main(String[] args) throws IOException { RAMDirectory directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true); Document doc = new Document(); doc.add(new Field(id, 1, Field.Store.YES, Field.Index.NOT_ANALYZED)); // 1 day since epoch = 1 Field daysField = new Field (days, 1, Field.Store.NO, Field.Index.ANALYZED); daysField.setOmitNorms(true); doc.add(daysField); writer.addDocument(doc); doc = new Document(); doc.add(new Field(id, 2, Field.Store.YES, Field.Index.NOT_ANALYZED)); // 3 days since epoch = 1 1 1 daysField = new Field(days, 1 1 1, Field.Store.NO, Field.Index.ANALYZED); daysField.setOmitNorms(true); doc.add(daysField); writer.addDocument(doc); doc = new Document(); doc.add(new Field(id, 3, Field.Store.YES, Field.Index.NOT_ANALYZED)); // 5 days since epoch = 1 1 1 1 1 daysField = new Field(days, 1 1 1 1 1, Field.Store.NO, Field.Index.ANALYZED); daysField.setOmitNorms(true); doc.add(daysField); writer.addDocument(doc); writer.close(); IndexSearcher searcher = new IndexSearcher(directory); Query query = new TermQuery(new Term(days, 1)); TopDocs rs = searcher.search(query, null, 10); System.out.println(Total hits: + rs.totalHits); Document firstHit = searcher.doc(rs.scoreDocs[0].doc); System.out.println(First hit ID (newest=3): + firstHit.getField(id).toString()); } } Steve On 01/15/2009 at 10:41 PM, mitu2009 wrote: Hi, Thanks for your suggestions! Am new to Lucene...would appreciate if u could elaborate ur following point with an example: Add a separate field, say days, in which you will put as many 1 as many days elapsed since the epoch (not neccessarily since 1 Jan 1970 - pick a date that makes sense for you). Then, if you want to prioritize newer documents, just add +days:1 to your query. Voila - the final results are a sum of other score factors plus a score factor that is higher for more recent document, containing more 1-s. Thanks again! Ed Steven A Rowe wrote: On 01/14/2009 at 10:44 PM, mitu2009 wrote: Is it possible to bubble up newer records in lucene search results? ie.I want Lucene to give a higher score to records which are closer to today's date. In addition to the fine ideas given by previous posters, Andrzej Bialecki has described a technique that uses term frequency alone to affect the score: from http://www.gossamer-threads.com/lists/lucene/java-user/43457: Here's the trick that works for me, without the issues of boost resolution or FunctionQuery. Add a separate field, say days, in which you will put as many 1 as many days elapsed since the epoch (not neccessarily since 1 Jan 1970 - pick a date that makes sense for you). Then, if you want to prioritize newer documents, just add +days:1 to your query. Voila - the final results are a sum of other score factors plus a score factor that is higher for more recent document, containing more 1-s. If you are dealing with large time spans, you can split this into years and days-in-a-year, and apply query boosts, like +years:1^10.0 +days:1^0.02. Do some experiments and find what works best for you. As noted in a later thread discussing this issue http://www.gossamer-threads.com/lists/lucene/java-user/64482, you should turn norms off on the days field: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/document/Fieldable.html#setOmitNorms(boolean) Steve - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- View this message in context: http://www.nabble.com/Bubbling-up-newer-records-tp21470766p21508712.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Lucene index updation and performance
I am working on a job portal site and have been using Lucene for job search functionality. Users will be posting a number jobs on our site on a daily basis.We need to make sure that new job posted is searchable on the site as soon as possible. In this context, how do I update Lucene index when a new job is posted or when an existing job is edited? Can lucene index updating and search work in parallel? Also,can I know any tips/best practices with respect to Lucene indexing,optimizing,performance etc? Appreciate ur help! Thanks! -- View this message in context: http://www.nabble.com/Lucene-index-updation-and-performance-tp21491992p21491992.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Bubbling up newer records
Hi, Thanks for your suggestions! Am new to Lucene...would appreciate if u could elaborate ur following point with an example: Add a separate field, say days, in which you will put as many 1 as many days elapsed since the epoch (not neccessarily since 1 Jan 1970 - pick a date that makes sense for you). Then, if you want to prioritize newer documents, just add +days:1 to your query. Voila - the final results are a sum of other score factors plus a score factor that is higher for more recent document, containing more 1-s. Thanks again! Ed Steven A Rowe wrote: On 01/14/2009 at 10:44 PM, mitu2009 wrote: Is it possible to bubble up newer records in lucene search results? ie.I want Lucene to give a higher score to records which are closer to today's date. In addition to the fine ideas given by previous posters, Andrzej Bialecki has described a technique that uses term frequency alone to affect the score: from http://www.gossamer-threads.com/lists/lucene/java-user/43457: Here's the trick that works for me, without the issues of boost resolution or FunctionQuery. Add a separate field, say days, in which you will put as many 1 as many days elapsed since the epoch (not neccessarily since 1 Jan 1970 - pick a date that makes sense for you). Then, if you want to prioritize newer documents, just add +days:1 to your query. Voila - the final results are a sum of other score factors plus a score factor that is higher for more recent document, containing more 1-s. If you are dealing with large time spans, you can split this into years and days-in-a-year, and apply query boosts, like +years:1^10.0 +days:1^0.02. Do some experiments and find what works best for you. As noted in a later thread discussing this issue http://www.gossamer-threads.com/lists/lucene/java-user/64482, you should turn norms off on the days field: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/document/Fieldable.html#setOmitNorms(boolean) Steve - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- View this message in context: http://www.nabble.com/Bubbling-up-newer-records-tp21470766p21492085.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Maximum boost factor
Does anyone know the maximum boost factor value for a field in Lucene? Thanks! -- View this message in context: http://www.nabble.com/Maximum-boost-factor-tp21492116p21492116.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Bubbling up newer records
Hi, Is it possible to bubble up newer records in lucene search results? ie.I want Lucene to give a higher score to records which are closer to today's date. -- View this message in context: http://www.nabble.com/Bubbling-up-newer-records-tp21470766p21470766.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org