upgrading to lucene 5.5.5
I'm trying to upgrade from lucene 4.10.4 too lucene 5.5.5 and I'm having trouble converting some of my code, here's the function I'm having trouble with. public Query rewrite(IndexReader reader) throws IOException { WildcardQuery wildQuery = new WildcardQuery(term); wildQuery.setRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE); Query q = wildQuery.rewrite(reader); if(q instanceof BooleanQuery || q instanceof WildcardQuery || q instanceof PrefixQuery || q instanceof ConstantScoreQuery){ }else{ return new SpanTermQuery(term); } BooleanQuery bq = (BooleanQuery) q; List clauses = bq.clauses(); SpanQuery[] sqs = new SpanQuery[clauses.size()]; for (int i = 0; i < clauses.size(); i++) { BooleanClause clause = clauses.get(i); TermQuery tq = (TermQuery) clause.getQuery(); sqs[i] = new SpanTermQuery(tq.getTerm()); sqs[i].setBoost(tq.getBoost()); } SpanOrQuery query = new SpanOrQuery(sqs); query.setBoost(wildQuery.getBoost()); return query; } The problem is the clause.getQuery() doesn't return a TermQuery anymore, it returns a BoostQuery. How would I get it to return a TermQuery? Or how would I get the term from a BoostQuery? Thanks for your help. Thanks, Chris Salem
RE: escaping characters
Thanks! That worked. We recently upgraded from 2.9 to 4.9, was true the default in 2.9? -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Monday, August 11, 2014 5:54 PM To: java-user@lucene.apache.org Subject: Re: escaping characters You need to manually enable automatic generation of phrase queries - it defaults to disabled, which simply treats the sub-terms as individual terms subject to the default operator. See: http://lucene.apache.org/core/4_9_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setAutoGeneratePhraseQueries(boolean) -- Jack Krupansky -Original Message- From: Chris Salem Sent: Monday, August 11, 2014 1:03 PM To: java-user@lucene.apache.org Subject: RE: escaping characters I'm not using Solr. Here's my code: FSDirectory fsd = FSDirectory.open(new File(C:\\indexes\\Lucene4)); IndexReader reader = DirectoryReader.open(fsd); IndexSearcher searcher = new IndexSearcher(reader); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_4_9, getStopWords()); BooleanQuery.setMaxClauseCount(10); QueryParser qptemp = new QueryParser(Version.LUCENE_4_9, resume_text,analyzer); qptemp.setAllowLeadingWildcard(true); qptemp.setDefaultOperator(QueryParser.AND_OPERATOR); Query querytemp = qptemp.parse(resume_text: (LS\\/MS)); System.out.println(querytemp.toString()); TopFieldCollector tfcollector = TopFieldCollector.create(new Sort(), 20, false, true, false, true); ScoreDoc[] hits; searcher.search(querytemp, tfcollector); hits = tfcollector.topDocs().scoreDocs; long resultCount = tfcollector.getTotalHits(); reader.close(); -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, August 11, 2014 12:27 PM To: java-user Subject: Re: escaping characters Take a look at the adnim/analysis page for the field in question. The next bit of critical information is adding debug=query to the URL. The former will tell you what happens to the input stream at query and index time, the latter will tell you how the query got through the query parsing process. My guess is that you have WordDelimiterFilterFactory in your analysis chain and that's breaking things up. Best, Erick On Mon, Aug 11, 2014 at 8:54 AM, Chris Salem csa...@mainsequence.net wrote: Hi everyone, I'm trying to escape special characters and it doesn't seem to be working. If I do a search like resume_text: (LS\/MS) it searches for LS AND MS instead of LS/MS. How would I escape the slash so it searches for LS/MS? Thanks - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
escaping characters
Hi everyone, I'm trying to escape special characters and it doesn't seem to be working. If I do a search like resume_text: (LS\/MS) it searches for LS AND MS instead of LS/MS. How would I escape the slash so it searches for LS/MS? Thanks
RE: escaping characters
I'm not using Solr. Here's my code: FSDirectory fsd = FSDirectory.open(new File(C:\\indexes\\Lucene4)); IndexReader reader = DirectoryReader.open(fsd); IndexSearcher searcher = new IndexSearcher(reader); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_4_9, getStopWords()); BooleanQuery.setMaxClauseCount(10); QueryParser qptemp = new QueryParser(Version.LUCENE_4_9, resume_text,analyzer); qptemp.setAllowLeadingWildcard(true); qptemp.setDefaultOperator(QueryParser.AND_OPERATOR); Query querytemp = qptemp.parse(resume_text: (LS\\/MS)); System.out.println(querytemp.toString()); TopFieldCollector tfcollector = TopFieldCollector.create(new Sort(), 20, false, true, false, true); ScoreDoc[] hits; searcher.search(querytemp, tfcollector); hits = tfcollector.topDocs().scoreDocs; long resultCount = tfcollector.getTotalHits(); reader.close(); -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, August 11, 2014 12:27 PM To: java-user Subject: Re: escaping characters Take a look at the adnim/analysis page for the field in question. The next bit of critical information is adding debug=query to the URL. The former will tell you what happens to the input stream at query and index time, the latter will tell you how the query got through the query parsing process. My guess is that you have WordDelimiterFilterFactory in your analysis chain and that's breaking things up. Best, Erick On Mon, Aug 11, 2014 at 8:54 AM, Chris Salem csa...@mainsequence.net wrote: Hi everyone, I'm trying to escape special characters and it doesn't seem to be working. If I do a search like resume_text: (LS\/MS) it searches for LS AND MS instead of LS/MS. How would I escape the slash so it searches for LS/MS? Thanks - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ComplexPhraseQueryParser with multiple fields
That seems to work. Thank you! Sincerely, Chris Salem Development Team Main Sequence Technologies, Inc. PCRecruiter.net - PCRecruiter Support ch...@mainsequence.net P: 440.946.5214 ext 5458 F: 440.856.0312 This email and any files transmitted with it may contain confidential information intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. Main Sequence Technologies, Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net - Original Message - To: java-user@lucene.apache.org java-user@lucene.apache.org, Chris Salem ch...@mainsequence.net From: Ahmet Arslan iori...@yahoo.com Sent: 5/2/2011 6:42:37 AM Subject: Re: ComplexPhraseQueryParser with multiple fields Hi, I've just started using the ComplexPhraseQueryParser and it works great with one field but is there a way for it to work with multiple fields? For example, right now the query: job_title: sales man* AND NOT contact_name: Chris Salem throws this exception Caused by: org.apache.lucene.queryParser.ParseException: Cannot have clause for field job_title nested in phrase for field contact_name What is the best way to work around this? There is Lucene-1486 non default field.patch for that but it requires : Fixing this would require changing the package name of ComplexPhraseQueryParser or changing the visibility of field in the QueryParser base class to protected. Anyone have any strong feelings about which of these is the most acceptable? https://issues.apache.org/jira/browse/LUCENE-1486 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:ch...@mainsequence.net (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:ch...@mainsequence.net
ComplexPhraseQueryParser with multiple fields
Hi, I've just started using the ComplexPhraseQueryParser and it works great with one field but is there a way for it to work with multiple fields? For example, right now the query: job_title: sales man* AND NOT contact_name: Chris Salem throws this exception Caused by: org.apache.lucene.queryParser.ParseException: Cannot have clause for field job_title nested in phrase for field contact_name What is the best way to work around this? Sincerely, Chris Salem
custom scorer
Hello, I'm trying to write a custom scorer that only uses the term frequency function from the DefaultSimilarity class, the problem is that documents with lower frequencies are returning with higher scores than documents with higher frequencies. Here's the code: searcher.setSimilarity(new DefaultSimilarity(){ public float lengthNorm(String field, int numTerms){ return 1; } public float idf(int docFreq, int numDocs){ return 1; } public float coord(int overlap, int maxoverlap){ return 1; } public float queryNorm(float sumOfSquaredWeights){ return 1; } public float sloppyFreq(int distance){ return 1; } }); Any idea why this wouldn't be working? Sincerely, Chris Salem
Re: searching for c++, c#, etc...
I'm using the StandardAnalyzer for both searching and indexing. Here's the code to parse the query: Searcher searcher = new IndexSearcher(reader); Analyzer analyzer = new StandardAnalyzer(stopwords); System.out.println(queryString); QueryParser qp = new QueryParser(searchField,analyzer); Query query = qp.parse(queryString); queryString = query.toString(); System.out.println(queryString); And here's the output from the println's: r2_resume_text:c\+\+ AND r2_resume_text: c\# +r2_resume_text:c +r2_resume_text:c Also the documentation doesn't say anything about # having to be escaped. Do I have to escape during indexing too? Sincerely, Chris Salem - Original Message - To: java-user@lucene.apache.org, Chris Salem ch...@mainsequence.net From: Ian Lea ian@gmail.com Sent: 7/16/2009 5:12:53 AM Subject: Re: searching for c++, c#, etc... Hi Escaping should work. See http://lucene.apache.org/java/2_4_1/queryparsersyntax.html and QueryParser.escape(). And you need to be sure that your analyzer isn't removing the plus signs and that you use the same analyzer for indexing and searching. Googling for something like lucene escape will find you more info. Luke will tell you what is actually in your index. -- Ian. On Wed, Jul 15, 2009 at 5:19 PM, Chris Salemch...@mainsequence.net wrote: Hello, I'm trying to search for the terms like c++ but the parser is stripping off the ++. I tried escaping the ++ with slashes but it's still stripping it off. I could replace + with plus, is that the best way to do it? How come escaping isn't working? thanks Sincerely, Chris Salem - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: searching for c++, c#, etc...
That seems to be working. you don't have to escape the pluses though. Also, it appears that the WhitespaceAnalyzer is case sensitive, but I guess I could lowercase everything that gets indexed. thanks alot for your help. Sincerely, Chris Salem Development Team Main Sequence Technologies, Inc. PCRecruiter.net - PCRecruiter Support ch...@mainsequence.net P: 440.946.5214 ext 5458 F: 440.856.0312 This email and any files transmitted with it may contain confidential information intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. Main Sequence Technologies, Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net - Original Message - To: java-user@lucene.apache.org, Chris Salem ch...@mainsequence.net From: Danil TORIN torin...@gmail.com Sent: 7/16/2009 10:28:37 AM Subject: Re: searching for c++, c#, etc... Try WhitespaceAnalyzer for both indexing and searching. On search-time you may also need to escape +, (, ) with \. # shouldn't need escaping. On Thu, Jul 16, 2009 at 17:23, Chris Salemch...@mainsequence.net wrote: I'm using the StandardAnalyzer for both searching and indexing. Here's the code to parse the query: Searcher searcher = new IndexSearcher(reader); Analyzer analyzer = new StandardAnalyzer(stopwords); System.out.println(queryString); QueryParser qp = new QueryParser(searchField,analyzer); Query query = qp.parse(queryString); queryString = query.toString(); System.out.println(queryString); And here's the output from the println's: r2_resume_text:c\+\+ AND r2_resume_text: c\# +r2_resume_text:c +r2_resume_text:c Also the documentation doesn't say anything about # having to be escaped. Do I have to escape during indexing too? Sincerely, Chris Salem - Original Message - To: java-user@lucene.apache.org, Chris Salem ch...@mainsequence.net From: Ian Lea ian@gmail.com Sent: 7/16/2009 5:12:53 AM Subject: Re: searching for c++, c#, etc... Hi Escaping should work. See http://lucene.apache.org/java/2_4_1/queryparsersyntax.html and QueryParser.escape(). And you need to be sure that your analyzer isn't removing the plus signs and that you use the same analyzer for indexing and searching. Googling for something like lucene escape will find you more info. Luke will tell you what is actually in your index. -- Ian. On Wed, Jul 15, 2009 at 5:19 PM, Chris Salemch...@mainsequence.net wrote: Hello, I'm trying to search for the terms like c++ but the parser is stripping off the ++. I tried escaping the ++ with slashes but it's still stripping it off. I could replace + with plus, is that the best way to do it? How come escaping isn't working? thanks Sincerely, Chris Salem - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:ch...@mainsequence.net (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:ch...@mainsequence.net
Re: searching for c++, c#, etc...
I figured c++. would be a problem. Here's what I did to get around it: value.toLowerCase().replaceAll(\\.( ?\t?\n?\r?)+, ) I'm not escaping +'s from the query so I should be good there. thanks alot. Sincerely, Chris Salem Development Team Main Sequence Technologies, Inc. PCRecruiter.net - PCRecruiter Support ch...@mainsequence.net P: 440.946.5214 ext 5458 F: 440.856.0312 This email and any files transmitted with it may contain confidential information intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. Main Sequence Technologies, Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net - Original Message - To: java-user@lucene.apache.org, Chris Salem ch...@mainsequence.net From: John Wang john.w...@gmail.com Sent: 7/16/2009 12:09:05 PM Subject: Re: searching for c++, c#, etc... If you escape the character + or #, the sentence: I know java + c++ would not skip +, furthermore, it breaks query parsing, where + is reserved. -John On Thu, Jul 16, 2009 at 9:04 AM, John Wang john.w...@gmail.com wrote: This runs into problems when you have such following sentence: I dislike c++. If you use WSA, then last token is c++., not c++, the query would not find this document. -John On Thu, Jul 16, 2009 at 8:29 AM, Chris Salem ch...@mainsequence.netwrote: That seems to be working. you don't have to escape the pluses though. Also, it appears that the WhitespaceAnalyzer is case sensitive, but I guess I could lowercase everything that gets indexed. thanks alot for your help. Sincerely, Chris Salem Development Team Main Sequence Technologies, Inc. PCRecruiter.net - PCRecruiter Support ch...@mainsequence.net P: 440.946.5214 ext 5458 F: 440.856.0312 This email and any files transmitted with it may contain confidential information intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. Main Sequence Technologies, Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net - Original Message - To: java-user@lucene.apache.org, Chris Salem ch...@mainsequence.net From: Danil TORIN torin...@gmail.com Sent: 7/16/2009 10:28:37 AM Subject: Re: searching for c++, c#, etc... Try WhitespaceAnalyzer for both indexing and searching. On search-time you may also need to escape +, (, ) with \. # shouldn't need escaping. On Thu, Jul 16, 2009 at 17:23, Chris Salemch...@mainsequence.net wrote: I'm using the StandardAnalyzer for both searching and indexing. Here's the code to parse the query: Searcher searcher = new IndexSearcher(reader); Analyzer analyzer = new StandardAnalyzer(stopwords); System.out.println(queryString); QueryParser qp = new QueryParser(searchField,analyzer); Query query = qp.parse(queryString); queryString = query.toString(); System.out.println(queryString); And here's the output from the println's: r2_resume_text:c\+\+ AND r2_resume_text: c\# +r2_resume_text:c +r2_resume_text:c Also the documentation doesn't say anything about # having to be escaped. Do I have to escape during indexing too? Sincerely, Chris Salem - Original Message - To: java-user@lucene.apache.org, Chris Salem ch...@mainsequence.net From: Ian Lea ian@gmail.com Sent: 7/16/2009 5:12:53 AM Subject: Re: searching for c++, c#, etc... Hi Escaping should work. See http://lucene.apache.org/java/2_4_1/queryparsersyntax.html and QueryParser.escape(). And you need to be sure that your analyzer isn't removing the plus signs and that you use the same analyzer for indexing and searching. Googling for something like lucene escape will find you more info. Luke will tell you what is actually in your index. -- Ian. On Wed, Jul 15, 2009 at 5:19 PM, Chris Salemch...@mainsequence.net wrote: Hello, I'm trying to search for the terms like c++ but the parser is stripping off the ++. I tried escaping the ++ with slashes but it's still stripping it off. I could replace + with plus, is that the best way to do it? How come escaping isn't working? thanks Sincerely, Chris Salem
searching for c++, c#, etc...
Hello, I'm trying to search for the terms like c++ but the parser is stripping off the ++. I tried escaping the ++ with slashes but it's still stripping it off. I could replace + with plus, is that the best way to do it? How come escaping isn't working? thanks Sincerely, Chris Salem
Re: LUCENE-1453 not fixed?
oops, the lucene 2.4.0 jar was in the jre/lib/ext directory (I don't remember putting it there). when i updated to lucene 2.4.1 i put the jar in the tomcat/lib directory (which also had the lucene 2.4.0 jar). i deleted the old lucene 2.4.0 jar. changing the code to use FSDirectory instead of the index path string seemed to work though, although it may have blown up if i tested it more extensively. Sorry for wasting your time. Sincerely, Chris Salem - Original Message - To: java-user@lucene.apache.org From: Michael McCandless luc...@mikemccandless.com Sent: 3/19/2009 6:47:39 PM Subject: Re: LUCENE-1453 not fixed? That exception looks like it's from 2.4.0, not 2.4.1. Can you double check your CLASSPATH? Mike Chris Salem wrote: sure. the method that does the reopening of the index is synchronized. it would be possible for in-flight searches to be using the reader, but that wasn't the problem since I was the only one testing it. here's the full exception that was thrown: org.apache.lucene.store.AlreadyClosedException: this Directory is closed at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220) at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320) at org.apache.lucene.index.SegmentInfos $FindSegmentsFile.run(SegmentInfos.java:533) at org .apache .lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366) at org .apache .lucene .index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188) at org .apache .lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java: 124) at net.mainsequence.pcr.lucene.LuceneHandler.reopen(LuceneHandler.java: 450) at net .mainsequence .pcr.lucene.LuceneServlet.searchIndex(LuceneServlet.java:578) at net .mainsequence .pcr.lucene.LuceneServlet.processRequest(LuceneServlet.java:114) at net.mainsequence.pcr.lucene.LuceneServlet.doPost(LuceneServlet.java: 99) at javax.servlet.http.HttpServlet.service(HttpServlet.java:637) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 290) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org .apache .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 233) at org .apache .catalina.core.StandardContextValve.invoke(StandardContextValve.java: 191) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 109) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 286) at org .apache .coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857) at org.apache.coyote.http11.Http11AprProtocol $Http11ConnectionHandler.process(Http11AprProtocol.java:565) at org.apache.tomcat.util.net.AprEndpoint $Worker.run(AprEndpoint.java:1509) at java.lang.Thread.run(Unknown Source) Sincerely, Chris Salem Development Team Main Sequence Technologies, Inc. PCRecruiter.net - PCRecruiter Support ch...@mainsequence.net P: 440.946.5214 ext 5458 F: 440.856.0312 This email and any files transmitted with it may contain confidential information intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. Main Sequence Technologies, Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net - Original Message - To: java-user@lucene.apache.org From: Michael McCandless luc...@mikemccandless.com Sent: 3/19/2009 4:41:35 PM Subject: Re: LUCENE-1453 not fixed? Hmm that's good that it resolves your issue, but not good in that it means the bug may in fact still be there. Can you answer the other questions below? Mike Chris Salem wrote: Changing it to use the FSDirectory instead of the indexPath string seems to work. thanks alot! Sincerely, Chris Salem - Original Message - To: java-user@lucene.apache.org From: Michael McCandless luc...@mikemccandless.com Sent: 3/19/2009 2:17:33 PM Subject: Re: LUCENE-1453 not fixed? Hmm... the code looks OK. Though: can multiple threads call that method at the same time? And: could in-flight searches be using the reader, when you close it? If instead of opening with String indexPath, you pass in an FSDirectory that you opened
LUCENE-1453 not fixed?
I'm using Lucene 2.4.1 and I'm still getting an AlreadyClosedException when trying to reopen an IndexReader. Here's the code I'm using, in case I'm doing something wrong, there isn't an error if I don't close the old reader: String indexPath = C:\\Lucene\\test; IndexReader reader = IndexReader.open(indexPath); ... IndexReader tempReader; try { tempReader = reader.reopen(); if(tempReader != reader){ System.out.println(reader reopened); reader.close(); }else{ System.out.println(reader has not changed); } reader = tempReader; return this; } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } Sincerely, Chris Salem
Re: lucene explanation
That worked perfectly. Thanks alot! Sincerely, Chris Salem - Original Message - To: java-user@lucene.apache.org From: Erick Erickson erickerick...@gmail.com Sent: 12/22/2008 5:00:51 PM Subject: Re: lucene explanation Warning! I'm really reaching on this But it seems you could use TermDocs/TermEnum to good effect here. Basically, you should be able, for a given term, use the above to determine whether doc N had a hit in one of your fields pretty efficiently. There's even a WildcardTermEnum that will iterate over wildcards. Filters are surprisingly fast to construct, so you could use the above to construct a filter on each term for each field. Then determining whether the doc is a hit for a particular field is just a matter of seeing if that bit is on in the relevant filter. Either one should be wy under 30 seconds, although I don't know how big your index is or how encompassing your wildcard searches are... FWIW Erick On Mon, Dec 22, 2008 at 4:48 PM, Chris Salem ch...@mainsequence.net wrote: Hello, I'm wondering what the best way to accomplish this is. When a user enters text to search on it customarily searches 3 fields, resume_text, profile_text, and summary_text, so a standard query would be something like: (resume_text:(query) OR profile_text:(query) OR summary_text:(query)) For each hit (up to 50) I'd like to find out which part of the query matched with the document. Right now I use the Explanation object, here's the code: int len = hits.length(); if(len 50) len = 50; for(int i=0; ilen; i++){ Explanation ex = searcher.explain(Query.parse(resume_text:(query)), hits.id(i)); if(ex.isMatch()) ... ex = searcher.explain(Query.parse(profile_text:(query)), hits.id(i)); if(ex.isMatch()) ... ex = searcher.explain(Query.parse(summary_text:(query)), hits.id(i)); if(ex.isMatch()) ... } This works fine with regular queries, but if someone does a query with a wildcard search times increase to more than 30 seconds. Is there a better way to do this? Thanks Sincerely, Chris Salem
lucene explanation
Hello, I'm wondering what the best way to accomplish this is. When a user enters text to search on it customarily searches 3 fields, resume_text, profile_text, and summary_text, so a standard query would be something like: (resume_text:(query) OR profile_text:(query) OR summary_text:(query)) For each hit (up to 50) I'd like to find out which part of the query matched with the document. Right now I use the Explanation object, here's the code: int len = hits.length(); if(len 50) len = 50; for(int i=0; ilen; i++){ Explanation ex = searcher.explain(Query.parse(resume_text:(query)), hits.id(i)); if(ex.isMatch()) ... ex = searcher.explain(Query.parse(profile_text:(query)), hits.id(i)); if(ex.isMatch()) ... ex = searcher.explain(Query.parse(summary_text:(query)), hits.id(i)); if(ex.isMatch()) ... } This works fine with regular queries, but if someone does a query with a wildcard search times increase to more than 30 seconds. Is there a better way to do this? Thanks Sincerely, Chris Salem
Re: lucene 2.4 sorting slowness
that makes it much faster (100ms after the first run). thanks alot. also, the index will be updated oftenly throughout the day, will keeping the indexreader open recognize updates to the index? Sincerely, Chris Salem Development Team Main Sequence Technologies, Inc. PCRecruiter.net - PCRecruiter Support ch...@mainsequence.net P: 440.946.5214 ext 5458 F: 440.856.0312 This email and any files transmitted with it may contain confidential information intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. Main Sequence Technologies, Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net - Original Message - To: java-user@lucene.apache.org From: Michael McCandless luc...@mikemccandless.com Sent: 12/17/2008 4:46:18 PM Subject: Re: lucene 2.4 sorting slowness Are you warming the searcher first, and then testing the sort performance? (The first query is slow because it populates the FieldCache, internally, which is then reused for subsequent queries as long as you don't close that reader/searcher). Mike Chris Salem wrote: Hello, I have an index with ~400 documents and some 200 fields. Searching without sorting takes around 300 - 500 ms, when sorting on dates (formated as '-mm-dd') searching time takes on average 15 seconds. Here's the code that does the search: hits = searcher.search(query, new Sort(new SortField(slast_modified, false))); and here's how that field is being indexed: doc.add(new Field(slast_modified, 2008-12-12, Field.Store.NO, Field.Index.NOT_ANALYZED)); Am I doing something wrong? Is there a bug in lucene and if so is there a way to work around it so that search speed is increased to something reasonable? Thanks Sincerely, Chris Salem Development Team Main Sequence Technologies, Inc. PCRecruiter.net - PCRecruiter Support ch...@mainsequence.net P: 440.946.5214 ext 5458 F: 440.856.0312 This email and any files transmitted with it may contain confidential information intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. Main Sequence Technologies, Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:ch...@mainsequence.net (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:ch...@mainsequence.net - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:ch...@mainsequence.net (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:ch...@mainsequence.net
lucene 2.4 sorting slowness
Hello, I have an index with ~400 documents and some 200 fields. Searching without sorting takes around 300 - 500 ms, when sorting on dates (formated as '-mm-dd') searching time takes on average 15 seconds. Here's the code that does the search: hits = searcher.search(query, new Sort(new SortField(slast_modified, false))); and here's how that field is being indexed: doc.add(new Field(slast_modified, 2008-12-12, Field.Store.NO, Field.Index.NOT_ANALYZED)); Am I doing something wrong? Is there a bug in lucene and if so is there a way to work around it so that search speed is increased to something reasonable? Thanks Sincerely, Chris Salem Development Team Main Sequence Technologies, Inc. PCRecruiter.net - PCRecruiter Support ch...@mainsequence.net P: 440.946.5214 ext 5458 F: 440.856.0312 This email and any files transmitted with it may contain confidential information intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. Main Sequence Technologies, Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:ch...@mainsequence.net (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:ch...@mainsequence.net
toomanyclauses exception
Hi All, I'm getting a 'TooManyClauses' Exception and I'm not sure how to fix this. Here's a sample query that I'm using: +(+freeform_text:exhibit* +(+freeform_text:dispaly +freeform_text:event*) +(+freeform_text:sale* +freeform_text:sells +freeform_text:develop*) +(+freeform_text:trade +freeform_text:show +freeform_text:trade +freeform_text:shows)) +degree_type:5 +position_desired:ftp +city:washington~0.5 +state:dc +ncountry:usa +last_modified:[2005-12-26 TO 2006-12-26] Here's the exception I'm getting: org.apache.lucene.search.BooleanQuery$TooManyClauses at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:160) at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:151) at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:52) at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372) at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372) at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372) at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:137) at org.apache.lucene.search.Query.weight(Query.java:93) at org.apache.lucene.search.Hits.init(Hits.java:41) at org.apache.lucene.search.Searcher.search(Searcher.java:44) at org.apache.lucene.search.Searcher.search(Searcher.java:36) at net.mainsequence.pcr.lucene.LuceneHandler.multiSearch(LuceneHandler.java:382) at net.mainsequence.pcr.lucene.LuceneServlet.searchIndex(LuceneServlet.java:169) at net.mainsequence.pcr.lucene.LuceneServlet.processRequest(LuceneServlet.java:83) at net.mainsequence.pcr.lucene.LuceneServlet.doPost(LuceneServlet.java:72) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Unknown Source) Is there anyway to increase the amount of clauses lucene can take? This kind of large query is not uncommon so any help would be greatly appreciated. Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED]
spell checker
Does anyone have sample code on how to build a dictionary? I found this article online and but it uses version 1.4.3 and it doesn't seem to work on 2.0.0: http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 Here's the code I have: indexReader = IndexReader.open(originalIndexDirectory); Dictionary dictionary = new LuceneDictionary(indexReader, experience_desired); SpellChecker spellChckr = new SpellChecker(spellIndexDirectory); spellChckr.indexDictionary(dictionary); I'm getting a null pointer exception when I call indexDirectory(). Here's how I index the field experience_desired: doc.add(new Field(experience_desired, value, Field.Store.NO, Field.Index.TOKENIZED)); Is there another way I should do it so there is a way to build a dictionary on that field? Thanks Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] (The following links were included with this email:) http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 mailto:[EMAIL PROTECTED] (The following links were included with this email:) http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 mailto:[EMAIL PROTECTED]
spell checker
Does anyone have sample code on how to build a dictionary? I found this article online and but it uses version 1.4.3 and it doesn't seem to work on 2.0.0: http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 Here's the code I have: indexReader = IndexReader.open(originalIndexDirectory); Dictionary dictionary = new LuceneDictionary(indexReader, experience_desired); SpellChecker spellChckr = new SpellChecker(spellIndexDirectory); spellChckr.indexDictionary(dictionary); I'm getting a null pointer exception when I call indexDirectory(). Here's how I index the field experience_desired: doc.add(new Field(experience_desired, value, Field.Store.NO, Field.Index.TOKENIZED)); Is there another way I should do it so there is a way to build a dictionary on that field? Thanks Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] (The following links were included with this email:) http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 mailto:[EMAIL PROTECTED] (The following links were included with this email:) http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 mailto:[EMAIL PROTECTED]
FWD: Re: parser question
any help with this? Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] - Forwarded Message - To: Mark Miller [EMAIL PROTECTED] From: Chris Salem [EMAIL PROTECTED] Sent: Wed 9/6/2006 3:58:49 PM Subject: Re: parser question its an index of 10 fields and about 10,000 records. Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] - Original Message - To: Chris Salem [EMAIL PROTECTED] From: Mark Miller [EMAIL PROTECTED] Sent: Wed 9/6/2006 2:32:24 PM Subject: Re: parser question What are you using as a test index? - Mark Chris Salem wrote: yes its ANDing them. Doing the query 'software engineer', 'software OR engineer', 'software AND engineer' all return the same results. the generated queries for them respectively are '(field:software field:engineer)', '(field:software field:engineer)' and '(+field:software +field:engineer)'. I do set the default operator to AND and i'm using the MultiFieldQueryParser if that makes a difference (it was doing the same thing with the QueryParser as well). Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] - Original Message - *To:* java-user@lucene.apache.org *From:* Mark Miller [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *Sent:* Wed 9/6/2006 12:57:44 PM *Subject:* Re: parser question Are you sure it is anding them? field:software field:engineer indicates an OR operation. +field:software +field:engineer indicates an AND operation. - Mark Chris Salem wrote: i set the default operator to AND, but if i have a query with an OR in it it doesn't work, for example, if i have the query 'software OR engineer' the parser interprets it as 'field:software field:engineer' and AND's them. how would i fix this? Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] - Original Message - To: java-user@lucene.apache.org From: Mark Miller [EMAIL PROTECTED] Sent: Tue 9/5/2006 5:38:50 PM Subject: Re: parser question QueryParser.setDefaultOperator(Operator op) Chris Salem wrote: With all the parsers I have tried a space in a query, such as doing a search for sales manager, interprets the space as an OR, is there a way to change it so that it interprets a space as an AND? Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
parser question
i set the default operator to AND, but if i have a query with an OR in it it doesn't work, for example, if i have the query 'software OR engineer' the parser interprets it as 'field:software field:engineer' and AND's them. how would i fix this? Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] - Original Message - To: java-user@lucene.apache.org From: Mark Miller [EMAIL PROTECTED] Sent: Tue 9/5/2006 5:38:50 PM Subject: Re: parser question QueryParser.setDefaultOperator(Operator op) Chris Salem wrote: With all the parsers I have tried a space in a query, such as doing a search for sales manager, interprets the space as an OR, is there a way to change it so that it interprets a space as an AND? Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
parser question
With all the parsers I have tried a space in a query, such as doing a search for sales manager, interprets the space as an OR, is there a way to change it so that it interprets a space as an AND? Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED] (The following links were included with this email:) mailto:[EMAIL PROTECTED]