RE: Query syntax on Keyword field question
Great info Morus, After making the escape the dash change to the QueryParser: Query query = QueryParser.parse(+category:HW\\-NCI_TOPICS AND SPACE, description, analyzer); Hits hits = searcher.search(query); System.out.println(query.ToString = + query.toString(description)); assertEquals(HW-NCI_TOPICS kept as-is, +category:HW\\-NCI_TOPICS +space, query.toString(description)); --note that this passes with the escape put in, so not as-is. assertEquals(doc found!, 1, hits.length()); I'm still getting this output: domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW\-NCI_TOPICS +space junit.framework.AssertionFailedError: doc found! expected:1 but was:0 It look like bug, http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 , was fixed today: --- Additional Comments From Otis Gospodnetic mailto:[EMAIL PROTECTED] 2004-03-24 10:10 --- Although tft-monitor should not really result in a phrase query tft monitor, I agree that this is better than converting it to tft AND NOT monitor (tft -monitor). Moreover, I have seen query syntax where '-' characters are used for phrase queries instead or in addition to quotes, so one could use either morus-walter or morus walter. I applied your change, as it doesn't look like it breaks anything, and I hope nobody relied on ill behaviour where tft-monitor would result in AND NOT query. --- But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. thanks, chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 1:43 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Chad Small writes: Here is my attempt at a KeywordAnalyzer - although is not working? Excuse the length of the message, but wanted to give actual code. With this output: Analzying HW-NCI_TOPICS org.apache.lucene.analysis.WhitespaceAnalyzer: [HW-NCI_TOPICS] org.apache.lucene.analysis.SimpleAnalyzer: [hw] [nci] [topics] org.apache.lucene.analysis.StopAnalyzer: [hw] [nci] [topics] org.apache.lucene.analysis.standard.StandardAnalyzer: [hw] [nci] [topics] healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = category:HW -nci topics +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW-NCI_TOPICS +space Actual :category:HW -nci topics +space Well query parser does not allow `-' within words currently. So before your analyzer is called, query parser reads one word HW, a `-' operator, one word NCI_TOPICS. The latter is analyzed as nci topics because it's not in field category anymore, I guess. I suggested to change this. See http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 Either you escape the - using category:HW\-NCI_TOPICS in your query (untested. and I don't know where the escape character will be removed) or you apply my suggested change. Another option for using keywords with query parser might be adding a keyword syntax to the query parser. Something like category:key(HW-NCI_TOPICS) or category=HW-NCI_TOPICS. HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change of the query parser... Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
thanks. I was in the process of getting javacc3.2 setup. I'll have to hunt for 2.x. chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:00 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change of the query parser... Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
For others reference - here is the old version url: https://javacc.dev.java.net/servlets/ProjectDocumentList?folderID=212 -Original Message- From: Chad Small Sent: Wed 3/24/2004 8:07 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question thanks. I was in the process of getting javacc3.2 setup. I'll have to hunt for 2.x. chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:00 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change of the query parser... Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
JavaCC 3.2 works for me. Otis --- Chad Small [EMAIL PROTECTED] wrote: thanks. I was in the process of getting javacc3.2 setup. I'll have to hunt for 2.x. chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:00 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change of the query parser... Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
I'm getting this with 3.2: javacc-check: BUILD FAILED file:D:/applications/lucene-1.3-final/build.xml:97: ## JavaCC not found. JavaCC Home: /applications/javacc-3.2/bin JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar Please download and install JavaCC from: http://javacc.dev.java.net Then, create a build.properties file either in your home directory, or within the Lucene directory and set the javacc.home property to the path where JavaCC is installed. For example, if you installed JavaCC in /usr/local/java/javacc-3.2, then set the javacc.home property to: javacc.home=/usr/local/java/javacc-3.2 If you get an error like the one below, then you have not installed things correctly. Please check all your paths and try again. java.lang.NoClassDefFoundError: org.javacc.parser.Main ## even though I put a build.properties file in my root lucene directory with this in it: javacc.home=/applications/javacc-3.2/bin hmm? -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:29 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question JavaCC 3.2 works for me. Otis --- Chad Small [EMAIL PROTECTED] wrote: thanks. I was in the process of getting javacc3.2 setup. I'll have to hunt for 2.x. chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:00 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change of the query parser... Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
Chad Small writes: I'm getting this with 3.2: javacc-check: BUILD FAILED file:D:/applications/lucene-1.3-final/build.xml:97: ## JavaCC not found. JavaCC Home: /applications/javacc-3.2/bin JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar Please download and install JavaCC from: http://javacc.dev.java.net Then, create a build.properties file either in your home directory, or within the Lucene directory and set the javacc.home property to the path where JavaCC is installed. For example, if you installed JavaCC in /usr/local/java/javacc-3.2, then set the javacc.home property to: javacc.home=/usr/local/java/javacc-3.2 If you get an error like the one below, then you have not installed things correctly. Please check all your paths and try again. java.lang.NoClassDefFoundError: org.javacc.parser.Main ## even though I put a build.properties file in my root lucene directory with this in it: javacc.home=/applications/javacc-3.2/bin I never tried javacc 3.2 but I thought there were issues with query parser and/or standard analyzer. Seems I'm wrong or outdated. In your case the problem seems to be installation of javacc. I guess the /bin directory should not be part of javacc.home. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
Ahh, without the bin on the javacc.home - 3.2 seems to work for me to. -Original Message- From: Chad Small Sent: Wed 3/24/2004 8:34 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question I'm getting this with 3.2: javacc-check: BUILD FAILED file:D:/applications/lucene-1.3-final/build.xml:97: ## JavaCC not found. JavaCC Home: /applications/javacc-3.2/bin JavaCC JAR: D:\applications\javacc-3.2\bin\bin\lib\javacc.jar Please download and install JavaCC from: http://javacc.dev.java.net Then, create a build.properties file either in your home directory, or within the Lucene directory and set the javacc.home property to the path where JavaCC is installed. For example, if you installed JavaCC in /usr/local/java/javacc-3.2, then set the javacc.home property to: javacc.home=/usr/local/java/javacc-3.2 If you get an error like the one below, then you have not installed things correctly. Please check all your paths and try again. java.lang.NoClassDefFoundError: org.javacc.parser.Main ## even though I put a build.properties file in my root lucene directory with this in it: javacc.home=/applications/javacc-3.2/bin hmm? -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:29 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question JavaCC 3.2 works for me. Otis --- Chad Small [EMAIL PROTECTED] wrote: thanks. I was in the process of getting javacc3.2 setup. I'll have to hunt for 2.x. chad. -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Wed 3/24/2004 8:00 AM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Hi Chad, But I assume this fix won't come out for some time. Is there a way I can get this fix sooner? I'm up against a deadline and would very much like this functionality. Just get lucenes sources, change the line and recompile. The difficult part is to get a copy of JavaCC 2 (3 won't do), but I think this can be found in the archives. And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip the escape: protected boolean isTokenChar(char c) { if (c == '\\') { return false; } else { return true; } } The test then returns with a space: healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = +category:HW -NCI_TOPICS +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW\-NCI_TOPICS +space Actual :+category:HW -NCI_TOPICS +space note space where escape was. Sure. If \ isn't a token char, it end's the token. So you will have to look for a different way of implementing the analyzer. Shouldn't be that difficult since you have only one token. Maybe it should be the job of the query parser to remove the escape character (would make more sense to me at least) but that would be another change of the
lucene usage without website
I want to create a knowledgebase but it needs to be something that does not require a server to run constantly (like with using jsp). I just needs to run on the Windows platform. Lucene works well with Windows using an applet right? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: lucene usage without website
Lucene is not dedicated to a special application type. Your can integrate it's fonctionnalities in any program that can invoke java APIs. In particular I don't think that Lucene can be invoked from an applet as the applet API does not permit to read and write local files. -Message d'origine- De : Pleasant, Tracy [mailto:[EMAIL PROTECTED] Envoyé : mercredi 24 mars 2004 17:41 À : Lucene Users List Objet : lucene usage without website I want to create a knowledgebase but it needs to be something that does not require a server to run constantly (like with using jsp). I just needs to run on the Windows platform. Lucene works well with Windows using an applet right? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
multiple indices seacher
Hi, The MultiSearcher 1.3 final keeps throwing exception when rewriting query. java.lang.UnsupportedOperationException org.apache.lucene.search.Query:combine:139 org.apache.lucene.search.MultiSearcher:rewrite:203 I still use the Query object before the rewriting, so the search seems working fine. Does anyone know how to avoid this problem? Thx. I have to call rewrite in order to avoid the cached searcher's I/O problem. Regards, Hui - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
analyzer for word perfect?
Is there an analyzer for WordPerfect files? I have a need to be able to index WP files as well as MS files, pdfs, etc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query syntax on Keyword field question
On Tue, Mar 23, 2004 at 08:48:11PM -0600, Chad Small wrote: Thanks-you Erik and Incze. I now understand the issue and I'm trying to create a KeywordAnalyzer as suggested from you book excerpt, Erik: http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6727 However, not being all that familiar with the Analyzer framework, I'm not sure how to implement the KeywordAnalyzer even though it might be trivial :) Any hints, code, or messages to look at? Actually, what I've written was not an analyzer, but a NotTokenizingTokenizer, as I have a very specia analyzer (different needs for different field catgories) and this is used in that (the code is far from the phase of any kind of optimization, but you can see the logic): --- package hu.emnl.lucene.analyzer; import java.io.IOException; import java.io.Reader; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; public class NotTokenizingTokenizer extends Tokenizer { public NotTokenizingTokenizer() { super(); } public NotTokenizingTokenizer(Reader input) { super(input); } public Token next() throws IOException { Token t = null; int c = input.read(); if (c = 0) { StringBuffer sb = new StringBuffer(); do { sb.append((char) c); c = input.read(); } while (c = 0); t = new Token(new String(sb), 0, sb.length()); } return t; } } --- incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Searching for a phrase that contains quote character
I'd like to search for a phrase that contains the quote character. I've tried escaping the quote character, but am receiving a ParseException from the QueryParser: For example to search for the phrase: this is a test I'm trying the following QueryParser.parse(field:\This is a \\\test, field, new StandardAnalyzer()); This results in: org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 31. Encountered: EOF after : at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:111) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87) ... What is the proper way to accomplish this? --Dan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Cannot access hits
Greetings. I have recently had to re-install my web server. Once completed, however, I cannot get the Lucene search to work. It worked before the crash and it works on my laptop. When conducting searches now, I get the following message: org.apache.cocoon.ProcessingException: Cannot access hits: java.io.IOException: Permission denied for full message see: http://archives.mc.duke.edu/search?queryString=Davison Can anyone suggest a place to start looking to add the correct permissions? Thank you, Russell - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Changes to QueryParser.jj: Status?
Dear All, Some time ago there was a discussion on modifying the definitions of tokens in QueryParser so that the character '-' (dash), and others, will be treated as part of a word. Can someone please tell me the status of that discussion. Will these changes actually be reflected in the code...soon? Thanks, -- Ravi/ PS: The title of the thread in the previous discussion was 'Problem with search results' Ravi(ndra) Rao AlterPoint Inc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Cannot access hits
The source of your problem is simple UNIX permission: java.io.IOException: Permission denied at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:688) at org.apache.lucene.store.FSDirectory$1.obtain(Unknown Source) Figure out what directory Java's java.io.tmpdir system property points to, and make sure that directory is writable by the user that runs that Tomcat server. Otis --- Russell S Koonts [EMAIL PROTECTED] wrote: Greetings. I have recently had to re-install my web server. Once completed, however, I cannot get the Lucene search to work. It worked before the crash and it works on my laptop. When conducting searches now, I get the following message: org.apache.cocoon.ProcessingException: Cannot access hits: java.io.IOException: Permission denied for full message see: http://archives.mc.duke.edu/search?queryString=Davison Can anyone suggest a place to start looking to add the correct permissions? Thank you, Russell - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: analyzer for word perfect?
I just finished writing a chapter for Lucene in Action that deals with that. PDF: pdfbox.org MS Word/Excel: jakarta.apache.org/poi WP: http://www.google.com/search?q=java+word+perfect+parser Note that what you need are parsers. The term Analyzer has a special meaning in Lucene realm. Otis --- Charlie Smith [EMAIL PROTECTED] wrote: Is there an analyzer for WordPerfect files? I have a need to be able to index WP files as well as MS files, pdfs, etc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Changes to QueryParser.jj: Status?
I committed those changes to CVS today. There is a bug entry in Bugzilla from Morus Walter, which is now marked as fixed. Otis --- Ravi Rao [EMAIL PROTECTED] wrote: Dear All, Some time ago there was a discussion on modifying the definitions of tokens in QueryParser so that the character '-' (dash), and others, will be treated as part of a word. Can someone please tell me the status of that discussion. Will these changes actually be reflected in the code...soon? Thanks, -- Ravi/ PS: The title of the thread in the previous discussion was 'Problem with search results' Ravi(ndra) Rao AlterPoint Inc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Zero hits for queries ending with a number
Thanks to Otis, Morus, and Erik for their responses to my question. I see that my question is also related to the posting: Query syntax on Keyword field question. I tried all of your suggestions. When using: a) the tokens generated by the analyzer and b) the parsed query (using the to_string method). to debug StandardAnalyzer, I saw that it does properly pass in the string with the number attached to it. I don't understand why Field.Text did not work with StandardAnalyzer. I tried WhitespaceAnalyzer and that did not work. I have tried implementing a custom analyzer like KeywordAnalyzer, and using PerFieldAnalyzerWrapper. I think the custom analyzer I created is not properly doing what a KeywordAnalyzer would do. Erik, could you please post what KeywordAnalyzer should look like? I can't wait until the book you guys are developing comes out. Thanks very much. Morris -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Saturday, March 13, 2004 3:14 AM To: Lucene Users List Subject: Re: Zero hits for queries ending with a number On Mar 13, 2004, at 6:02 AM, Morus Walter wrote: Otis Gospodnetic writes: Field.Keyword is suitable for storing data like Url. Give that a try. Hmm. I don't think keyword fields can be used with query parser, which is probably one of the problems here. He did try keyword fields. Look in the archives for KeywordAnalyzer (custom) and PerFieldAnalyzerWrapper (built-in) using a combination of these you can use keyword fields. Or, first try just using WhitespaceAnalyzer. It is almost always the analyzer that is the cause of confusion - folks just get lulled into forgetting about its role because Lucene is so easy to use... until this type of issue bites you. It is a wacky combination though - and notorious for causing confusion. Perhaps someone could create a wiki page for this scenario where we can flesh out examples/solutions? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
possible parse problem
I get distinctly different results (java exception versus request completion) for two queries: this AND is this OR is I realize these are dumb queries, but they illustrate the problem. The first gets: error: java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.Vector.elementAt(Vector.java:434) at org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java:181) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:493) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:525) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:464) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:108) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87) at ((MY CODE)) the second finds no results. Used the latest stable release downloaded today, 1,3 final. Please accept this as an observation on a surprise, not a complaint. Thanks Bill This communication is intended solely for the addressee and is confidential and not for third party unauthorized distribution.
Re: Zero hits for queries ending with a number
On Mar 24, 2004, at 5:58 PM, Morris Mizrahi wrote: I think the custom analyzer I created is not properly doing what a KeywordAnalyzer would do. Erik, could you please post what KeywordAnalyzer should look like? It should simply tokenize the entire input as a single token. Incze Lajos posted a NonTokenizingTokenizer early today, in fact, that does the trick. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Zero hits for queries ending with a number
Thanks Erik and Incze. Sorry for this lengthy post. Here is the class: import org.apache.lucene.analysis.*; import org.apache.lucene.analysis.standard.StandardFilter; import java.io.Reader; import java.util.Hashtable; public class KeywordAnalyzer extends Analyzer { public static final String[] STOP_WORDS = StopAnalyzer.ENGLISH_STOP_WORDS; private Hashtable stopTable; public KeywordAnalyzer() { this(STOP_WORDS); } public KeywordAnalyzer(String[] stopWords) { stopTable = StopFilter.makeStopTable(stopWords); } public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new NotTokenizingTokenizer(reader); result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result, stopTable); return result; } } I have retried everything with the new KeywordAnalyzer class, PerFieldAnalyzerWrapper, and with Field.Keyword. I don't get results for any searches, it doesn't even matter whether there is a number at the end or not. Using query.toString(url): Query query = QueryParser.parse(terms, contents, analyzer); logger.info(search method: query.toString for url= + query.toString(url)); I can see what the analyzer is searching for. How do I determine what is the value stored in the index by Field.Keyword? I've tried: doc.add(Field.Keyword(url, url)); System.out.println(url: doc toString method= + doc.toString()); But I don't know if this is the correct value that is compared with what the analyzer sends in. Thanks for the help. Morris -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 24, 2004 4:45 PM To: Lucene Users List Subject: Re: Zero hits for queries ending with a number On Mar 24, 2004, at 5:58 PM, Morris Mizrahi wrote: I think the custom analyzer I created is not properly doing what a KeywordAnalyzer would do. Erik, could you please post what KeywordAnalyzer should look like? It should simply tokenize the entire input as a single token. Incze Lajos posted a NonTokenizingTokenizer early today, in fact, that does the trick. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
How to order search results by Field value?
Was there any conclusion to message: http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6762 Regarding Ordering by a Field? I have a similar need and didn't see the resolusion in that thread. Is it a current patch to the 1.3-final, I could see one? My other option, I guess, is just to code a comparator on a collection build off of the Hits. thanks, chad.
Re: How to order search results by Field value?
Chad, Was there any conclusion to message: http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=6762 Regarding Ordering by a Field? I have a similar need and didn't see the resolusion in that thread. Is it a current patch to the 1.3-final, I could see one? You can see the resolution in the latest CVS ;-) yo My other option, I guess, is just to code a comparator on a collection build off of the Hits. thanks, chad. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]