Hello, I am not sure if this is more relevant for the Nutch-User list. I felt that the nutch developers should be aware of this issue.
A patch was submitted in jira - http://issues.apache.org/jira/browse/NUTCH-479. I used this patch to fix NutchAnalyze.jj and OR query works in most cases. However, I have noticed a small bug with the code. The code breaks when an OR is followed by: 1) a whitespace and no other search terms after that - eg. patent OR http://mysite.com/search.jsp?lang=en&query=patent+OR+ 2) there is nothing else after the OR operator: http://mysite.com/search.jsp?lang=en&query=patent+OR 3) If OR is the only search term http://mysite.com/search.jsp?lang=en&query=OR 4) OR+, OR_ OR- , basically OR with any trailing characters. http://mysite.com/search.jsp?lang=en&query=OR- http://mysite.com/search.jsp?lang=en&query=OR+ I get this error message from tomcat: java.io.IOException: Parse exception: org.apache.nutch.analysis.ParseException: Encountered "<EOF>" at line 1, column 12. Was expecting one of: <WORD> ... <ACRONYM> ... <SIGRAM> ... "\"" ... <WHITE> ... ":" ... "/" ... "." ... "@" ... "\'" ... "+" ... "-" ... org.apache.nutch.analysis.NutchAnalysis.parseQuery(NutchAnalysis.java:62) org.apache.nutch.searcher.Query.parse(Query.java:468) org.apache.jsp.search_jsp._jspService(search_jsp.java:172) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) javax.servlet.http.HttpServlet.service(HttpServlet.java:803) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) javax.servlet.http.HttpServlet.service(HttpServlet.java:803) If there are any suggestions or pointers on how to fix this, that will be great. Thanks. -- View this message in context: http://www.nabble.com/Bug-in-NutchAnalysis.java-tp17261004p17261004.html Sent from the Nutch - Dev mailing list archive at Nabble.com.