[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756995#action_12756995 ] Jason Rutherglen commented on SOLR-908: --- It looks like our problem could be due to Analyzer.reusableTokenStream and how it reuses tokenstreams from a thread local variable. This would explain the random behavior (i.e. depending on the thread one was assigned for a query, the associated token stream, if it were in an invalid state, would return incorrect results). I'm thinking reusableTokenStream can be overridden to return a new token stream each time? And so bypass whatever reseting issue is occurring from the mixture of the old and new tokenizer APIs. Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream others
[ https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1423. -- Resolution: Fixed Committed revision 816502. Thanks, Uwe! Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream others Key: SOLR-1423 URL: https://issues.apache.org/jira/browse/SOLR-1423 Project: Solr Issue Type: Task Components: Analysis Affects Versions: 1.4 Reporter: Uwe Schindler Assignee: Koji Sekiguchi Fix For: 1.4 Attachments: SOLR-1423-FieldType.patch, SOLR-1423-fix-empty-tokens.patch, SOLR-1423-fix-empty-tokens.patch, SOLR-1423-with-empty-tokens.patch, SOLR-1423.patch, SOLR-1423.patch, SOLR-1423.patch Because of some backwards compatibility problems (LUCENE-1906) we changed the CharStream/CharFilter API a little bit. Tokenizer now only has a input field of type java.io.Reader (as before the CharStream code). To correct offsets, it is now needed to call the Tokenizer.correctOffset(int) method, which delegates to the CharStream (if input is subclass of CharStream), else returns an uncorrected offset. Normally it is enough to change all occurences of input.correctOffset() to this.correctOffset() in Tokenizers. It should also be checked, if custom Tokenizers in Solr do correct their offsets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Solr nightly build failure
init-forrest-entities: [mkdir] Created dir: /tmp/apache-solr-nightly/build [mkdir] Created dir: /tmp/apache-solr-nightly/build/web compile-solrj: [mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj [javac] Compiling 86 source files to /tmp/apache-solr-nightly/build/solrj [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compile: [mkdir] Created dir: /tmp/apache-solr-nightly/build/solr [javac] Compiling 382 source files to /tmp/apache-solr-nightly/build/solr [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compileTests: [mkdir] Created dir: /tmp/apache-solr-nightly/build/tests [javac] Compiling 170 source files to /tmp/apache-solr-nightly/build/tests [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. solr-cell-example: init: [mkdir] Created dir: /tmp/apache-solr-nightly/contrib/extraction/build/classes [mkdir] Created dir: /tmp/apache-solr-nightly/build/docs/api init-forrest-entities: compile-solrj: compile: [javac] Compiling 1 source file to /tmp/apache-solr-nightly/build/solr [javac] Note: /tmp/apache-solr-nightly/src/java/org/apache/solr/search/DocSetHitCollector.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. make-manifest: [mkdir] Created dir: /tmp/apache-solr-nightly/build/META-INF compile: [javac] Compiling 6 source files to /tmp/apache-solr-nightly/contrib/extraction/build/classes [javac] Note: /tmp/apache-solr-nightly/contrib/extraction/src/main/java/org/apache/solr/handler/extraction/ExtractingDocumentLoader.java uses unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. build: [jar] Building jar: /tmp/apache-solr-nightly/contrib/extraction/build/apache-solr-cell-nightly.jar example: [copy] Copying 1 file to /tmp/apache-solr-nightly/example/solr/lib [copy] Copying 26 files to /tmp/apache-solr-nightly/example/solr/lib junit: [mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results [junit] Running org.apache.solr.BasicFunctionalityTest [junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 41.498 sec [junit] Running org.apache.solr.ConvertedLegacyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 24.701 sec [junit] Running org.apache.solr.DisMaxRequestHandlerTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 19.198 sec [junit] Running org.apache.solr.EchoParamsTest [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 6.827 sec [junit] Running org.apache.solr.MinimalSchemaTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 8.555 sec [junit] Running org.apache.solr.OutputWriterTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 12.662 sec [junit] Running org.apache.solr.SampleTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 7.934 sec [junit] Running org.apache.solr.SolrInfoMBeanTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.359 sec [junit] Running org.apache.solr.TestDistributedSearch [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 111.856 sec [junit] Running org.apache.solr.TestPluginEnable [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.218 sec [junit] Running org.apache.solr.TestSolrCoreProperties [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.499 sec [junit] Running org.apache.solr.TestTrie [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 20.647 sec [junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterFactoryTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.097 sec [junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterTest [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.652 sec [junit] Running org.apache.solr.analysis.EnglishPorterFilterFactoryTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.653 sec [junit] Running org.apache.solr.analysis.HTMLStripCharFilterTest [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 5.37 sec [junit] Running org.apache.solr.analysis.LengthFilterTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.497 sec [junit] Running
Hudson build is back to normal: Solr-trunk #928
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/928/changes
[jira] Updated: (SOLR-1437) DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.
[ https://issues.apache.org/jira/browse/SOLR-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fergus McMenemie updated SOLR-1437: --- Attachment: SOLR-1437.patch Good to see you reuse your own code! This new patch is the same as the previous version excepting that the references to SOLR and datasource etc have been rewritten. Also, Noble, can you check over and review my comments around line 237 in the file XPathRecordReader.java. Is this correct? {code} } else { // can we ever get here? This means we are collecting for an Xpath // that is outwith any forEach expression if (attributes != null || hasText) valuesAddedinThisFrame = new HashSetString(); stack.push(valuesAddedinThisFrame); } {code} DIH: Enhance XPathRecordReader to deal with //tagname and other improvments. Key: SOLR-1437 URL: https://issues.apache.org/jira/browse/SOLR-1437 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Fergus McMenemie Assignee: Noble Paul Priority: Minor Fix For: 1.5 Attachments: SOLR-1437.patch, SOLR-1437.patch Original Estimate: 672h Remaining Estimate: 672h As per http://www.nabble.com/Re%3A-Extract-info-from-parent-node-during-data-import-%28redirect%3A%29-td25471162.html it would be nice to be able to use expressions such as //tagname when parsing XML documents. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1437) DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.
[ https://issues.apache.org/jira/browse/SOLR-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757091#action_12757091 ] Noble Paul commented on SOLR-1437: -- committed r816577 thanks Fergus DIH: Enhance XPathRecordReader to deal with //tagname and other improvments. Key: SOLR-1437 URL: https://issues.apache.org/jira/browse/SOLR-1437 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Fergus McMenemie Assignee: Noble Paul Priority: Minor Fix For: 1.5 Attachments: SOLR-1437.patch, SOLR-1437.patch Original Estimate: 672h Remaining Estimate: 672h As per http://www.nabble.com/Re%3A-Extract-info-from-parent-node-during-data-import-%28redirect%3A%29-td25471162.html it would be nice to be able to use expressions such as //tagname when parsing XML documents. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-758) Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies.
[ https://issues.apache.org/jira/browse/SOLR-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757106#action_12757106 ] Simon Lachinger commented on SOLR-758: -- First of all thanks for providing wildcard matching for the dismax query handler, that is exactly what I need. However, the WILDCARD_STRIP_CHARS regex in UserQParser.java does not work with umlauts which makes the patch useless for languages like ie. German. I will attach a diff file with the changes I have made to get it working with umlauts. Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies. - Key: SOLR-758 URL: https://issues.apache.org/jira/browse/SOLR-758 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3 Reporter: David Smiley Fix For: 1.5 Attachments: AdvancedQParserPlugin.java, AdvancedQParserPlugin.java, DisMaxQParserPlugin.java, DisMaxQParserPlugin.java, UserQParser.java, UserQParser.java The DisMaxQParserPlugin has a variety of nice features; chief among them is that is uses the DisjunctionMaxQueryParser. However it imposes limitations on the syntax. I've enhanced the DisMax QParser plugin to use a pluggable query string re-writer (via subclass extension) instead of hard-coding the logic currently embedded within it (i.e. the escape nearly everything logic). Additionally, I've made this QParser have a notion of a simple syntax (the default) or non-simple in which case some of the logic in this QParser doesn't occur because it's irrelevant (phrase boosting and min-should-max in particular). As part of my work I significantly moved the code around to make it clearer and more extensible. I also chose to rename it to suggest it's role as a parser for user queries. Attachment to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-758) Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies.
[ https://issues.apache.org/jira/browse/SOLR-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Lachinger updated SOLR-758: - Attachment: UserQParser.java-umlauts.patch Making the UserQParser.java work with umlauts and other special characters. Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies. - Key: SOLR-758 URL: https://issues.apache.org/jira/browse/SOLR-758 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3 Reporter: David Smiley Fix For: 1.5 Attachments: AdvancedQParserPlugin.java, AdvancedQParserPlugin.java, DisMaxQParserPlugin.java, DisMaxQParserPlugin.java, UserQParser.java, UserQParser.java, UserQParser.java-umlauts.patch The DisMaxQParserPlugin has a variety of nice features; chief among them is that is uses the DisjunctionMaxQueryParser. However it imposes limitations on the syntax. I've enhanced the DisMax QParser plugin to use a pluggable query string re-writer (via subclass extension) instead of hard-coding the logic currently embedded within it (i.e. the escape nearly everything logic). Additionally, I've made this QParser have a notion of a simple syntax (the default) or non-simple in which case some of the logic in this QParser doesn't occur because it's irrelevant (phrase boosting and min-should-max in particular). As part of my work I significantly moved the code around to make it clearer and more extensible. I also chose to rename it to suggest it's role as a parser for user queries. Attachment to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
acts_as_solr integeration with solr separately
Hi, I have setup solr search server in tomcat. I am able to fire queries(of any knid) get results in xml format. Now i want to Integerate it(solr) with ruby on rails . I know ruby on rails has inbuilt plugin acts_as_solr which helps in integerating(talking) with solr. acts_as_solr comes bundled with solr web application with jetty server. But i don't wanna use this inbuilt solr web application . e.g. i don't wanna do rake solr:start. I am running solr as different search server in tomcat at port 8983.(url http://localhost:8983/solr/ all other urls are listening) Now, I want to talk to this solr server (separate) using acts_as_solr plugin. Questions: 1)Can anybody point me how to do this? Any tutorial ? 2)What changes I had to make in acts_as_solr plugin? 3)Any good pointers(urls) will be appreciated... Regards Abhay
[jira] Commented: (SOLR-758) Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies.
[ https://issues.apache.org/jira/browse/SOLR-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757139#action_12757139 ] David Smiley commented on SOLR-758: --- Thanks for the update Simon. I forget you can do things like \w within a regex character class -- [...] Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies. - Key: SOLR-758 URL: https://issues.apache.org/jira/browse/SOLR-758 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3 Reporter: David Smiley Fix For: 1.5 Attachments: AdvancedQParserPlugin.java, AdvancedQParserPlugin.java, DisMaxQParserPlugin.java, DisMaxQParserPlugin.java, UserQParser.java, UserQParser.java, UserQParser.java-umlauts.patch The DisMaxQParserPlugin has a variety of nice features; chief among them is that is uses the DisjunctionMaxQueryParser. However it imposes limitations on the syntax. I've enhanced the DisMax QParser plugin to use a pluggable query string re-writer (via subclass extension) instead of hard-coding the logic currently embedded within it (i.e. the escape nearly everything logic). Additionally, I've made this QParser have a notion of a simple syntax (the default) or non-simple in which case some of the logic in this QParser doesn't occur because it's irrelevant (phrase boosting and min-should-max in particular). As part of my work I significantly moved the code around to make it clearer and more extensible. I also chose to rename it to suggest it's role as a parser for user queries. Attachment to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1445) Leading term in a multi-word synonym replaced with the token that follows it
[ https://issues.apache.org/jira/browse/SOLR-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-1445: -- Fix Version/s: 1.4 Leading term in a multi-word synonym replaced with the token that follows it Key: SOLR-1445 URL: https://issues.apache.org/jira/browse/SOLR-1445 Project: Solr Issue Type: Bug Components: Analysis Affects Versions: 1.4 Environment: Solr 1.4 nightly (09/14/2009) Reporter: Gregg Donovan Fix For: 1.4 Attachments: TestMultiWordSynonmys.java I'm running into an odd issue with multi-word synonyms. Things generally seem to work as expected, but I sometimes see words that are the leading term in a multi-word synonym being replaced with the token that follows them in the stream when they should just be ignored (i.e. there's no synonym match for just that token). When I preview the analysis at admin/analysis.jsp it looks fine, but at runtime I see problems like the one in the attached unit test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: svn commit: r816202 - in /lucene/solr/trunk/src: java/org/apache/solr/schema/ java/org/apache/solr/search/ java/org/apache/solr/search/function/ test/org/apache/solr/search/function/
2009/9/17 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: do we have some type info for the context param in SolrFilter#createWeight(Map context, Searcher searcher) Nope... it's specifically opaque so we don't have to change it down the road or force the creation of custom weight classes just to store extra info, or force the creation of a fake/custom ValueSorce just to use a different key. -Yonik http://www.lucidimagination.com
[jira] Commented: (SOLR-1427) SearchComponents aren't listed on registry.jsp
[ https://issues.apache.org/jira/browse/SOLR-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757185#action_12757185 ] Grant Ingersoll commented on SOLR-1427: --- I'm guessing the problem is most likely in loading the SearchComponents, not in the SolrResourceLoader. The reason being what Yonik said in that the core is not ready yet at that point. Also, need to address the possible double loading in SolrResourceLoader. SearchComponents aren't listed on registry.jsp -- Key: SOLR-1427 URL: https://issues.apache.org/jira/browse/SOLR-1427 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Grant Ingersoll Priority: Minor Fix For: 1.4 Attachments: SOLR-1427.patch, SOLR-1427.patch SearchComponent implements SolrInfoMBean using getCategory() of OTHER but they aren't listed on the registry.jsp display of loaded plugins. This may be a one-of-glitch because of the way SearchComponents get loaded, or it may indicate some other problem with the infoRegistry. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1427) SearchComponents aren't listed on registry.jsp
[ https://issues.apache.org/jira/browse/SOLR-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757195#action_12757195 ] Grant Ingersoll commented on SOLR-1427: --- Hoss, where in the SolrResourceLoader do you see other puts into the infoRegistry happening? SearchComponents aren't listed on registry.jsp -- Key: SOLR-1427 URL: https://issues.apache.org/jira/browse/SOLR-1427 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Grant Ingersoll Priority: Minor Fix For: 1.4 Attachments: SOLR-1427.patch, SOLR-1427.patch SearchComponent implements SolrInfoMBean using getCategory() of OTHER but they aren't listed on the registry.jsp display of loaded plugins. This may be a one-of-glitch because of the way SearchComponents get loaded, or it may indicate some other problem with the infoRegistry. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
NPE
Anyone else seeing: SEVERE: java.lang.NullPointerException at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java: 761) at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java: 619) at org.apache.solr.schema.TextField.write(TextField.java:45) at org.apache.solr.schema.SchemaField.write(SchemaField.java: 108) at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java: 311) at org.apache.solr.request.XMLWriter$3.writeDocs (XMLWriter.java:483) at org.apache.solr.request.XMLWriter.writeDocuments (XMLWriter.java:420) at org.apache.solr.request.XMLWriter.writeDocList (XMLWriter.java:457) at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java: 520) at org.apache.solr.request.XMLWriter.writeResponse (XMLWriter.java:130) at org.apache.solr.request.XMLResponseWriter.write (XMLResponseWriter.java:34) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse (SolrDispatchFilter.java:325) at org.apache.solr.servlet.SolrDispatchFilter.doFilter (SolrDispatchFilter.java:254) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) When running the example and doing a simple query?
[jira] Updated: (SOLR-1427) SearchComponents aren't listed on registry.jsp
[ https://issues.apache.org/jira/browse/SOLR-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1427: -- Attachment: SOLR-1427.patch Patch that defers registering the components until later. I can't reproduce the problem, so this is just a educated guess. SearchComponents aren't listed on registry.jsp -- Key: SOLR-1427 URL: https://issues.apache.org/jira/browse/SOLR-1427 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Grant Ingersoll Priority: Minor Fix For: 1.4 Attachments: SOLR-1427.patch, SOLR-1427.patch, SOLR-1427.patch SearchComponent implements SolrInfoMBean using getCategory() of OTHER but they aren't listed on the registry.jsp display of loaded plugins. This may be a one-of-glitch because of the way SearchComponents get loaded, or it may indicate some other problem with the infoRegistry. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: NPE
Never mind. Operator error. On Sep 18, 2009, at 8:15 AM, Grant Ingersoll wrote: Anyone else seeing: SEVERE: java.lang.NullPointerException at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java: 761) at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java: 619) at org.apache.solr.schema.TextField.write(TextField.java:45) at org.apache.solr.schema.SchemaField.write(SchemaField.java: 108) at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java: 311) at org.apache.solr.request.XMLWriter$3.writeDocs (XMLWriter.java:483) at org.apache.solr.request.XMLWriter.writeDocuments (XMLWriter.java:420) at org.apache.solr.request.XMLWriter.writeDocList (XMLWriter.java:457) at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java: 520) at org.apache.solr.request.XMLWriter.writeResponse (XMLWriter.java:130) at org.apache.solr.request.XMLResponseWriter.write (XMLResponseWriter.java:34) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse (SolrDispatchFilter.java:325) at org.apache.solr.servlet.SolrDispatchFilter.doFilter (SolrDispatchFilter.java:254) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) When running the example and doing a simple query?
Re: NPE
Looks like one of the hazards of changing the schema w/o deleting the index and re-indexing. I bet this field was something like a numeric type that would return null from Field.getStringValue() and then it was changed to a text type. -Yonik http://www.lucidimagination.com On Fri, Sep 18, 2009 at 11:15 AM, Grant Ingersoll gsing...@apache.org wrote: Anyone else seeing: SEVERE: java.lang.NullPointerException at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java:761) at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java:619) at org.apache.solr.schema.TextField.write(TextField.java:45) at org.apache.solr.schema.SchemaField.write(SchemaField.java:108) at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java:311) at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:483) at org.apache.solr.request.XMLWriter.writeDocuments(XMLWriter.java:420) at org.apache.solr.request.XMLWriter.writeDocList(XMLWriter.java:457) at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java:520) at org.apache.solr.request.XMLWriter.writeResponse(XMLWriter.java:130) at org.apache.solr.request.XMLResponseWriter.write(XMLResponseWriter.java:34) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:325) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) When running the example and doing a simple query?
[jira] Commented: (SOLR-1294) SolrJS/Javascript client fails in IE8!
[ https://issues.apache.org/jira/browse/SOLR-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757235#action_12757235 ] Alex Dergachev commented on SOLR-1294: -- Hi guys... we have worked extensively at integrating solrjs and drupal over the last few months, and have had to rewrite much of the code to fix bugs and allow extensibility.We're hoping to release our fork in the coming weeks, at this URL: http://drupal.org/project/solrjs Because we're sticking closely to the original solrjs model--javascript that communicates directly with solr, we're hoping to eventually merge the two branches, and have brought up the possibility with Matthias Epheser. Solrjs is a killer app, and solr user we talked to to is incredibly excited about it. However, given that the current code base is very alpha, I don't think a few browser bugs with solrjs should hold up the release of solr 1.4. Regards, Alex Dergachev Co-founder, Evolving Web http://evolvingweb.ca SolrJS/Javascript client fails in IE8! -- Key: SOLR-1294 URL: https://issues.apache.org/jira/browse/SOLR-1294 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Eric Pugh Assignee: Ryan McKinley Fix For: 1.4 Attachments: SOLR-1294-IE8.patch, SOLR-1294.patch, solrjs-ie8-html-syntax-error.patch SolrJS seems to fail with 'jQuery.solrjs' is null or not an object errors under IE8. I am continuing to test if this occurs in IE 6 and 7 as well. This happens on both the Sample online site at http://solrjs.solrstuff.org/test/reuters/ as well as the /trunk/contrib/javascript library. Seems to be a show stopper from the standpoint of really using this library! I have posted a screenshot of the error at http://img.skitch.com/20090717-jejm71u6ghf2dpn3mwrkarigwm.png The error is just a whole bunch of repeated messages in the vein of: Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/QueryItem.js Message: 'jQuery.solrjs' is null or not an object Line: 37 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/Manager.js Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractSelectionView.js Message: 'jQuery.solrjs' is null or not an object Line: 27 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractWidget.js -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: NPE
Grant Ingersoll wrote: Anyone else seeing: SEVERE: java.lang.NullPointerException at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java:761) I saw that symptom when schema seriously didn't match the index (e.g. schema didn't specify field type and then XMLWriter assumes Text, or schemna specified a stored field, whereas the index had the same field unstored). -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757267#action_12757267 ] Yonik Seeley commented on SOLR-908: --- Jason, at a quick look, I see that this filter maintains state, but doesn't implement reset() - could that be the issue? Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757271#action_12757271 ] Robert Muir commented on SOLR-908: -- just my opinion, do not think this problem is due to mixed tokenizer APIs (LUCENE-1919) this is because this BufferedTokenStream does not mix the apis that cause that issue... it only uses TokenStream.next() i think instead Yonik might be on the right track, could be wrong. Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757344#action_12757344 ] Uwe Schindler commented on SOLR-908: In my opinion, the problem is BufferedTokenStream (should its name not BufferedTokenFilter?). It has the linked list but does not implement reset(). So the problem is not this issue, more the usage of reset because you reuse the token stream. As long as BufferedTokenStream is not fixed to support reset() you have to create new instances. Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1446) BufferedTokenStream keeps state, but does not implement reset
BufferedTokenStream keeps state, but does not implement reset - Key: SOLR-1446 URL: https://issues.apache.org/jira/browse/SOLR-1446 Project: Solr Issue Type: Bug Components: Analysis Reporter: Robert Muir Priority: Minor Attachments: SOLR-1446.patch BufferedTokenStream needs a reset() impl that clears its internal lists. otherwise, there could be problems when using reusable tokenstreams. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1446) BufferedTokenStream keeps state, but does not implement reset
[ https://issues.apache.org/jira/browse/SOLR-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-1446: -- Attachment: SOLR-1446.patch BufferedTokenStream keeps state, but does not implement reset - Key: SOLR-1446 URL: https://issues.apache.org/jira/browse/SOLR-1446 Project: Solr Issue Type: Bug Components: Analysis Reporter: Robert Muir Priority: Minor Attachments: SOLR-1446.patch BufferedTokenStream needs a reset() impl that clears its internal lists. otherwise, there could be problems when using reusable tokenstreams. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757364#action_12757364 ] Robert Muir commented on SOLR-908: -- Uwe, i opened an issue for this: SOLR-1446 i think even if not the cause of this problem, BufferedTokenStream should implement reset() since it keeps internal state. Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757381#action_12757381 ] Robert Muir commented on SOLR-908: -- similar to the BufferedTokenStream reset, the CommonGramsQueryFilter here has its own internal state: {code} private Token prev; {code} so this filter too should implement reset (and must call super.reset() so the BufferedTokenStream lists get reset too). Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1447) Simple property injection
Simple property injection -- Key: SOLR-1447 URL: https://issues.apache.org/jira/browse/SOLR-1447 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 MergePolicy and MergeScheduler require property injection. We'll allow these and probably other cases in this patch using Java reflection. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757413#action_12757413 ] Jason Rutherglen commented on SOLR-908: --- Interesting, the whole reusableTokenStream model is new to me, so it wasn't in my mental view of how Lucene analyzers work. It seems if BTS is caching tokens, then being reused, and isn't reset, then there would be excess tokens instead of deletions? Or perhaps the reset is being called from another analyzer? It's quite confusing. I started work on a LoggingTokenizer that could be inserted between tokenizers in the Solr schema, however have been working on reproducing the issue (which hasn't worked either). Uwe, Yonik, and Robert, thanks for taking a look! Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757432#action_12757432 ] Robert Muir commented on SOLR-908: -- {quote} It seems if BTS is caching tokens, then being reused, and isn't reset, then there would be excess tokens instead of deletions? {quote} right, thats what the test case I added for BufferedTokenStream showed. this would be more of a corner case, as i think most BufferedTokenStreams would have empty lists anyway by the time they are reset(), so its likely not causing your problem (though it should be fixed!) your problem, again is probably the internal state kept in CommonGramsQueryFilter as you can see, CommonGramsQueryFilter has hairy logic involving the buffered token 'prev' a lot of this logic has to do with what happens at end of stream. unfortunately there is no reset() for CommonGramsQueryFilter to set 'prev' back to its initial state, so when something like QueryParser tries to reuse it, it is probably not behaving correctly. Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1444) Add option in solrconfig.xml to override the LogMergePolicy calibrateSizeByDeletes
[ https://issues.apache.org/jira/browse/SOLR-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757431#action_12757431 ] Jason Rutherglen commented on SOLR-1444: I think this is barking up the wrong path, I think we'll want to support any setter methods a class has to offer. I opened an issue to address this SOLR-1447 Otherwise we're writing custom code for each config class? Add option in solrconfig.xml to override the LogMergePolicy calibrateSizeByDeletes Key: SOLR-1444 URL: https://issues.apache.org/jira/browse/SOLR-1444 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Environment: NA Reporter: Jibo John Priority: Minor A patch was committed in lucene (http://issues.apache.org/jira/browse/LUCENE-1634) that would consider the number of deleted documents as the criteria when deciding which segments to merge. By default, calibrateSizeByDeletes = false in LogMergePolicy. So, currently, there is no way in Solr to set calibrateSizeByDeletes = true. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1446) BufferedTokenStream keeps state, but does not implement reset
[ https://issues.apache.org/jira/browse/SOLR-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-1446. Resolution: Fixed Fix Version/s: 1.4 I had missed that one... Thanks! BufferedTokenStream keeps state, but does not implement reset - Key: SOLR-1446 URL: https://issues.apache.org/jira/browse/SOLR-1446 Project: Solr Issue Type: Bug Components: Analysis Reporter: Robert Muir Priority: Minor Fix For: 1.4 Attachments: SOLR-1446.patch BufferedTokenStream needs a reset() impl that clears its internal lists. otherwise, there could be problems when using reusable tokenstreams. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757445#action_12757445 ] Yonik Seeley commented on SOLR-908: --- I guess if something causes an exception during analysis, things like BufferedTokenStream can be left with unwanted state. Note that BufferedTokenStream didn't inherit from TokenFilter and thus wouldn't automatically chain the reset() to it's input... so any upstream filters wouldn't be reset(). I just fixed that. Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-908: -- Attachment: SOLR-908.patch Added reset overrides to CommonGramsFilter and CommonGramsQueryFilter. Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757472#action_12757472 ] Robert Muir commented on SOLR-908: -- jason, i took a glance. i think the reset() for CommonGramsQueryFilter should not set prev = null this is because the initial state is not null: in the ctor, prev = new Token() with the current logic, this is what reset() must do also. also, fyi CommonGramsFilter does not need a reset since the stringbuffer isn't used to keep state, the best way I think to ensure its correct i think, is to add tests that consume and reuse/reset() Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-908) Port of Nutch CommonGrams filter to Solr
[ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-908: -- Attachment: SOLR-908.patch Robert thanks. I added the new token in CGQF.reset and reset test cases. Port of Nutch CommonGrams filter to Solr - Key: SOLR-908 URL: https://issues.apache.org/jira/browse/SOLR-908 Project: Solr Issue Type: Wish Components: Analysis Reporter: Tom Burton-West Priority: Minor Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to be, the who, man in the moon vs man on the moon etc.) Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757592#action_12757592 ] Jason Rutherglen commented on SOLR-1316: The DAWG seems like a potential fit as a replacement for the Lucene term dictionary. It would provide the extra benefit of faster prefix etc lookups. I believe it could be stored on disk by writing file pointers to the locations of the letters. I found the Stanford lecture on them interesting, though the papers seem to overcomplicate them. I coauld not find an existing Java implementation. As a generic library I think it could be useful for a variety of Lucene based use cases (i.e. storing terms in a compact form that allows fast lookups, prefix and otherwise). Create autosuggest component Key: SOLR-1316 URL: https://issues.apache.org/jira/browse/SOLR-1316 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: TernarySearchTree.tar.gz Original Estimate: 96h Remaining Estimate: 96h Autosuggest is a common search function that can be integrated into Solr as a SearchComponent. Our first implementation will use the TernaryTree found in Lucene contrib. * Enable creation of the dictionary from the index or via Solr's RPC mechanism * What types of parameters and settings are desirable? * Hopefully in the future we can include user click through rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1447) Simple property injection
[ https://issues.apache.org/jira/browse/SOLR-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757627#action_12757627 ] Noble Paul commented on SOLR-1447: -- +1 . Simple property injection -- Key: SOLR-1447 URL: https://issues.apache.org/jira/browse/SOLR-1447 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 Original Estimate: 48h Remaining Estimate: 48h MergePolicy and MergeScheduler require property injection. We'll allow these and probably other cases in this patch using Java reflection. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.