[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756995#action_12756995
 ] 

Jason Rutherglen commented on SOLR-908:
---

It looks like our problem could be due to
Analyzer.reusableTokenStream and how it reuses tokenstreams from
a thread local variable. This would explain the random behavior
(i.e. depending on the thread one was assigned for a query, the
associated token stream, if it were in an invalid state, would
return incorrect results). I'm thinking reusableTokenStream can
be overridden to return a new token stream each time? And so
bypass whatever reseting issue is occurring from the mixture of
the old and new tokenizer APIs.

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream others

2009-09-18 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1423.
--

Resolution: Fixed

Committed revision 816502. Thanks, Uwe!

 Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream  
 others
 

 Key: SOLR-1423
 URL: https://issues.apache.org/jira/browse/SOLR-1423
 Project: Solr
  Issue Type: Task
  Components: Analysis
Affects Versions: 1.4
Reporter: Uwe Schindler
Assignee: Koji Sekiguchi
 Fix For: 1.4

 Attachments: SOLR-1423-FieldType.patch, 
 SOLR-1423-fix-empty-tokens.patch, SOLR-1423-fix-empty-tokens.patch, 
 SOLR-1423-with-empty-tokens.patch, SOLR-1423.patch, SOLR-1423.patch, 
 SOLR-1423.patch


 Because of some backwards compatibility problems (LUCENE-1906) we changed the 
 CharStream/CharFilter API a little bit. Tokenizer now only has a input field 
 of type java.io.Reader (as before the CharStream code). To correct offsets, 
 it is now needed to call the Tokenizer.correctOffset(int) method, which 
 delegates to the CharStream (if input is subclass of CharStream), else 
 returns an uncorrected offset. Normally it is enough to change all occurences 
 of input.correctOffset() to this.correctOffset() in Tokenizers. It should 
 also be checked, if custom Tokenizers in Solr do correct their offsets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Solr nightly build failure

2009-09-18 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build
[mkdir] Created dir: /tmp/apache-solr-nightly/build/web

compile-solrj:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj
[javac] Compiling 86 source files to /tmp/apache-solr-nightly/build/solrj
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solr
[javac] Compiling 382 source files to /tmp/apache-solr-nightly/build/solr
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 170 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

solr-cell-example:

init:
[mkdir] Created dir: 
/tmp/apache-solr-nightly/contrib/extraction/build/classes
[mkdir] Created dir: /tmp/apache-solr-nightly/build/docs/api

init-forrest-entities:

compile-solrj:

compile:
[javac] Compiling 1 source file to /tmp/apache-solr-nightly/build/solr
[javac] Note: 
/tmp/apache-solr-nightly/src/java/org/apache/solr/search/DocSetHitCollector.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

make-manifest:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/META-INF

compile:
[javac] Compiling 6 source files to 
/tmp/apache-solr-nightly/contrib/extraction/build/classes
[javac] Note: 
/tmp/apache-solr-nightly/contrib/extraction/src/main/java/org/apache/solr/handler/extraction/ExtractingDocumentLoader.java
 uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

build:
  [jar] Building jar: 
/tmp/apache-solr-nightly/contrib/extraction/build/apache-solr-cell-nightly.jar

example:
 [copy] Copying 1 file to /tmp/apache-solr-nightly/example/solr/lib
 [copy] Copying 26 files to /tmp/apache-solr-nightly/example/solr/lib

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 41.498 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 24.701 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 19.198 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 6.827 sec
[junit] Running org.apache.solr.MinimalSchemaTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 8.555 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 12.662 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 7.934 sec
[junit] Running org.apache.solr.SolrInfoMBeanTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.359 sec
[junit] Running org.apache.solr.TestDistributedSearch
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 111.856 sec
[junit] Running org.apache.solr.TestPluginEnable
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.218 sec
[junit] Running org.apache.solr.TestSolrCoreProperties
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.499 sec
[junit] Running org.apache.solr.TestTrie
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 20.647 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.097 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.652 sec
[junit] Running org.apache.solr.analysis.EnglishPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.653 sec
[junit] Running org.apache.solr.analysis.HTMLStripCharFilterTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 5.37 sec
[junit] Running org.apache.solr.analysis.LengthFilterTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.497 sec
[junit] Running 

Hudson build is back to normal: Solr-trunk #928

2009-09-18 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/928/changes




[jira] Updated: (SOLR-1437) DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.

2009-09-18 Thread Fergus McMenemie (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fergus McMenemie updated SOLR-1437:
---

Attachment: SOLR-1437.patch

Good to see you reuse your own code!

This new patch is the same as the previous version excepting that the 
references to SOLR and datasource etc have been rewritten.

Also, Noble, can you check over and review my comments around line 237 in the 
file XPathRecordReader.java. Is this correct?

{code}
  } else {
// can we ever get here? This means we are collecting for an Xpath
// that is outwith any forEach expression
if (attributes != null || hasText)
  valuesAddedinThisFrame = new HashSetString();
stack.push(valuesAddedinThisFrame);
  }
{code}


 DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.
 

 Key: SOLR-1437
 URL: https://issues.apache.org/jira/browse/SOLR-1437
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Fergus McMenemie
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1437.patch, SOLR-1437.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 As per 
 http://www.nabble.com/Re%3A-Extract-info-from-parent-node-during-data-import-%28redirect%3A%29-td25471162.html
  it would be nice to be able to use expressions such as //tagname when 
 parsing XML documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1437) DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.

2009-09-18 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757091#action_12757091
 ] 

Noble Paul commented on SOLR-1437:
--

committed r816577
thanks Fergus

 DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.
 

 Key: SOLR-1437
 URL: https://issues.apache.org/jira/browse/SOLR-1437
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Fergus McMenemie
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1437.patch, SOLR-1437.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 As per 
 http://www.nabble.com/Re%3A-Extract-info-from-parent-node-during-data-import-%28redirect%3A%29-td25471162.html
  it would be nice to be able to use expressions such as //tagname when 
 parsing XML documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-758) Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies.

2009-09-18 Thread Simon Lachinger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757106#action_12757106
 ] 

Simon Lachinger commented on SOLR-758:
--

First of all thanks for providing wildcard matching for the dismax query 
handler, that is exactly what I need. However, the WILDCARD_STRIP_CHARS regex 
in UserQParser.java does not work with umlauts which makes the patch useless 
for languages like ie. German.

I will attach a diff file with the changes I have made to get it working with 
umlauts.

 Enhance DisMaxQParserPlugin to support full-Solr syntax and to support 
 alternate escaping strategies.
 -

 Key: SOLR-758
 URL: https://issues.apache.org/jira/browse/SOLR-758
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3
Reporter: David Smiley
 Fix For: 1.5

 Attachments: AdvancedQParserPlugin.java, AdvancedQParserPlugin.java, 
 DisMaxQParserPlugin.java, DisMaxQParserPlugin.java, UserQParser.java, 
 UserQParser.java


 The DisMaxQParserPlugin has a variety of nice features; chief among them is 
 that is uses the DisjunctionMaxQueryParser.  However it imposes limitations 
 on the syntax.  
 I've enhanced the DisMax QParser plugin to use a pluggable query string 
 re-writer (via subclass extension) instead of hard-coding the logic currently 
 embedded within it (i.e. the escape nearly everything logic). Additionally, 
 I've made this QParser have a notion of a simple syntax (the default) or 
 non-simple in which case some of the logic in this QParser doesn't occur 
 because it's irrelevant (phrase boosting and min-should-max in particular). 
 As part of my work I significantly moved the code around to make it clearer 
 and more extensible.  I also chose to rename it to suggest it's role as a 
 parser for user queries.
 Attachment to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-758) Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies.

2009-09-18 Thread Simon Lachinger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Lachinger updated SOLR-758:
-

Attachment: UserQParser.java-umlauts.patch

Making the UserQParser.java work with umlauts and other special characters.

 Enhance DisMaxQParserPlugin to support full-Solr syntax and to support 
 alternate escaping strategies.
 -

 Key: SOLR-758
 URL: https://issues.apache.org/jira/browse/SOLR-758
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3
Reporter: David Smiley
 Fix For: 1.5

 Attachments: AdvancedQParserPlugin.java, AdvancedQParserPlugin.java, 
 DisMaxQParserPlugin.java, DisMaxQParserPlugin.java, UserQParser.java, 
 UserQParser.java, UserQParser.java-umlauts.patch


 The DisMaxQParserPlugin has a variety of nice features; chief among them is 
 that is uses the DisjunctionMaxQueryParser.  However it imposes limitations 
 on the syntax.  
 I've enhanced the DisMax QParser plugin to use a pluggable query string 
 re-writer (via subclass extension) instead of hard-coding the logic currently 
 embedded within it (i.e. the escape nearly everything logic). Additionally, 
 I've made this QParser have a notion of a simple syntax (the default) or 
 non-simple in which case some of the logic in this QParser doesn't occur 
 because it's irrelevant (phrase boosting and min-should-max in particular). 
 As part of my work I significantly moved the code around to make it clearer 
 and more extensible.  I also chose to rename it to suggest it's role as a 
 parser for user queries.
 Attachment to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



acts_as_solr integeration with solr separately

2009-09-18 Thread abhay kumar
Hi,

I have setup solr search server in tomcat.

I am able to fire queries(of any knid)  get results in xml format.

Now i want to Integerate it(solr) with ruby on rails .

I know ruby on rails has inbuilt plugin acts_as_solr which helps in
integerating(talking) with solr.

acts_as_solr comes bundled with solr web application with jetty server.

But i don't wanna use this inbuilt solr web application .

e.g. i don't wanna do rake solr:start.

I am running solr as different search server in tomcat at port 8983.(url
http://localhost:8983/solr/  all other urls are listening)

Now, I want to talk to this solr server (separate) using acts_as_solr
plugin.

Questions:
1)Can anybody point me how to do this?
Any tutorial ?
2)What changes I had to make in acts_as_solr plugin?

3)Any good pointers(urls) will be appreciated...

Regards
Abhay


[jira] Commented: (SOLR-758) Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies.

2009-09-18 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757139#action_12757139
 ] 

David Smiley commented on SOLR-758:
---

Thanks for the update Simon.  I forget you can do things like \w within a regex 
character class -- [...]

 Enhance DisMaxQParserPlugin to support full-Solr syntax and to support 
 alternate escaping strategies.
 -

 Key: SOLR-758
 URL: https://issues.apache.org/jira/browse/SOLR-758
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3
Reporter: David Smiley
 Fix For: 1.5

 Attachments: AdvancedQParserPlugin.java, AdvancedQParserPlugin.java, 
 DisMaxQParserPlugin.java, DisMaxQParserPlugin.java, UserQParser.java, 
 UserQParser.java, UserQParser.java-umlauts.patch


 The DisMaxQParserPlugin has a variety of nice features; chief among them is 
 that is uses the DisjunctionMaxQueryParser.  However it imposes limitations 
 on the syntax.  
 I've enhanced the DisMax QParser plugin to use a pluggable query string 
 re-writer (via subclass extension) instead of hard-coding the logic currently 
 embedded within it (i.e. the escape nearly everything logic). Additionally, 
 I've made this QParser have a notion of a simple syntax (the default) or 
 non-simple in which case some of the logic in this QParser doesn't occur 
 because it's irrelevant (phrase boosting and min-should-max in particular). 
 As part of my work I significantly moved the code around to make it clearer 
 and more extensible.  I also chose to rename it to suggest it's role as a 
 parser for user queries.
 Attachment to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1445) Leading term in a multi-word synonym replaced with the token that follows it

2009-09-18 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-1445:
--

Fix Version/s: 1.4

 Leading term in a multi-word synonym replaced with the token that follows it
 

 Key: SOLR-1445
 URL: https://issues.apache.org/jira/browse/SOLR-1445
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.4
 Environment: Solr 1.4 nightly (09/14/2009)
Reporter: Gregg Donovan
 Fix For: 1.4

 Attachments: TestMultiWordSynonmys.java


 I'm running into an odd issue with multi-word synonyms. Things generally seem 
 to work as expected, but I sometimes see words that are the leading term in a 
 multi-word synonym being replaced with the token that follows them in the 
 stream when they should just be ignored (i.e. there's no synonym match for 
 just that token). When I preview the analysis at admin/analysis.jsp it looks 
 fine, but at runtime I see problems like the one in the attached unit test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: svn commit: r816202 - in /lucene/solr/trunk/src: java/org/apache/solr/schema/ java/org/apache/solr/search/ java/org/apache/solr/search/function/ test/org/apache/solr/search/function/

2009-09-18 Thread Yonik Seeley
2009/9/17 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 do we have some type info for the context param in
 SolrFilter#createWeight(Map context, Searcher searcher)

Nope... it's specifically opaque so we don't have to change it down
the road or force the creation of custom weight classes just to store
extra info, or force the creation of a fake/custom ValueSorce just to
use a different key.

-Yonik
http://www.lucidimagination.com


[jira] Commented: (SOLR-1427) SearchComponents aren't listed on registry.jsp

2009-09-18 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757185#action_12757185
 ] 

Grant Ingersoll commented on SOLR-1427:
---

I'm guessing the problem is most likely in loading the SearchComponents, not in 
the SolrResourceLoader.  The reason being what Yonik said in that the core is 
not ready yet at that point.

Also, need to address the possible double loading in SolrResourceLoader.


 SearchComponents aren't listed on registry.jsp
 --

 Key: SOLR-1427
 URL: https://issues.apache.org/jira/browse/SOLR-1427
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1427.patch, SOLR-1427.patch


 SearchComponent implements SolrInfoMBean using getCategory() of OTHER but 
 they aren't listed on the registry.jsp display of loaded plugins.
 This may be a one-of-glitch because of the way SearchComponents get loaded, 
 or it may indicate some other problem with the infoRegistry.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1427) SearchComponents aren't listed on registry.jsp

2009-09-18 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757195#action_12757195
 ] 

Grant Ingersoll commented on SOLR-1427:
---

Hoss, where in the SolrResourceLoader do you see other puts into the 
infoRegistry happening?

 SearchComponents aren't listed on registry.jsp
 --

 Key: SOLR-1427
 URL: https://issues.apache.org/jira/browse/SOLR-1427
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1427.patch, SOLR-1427.patch


 SearchComponent implements SolrInfoMBean using getCategory() of OTHER but 
 they aren't listed on the registry.jsp display of loaded plugins.
 This may be a one-of-glitch because of the way SearchComponents get loaded, 
 or it may indicate some other problem with the infoRegistry.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



NPE

2009-09-18 Thread Grant Ingersoll

Anyone else seeing:
SEVERE: java.lang.NullPointerException
at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java: 
761)
at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java: 
619)

at org.apache.solr.schema.TextField.write(TextField.java:45)
at org.apache.solr.schema.SchemaField.write(SchemaField.java: 
108)
at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java: 
311)
at org.apache.solr.request.XMLWriter$3.writeDocs 
(XMLWriter.java:483)
at org.apache.solr.request.XMLWriter.writeDocuments 
(XMLWriter.java:420)
at org.apache.solr.request.XMLWriter.writeDocList 
(XMLWriter.java:457)
at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java: 
520)
at org.apache.solr.request.XMLWriter.writeResponse 
(XMLWriter.java:130)
at org.apache.solr.request.XMLResponseWriter.write 
(XMLResponseWriter.java:34)
at org.apache.solr.servlet.SolrDispatchFilter.writeResponse 
(SolrDispatchFilter.java:325)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter 
(SolrDispatchFilter.java:254)
at org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)


When running the example and doing a simple query?


[jira] Updated: (SOLR-1427) SearchComponents aren't listed on registry.jsp

2009-09-18 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1427:
--

Attachment: SOLR-1427.patch

Patch that defers registering the components until later.  I can't reproduce 
the problem, so this is just a educated guess.

 SearchComponents aren't listed on registry.jsp
 --

 Key: SOLR-1427
 URL: https://issues.apache.org/jira/browse/SOLR-1427
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1427.patch, SOLR-1427.patch, SOLR-1427.patch


 SearchComponent implements SolrInfoMBean using getCategory() of OTHER but 
 they aren't listed on the registry.jsp display of loaded plugins.
 This may be a one-of-glitch because of the way SearchComponents get loaded, 
 or it may indicate some other problem with the infoRegistry.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: NPE

2009-09-18 Thread Grant Ingersoll

Never mind.  Operator error.

On Sep 18, 2009, at 8:15 AM, Grant Ingersoll wrote:


Anyone else seeing:
SEVERE: java.lang.NullPointerException
   at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java: 
761)
   at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java: 
619)

   at org.apache.solr.schema.TextField.write(TextField.java:45)
   at org.apache.solr.schema.SchemaField.write(SchemaField.java: 
108)
   at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java: 
311)
   at org.apache.solr.request.XMLWriter$3.writeDocs 
(XMLWriter.java:483)
   at org.apache.solr.request.XMLWriter.writeDocuments 
(XMLWriter.java:420)
   at org.apache.solr.request.XMLWriter.writeDocList 
(XMLWriter.java:457)
   at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java: 
520)
   at org.apache.solr.request.XMLWriter.writeResponse 
(XMLWriter.java:130)
   at org.apache.solr.request.XMLResponseWriter.write 
(XMLResponseWriter.java:34)
   at org.apache.solr.servlet.SolrDispatchFilter.writeResponse 
(SolrDispatchFilter.java:325)
   at org.apache.solr.servlet.SolrDispatchFilter.doFilter 
(SolrDispatchFilter.java:254)
   at org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)


When running the example and doing a simple query?





Re: NPE

2009-09-18 Thread Yonik Seeley
Looks like one of the hazards of changing the schema w/o deleting the
index and re-indexing.
I bet this field was something like a numeric type that would return
null from Field.getStringValue() and then it was changed to a text
type.

-Yonik
http://www.lucidimagination.com



On Fri, Sep 18, 2009 at 11:15 AM, Grant Ingersoll gsing...@apache.org wrote:
 Anyone else seeing:
 SEVERE: java.lang.NullPointerException
        at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java:761)
        at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java:619)
        at org.apache.solr.schema.TextField.write(TextField.java:45)
        at org.apache.solr.schema.SchemaField.write(SchemaField.java:108)
        at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java:311)
        at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:483)
        at
 org.apache.solr.request.XMLWriter.writeDocuments(XMLWriter.java:420)
        at org.apache.solr.request.XMLWriter.writeDocList(XMLWriter.java:457)
        at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java:520)
        at
 org.apache.solr.request.XMLWriter.writeResponse(XMLWriter.java:130)
        at
 org.apache.solr.request.XMLResponseWriter.write(XMLResponseWriter.java:34)
        at
 org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:325)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
        at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)

 When running the example and doing a simple query?



[jira] Commented: (SOLR-1294) SolrJS/Javascript client fails in IE8!

2009-09-18 Thread Alex Dergachev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757235#action_12757235
 ] 

Alex Dergachev commented on SOLR-1294:
--

Hi guys... we have worked extensively at integrating solrjs and drupal over the 
last few months, and have had to rewrite much of the code to fix bugs and allow 
extensibility.We're hoping to release our fork in the coming weeks, at this 
URL: http://drupal.org/project/solrjs

Because we're sticking closely to the original solrjs model--javascript that 
communicates directly with solr, we're hoping to eventually merge the two 
branches, and have brought up the possibility with Matthias Epheser. 

Solrjs is a killer app, and solr user we talked to to is incredibly excited 
about it. However, given that the current code base is very alpha, I don't 
think a few browser bugs with solrjs should hold up the release of solr 1.4.

Regards,
Alex Dergachev
Co-founder, Evolving Web
http://evolvingweb.ca




 SolrJS/Javascript client fails in IE8!
 --

 Key: SOLR-1294
 URL: https://issues.apache.org/jira/browse/SOLR-1294
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Eric Pugh
Assignee: Ryan McKinley
 Fix For: 1.4

 Attachments: SOLR-1294-IE8.patch, SOLR-1294.patch, 
 solrjs-ie8-html-syntax-error.patch


 SolrJS seems to fail with 'jQuery.solrjs' is null or not an object errors 
 under IE8.  I am continuing to test if this occurs in IE 6 and 7 as well.  
 This happens on both the Sample online site at 
 http://solrjs.solrstuff.org/test/reuters/ as well as the 
 /trunk/contrib/javascript library.   Seems to be a show stopper from the 
 standpoint of really using this library!
 I have posted a screenshot of the error at 
 http://img.skitch.com/20090717-jejm71u6ghf2dpn3mwrkarigwm.png
 The error is just a whole bunch of repeated messages in the vein of:
 Message: 'jQuery.solrjs' is null or not an object
 Line: 24
 Char: 1
 Code: 0
 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/QueryItem.js
 Message: 'jQuery.solrjs' is null or not an object
 Line: 37
 Char: 1
 Code: 0
 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/Manager.js
 Message: 'jQuery.solrjs' is null or not an object
 Line: 24
 Char: 1
 Code: 0
 URI: 
 file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractSelectionView.js
 Message: 'jQuery.solrjs' is null or not an object
 Line: 27
 Char: 1
 Code: 0
 URI: 
 file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractWidget.js

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: NPE

2009-09-18 Thread Andrzej Bialecki

Grant Ingersoll wrote:

Anyone else seeing:
SEVERE: java.lang.NullPointerException
at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java:761)


I saw that symptom when schema seriously didn't match the index (e.g. 
schema didn't specify field type and then XMLWriter assumes Text, or 
schemna specified a stored field, whereas the index had the same field 
unstored).



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757267#action_12757267
 ] 

Yonik Seeley commented on SOLR-908:
---

Jason, at a quick look, I see that this filter maintains state, but doesn't 
implement reset() - could that be the issue?

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757271#action_12757271
 ] 

Robert Muir commented on SOLR-908:
--

just my opinion, do not think this problem is due to mixed tokenizer APIs 
(LUCENE-1919)

this is because this BufferedTokenStream does not mix the apis that cause that 
issue... it only uses TokenStream.next()

i think instead Yonik might be on the right track, could be wrong.


 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757344#action_12757344
 ] 

Uwe Schindler commented on SOLR-908:


In my opinion, the problem is BufferedTokenStream (should its name not 
BufferedTokenFilter?). It has the linked list but does not implement reset(). 
So the problem is not this issue, more the usage of reset because you reuse the 
token stream. As long as BufferedTokenStream is not fixed to support reset() 
you have to create new instances.

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1446) BufferedTokenStream keeps state, but does not implement reset

2009-09-18 Thread Robert Muir (JIRA)
BufferedTokenStream keeps state, but does not implement reset
-

 Key: SOLR-1446
 URL: https://issues.apache.org/jira/browse/SOLR-1446
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Reporter: Robert Muir
Priority: Minor
 Attachments: SOLR-1446.patch

BufferedTokenStream needs a reset() impl that clears its internal lists.
otherwise, there could be problems when using reusable tokenstreams.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1446) BufferedTokenStream keeps state, but does not implement reset

2009-09-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1446:
--

Attachment: SOLR-1446.patch

 BufferedTokenStream keeps state, but does not implement reset
 -

 Key: SOLR-1446
 URL: https://issues.apache.org/jira/browse/SOLR-1446
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Reporter: Robert Muir
Priority: Minor
 Attachments: SOLR-1446.patch


 BufferedTokenStream needs a reset() impl that clears its internal lists.
 otherwise, there could be problems when using reusable tokenstreams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757364#action_12757364
 ] 

Robert Muir commented on SOLR-908:
--

Uwe, i opened an issue for this: SOLR-1446

i think even if not the cause of this problem, BufferedTokenStream should 
implement reset() since it keeps internal state.

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757381#action_12757381
 ] 

Robert Muir commented on SOLR-908:
--

similar to the BufferedTokenStream reset, the CommonGramsQueryFilter here has 
its own internal state:
{code}
private Token prev;
{code}

so this filter too should implement reset (and must call super.reset() so the 
BufferedTokenStream lists get reset too).

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1447) Simple property injection

2009-09-18 Thread Jason Rutherglen (JIRA)
Simple property injection 
--

 Key: SOLR-1447
 URL: https://issues.apache.org/jira/browse/SOLR-1447
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5


MergePolicy and MergeScheduler require property injection.  We'll allow these 
and probably other cases in this patch using Java reflection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757413#action_12757413
 ] 

Jason Rutherglen commented on SOLR-908:
---

Interesting, the whole reusableTokenStream model is new to me,
so it wasn't in my mental view of how Lucene analyzers work. It
seems if BTS is caching tokens, then being reused, and isn't
reset, then there would be excess tokens instead of deletions?
Or perhaps the reset is being called from another analyzer? It's
quite confusing. I started work on a LoggingTokenizer that could
be inserted between tokenizers in the Solr schema, however have
been working on reproducing the issue (which hasn't worked
either). 

Uwe, Yonik, and Robert, thanks for taking a look! 

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757432#action_12757432
 ] 

Robert Muir commented on SOLR-908:
--

{quote}
It seems if BTS is caching tokens, then being reused, and isn't
reset, then there would be excess tokens instead of deletions?
{quote}

right, thats what the test case I added for BufferedTokenStream showed. 
this would be more of a corner case, as i think most BufferedTokenStreams would 
have empty lists anyway
by the time they are reset(), so its likely not causing your problem (though it 
should be fixed!)

your problem, again is probably the internal state kept in 
CommonGramsQueryFilter
as you can see, CommonGramsQueryFilter has hairy logic involving the buffered 
token 'prev'
a lot of this logic has to do with what happens at end of stream.

unfortunately there is no reset() for CommonGramsQueryFilter to set 'prev' back 
to its initial state, so when something like QueryParser tries to reuse it, it 
is probably not behaving correctly. 

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1444) Add option in solrconfig.xml to override the LogMergePolicy calibrateSizeByDeletes

2009-09-18 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757431#action_12757431
 ] 

Jason Rutherglen commented on SOLR-1444:


I think this is barking up the wrong path, I think we'll want to support any 
setter methods a class has to offer.  I opened an issue to address this 
SOLR-1447  Otherwise we're writing custom code for each config class?

 Add option in solrconfig.xml to override the LogMergePolicy 
 calibrateSizeByDeletes
 

 Key: SOLR-1444
 URL: https://issues.apache.org/jira/browse/SOLR-1444
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
 Environment: NA
Reporter: Jibo John
Priority: Minor

 A patch was committed in lucene  
 (http://issues.apache.org/jira/browse/LUCENE-1634) that would consider the 
 number of deleted documents as the criteria when deciding which segments to 
 merge.
 By default, calibrateSizeByDeletes = false in LogMergePolicy. So, currently, 
 there is no way in Solr to set calibrateSizeByDeletes = true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1446) BufferedTokenStream keeps state, but does not implement reset

2009-09-18 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1446.


   Resolution: Fixed
Fix Version/s: 1.4

I had missed that one... Thanks!

 BufferedTokenStream keeps state, but does not implement reset
 -

 Key: SOLR-1446
 URL: https://issues.apache.org/jira/browse/SOLR-1446
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Reporter: Robert Muir
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1446.patch


 BufferedTokenStream needs a reset() impl that clears its internal lists.
 otherwise, there could be problems when using reusable tokenstreams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757445#action_12757445
 ] 

Yonik Seeley commented on SOLR-908:
---

I guess if something causes an exception during analysis, things like 
BufferedTokenStream can be left with unwanted state.
Note that BufferedTokenStream didn't inherit from TokenFilter and thus wouldn't 
automatically chain the reset() to it's input... so any upstream filters 
wouldn't be reset().  I just fixed that.

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-908:
--

Attachment: SOLR-908.patch

Added reset overrides to CommonGramsFilter and CommonGramsQueryFilter.  

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757472#action_12757472
 ] 

Robert Muir commented on SOLR-908:
--

jason, i took a glance. i think the reset() for CommonGramsQueryFilter should 
not set prev = null
this is because the initial state is not null:
in the ctor, prev = new Token() 
with the current logic, this is what reset() must do also.

also, fyi CommonGramsFilter does not need a reset since the stringbuffer isn't 
used to keep state,

the best way I think to ensure its correct i think, is to add tests that 
consume and reuse/reset()


 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-09-18 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-908:
--

Attachment: SOLR-908.patch

Robert thanks.  I added the new token in CGQF.reset and reset test cases.

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-18 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757592#action_12757592
 ] 

Jason Rutherglen commented on SOLR-1316:


The DAWG seems like a potential fit as a replacement for the
Lucene term dictionary. It would provide the extra benefit of
faster prefix etc lookups. I believe it could be stored on disk
by writing file pointers to the locations of the letters. I
found the Stanford lecture on them interesting, though the
papers seem to overcomplicate them. I coauld not find an existing
Java implementation. 

As a generic library I think it could be useful for a variety of
Lucene based use cases (i.e. storing terms in a compact form
that allows fast lookups, prefix and otherwise). 

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: TernarySearchTree.tar.gz

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1447) Simple property injection

2009-09-18 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757627#action_12757627
 ] 

Noble Paul commented on SOLR-1447:
--

+1 . 



 Simple property injection 
 --

 Key: SOLR-1447
 URL: https://issues.apache.org/jira/browse/SOLR-1447
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5

   Original Estimate: 48h
  Remaining Estimate: 48h

 MergePolicy and MergeScheduler require property injection.  We'll allow these 
 and probably other cases in this patch using Java reflection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.