Solr nightly build failure
init-forrest-entities: [mkdir] Created dir: /tmp/apache-solr-nightly/build compile-common: [mkdir] Created dir: /tmp/apache-solr-nightly/build/common [javac] Compiling 31 source files to /tmp/apache-solr-nightly/build/common [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compile: [mkdir] Created dir: /tmp/apache-solr-nightly/build/core [javac] Compiling 274 source files to /tmp/apache-solr-nightly/build/core [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compile-solrj-core: [mkdir] Created dir: /tmp/apache-solr-nightly/build/client/solrj [javac] Compiling 22 source files to /tmp/apache-solr-nightly/build/client/solrj [javac] Note: /tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. compile-solrj: [javac] Compiling 2 source files to /tmp/apache-solr-nightly/build/client/solrj compileTests: [mkdir] Created dir: /tmp/apache-solr-nightly/build/tests [javac] Compiling 78 source files to /tmp/apache-solr-nightly/build/tests [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. junit: [mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results [junit] Running org.apache.solr.BasicFunctionalityTest [junit] Tests run: 25, Failures: 0, Errors: 0, Time elapsed: 23.125 sec [junit] Running org.apache.solr.ConvertedLegacyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.71 sec [junit] Running org.apache.solr.DisMaxRequestHandlerTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 5.308 sec [junit] Running org.apache.solr.EchoParamsTest [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.063 sec [junit] Running org.apache.solr.OutputWriterTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.18 sec [junit] Running org.apache.solr.SampleTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.824 sec [junit] Running org.apache.solr.analysis.HTMLStripReaderTest [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.653 sec [junit] Running org.apache.solr.analysis.TestBufferedTokenStream [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.631 sec [junit] Running org.apache.solr.analysis.TestCapitalizationFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.421 sec [junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.415 sec [junit] Running org.apache.solr.analysis.TestKeepWordFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.504 sec [junit] Running org.apache.solr.analysis.TestPatternReplaceFilter [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.242 sec [junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.435 sec [junit] Running org.apache.solr.analysis.TestPhoneticFilter [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.776 sec [junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.867 sec [junit] Running org.apache.solr.analysis.TestSynonymFilter [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.276 sec [junit] Running org.apache.solr.analysis.TestTrimFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.736 sec [junit] Running org.apache.solr.analysis.TestWordDelimiterFilter [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 7.932 sec [junit] Running org.apache.solr.common.SolrDocumentTest [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.07 sec [junit] Running org.apache.solr.common.params.SolrParamTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.089 sec [junit] Running org.apache.solr.common.util.ContentStreamTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.221 sec [junit] Running org.apache.solr.common.util.IteratorChainTest [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.051 sec [junit] Running org.apache.solr.common.util.TestXMLEscaping [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.059 sec [junit] Running
[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566951#action_12566951 ] Thomas Peuss commented on SOLR-127: --- Think of two scenarios: * An AJAXified browser client sending requests to Solr. Caching of unchanged data in the client and corporate caching proxies speeds up things. * A cluster of Solr servers behind a loadbalancer with caching functionality. Middleware sends requests to Solr through the loadbalancer. Repeating requests to unchanged data are responded directly from LB cache without putting load to the Solr servers. This is for example our scenario. Our code works fine with BlueCoat Webcache, Apache HTTPD proxy cache, Squid proxy cache and many other solutions _because_ we are following standards here. So I don't really get the point of your comment. Besides that you can completely disable this HTTP header stuff in solrconfig.xml if you don't want it. Make Solr more friendly to external HTTP caches --- Key: SOLR-127 URL: https://issues.apache.org/jira/browse/SOLR-127 Project: Solr Issue Type: Wish Reporter: Hoss Man Assignee: Hoss Man Fix For: 1.3 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch an offhand comment I saw recently reminded me of something that really bugged me about the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might be sitting in front of it. at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't do anything special with If-Modified-Since. t the very least, we can set a Last-Modified based on when the current IndexReder was open (if not the Date on the IndexReader) and use the same info to determing how to respond to If-Modified-Since requests. (for the record, i think the reason this hasn't occured to me in the 2+ years i've been using Solr, is because with the internal caching, i've yet to need to put a proxy cache in front of Solr) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
concurrency while indexing
Hi all, I have following usecase, one solr instance which receives add/commit calls constantly from 3 different clients. The machine: Model: HP Proliant DL 360 Memory: 2 Gb CPU: 1 Intel Xeon 3.02 Ghz Disk: 2 x 36 GB SCSI en RAID I need to raise the number of clients to about 10, can this be a problem for the indexing machine? salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567067#action_12567067 ] Fuad Efendi commented on SOLR-127: -- In my configuration I do not need SOLR caching at all; but I use HTTP caching more effectively. HTTPD memory- and disk- cache is used between Client and Middleware. No any caching between Middleware and SOLR. Middleware responds to HTTPD with 304 if necessary, with correct Last-Modified etc., and request do not reach SOLR. This caching configuration works fine with AJAX too, without SOLR's caching headers. I've seen unnecessary extra-work with this implementation... taking long time... and tried to point on some meanings of response codes (for Web). Make Solr more friendly to external HTTP caches --- Key: SOLR-127 URL: https://issues.apache.org/jira/browse/SOLR-127 Project: Solr Issue Type: Wish Reporter: Hoss Man Assignee: Hoss Man Fix For: 1.3 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch an offhand comment I saw recently reminded me of something that really bugged me about the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might be sitting in front of it. at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't do anything special with If-Modified-Since. t the very least, we can set a Last-Modified based on when the current IndexReder was open (if not the Date on the IndexReader) and use the same info to determing how to respond to If-Modified-Since requests. (for the record, i think the reason this hasn't occured to me in the 2+ years i've been using Solr, is because with the internal caching, i've yet to need to put a proxy cache in front of Solr) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567072#action_12567072 ] Fuad Efendi commented on SOLR-127: -- Regarding HTTP-Caching-Load-Balancer between SOLR and Middleware: You need to deal with additional internal http-cache at middleware. In most cases Middleware generates content from different sources and can't reroute If-Modified-Since request to SOLR without internal caching. For instance, if you are using SOLRJ, you have to implement *additional* cache for SolrDocument... Make Solr more friendly to external HTTP caches --- Key: SOLR-127 URL: https://issues.apache.org/jira/browse/SOLR-127 Project: Solr Issue Type: Wish Reporter: Hoss Man Assignee: Hoss Man Fix For: 1.3 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch an offhand comment I saw recently reminded me of something that really bugged me about the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might be sitting in front of it. at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't do anything special with If-Modified-Since. t the very least, we can set a Last-Modified based on when the current IndexReder was open (if not the Date on the IndexReader) and use the same info to determing how to respond to If-Modified-Since requests. (for the record, i think the reason this hasn't occured to me in the 2+ years i've been using Solr, is because with the internal caching, i've yet to need to put a proxy cache in front of Solr) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567111#action_12567111 ] Yonik Seeley commented on SOLR-342: --- Yikes! Thanks for the report Will. It certainly sounds like a Lucene issue to me (esp because removal of this patch fixes things... that means it only happens under certain lucene settings). Could you perhaps try the very latest Lucene trunk (there were some seemingly unrelated fixes recently). Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567077#action_12567077 ] Fuad Efendi commented on SOLR-127: -- Thomas, Walter, Finally I agree, thanks! Middleware should not send/reroute If-Modified-Since, and should not implement internal cache (in provided by me contr-sample): with caching enabled, it will simply retrieve cached content. I do not agree with 400, it is place for DoS attacks. Query parsing error should be 200 with caching response codes. Of course, I know RFC 2616. Make Solr more friendly to external HTTP caches --- Key: SOLR-127 URL: https://issues.apache.org/jira/browse/SOLR-127 Project: Solr Issue Type: Wish Reporter: Hoss Man Assignee: Hoss Man Fix For: 1.3 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch an offhand comment I saw recently reminded me of something that really bugged me about the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might be sitting in front of it. at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't do anything special with If-Modified-Since. t the very least, we can set a Last-Modified based on when the current IndexReder was open (if not the Date on the IndexReader) and use the same info to determing how to respond to If-Modified-Since requests. (for the record, i think the reason this hasn't occured to me in the 2+ years i've been using Solr, is because with the internal caching, i've yet to need to put a proxy cache in front of Solr) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567068#action_12567068 ] Walter Underwood commented on SOLR-127: --- Two reasons to do HTTP caching for Solr: First, Solr is HTTP and needs to implement that correctly. Second, caches are much harder to implement and test than the cache information in HTTP. HTTP caches already exist and are well tested, so the implementation cost is zero and deployment is very easy. The HTTP spec already covers which responses should be cached. A 400 response may only be cached if it includes explicit cache control headers which allow that. See RFC 2616. We are using a caching load balancer and caching in Apache front ends to Tomcat. We see an increase of more than 2X in the capacity of our search farm. I would recommend against Solr-specific cache information in the XML part of the responses. Distributed caching is extremely difficult to get right. Around 25% of the HTTP 1.1 spec is devoted to caching and there are still grey areas. Make Solr more friendly to external HTTP caches --- Key: SOLR-127 URL: https://issues.apache.org/jira/browse/SOLR-127 Project: Solr Issue Type: Wish Reporter: Hoss Man Assignee: Hoss Man Fix For: 1.3 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch an offhand comment I saw recently reminded me of something that really bugged me about the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might be sitting in front of it. at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't do anything special with If-Modified-Since. t the very least, we can set a Last-Modified based on when the current IndexReder was open (if not the Date on the IndexReader) and use the same info to determing how to respond to If-Modified-Since requests. (for the record, i think the reason this hasn't occured to me in the 2+ years i've been using Solr, is because with the internal caching, i've yet to need to put a proxy cache in front of Solr) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567064#action_12567064 ] Fuad Efendi commented on SOLR-127: -- I agree. Caching Load Balancer between SOLR and APP Servers is excellent idea, and it can be black box without any knowlege about SOLR API. AJAX can use internal cache of web browser; FLEX probably too... Question: do we need caching of static (non-changed) content from SOLR such as 400: Query parsing error?.. Make Solr more friendly to external HTTP caches --- Key: SOLR-127 URL: https://issues.apache.org/jira/browse/SOLR-127 Project: Solr Issue Type: Wish Reporter: Hoss Man Assignee: Hoss Man Fix For: 1.3 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch an offhand comment I saw recently reminded me of something that really bugged me about the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might be sitting in front of it. at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't do anything special with If-Modified-Since. t the very least, we can set a Last-Modified based on when the current IndexReder was open (if not the Date on the IndexReader) and use the same info to determing how to respond to If-Modified-Since requests. (for the record, i think the reason this hasn't occured to me in the 2+ years i've been using Solr, is because with the internal caching, i've yet to need to put a proxy cache in front of Solr) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567081#action_12567081 ] Fuad Efendi commented on SOLR-127: -- Fortunately, we are not using 404 trying to retrieve removed document... In initial design (I believe) SOLR developers simply wrapped all exceptions into 400, and empty result set is not an exception. Make Solr more friendly to external HTTP caches --- Key: SOLR-127 URL: https://issues.apache.org/jira/browse/SOLR-127 Project: Solr Issue Type: Wish Reporter: Hoss Man Assignee: Hoss Man Fix For: 1.3 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch an offhand comment I saw recently reminded me of something that really bugged me about the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might be sitting in front of it. at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't do anything special with If-Modified-Since. t the very least, we can set a Last-Modified based on when the current IndexReder was open (if not the Date on the IndexReader) and use the same info to determing how to respond to If-Modified-Since requests. (for the record, i think the reason this hasn't occured to me in the 2+ years i've been using Solr, is because with the internal caching, i've yet to need to put a proxy cache in front of Solr) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567099#action_12567099 ] Will Johnson commented on SOLR-342: --- I think we're running into a very serious issue with trunk + this patch. either the document summaries are not matched or the overall matching is 'wrong'. i did find this in the lucene jira: LUCENE-994 Note that these changes will break users of ParallelReader because the parallel indices will no longer have matching docIDs. Such users need to switch IndexWriter back to flushing by doc count, and switch the MergePolicy back to LogDocMergePolicy. It's likely also necessary to switch the MergeScheduler back to SerialMergeScheduler to ensure deterministic docID assignment. we're seeing rather consistent bad results but only after 20-30k documents and multiple commits and wondering if anyone else is seeing anything. i've verified that the results are bad even though luke which would seem to remove the search side of hte solr equation. the basic test case is to search for title:foo and get back documents that only have title:bar. we're going to start on a unit test but give the document counts and the corpus we're testing against it may be a while so i thought i'd ask to see if anyone had any hints. removing this patch seems to remove the issue so i doesn't appear to be a lucene problem Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567152#action_12567152 ] Yonik Seeley commented on SOLR-342: --- Thanks Will. My guess at this point is a merging bug in Lucene, so you might be able to reproduce by forcing more merges. Make mergeFacor=2 and lower how many docs it takes to do a merge (set maxBufferedDocs to 2, or set ramBufferSizeMB to 1). Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567198#action_12567198 ] Will Johnson commented on SOLR-342: --- we have: mergeFactor10/mergeFactor ramBufferSizeMB64/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs and i'm working on a unit test but just adding a few terms per doc doesnt seem to trigger it, at least not 'quickly.' Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567184#action_12567184 ] Oleg Gnatovskiy commented on SOLR-236: -- Also, is field collapse going to be a part of the upcoming Solr 1.3 release, or will we need to run a patch on it? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-475) multi-valued faceting via un-inverted field
[ https://issues.apache.org/jira/browse/SOLR-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-475: -- Attachment: UnInvertedField.java Prototype attached. This is completely untested code, and is still missing the solr interface + caching. The approach is described in the comments (cut-n-pasted here). Any thoughts or comments on the approach? I may not have time to immediately work on this (fix the bugs, add tests, hook up to solr, add caching of un-inverted field, etc), so additional contributions in this direction are welcome! {code} /** * Final form of the un-inverted field: * Each document points to a list of term numbers that are contained in that document. * * Term numbers are in sorted order, and are encoded as variable-length deltas from the * previous term number. Real term numbers start at 2 since 0 and 1 are reserved. A * term number of 0 signals the end of the termNumber list. * * There is a singe int[maxDoc()] which either contains a pointer into a byte[] for * the termNumber lists, or directly contains the termNumber list if it fits in the 4 * bytes of an integer. If the first byte in the integer is 1, the next 3 bytes * are a pointer into a byte[] where the termNumber list starts. * * There are actually 256 byte arrays, to compensate for the fact that the pointers * into the byte arrays are only 3 bytes long. The correct byte array for a document * is a function of it's id. * * To save space and speed up faceting, any term that matches enough documents will * not be un-inverted... it will be skipped while building the un-inverted field structore, * and will use a set intersection method during faceting. * * To further save memory, the terms (the actual string values) are not all stored in * memory, but a TermIndex is used to convert term numbers to term values only * for the terms needed after faceting has completed. Only every 128th term value * is stored, along with it's corresponding term number, and this is used as an * index to find the closest term and iterate until the desired number is hit (very * much like Lucene's own internal term index). */ {code} multi-valued faceting via un-inverted field --- Key: SOLR-475 URL: https://issues.apache.org/jira/browse/SOLR-475 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Attachments: UnInvertedField.java Facet multi-valued fields via a counting method (like the FieldCache method) on an un-inverted representation of the field. For each doc, look at it's terms and increment a count for that term. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567207#action_12567207 ] Grant Ingersoll commented on SOLR-342: -- You mentioned ParallelReader, are you using that, or any other patches? {quote} problem to happen before we get 20-30k large docs {quote} what is large in your terms? Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567140#action_12567140 ] Yonik Seeley commented on SOLR-342: --- Will, are you using term vectors anywhere, or any customizations to Solr (at the lucene level)? When you say document summaries are not matched, you you mean that the incorrect documents are matched, or that the correct documents are matched but just highlighting is wrong? Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: /example/solr/bin is empty in trunk
Try ant example in the base dir to build the example. Thanks, it works
[jira] Created: (SOLR-475) multi-valued faceting via un-inverted field
multi-valued faceting via un-inverted field --- Key: SOLR-475 URL: https://issues.apache.org/jira/browse/SOLR-475 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Facet multi-valued fields via a counting method (like the FieldCache method) on an un-inverted representation of the field. For each doc, look at it's terms and increment a count for that term. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567235#action_12567235 ] Will Johnson commented on SOLR-342: --- we're using SolrCore in terms of: core = new SolrCore(foo, dataDir, solrConfig, solrSchema); UpdateHandler handler = core.getUpdateHandler(); updateHandler.addDoc(command); which is a bit more low level than normal however when we flipped back to solr trunk + lucene 2.3 everything was fine so it leads me to belive that we are ok in that respect. i was going to try and reproduce with lucene directly also but that too is a bit outside the scope of what i have time for at the moment. and we're not getting any exceptions, just bad search results. Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567218#action_12567218 ] Will Johnson commented on SOLR-342: --- we're not using parallel reader but we are using direct core access instead of going over http. as for doc size, we're indexing wikipedia but creating anumber of extra fields. they are only large in comparison to any of the 'large volume' tests i've seen in most of the solr and lucene tests. - will Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567224#action_12567224 ] Oleg Gnatovskiy commented on SOLR-236: -- OK, I think I have the first issue figured out. If the current resultset (lets say the first 10 rows) doesn't have the field that we are collapsing on, the counts don't show up. Is that correct? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: concurrency while indexing
On Feb 8, 2008 3:53 AM, Thorsten Scherler [EMAIL PROTECTED] wrote: I have following usecase, one solr instance which receives add/commit calls constantly from 3 different clients. The machine: Model: HP Proliant DL 360 Memory: 2 Gb CPU: 1 Intel Xeon 3.02 Ghz Disk: 2 x 36 GB SCSI en RAID I need to raise the number of clients to about 10, can this be a problem for the indexing machine? I'd stop the clients from doing commit themselves unless it's really necessary, and use some form of time based autocommit (see example solrconfig.xml). -Yonik
Re: /example/solr/bin is empty in trunk
On Feb 8, 2008 1:13 AM, Fuad Efendi [EMAIL PROTECTED] wrote: Is it correct?.. I want to try distribution/replication in v.2.3 Try ant example in the base dir to build the example. -Yonik
[jira] Updated: (SOLR-475) multi-valued faceting via un-inverted field
[ https://issues.apache.org/jira/browse/SOLR-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-475: -- Attachment: UnInvertedField.java fix single line oops multi-valued faceting via un-inverted field --- Key: SOLR-475 URL: https://issues.apache.org/jira/browse/SOLR-475 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Attachments: UnInvertedField.java, UnInvertedField.java Facet multi-valued fields via a counting method (like the FieldCache method) on an un-inverted representation of the field. For each doc, look at it's terms and increment a count for that term. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567221#action_12567221 ] Grant Ingersoll commented on SOLR-342: -- Direct core meaning embedded, right? It's interesting, b/c I have done a fair amount of Lucene 2.3 testing w/ Wikipedia (nothing like a free, fairly large dataset) Can you reproduce the problem using Lucene directly? (have a look at contrib/benchmark for a way to get Lucene/Wikipedia up and running quickly) Also, are there any associated exceptions anywhere in the chain? Or is it just that your index is bad? Are you starting from a clean index or updating an existing one? Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.