Solr nightly build failure

2008-02-08 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build

compile-common:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/common
[javac] Compiling 31 source files to /tmp/apache-solr-nightly/build/common
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/core
[javac] Compiling 274 source files to /tmp/apache-solr-nightly/build/core
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile-solrj-core:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/client/solrj
[javac] Compiling 22 source files to 
/tmp/apache-solr-nightly/build/client/solrj
[javac] Note: 
/tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compile-solrj:
[javac] Compiling 2 source files to 
/tmp/apache-solr-nightly/build/client/solrj

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 78 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 25, Failures: 0, Errors: 0, Time elapsed: 23.125 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.71 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 5.308 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.063 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.18 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.824 sec
[junit] Running org.apache.solr.analysis.HTMLStripReaderTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.653 sec
[junit] Running org.apache.solr.analysis.TestBufferedTokenStream
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.631 sec
[junit] Running org.apache.solr.analysis.TestCapitalizationFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.421 sec
[junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.415 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.504 sec
[junit] Running org.apache.solr.analysis.TestPatternReplaceFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.242 sec
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.435 sec
[junit] Running org.apache.solr.analysis.TestPhoneticFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.776 sec
[junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.867 sec
[junit] Running org.apache.solr.analysis.TestSynonymFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.276 sec
[junit] Running org.apache.solr.analysis.TestTrimFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.736 sec
[junit] Running org.apache.solr.analysis.TestWordDelimiterFilter
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 7.932 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.07 sec
[junit] Running org.apache.solr.common.params.SolrParamTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.089 sec
[junit] Running org.apache.solr.common.util.ContentStreamTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.221 sec
[junit] Running org.apache.solr.common.util.IteratorChainTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.051 sec
[junit] Running org.apache.solr.common.util.TestXMLEscaping
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.059 sec
[junit] Running 

[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-08 Thread Thomas Peuss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566951#action_12566951
 ] 

Thomas Peuss commented on SOLR-127:
---

Think of two scenarios:
* An AJAXified browser client sending requests to Solr. Caching of unchanged 
data in the client and corporate caching proxies speeds up things.
* A cluster of Solr servers behind a loadbalancer with caching functionality. 
Middleware sends requests to Solr through the loadbalancer. Repeating requests 
to unchanged data are responded directly from LB cache without putting load to 
the Solr servers. This is for example our scenario.

Our code works fine with BlueCoat Webcache, Apache HTTPD proxy cache, Squid 
proxy cache and many other solutions _because_ we are following standards here. 
So I don't really get the point of your comment.

Besides that you can completely disable this HTTP header stuff in 
solrconfig.xml if you don't want it.

 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



concurrency while indexing

2008-02-08 Thread Thorsten Scherler
Hi all,

I have following usecase, one solr instance which receives add/commit
calls constantly from 3 different clients.

The machine:
Model: HP Proliant DL 360
Memory: 2 Gb
CPU: 1 Intel Xeon 3.02 Ghz
Disk: 2 x 36 GB SCSI en RAID

I need to raise the number of clients to about 10, can this be a problem
for the indexing machine?

salu2
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-08 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567067#action_12567067
 ] 

Fuad Efendi commented on SOLR-127:
--

In my configuration I do not need SOLR caching at all; but I use HTTP caching 
more effectively.

HTTPD memory- and disk- cache is used between Client and Middleware. No any 
caching between Middleware and SOLR. Middleware responds to HTTPD with 304 if 
necessary, with correct Last-Modified etc., and request do not reach SOLR. This 
caching configuration works fine with AJAX too, without SOLR's caching headers.

I've seen unnecessary extra-work with this implementation... taking long 
time... and tried to point on some meanings of response codes (for Web).

 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-08 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567072#action_12567072
 ] 

Fuad Efendi commented on SOLR-127:
--

Regarding HTTP-Caching-Load-Balancer between SOLR and Middleware:
You need to deal with additional internal http-cache at middleware. In most 
cases Middleware generates content from different sources and can't reroute 
If-Modified-Since request to SOLR without internal caching. For instance, if 
you are using SOLRJ, you have to implement *additional* cache for 
SolrDocument... 


 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567111#action_12567111
 ] 

Yonik Seeley commented on SOLR-342:
---

Yikes!  Thanks for the report Will.  It certainly sounds like a Lucene issue to 
me (esp because removal of this patch fixes things... that means it only 
happens under certain lucene settings).  Could you perhaps try the very latest 
Lucene trunk (there were some seemingly unrelated fixes recently).

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-08 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567077#action_12567077
 ] 

Fuad Efendi commented on SOLR-127:
--

Thomas, Walter,

Finally I agree, thanks!

Middleware should not send/reroute If-Modified-Since, and should not 
implement internal cache (in provided by me contr-sample): with caching 
enabled, it will simply retrieve cached content.

I do not agree with 400, it is place for DoS attacks. Query parsing error 
should be 200 with caching response codes. Of course, I know RFC 2616. 

 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-08 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567068#action_12567068
 ] 

Walter Underwood commented on SOLR-127:
---

Two reasons to do HTTP caching for Solr: First, Solr is HTTP and needs to 
implement that correctly. Second, caches are much harder to implement and test 
than the cache information in HTTP. HTTP caches already exist and are well 
tested, so the implementation cost is zero and deployment is very easy.

The HTTP spec already covers which responses should be cached.  A 400 response 
may only be cached if it includes explicit cache control headers which allow 
that. See RFC 2616.

We are using a caching load balancer and caching in Apache front ends to 
Tomcat. We see an increase of more than 2X in the capacity of our search farm.

I would recommend against Solr-specific cache information in the XML part of 
the responses. Distributed caching is extremely difficult to get right. Around 
25% of the HTTP 1.1 spec is devoted to caching and there are still grey areas.

 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-08 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567064#action_12567064
 ] 

Fuad Efendi commented on SOLR-127:
--

I agree.
Caching Load Balancer between SOLR and APP Servers is excellent idea, and it 
can be black box without any knowlege about SOLR API.
AJAX can use internal cache of web browser; FLEX probably too...
Question: do we need caching of static (non-changed) content from SOLR such as 
400: Query parsing error?.. 



 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-08 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567081#action_12567081
 ] 

Fuad Efendi commented on SOLR-127:
--

Fortunately, we are not using 404 trying to retrieve removed document... In 
initial design (I believe) SOLR developers simply wrapped all exceptions into 
400, and empty result set is not an exception.

 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567099#action_12567099
 ] 

Will Johnson commented on SOLR-342:
---

I think we're running into a very serious issue with trunk + this patch.  
either the document summaries are not matched or the overall matching is 
'wrong'.  i did find this in the lucene jira: LUCENE-994 

Note that these changes will break users of ParallelReader because the
parallel indices will no longer have matching docIDs. Such users need
to switch IndexWriter back to flushing by doc count, and switch the
MergePolicy back to LogDocMergePolicy. It's likely also necessary to
switch the MergeScheduler back to SerialMergeScheduler to ensure
deterministic docID assignment.

we're seeing rather consistent bad results but only after 20-30k documents and 
multiple commits and wondering if anyone else is seeing anything.  i've 
verified that the results are bad even though luke which would seem to remove 
the search side of hte solr equation.   the basic test case is to search for 
title:foo and get back documents that only have title:bar.  we're going to 
start on a unit test but give the document counts and the corpus we're testing 
against it may be a while so i thought i'd ask to see if anyone had any hints.

removing this patch seems to remove the issue so i doesn't appear to be a 
lucene problem



 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567152#action_12567152
 ] 

Yonik Seeley commented on SOLR-342:
---

Thanks Will.  My guess at this point is a merging bug in Lucene, so you might 
be able to reproduce by forcing more merges.  Make mergeFacor=2 and lower how 
many docs it takes to do a merge (set maxBufferedDocs to 2, or set 
ramBufferSizeMB to 1).

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567198#action_12567198
 ] 

Will Johnson commented on SOLR-342:
---

we have: 

mergeFactor10/mergeFactor 
ramBufferSizeMB64/ramBufferSizeMB 
maxMergeDocs2147483647/maxMergeDocs 

and i'm working on a unit test but just adding a few terms per doc doesnt seem 
to trigger it, at least not 'quickly.'


 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2008-02-08 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567184#action_12567184
 ] 

Oleg Gnatovskiy commented on SOLR-236:
--

Also, is field collapse going to be a part of the upcoming Solr 1.3 release, or 
will we need to run a patch on it?

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-475) multi-valued faceting via un-inverted field

2008-02-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-475:
--

Attachment: UnInvertedField.java

Prototype attached.
This is completely untested code, and is still missing the solr interface + 
caching.
The approach is described in the comments (cut-n-pasted here).
Any thoughts or comments on the approach?

I may not have time to immediately work on this (fix the bugs, add tests, hook 
up to solr, add caching of un-inverted field, etc), so additional contributions 
in this direction are welcome!

{code}
/**
 * Final form of the un-inverted field:
 *   Each document points to a list of term numbers that are contained in that 
document.
 *
 *   Term numbers are in sorted order, and are encoded as variable-length 
deltas from the
 *   previous term number.  Real term numbers start at 2 since 0 and 1 are 
reserved.  A
 *   term number of 0 signals the end of the termNumber list.
 *
 *   There is a singe int[maxDoc()] which either contains a pointer into a 
byte[] for
 *   the termNumber lists, or directly contains the termNumber list if it fits 
in the 4
 *   bytes of an integer.  If the first byte in the integer is 1, the next 3 
bytes
 *   are a pointer into a byte[] where the termNumber list starts.
 *
 *   There are actually 256 byte arrays, to compensate for the fact that the 
pointers
 *   into the byte arrays are only 3 bytes long.  The correct byte array for a 
document
 *   is a function of it's id.
 *
 *   To save space and speed up faceting, any term that matches enough 
documents will
 *   not be un-inverted... it will be skipped while building the un-inverted 
field structore,
 *   and will use a set intersection method during faceting.
 *
 *   To further save memory, the terms (the actual string values) are not all 
stored in
 *   memory, but a TermIndex is used to convert term numbers to term values only
 *   for the terms needed after faceting has completed.  Only every 128th term 
value
 *   is stored, along with it's corresponding term number, and this is used as 
an
 *   index to find the closest term and iterate until the desired number is hit 
(very
 *   much like Lucene's own internal term index).
 */
{code}

 multi-valued faceting via un-inverted field
 ---

 Key: SOLR-475
 URL: https://issues.apache.org/jira/browse/SOLR-475
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Attachments: UnInvertedField.java


 Facet multi-valued fields via a counting method (like the FieldCache method) 
 on an un-inverted representation of the field.  For each doc, look at it's 
 terms and increment a count for that term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567207#action_12567207
 ] 

Grant Ingersoll commented on SOLR-342:
--

You mentioned ParallelReader, are you using that, or any other patches?
{quote}
problem to happen before we get 20-30k large docs
{quote}

what is large in your terms?  

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567140#action_12567140
 ] 

Yonik Seeley commented on SOLR-342:
---

Will, are you using term vectors anywhere, or any customizations to Solr (at 
the lucene level)?
When you say document summaries are not matched, you you mean that the 
incorrect documents are matched, or that the correct documents are matched but 
just highlighting is wrong?


 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: /example/solr/bin is empty in trunk

2008-02-08 Thread Fuad Efendi
 Try ant example in the base dir to build the example.

Thanks, it works



[jira] Created: (SOLR-475) multi-valued faceting via un-inverted field

2008-02-08 Thread Yonik Seeley (JIRA)
multi-valued faceting via un-inverted field
---

 Key: SOLR-475
 URL: https://issues.apache.org/jira/browse/SOLR-475
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley


Facet multi-valued fields via a counting method (like the FieldCache method) on 
an un-inverted representation of the field.  For each doc, look at it's terms 
and increment a count for that term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567235#action_12567235
 ] 

Will Johnson commented on SOLR-342:
---

we're using SolrCore in terms of:

core = new SolrCore(foo, dataDir, solrConfig, solrSchema);
UpdateHandler handler = core.getUpdateHandler();
updateHandler.addDoc(command);

which is a bit more low level than normal however when we flipped back to solr 
trunk + lucene 2.3 everything was fine so it leads me to belive that we are ok 
in that respect.

i was going to try and reproduce with lucene directly also but that too is a 
bit outside the scope of what i have time for at the moment.  

and we're not getting any exceptions, just bad search results.

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567218#action_12567218
 ] 

Will Johnson commented on SOLR-342:
---

we're not using parallel reader but we are using direct core access instead of 
going over http.  as for doc size, we're indexing wikipedia but creating 
anumber of extra fields.  they are only large in comparison to any of the 
'large volume' tests i've seen in most of the solr and lucene tests.  

- will

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2008-02-08 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567224#action_12567224
 ] 

Oleg Gnatovskiy commented on SOLR-236:
--

OK, I think I have the first issue figured out. If the current resultset (lets 
say the first 10 rows) doesn't have the field that we are collapsing on, the 
counts don't show up. Is that correct?

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: concurrency while indexing

2008-02-08 Thread Yonik Seeley
On Feb 8, 2008 3:53 AM, Thorsten Scherler
[EMAIL PROTECTED] wrote:
 I have following usecase, one solr instance which receives add/commit
 calls constantly from 3 different clients.

 The machine:
 Model: HP Proliant DL 360
 Memory: 2 Gb
 CPU: 1 Intel Xeon 3.02 Ghz
 Disk: 2 x 36 GB SCSI en RAID

 I need to raise the number of clients to about 10, can this be a problem
 for the indexing machine?

I'd stop the clients from doing commit themselves unless it's really
necessary, and use some form of time based autocommit (see example
solrconfig.xml).

-Yonik


Re: /example/solr/bin is empty in trunk

2008-02-08 Thread Yonik Seeley
On Feb 8, 2008 1:13 AM, Fuad Efendi [EMAIL PROTECTED] wrote:

 Is it correct?.. I want to try distribution/replication in v.2.3


Try ant example in the base dir to build the example.

-Yonik


[jira] Updated: (SOLR-475) multi-valued faceting via un-inverted field

2008-02-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-475:
--

Attachment: UnInvertedField.java

fix single line oops

 multi-valued faceting via un-inverted field
 ---

 Key: SOLR-475
 URL: https://issues.apache.org/jira/browse/SOLR-475
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Attachments: UnInvertedField.java, UnInvertedField.java


 Facet multi-valued fields via a counting method (like the FieldCache method) 
 on an un-inverted representation of the field.  For each doc, look at it's 
 terms and increment a count for that term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567221#action_12567221
 ] 

Grant Ingersoll commented on SOLR-342:
--

Direct core meaning embedded, right?  It's interesting, b/c I have done a fair 
amount of Lucene 2.3 testing w/ Wikipedia (nothing like a free, fairly large 
dataset)

Can you reproduce the problem using Lucene directly? (have a look at 
contrib/benchmark for a way to get Lucene/Wikipedia up and running quickly)

Also, are there any associated exceptions anywhere in the chain?  Or is it just 
that your index is bad?  Are you starting from a clean index or updating an 
existing one?

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.