[jira] [Updated] (SOLR-3948) Calculate/display deleted documents in admin interface

2012-10-16 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-3948:
---

Attachment: SOLR-3948.patch

A patch against branch_4x that puts Deleted Docs into the admin interface.  I 
may not have gotten everything that needs to be touched, this is my first look 
at the code that builds the gui.

 Calculate/display deleted documents in admin interface
 --

 Key: SOLR-3948
 URL: https://issues.apache.org/jira/browse/SOLR-3948
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0
Reporter: Shawn Heisey
Priority: Minor
 Fix For: 4.1

 Attachments: SOLR-3948.patch


 The admin interface shows you two totals that let you infer how many deleted 
 documents exist in the index by subtracting Num Docs from Max Doc.  It would 
 make things much easier for novice users and for automated statistics 
 gathering if the number of deleted documents were calculated for you and 
 displayed.
 Last Modified: 3 minutes ago
 Num Docs: 12924551
 Max Doc: 13011778
 Version: 862
 Segment Count: 23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3951) wt=json should set application/json as content-type

2012-10-16 Thread Fredrik Rodland (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredrik Rodland updated SOLR-3951:
--

Description: 
the result with wt=json has content-type text/plain.  Should be 
application/json. 

see SOLR-1123 (which seemed to be fixed for 4.0-ALPHA).

reproduce:
load all tutorial data.

http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true


info on request/response:

{code}9:42:14.681[31ms][total 69ms] Status: 200[OK]
GET http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true 

Content Size[-1] Mime Type[text/plain]
   Request Headers:
  Host[localhost:8983]
  User-Agent[Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) 
Gecko/20100101 Firefox/16.0]
  Accept[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
  Accept-Language[en-US,en;q=0.5]
  Accept-Encoding[gzip, deflate]
  Connection[keep-alive]
  Referer[http://localhost:8983/solr/]
  Cache-Control[max-age=0]
   Response Headers:
  Content-Type[text/plain;charset=UTF-8]
  Transfer-Encoding[chunked]{code}




  was:
the result with wt=json has content-type text/plain.  Should be 
application/json. 

see SOLR-1123 (which seemed to be fixed for 4.0-ALPHA).

reproduce:
load all tutorial data.

http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true

9:42:14.681[31ms][total 69ms] Status: 200[OK]
GET http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true 

info on request/response:

Content Size[-1] Mime Type[text/plain]
   Request Headers:
  Host[localhost:8983]
  User-Agent[Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) 
Gecko/20100101 Firefox/16.0]
  Accept[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
  Accept-Language[en-US,en;q=0.5]
  Accept-Encoding[gzip, deflate]
  Connection[keep-alive]
  Referer[http://localhost:8983/solr/]
  Cache-Control[max-age=0]
   Response Headers:
  Content-Type[text/plain;charset=UTF-8]
  Transfer-Encoding[chunked]





 wt=json should set application/json as content-type
 ---

 Key: SOLR-3951
 URL: https://issues.apache.org/jira/browse/SOLR-3951
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
 Environment: Darwin SCH-BP-2003.local 11.4.2 Darwin Kernel Version 
 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 
 x86_64, SOLR 4.0.0
Reporter: Fredrik Rodland

 the result with wt=json has content-type text/plain.  Should be 
 application/json. 
 see SOLR-1123 (which seemed to be fixed for 4.0-ALPHA).
 reproduce:
 load all tutorial data.
 http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true
 info on request/response:
 {code}9:42:14.681[31ms][total 69ms] Status: 200[OK]
 GET http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true 
 Content Size[-1] Mime Type[text/plain]
Request Headers:
   Host[localhost:8983]
   User-Agent[Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) 
 Gecko/20100101 Firefox/16.0]
   Accept[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
   Accept-Language[en-US,en;q=0.5]
   Accept-Encoding[gzip, deflate]
   Connection[keep-alive]
   Referer[http://localhost:8983/solr/]
   Cache-Control[max-age=0]
Response Headers:
   Content-Type[text/plain;charset=UTF-8]
   Transfer-Encoding[chunked]{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3951) wt=json should set application/json as content-type

2012-10-16 Thread Fredrik Rodland (JIRA)
Fredrik Rodland created SOLR-3951:
-

 Summary: wt=json should set application/json as content-type
 Key: SOLR-3951
 URL: https://issues.apache.org/jira/browse/SOLR-3951
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
 Environment: Darwin SCH-BP-2003.local 11.4.2 Darwin Kernel Version 
11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 
x86_64, SOLR 4.0.0
Reporter: Fredrik Rodland


the result with wt=json has content-type text/plain.  Should be 
application/json. 

see SOLR-1123 (which seemed to be fixed for 4.0-ALPHA).

reproduce:
load all tutorial data.

http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true

9:42:14.681[31ms][total 69ms] Status: 200[OK]
GET http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true 

info on request/response:

Content Size[-1] Mime Type[text/plain]
   Request Headers:
  Host[localhost:8983]
  User-Agent[Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) 
Gecko/20100101 Firefox/16.0]
  Accept[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
  Accept-Language[en-US,en;q=0.5]
  Accept-Encoding[gzip, deflate]
  Connection[keep-alive]
  Referer[http://localhost:8983/solr/]
  Cache-Control[max-age=0]
   Response Headers:
  Content-Type[text/plain;charset=UTF-8]
  Transfer-Encoding[chunked]




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3951) wt=json should set application/json as content-type

2012-10-16 Thread Fredrik Rodland (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredrik Rodland updated SOLR-3951:
--

Environment: max osx 10.7.5, SOLR 4.0.0  (was: Darwin SCH-BP-2003.local 
11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; 
root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64, SOLR 4.0.0)

 wt=json should set application/json as content-type
 ---

 Key: SOLR-3951
 URL: https://issues.apache.org/jira/browse/SOLR-3951
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
 Environment: max osx 10.7.5, SOLR 4.0.0
Reporter: Fredrik Rodland

 the result with wt=json has content-type text/plain.  Should be 
 application/json. 
 see SOLR-1123 (which seemed to be fixed for 4.0-ALPHA).
 reproduce:
 load all tutorial data.
 http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true
 info on request/response:
 {code}9:42:14.681[31ms][total 69ms] Status: 200[OK]
 GET http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true 
 Content Size[-1] Mime Type[text/plain]
Request Headers:
   Host[localhost:8983]
   User-Agent[Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) 
 Gecko/20100101 Firefox/16.0]
   Accept[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
   Accept-Language[en-US,en;q=0.5]
   Accept-Encoding[gzip, deflate]
   Connection[keep-alive]
   Referer[http://localhost:8983/solr/]
   Cache-Control[max-age=0]
Response Headers:
   Content-Type[text/plain;charset=UTF-8]
   Transfer-Encoding[chunked]{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3951) wt=json should set application/json as content-type

2012-10-16 Thread Fredrik Rodland (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredrik Rodland resolved SOLR-3951.
---

Resolution: Not A Problem

hm - reading a bit more - it seems that this is intended, and that you must 
manually specify that you want the content-type to be application/json when you 
use wt=json.  Seems like an awkward decision.

{code}
  queryResponseWriter name=json class=solr.JSONResponseWriter
 !-- For the purposes of the tutorial, JSON responses are written as
  plain text so that they are easy to read in *any* browser.
  If you expect a MIME type of application/json just remove this override.
 --
str name=content-typeapplication/json; charset=UTF-8/str
  /queryResponseWriter
{code)

resolving issue as not a problem


 wt=json should set application/json as content-type
 ---

 Key: SOLR-3951
 URL: https://issues.apache.org/jira/browse/SOLR-3951
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
 Environment: max osx 10.7.5, SOLR 4.0.0
Reporter: Fredrik Rodland

 the result with wt=json has content-type text/plain.  Should be 
 application/json. 
 see SOLR-1123 (which seemed to be fixed for 4.0-ALPHA).
 reproduce:
 load all tutorial data.
 http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true
 info on request/response:
 {code}9:42:14.681[31ms][total 69ms] Status: 200[OK]
 GET http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true 
 Content Size[-1] Mime Type[text/plain]
Request Headers:
   Host[localhost:8983]
   User-Agent[Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) 
 Gecko/20100101 Firefox/16.0]
   Accept[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
   Accept-Language[en-US,en;q=0.5]
   Accept-Encoding[gzip, deflate]
   Connection[keep-alive]
   Referer[http://localhost:8983/solr/]
   Cache-Control[max-age=0]
Response Headers:
   Content-Type[text/plain;charset=UTF-8]
   Transfer-Encoding[chunked]{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3951) wt=json should set application/json as content-type

2012-10-16 Thread Fredrik Rodland (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476833#comment-13476833
 ] 

Fredrik Rodland edited comment on SOLR-3951 at 10/16/12 7:53 AM:
-

hm - reading a bit more - it seems that this is intended, and that you must 
manually specify that you want the content-type to be application/json when you 
use wt=json.  Seems like an awkward decision.

{code}
from solrconfig.xml
...
  For the purposes of the tutorial, JSON responses are written as
  plain text so that they are easy to read in *any* browser.
  If you expect a MIME type of application/json just remove this override.
...
{code)

resolving issue as not a problem


  was (Author: fmr):
hm - reading a bit more - it seems that this is intended, and that you must 
manually specify that you want the content-type to be application/json when you 
use wt=json.  Seems like an awkward decision.

{code}
  queryResponseWriter name=json class=solr.JSONResponseWriter
 !-- For the purposes of the tutorial, JSON responses are written as
  plain text so that they are easy to read in *any* browser.
  If you expect a MIME type of application/json just remove this override.
 --
str name=content-typeapplication/json; charset=UTF-8/str
  /queryResponseWriter
{code)

resolving issue as not a problem

  
 wt=json should set application/json as content-type
 ---

 Key: SOLR-3951
 URL: https://issues.apache.org/jira/browse/SOLR-3951
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
 Environment: max osx 10.7.5, SOLR 4.0.0
Reporter: Fredrik Rodland

 the result with wt=json has content-type text/plain.  Should be 
 application/json. 
 see SOLR-1123 (which seemed to be fixed for 4.0-ALPHA).
 reproduce:
 load all tutorial data.
 http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true
 info on request/response:
 {code}9:42:14.681[31ms][total 69ms] Status: 200[OK]
 GET http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true 
 Content Size[-1] Mime Type[text/plain]
Request Headers:
   Host[localhost:8983]
   User-Agent[Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) 
 Gecko/20100101 Firefox/16.0]
   Accept[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
   Accept-Language[en-US,en;q=0.5]
   Accept-Encoding[gzip, deflate]
   Connection[keep-alive]
   Referer[http://localhost:8983/solr/]
   Cache-Control[max-age=0]
Response Headers:
   Content-Type[text/plain;charset=UTF-8]
   Transfer-Encoding[chunked]{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3951) wt=json should set application/json as content-type

2012-10-16 Thread Fredrik Rodland (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476833#comment-13476833
 ] 

Fredrik Rodland edited comment on SOLR-3951 at 10/16/12 7:53 AM:
-

hm - reading a bit more - it seems that this is intended, and that you must 
manually specify that you want the content-type to be application/json when you 
use wt=json.  Seems like an awkward decision.

{code}
from solrconfig.xml
...
  For the purposes of the tutorial, JSON responses are written as
  plain text so that they are easy to read in *any* browser.
  If you expect a MIME type of application/json just remove this override.
...
{code}

resolving issue as not a problem


  was (Author: fmr):
hm - reading a bit more - it seems that this is intended, and that you must 
manually specify that you want the content-type to be application/json when you 
use wt=json.  Seems like an awkward decision.

{code}
from solrconfig.xml
...
  For the purposes of the tutorial, JSON responses are written as
  plain text so that they are easy to read in *any* browser.
  If you expect a MIME type of application/json just remove this override.
...
{code)

resolving issue as not a problem

  
 wt=json should set application/json as content-type
 ---

 Key: SOLR-3951
 URL: https://issues.apache.org/jira/browse/SOLR-3951
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
 Environment: max osx 10.7.5, SOLR 4.0.0
Reporter: Fredrik Rodland

 the result with wt=json has content-type text/plain.  Should be 
 application/json. 
 see SOLR-1123 (which seemed to be fixed for 4.0-ALPHA).
 reproduce:
 load all tutorial data.
 http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true
 info on request/response:
 {code}9:42:14.681[31ms][total 69ms] Status: 200[OK]
 GET http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=true 
 Content Size[-1] Mime Type[text/plain]
Request Headers:
   Host[localhost:8983]
   User-Agent[Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) 
 Gecko/20100101 Firefox/16.0]
   Accept[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
   Accept-Language[en-US,en;q=0.5]
   Accept-Encoding[gzip, deflate]
   Connection[keep-alive]
   Referer[http://localhost:8983/solr/]
   Cache-Control[max-age=0]
Response Headers:
   Content-Type[text/plain;charset=UTF-8]
   Transfer-Encoding[chunked]{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3950) Attempting postings=BloomFilter results in UnsupportedOperationException

2012-10-16 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476854#comment-13476854
 ] 

Mark Harwood commented on SOLR-3950:


BloomFilterPostingsFormat is designed to wrap another choice of PostingsFormat 
and adds .blm files to the other files created by the choice of delegate.

However your code has instantiated a BloomFilterPostingsFormat without passing 
a choice of delegate - presumably using the zero-arg constructor. 
The comments in the code for this zero-arg constructor state:

  // Used only by core Lucene at read-time via Service Provider instantiation -
  // do not use at Write-time in application code.





 Attempting postings=BloomFilter results in UnsupportedOperationException
 --

 Key: SOLR-3950
 URL: https://issues.apache.org/jira/browse/SOLR-3950
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.1
 Environment: Linux bigindy5 2.6.32-279.9.1.el6.centos.plus.x86_64 #1 
 SMP Wed Sep 26 03:52:55 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 [root@bigindy5 ~]# java -version
 java version 1.7.0_07
 Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
 Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
Reporter: Shawn Heisey
 Fix For: 4.1


 Tested on branch_4x, checked out after BlockPostingsFormat was made the 
 default by LUCENE-4446.
 I used 'ant generate-maven-artifacts' to create the lucene-codecs jar, and 
 copied it into my sharedLib directory.  When I subsequently tried 
 postings=BloomFilter I got a the following exception in the log:
 {code}
 Oct 15, 2012 11:14:02 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.UnsupportedOperationException: Error - 
 org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
 constructed without a choice of PostingsFormat
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 62 - Failure

2012-10-16 Thread Michael McCandless
I opened LUCENE-4484 for this.

Mike McCandless

http://blog.mikemccandless.com

On Sun, Oct 14, 2012 at 2:21 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/62/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.Test4GBStoredFields.test

 Error Message:
 Java heap space

 Stack Trace:
 java.lang.OutOfMemoryError: Java heap space
 at 
 __randomizedtesting.SeedInfo.seed([2D89DD229CD304F5:A5DDE2F8322F690D]:0)
 at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:75)
 at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:48)
 at 
 org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:139)
 at 
 org.apache.lucene.store.RAMOutputStream.writeBytes(RAMOutputStream.java:125)
 at 
 org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:123)
 at 
 org.apache.lucene.codecs.lucene40.Lucene40StoredFieldsWriter.writeField(Lucene40StoredFieldsWriter.java:180)
 at 
 org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:120)
 at 
 org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:339)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:263)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
 at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1443)
 at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1122)
 at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1103)
 at 
 org.apache.lucene.index.Test4GBStoredFields.test(Test4GBStoredFields.java:80)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)




 Build Log:
 [...truncated 419 lines...]
 [junit4:junit4] Suite: org.apache.lucene.index.Test4GBStoredFields
 [junit4:junit4]   2 NOTE: download the large Jenkins line-docs file by 
 running 'ant get-jenkins-line-docs' in the lucene directory.
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=Test4GBStoredFields -Dtests.method=test 
 -Dtests.seed=2D89DD229CD304F5 -Dtests.multiplier=3 -Dtests.nightly=true 
 -Dtests.slow=true 
 -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
 -Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   3.95s J0 | Test4GBStoredFields.test 
 [junit4:junit4] Throwable #1: java.lang.OutOfMemoryError: Java heap space
 [junit4:junit4]at 
 __randomizedtesting.SeedInfo.seed([2D89DD229CD304F5:A5DDE2F8322F690D]:0)
 [junit4:junit4]at 
 org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:75)
 [junit4:junit4]at 
 org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:48)
 [junit4:junit4]at 
 org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:139)
 [junit4:junit4]at 
 org.apache.lucene.store.RAMOutputStream.writeBytes(RAMOutputStream.java:125)
 [junit4:junit4]at 
 

[jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files

2012-10-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476908#comment-13476908
 ] 

Michael McCandless commented on LUCENE-4484:


bq. Can uncache() be changed to return the still-open newly created IndexOutput?

I think we'd have to wrap the RAMOutputStream .. then we could 1) know when too 
many bytes have been written, 2) close the wrapped RAMOutputStream and call 
uncache to move it to disk, 3) fix uncache to not close the IO (return it), 4) 
cutover the wrapper to the new on-disk IO.  And all of this would have to be 
done inside a writeByte/s call (from the caller's standpoint) ... it seems 
hairy.

We could also just leave it be, ie advertise this limitation.  NRTCachingDir is 
already hairy enough...  The purpose of this directory is to be used in an NRT 
setting where you have relatively frequent reopens compared to the indexing 
rate, and this naturally keeps files plenty small.  It's also particularly 
unusual to index only stored fields in an NRT setting (what this test is doing).

Yet another option would be to somehow have the indexer be able to flush based 
on size of stored fields / term vectors files ... today of course we completely 
disregard these from the RAM accounting since we write their bytes directly to 
disk.  Maybe ... the app could pass the indexer an AtomicInt/Long recording 
bytes held elsewhere in RAM, and indexer would add that in its logic for when 
to trigger a flush...

 NRTCachingDir can't handle large files
 --

 Key: LUCENE-4484
 URL: https://issues.apache.org/jira/browse/LUCENE-4484
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless

 I dug into this OOME, which easily repros for me on rev 1398268:
 {noformat}
 ant test  -Dtestcase=Test4GBStoredFields -Dtests.method=test 
 -Dtests.seed=2D89DD229CD304F5 -Dtests.multiplier=3 -Dtests.nightly=true 
 -Dtests.slow=true 
 -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
 -Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok 
 -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
 {noformat}
 The problem is the test got NRTCachingDir ... which cannot handle large files 
 because it decides up front (when createOutput is called) whether the file 
 will be in RAMDir vs wrapped dir ... so if that file turns out to be immense 
 (which this test does since stored fields files can grow arbitrarily huge w/o 
 any flush happening) then it takes unbounded RAM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4472) Add setting that prevents merging on updateDocument

2012-10-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476915#comment-13476915
 ] 

Michael McCandless commented on LUCENE-4472:


I like the new MergeCause enum!

But, instead of folding all parameters into a MergeContext, and exposing a 
single MergePolicy.findMerges methods, can we keep the methods we have today 
and just add MergeCause as another parameter?  This is a very expert API and I 
think it's fine to simply change it.  I think this approach is more type-safe 
for the future, ie if we need to add something important such that a custom 
merge policy should pay attention to it ... apps will see compilation errors on 
upgrading and know they have to handle the new parameter.

 Add setting that prevents merging on updateDocument
 ---

 Key: LUCENE-4472
 URL: https://issues.apache.org/jira/browse/LUCENE-4472
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4472.patch, LUCENE-4472.patch


 Currently we always call maybeMerge if a segment was flushed after 
 updateDocument. Some apps and in particular ElasticSearch uses some hacky 
 workarounds to disable that ie for merge throttling. It should be easier to 
 enable this kind of behavior. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4472) Add setting that prevents merging on updateDocument

2012-10-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476920#comment-13476920
 ] 

Michael McCandless commented on LUCENE-4472:


Actually I think we only need to add the MergeCause (maybe rename this to 
MergeTrigger?) param to findMerges?  That method is invoked for natural merges, 
and knowing the trigger for the natural merge is useful...

The other two methods (findForceMerges, findForcedDeletesMerges) are only 
triggered when the app explicitly asked IndexWriter to do so.

 Add setting that prevents merging on updateDocument
 ---

 Key: LUCENE-4472
 URL: https://issues.apache.org/jira/browse/LUCENE-4472
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4472.patch, LUCENE-4472.patch


 Currently we always call maybeMerge if a segment was flushed after 
 updateDocument. Some apps and in particular ElasticSearch uses some hacky 
 workarounds to disable that ie for merge throttling. It should be easier to 
 enable this kind of behavior. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-trunk-Linux-Java6-64-test-only - Build # 9932 - Failure!

2012-10-16 Thread Michael McCandless
Hmmm I'll dig.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Oct 15, 2012 at 7:35 PM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java6-64-test-only/9932/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.TestNRTThreads.testNRTThreads

 Error Message:
 saw non-zero open-but-deleted count

 Stack Trace:
 java.lang.AssertionError: saw non-zero open-but-deleted count
 at 
 __randomizedtesting.SeedInfo.seed([447148DE18F87BA8:DFA85CC559036DC3]:0)
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.assertTrue(Assert.java:43)
 at org.junit.Assert.assertFalse(Assert.java:68)
 at 
 org.apache.lucene.index.TestNRTThreads.doSearching(TestNRTThreads.java:89)
 at 
 org.apache.lucene.index.ThreadedIndexingAndSearchingTestCase.runTest(ThreadedIndexingAndSearchingTestCase.java:507)
 at 
 org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:127)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at java.lang.Thread.run(Thread.java:662)




 Build Log:
 [...truncated 335 lines...]
 [junit4:junit4] Suite: 

[jira] [Commented] (LUCENE-4472) Add setting that prevents merging on updateDocument

2012-10-16 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476928#comment-13476928
 ] 

Simon Willnauer commented on LUCENE-4472:
-

bq. The other two methods (findForceMerges, findForcedDeletesMerges) are only 
triggered when the app explicitly asked IndexWriter to do so.

I am not sure if we should really do that. I'd rather make those two methods 
protected and make it a impl detail of merge policy. I think the specialized 
methods are a poor man's approach to the MergeContext and the api is rather 
clumsy along those lines. I'd be happy to not break bw. compat but only add a 
more flexible API that is the authoritative source / single entry point for the 
IndexWriter. If you think this through finfForcedDeletesMerges and 
findForcedMerges are really and impl detail of the current IndexWriter and if 
we would modularize it would become even more obvious.  

 Add setting that prevents merging on updateDocument
 ---

 Key: LUCENE-4472
 URL: https://issues.apache.org/jira/browse/LUCENE-4472
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4472.patch, LUCENE-4472.patch


 Currently we always call maybeMerge if a segment was flushed after 
 updateDocument. Some apps and in particular ElasticSearch uses some hacky 
 workarounds to disable that ie for merge throttling. It should be easier to 
 enable this kind of behavior. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3881) frequent OOM in LanguageIdentifierUpdateProcessor

2012-10-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476942#comment-13476942
 ] 

Jan Høydahl commented on SOLR-3881:
---

I'm sure it's possible to optimize memory footprint somehow. The reason why we 
concat all fl fields before detection was originally because Tika's detector 
gets better and better the longer input text you have. So while detection for 
individual short fields have a high risk of mis-detection, the resulting 
concatenated string has a better chance.

A configurable max-cap in the concatenation may make sense, as the detection 
accuracy flattens out after some threshold.

Perhaps we could also avoid the expandCapacity() and Ararys.copyOf() calls if 
we pre-allocate the StringBuffer with the theoretical max size, being the size 
of our SolrInputDoc. If StringBuffer is at 10kb and needs an extra 10b for an 
append, it will allocate a new buffer of (10kb+1)*2 capacity which is a waste. 
We should also switch to StringBuilder which is more performant.

 frequent OOM in LanguageIdentifierUpdateProcessor
 -

 Key: SOLR-3881
 URL: https://issues.apache.org/jira/browse/SOLR-3881
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
 Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G 
 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=)
Reporter: Rob Tulloh

 We are seeing frequent failures from Solr causing it to OOM. Here is the 
 stack trace we observe when this happens:
 {noformat}
 Caused by: java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2882)
 at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
 at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
 at java.lang.StringBuffer.append(StringBuffer.java:224)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
 at 
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
 at 
 

[jira] [Created] (LUCENE-4485) CheckIndex's term stats should not include deleted docs

2012-10-16 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-4485:
--

 Summary: CheckIndex's term stats should not include deleted docs
 Key: LUCENE-4485
 URL: https://issues.apache.org/jira/browse/LUCENE-4485
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless


I was looking at the CheckIndex output on and index that has deletions, eg:
{noformat}
  4 of 30: name=_90 docCount=588408
codec=Lucene41
compound=false
numFiles=14
size (MB)=265.318
diagnostics = {os=Linux, os.version=3.2.0-23-generic, mergeFactor=10, 
source=merge, lucene.version=5.0-SNAPSHOT, os.arch=amd64, 
mergeMaxNumSegments=-1, java.version=1.7.0_07, java.vendor=Oracle Corporation}
has deletions [delGen=1]
test: open reader.OK [39351 deleted docs]
test: fields..OK [8 fields]
test: field norms.OK [2 fields]
test: terms, freq, prox...OK [4910342 terms; 61319238 terms/docs pairs; 
65597188 tokens]
test (ignoring deletes): terms, freq, prox...OK [4910342 terms; 61319238 
terms/docs pairs; 70293065 tokens]
test: stored fields...OK [1647171 total field count; avg 3 fields per 
doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq vector 
fields per doc]
test: docvalues...OK [0 total doc count; 1 docvalues fields]
{noformat}

If you compare the {{test: terms, freq, prox}} (includes deletions) and the 
next line (doesn't include deletions), it's confusing because only the 3rd 
number (tokens) reflects deletions.  I think the first two numbers should also 
reflect deletions?  This way an app could get a sense of how much deadweight 
is in the index due to un-reclaimed deletions...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3952) TextResponseWriter/XMLWriter: Make escaping deactivatable

2012-10-16 Thread Sebastian Lutze (JIRA)
Sebastian Lutze created SOLR-3952:
-

 Summary: TextResponseWriter/XMLWriter: Make escaping deactivatable
 Key: SOLR-3952
 URL: https://issues.apache.org/jira/browse/SOLR-3952
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 1.4
Reporter: Sebastian Lutze
Priority: Minor
 Fix For: 4.1
 Attachments: disable_escape.patch

Since we have full control over what is stored in our indexes, we want to 
retrieve highlighted terms or phrases in real XML-tags ...

{code:xml}
str
 emNapoleon/em 
/str
{code}

... rather than in escaped sequences:

{code:xml}
str
 lt;emgt;Napoleonlt;/emgt; 
/str
{code}

Until now I haven't discovered any solution to solve this problem 
out-of-the-box. We patched together a very crude workaround involving Cocoon's 
ServletService, a XSLT-stylesheet and disableOutputEscaping=yes. 

Therefore this patch provides:

- a field doEscape in TextResponseWriter and corresponding getters/setters
- support for a request-parameter escape=off to disable escaping 

I'm not sure if I have chosen the optimal approach to address this issue or if 
the issue is even a issue. Maybe there is a better way with Formatters/Encoders 
or something else? 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3952) TextResponseWriter/XMLWriter: Make escaping deactivatable

2012-10-16 Thread Sebastian Lutze (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Lutze updated SOLR-3952:
--

Attachment: disable_escape.patch

 TextResponseWriter/XMLWriter: Make escaping deactivatable
 -

 Key: SOLR-3952
 URL: https://issues.apache.org/jira/browse/SOLR-3952
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 3.6
Reporter: Sebastian Lutze
Priority: Minor
  Labels: escaping, response, xml
 Fix For: 4.1

 Attachments: disable_escape.patch


 Since we have full control over what is stored in our indexes, we want to 
 retrieve highlighted terms or phrases in real XML-tags ...
 {code:xml}
 str
  emNapoleon/em 
 /str
 {code}
 ... rather than in escaped sequences:
 {code:xml}
 str
  lt;emgt;Napoleonlt;/emgt; 
 /str
 {code}
 Until now I haven't discovered any solution to solve this problem 
 out-of-the-box. We patched together a very crude workaround involving 
 Cocoon's ServletService, a XSLT-stylesheet and disableOutputEscaping=yes. 
 Therefore this patch provides:
 - a field doEscape in TextResponseWriter and corresponding getters/setters
 - support for a request-parameter escape=off to disable escaping 
 I'm not sure if I have chosen the optimal approach to address this issue or 
 if the issue is even a issue. Maybe there is a better way with 
 Formatters/Encoders or something else? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3952) TextResponseWriter/XMLWriter: Make escaping deactivatable

2012-10-16 Thread Sebastian Lutze (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Lutze updated SOLR-3952:
--

Affects Version/s: (was: 1.4)
   3.6

 TextResponseWriter/XMLWriter: Make escaping deactivatable
 -

 Key: SOLR-3952
 URL: https://issues.apache.org/jira/browse/SOLR-3952
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 3.6
Reporter: Sebastian Lutze
Priority: Minor
  Labels: escaping, response, xml
 Fix For: 4.1

 Attachments: disable_escape.patch


 Since we have full control over what is stored in our indexes, we want to 
 retrieve highlighted terms or phrases in real XML-tags ...
 {code:xml}
 str
  emNapoleon/em 
 /str
 {code}
 ... rather than in escaped sequences:
 {code:xml}
 str
  lt;emgt;Napoleonlt;/emgt; 
 /str
 {code}
 Until now I haven't discovered any solution to solve this problem 
 out-of-the-box. We patched together a very crude workaround involving 
 Cocoon's ServletService, a XSLT-stylesheet and disableOutputEscaping=yes. 
 Therefore this patch provides:
 - a field doEscape in TextResponseWriter and corresponding getters/setters
 - support for a request-parameter escape=off to disable escaping 
 I'm not sure if I have chosen the optimal approach to address this issue or 
 if the issue is even a issue. Maybe there is a better way with 
 Formatters/Encoders or something else? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files

2012-10-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476964#comment-13476964
 ] 

Robert Muir commented on LUCENE-4484:
-

{quote}
... it seems hairy.

We could also just leave it be, ie advertise this limitation. NRTCachingDir is 
already hairy enough... The purpose of this directory is to be used in an NRT 
setting where you have relatively frequent reopens compared to the indexing 
rate, and this naturally keeps files plenty small.
{quote}

This seems fine to me. I think lets just do javadocs?

Because in general there are lots of other combinations of stupid parameters 
that can cause OOM/Out of Open Files/etc and we can't prevent all of them.

 NRTCachingDir can't handle large files
 --

 Key: LUCENE-4484
 URL: https://issues.apache.org/jira/browse/LUCENE-4484
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless

 I dug into this OOME, which easily repros for me on rev 1398268:
 {noformat}
 ant test  -Dtestcase=Test4GBStoredFields -Dtests.method=test 
 -Dtests.seed=2D89DD229CD304F5 -Dtests.multiplier=3 -Dtests.nightly=true 
 -Dtests.slow=true 
 -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
 -Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok 
 -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
 {noformat}
 The problem is the test got NRTCachingDir ... which cannot handle large files 
 because it decides up front (when createOutput is called) whether the file 
 will be in RAMDir vs wrapped dir ... so if that file turns out to be immense 
 (which this test does since stored fields files can grow arbitrarily huge w/o 
 any flush happening) then it takes unbounded RAM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files

2012-10-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476969#comment-13476969
 ] 

Mark Miller commented on LUCENE-4484:
-

Doesn't seem like a great answer to me - if you want to use NRTCachingDir, 
please make sure you are constantly indexing and reopening so that you don't 
run into problems...that sounds hairy as well...

 NRTCachingDir can't handle large files
 --

 Key: LUCENE-4484
 URL: https://issues.apache.org/jira/browse/LUCENE-4484
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless

 I dug into this OOME, which easily repros for me on rev 1398268:
 {noformat}
 ant test  -Dtestcase=Test4GBStoredFields -Dtests.method=test 
 -Dtests.seed=2D89DD229CD304F5 -Dtests.multiplier=3 -Dtests.nightly=true 
 -Dtests.slow=true 
 -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
 -Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok 
 -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
 {noformat}
 The problem is the test got NRTCachingDir ... which cannot handle large files 
 because it decides up front (when createOutput is called) whether the file 
 will be in RAMDir vs wrapped dir ... so if that file turns out to be immense 
 (which this test does since stored fields files can grow arbitrarily huge w/o 
 any flush happening) then it takes unbounded RAM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files

2012-10-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476973#comment-13476973
 ] 

Robert Muir commented on LUCENE-4484:
-

The test in question is extreme in that it doesnt actually index anything, its 
just adding stored fields.



 NRTCachingDir can't handle large files
 --

 Key: LUCENE-4484
 URL: https://issues.apache.org/jira/browse/LUCENE-4484
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless

 I dug into this OOME, which easily repros for me on rev 1398268:
 {noformat}
 ant test  -Dtestcase=Test4GBStoredFields -Dtests.method=test 
 -Dtests.seed=2D89DD229CD304F5 -Dtests.multiplier=3 -Dtests.nightly=true 
 -Dtests.slow=true 
 -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
 -Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok 
 -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
 {noformat}
 The problem is the test got NRTCachingDir ... which cannot handle large files 
 because it decides up front (when createOutput is called) whether the file 
 will be in RAMDir vs wrapped dir ... so if that file turns out to be immense 
 (which this test does since stored fields files can grow arbitrarily huge w/o 
 any flush happening) then it takes unbounded RAM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files

2012-10-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476977#comment-13476977
 ] 

Mark Miller commented on LUCENE-4484:
-

Yeah, I know - its a special case of stored fields and term vectors - but it 
would still be great if it was a special case you didn't have to worry about.

It's not the end of the world - if someone has problems we can tell them to 
stop using NRTCachingDir - but it would also be great if it just worked well in 
that case too.

(Solr defaults to NRTCachingDir)

 NRTCachingDir can't handle large files
 --

 Key: LUCENE-4484
 URL: https://issues.apache.org/jira/browse/LUCENE-4484
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless

 I dug into this OOME, which easily repros for me on rev 1398268:
 {noformat}
 ant test  -Dtestcase=Test4GBStoredFields -Dtests.method=test 
 -Dtests.seed=2D89DD229CD304F5 -Dtests.multiplier=3 -Dtests.nightly=true 
 -Dtests.slow=true 
 -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
 -Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok 
 -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
 {noformat}
 The problem is the test got NRTCachingDir ... which cannot handle large files 
 because it decides up front (when createOutput is called) whether the file 
 will be in RAMDir vs wrapped dir ... so if that file turns out to be immense 
 (which this test does since stored fields files can grow arbitrarily huge w/o 
 any flush happening) then it takes unbounded RAM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files

2012-10-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476981#comment-13476981
 ] 

Robert Muir commented on LUCENE-4484:
-

I know it does: I think a much safer general solution to keep e.g. file counts 
low would be to just match the lucene defaults:
FSDirectory.open and CFS enabled.

I tend to agree with Mike on NRTCachingDirectory can really be especially for 
the NRT use case
because otherwise I think its going to be ugly to make it work well for all 
use-cases... and 
even then not OOM'ing doesnt necessarily mean working well. If its always 
overflowing its cache
and having to uncache files because its not really an NRT use case that doesn't 
seem great.

But i don't disagree with trying to make it more general either, I do just 
think that this should be done in NRTCachingDir
itself and not hacked into indexwriter (flushing when stored files get too 
large is illogical outside of hacking around this)


 NRTCachingDir can't handle large files
 --

 Key: LUCENE-4484
 URL: https://issues.apache.org/jira/browse/LUCENE-4484
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless

 I dug into this OOME, which easily repros for me on rev 1398268:
 {noformat}
 ant test  -Dtestcase=Test4GBStoredFields -Dtests.method=test 
 -Dtests.seed=2D89DD229CD304F5 -Dtests.multiplier=3 -Dtests.nightly=true 
 -Dtests.slow=true 
 -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
 -Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok 
 -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
 {noformat}
 The problem is the test got NRTCachingDir ... which cannot handle large files 
 because it decides up front (when createOutput is called) whether the file 
 will be in RAMDir vs wrapped dir ... so if that file turns out to be immense 
 (which this test does since stored fields files can grow arbitrarily huge w/o 
 any flush happening) then it takes unbounded RAM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files

2012-10-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476989#comment-13476989
 ] 

Robert Muir commented on LUCENE-4484:
-

{quote}
And all of this would have to be done inside a writeByte/s call (from the 
caller's standpoint)
{quote}

In trunk at least this could be done in switchBuffer or whatever instead. Not 
that it makes
it cleaner, just less ugly.

 NRTCachingDir can't handle large files
 --

 Key: LUCENE-4484
 URL: https://issues.apache.org/jira/browse/LUCENE-4484
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless

 I dug into this OOME, which easily repros for me on rev 1398268:
 {noformat}
 ant test  -Dtestcase=Test4GBStoredFields -Dtests.method=test 
 -Dtests.seed=2D89DD229CD304F5 -Dtests.multiplier=3 -Dtests.nightly=true 
 -Dtests.slow=true 
 -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
 -Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok 
 -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
 {noformat}
 The problem is the test got NRTCachingDir ... which cannot handle large files 
 because it decides up front (when createOutput is called) whether the file 
 will be in RAMDir vs wrapped dir ... so if that file turns out to be immense 
 (which this test does since stored fields files can grow arbitrarily huge w/o 
 any flush happening) then it takes unbounded RAM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3950) Attempting postings=BloomFilter results in UnsupportedOperationException

2012-10-16 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476999#comment-13476999
 ] 

Shawn Heisey commented on SOLR-3950:


bq. However your code has instantiated a BloomFilterPostingsFormat without 
passing a choice of delegate - presumably using the zero-arg constructor. 

In this case, my code is Solr, source code unmodified.  From my schema.xml:

{code}
fieldType name=bloomLong class=solr.TrieLongField precisionStep=0 
omitNorms=true positionIncrementGap=0 postingsFormat=BloomFilter/
fieldType name=bloomLowercase class=solr.TextField sortMissingLast=true 
positionIncrementGap=0 omitNorms=true postingsFormat=BloomFilter
.
. snip
.
/fieldType
{code}

If there is some schema config that will tell Solr to do the right thing, 
please let me know.


 Attempting postings=BloomFilter results in UnsupportedOperationException
 --

 Key: SOLR-3950
 URL: https://issues.apache.org/jira/browse/SOLR-3950
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.1
 Environment: Linux bigindy5 2.6.32-279.9.1.el6.centos.plus.x86_64 #1 
 SMP Wed Sep 26 03:52:55 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 [root@bigindy5 ~]# java -version
 java version 1.7.0_07
 Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
 Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
Reporter: Shawn Heisey
 Fix For: 4.1


 Tested on branch_4x, checked out after BlockPostingsFormat was made the 
 default by LUCENE-4446.
 I used 'ant generate-maven-artifacts' to create the lucene-codecs jar, and 
 copied it into my sharedLib directory.  When I subsequently tried 
 postings=BloomFilter I got a the following exception in the log:
 {code}
 Oct 15, 2012 11:14:02 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.UnsupportedOperationException: Error - 
 org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
 constructed without a choice of PostingsFormat
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3953) postingsFormat doesn't work on field, only on fieldType

2012-10-16 Thread Shawn Heisey (JIRA)
Shawn Heisey created SOLR-3953:
--

 Summary: postingsFormat doesn't work on field, only on fieldType
 Key: SOLR-3953
 URL: https://issues.apache.org/jira/browse/SOLR-3953
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.1
Reporter: Shawn Heisey
Priority: Minor
 Fix For: 4.1


The following schema config (adding postingsFormat) produces no changes in 
Solr's behavior.  If postingsFormat=BloomFilter is instead added to a new 
fieldType and that fieldType is used, then Solr's behavior changes.  In my 
pre-deployment tests, it results in SOLR-3950.

field name=did type=long indexed=true stored=true 
postingsFormat=BloomFilter/

Having to add a new fieldType for an alternate codec leads to configuration 
duplication and the potential for confusing problems.  I would imagine that 
most people that are interested in alternate codecs will want to continue using 
an existing type.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3926) solrj should support better way of finding active sorts

2012-10-16 Thread Eirik Lygre (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eirik Lygre updated SOLR-3926:
--

Affects Version/s: (was: 4.0-BETA)
   4.0

 solrj should support better way of finding active sorts
 ---

 Key: SOLR-3926
 URL: https://issues.apache.org/jira/browse/SOLR-3926
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 4.0
Reporter: Eirik Lygre
Priority: Minor

 The Solrj api uses ortogonal concepts for setting/removing and getting sort 
 information. Setting/removing uses a combination of (name,order), while 
 getters return a String name order:
 {code}
 public SolrQuery setSortField(String field, ORDER order);
 public SolrQuery addSortField(String field, ORDER order);
 public SolrQuery removeSortField(String field, ORDER order);
 public String[] getSortFields();
 public String getSortField();
 {code}
 If you want to use the current sort information to present a list of active 
 sorts, with the possibility to remove then, you need to manually parse the 
 string(s) returned from getSortFields, to recreate the information required 
 by removeSortField(). Not difficult, but not convenient either :-)
 Therefore this suggestion: Add a new method {{public MapString,ORDER 
 getSortFieldMap();}} which returns an ordered map of active sort fields. An 
 example implementation is shown below (here as a utility method living 
 outside SolrQuery; the rewrite should be trivial)
 {code}
 public MapString, ORDER getSortFieldMap(SolrQuery query) {
 String[] actualSortFields = query.getSortFields();
 if (actualSortFields == null || actualSortFields.length == 0)
 return Collections.emptyMap();
 MapString, ORDER sortFieldMap = new LinkedHashMapString, ORDER();
 for (String sortField : actualSortFields) {
 String[] fieldSpec = sortField.split( );
 sortFieldMap.put(fieldSpec[0], ORDER.valueOf(fieldSpec[1]));
 }
 return sortFieldMap;
 }
 {code}
 For what it's worth, this is possible client code:
 {code}
 System.out.println(Active sorts);
 MapString, ORDER fieldMap = getSortFieldMap(query);
 for (String field : fieldMap.keySet()) {
System.out.println(-  + field + ; dir= + fieldMap.get(field));
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3950) Attempting postings=BloomFilter results in UnsupportedOperationException

2012-10-16 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477036#comment-13477036
 ] 

Mark Harwood commented on SOLR-3950:


bq. If there is some schema config that will tell Solr to do the right thing, 
please let me know.

Right now BloomPF is like an abstract class - you need to fill-in-the-blanks as 
to what delegate it will use before you can use it at write-time.
I think we have 3 options:

1) Solr (or you) provide a new PF impl that weds BloomPF with a choice of PF 
e.g. Lucene40PF so you would have a zero-arg-constructor class named something 
like BloomLucene40PF or...
2) Solr extends config file format to provide a generic means of assembling 
wrapper PFs like Bloom in their config e.g:
   postingsFormat=BloomFilter delegatePostingsFormat=FooPF 
   and Solr then does reflection magic to call constructors appropriately or..
3) Core Lucene is changed so that BloomPF is wedded to a default PF (e.g. 
Lucene40PF) if users e.g. Solr fail to nominate a choice of delegate for 
BloomPF.

Of these 1) feels like the right thing.

Cheers
Mark

 Attempting postings=BloomFilter results in UnsupportedOperationException
 --

 Key: SOLR-3950
 URL: https://issues.apache.org/jira/browse/SOLR-3950
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.1
 Environment: Linux bigindy5 2.6.32-279.9.1.el6.centos.plus.x86_64 #1 
 SMP Wed Sep 26 03:52:55 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 [root@bigindy5 ~]# java -version
 java version 1.7.0_07
 Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
 Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
Reporter: Shawn Heisey
 Fix For: 4.1


 Tested on branch_4x, checked out after BlockPostingsFormat was made the 
 default by LUCENE-4446.
 I used 'ant generate-maven-artifacts' to create the lucene-codecs jar, and 
 copied it into my sharedLib directory.  When I subsequently tried 
 postings=BloomFilter I got a the following exception in the log:
 {code}
 Oct 15, 2012 11:14:02 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.UnsupportedOperationException: Error - 
 org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
 constructed without a choice of PostingsFormat
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4226) Efficient compression of small to medium stored fields

2012-10-16 Thread Radim Kolar (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477038#comment-13477038
 ] 

Radim Kolar commented on LUCENE-4226:
-

is there example config provided?

 Efficient compression of small to medium stored fields
 --

 Key: LUCENE-4226
 URL: https://issues.apache.org/jira/browse/LUCENE-4226
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Fix For: 4.1, 5.0

 Attachments: CompressionBenchmark.java, CompressionBenchmark.java, 
 LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, 
 LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, 
 SnappyCompressionAlgorithm.java


 I've been doing some experiments with stored fields lately. It is very common 
 for an index with stored fields enabled to have most of its space used by the 
 .fdt index file. To prevent this .fdt file from growing too much, one option 
 is to compress stored fields. Although compression works rather well for 
 large fields, this is not the case for small fields and the compression ratio 
 can be very close to 100%, even with efficient compression algorithms.
 In order to improve the compression ratio for small fields, I've written a 
 {{StoredFieldsFormat}} that compresses several documents in a single chunk of 
 data. To see how it behaves in terms of document deserialization speed and 
 compression ratio, I've run several tests with different index compression 
 strategies on 100,000 docs from Mike's 1K Wikipedia articles (title and text 
 were indexed and stored):
  - no compression,
  - docs compressed with deflate (compression level = 1),
  - docs compressed with deflate (compression level = 9),
  - docs compressed with Snappy,
  - using the compressing {{StoredFieldsFormat}} with deflate (level = 1) and 
 chunks of 6 docs,
  - using the compressing {{StoredFieldsFormat}} with deflate (level = 9) and 
 chunks of 6 docs,
  - using the compressing {{StoredFieldsFormat}} with Snappy and chunks of 6 
 docs.
 For those who don't know Snappy, it is compression algorithm from Google 
 which has very high compression ratios, but compresses and decompresses data 
 very quickly.
 {noformat}
 Format   Compression ratio IndexReader.document time
 
 uncompressed 100%  100%
 doc/deflate 1 59%  616%
 doc/deflate 9 58%  595%
 doc/snappy80%  129%
 index/deflate 1   49%  966%
 index/deflate 9   46%  938%
 index/snappy  65%  264%
 {noformat}
 (doc = doc-level compression, index = index-level compression)
 I find it interesting because it allows to trade speed for space (with 
 deflate, the .fdt file shrinks by a factor of 2, much better than with 
 doc-level compression). One other interesting thing is that {{index/snappy}} 
 is almost as compact as {{doc/deflate}} while it is more than 2x faster at 
 retrieving documents from disk.
 These tests have been done on a hot OS cache, which is the worst case for 
 compressed fields (one can expect better results for formats that have a high 
 compression ratio since they probably require fewer read/write operations 
 from disk).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4226) Efficient compression of small to medium stored fields

2012-10-16 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477041#comment-13477041
 ] 

Simon Willnauer commented on LUCENE-4226:
-

@adrien I deleted the jenkins job for this.

 Efficient compression of small to medium stored fields
 --

 Key: LUCENE-4226
 URL: https://issues.apache.org/jira/browse/LUCENE-4226
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Fix For: 4.1, 5.0

 Attachments: CompressionBenchmark.java, CompressionBenchmark.java, 
 LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, 
 LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, 
 SnappyCompressionAlgorithm.java


 I've been doing some experiments with stored fields lately. It is very common 
 for an index with stored fields enabled to have most of its space used by the 
 .fdt index file. To prevent this .fdt file from growing too much, one option 
 is to compress stored fields. Although compression works rather well for 
 large fields, this is not the case for small fields and the compression ratio 
 can be very close to 100%, even with efficient compression algorithms.
 In order to improve the compression ratio for small fields, I've written a 
 {{StoredFieldsFormat}} that compresses several documents in a single chunk of 
 data. To see how it behaves in terms of document deserialization speed and 
 compression ratio, I've run several tests with different index compression 
 strategies on 100,000 docs from Mike's 1K Wikipedia articles (title and text 
 were indexed and stored):
  - no compression,
  - docs compressed with deflate (compression level = 1),
  - docs compressed with deflate (compression level = 9),
  - docs compressed with Snappy,
  - using the compressing {{StoredFieldsFormat}} with deflate (level = 1) and 
 chunks of 6 docs,
  - using the compressing {{StoredFieldsFormat}} with deflate (level = 9) and 
 chunks of 6 docs,
  - using the compressing {{StoredFieldsFormat}} with Snappy and chunks of 6 
 docs.
 For those who don't know Snappy, it is compression algorithm from Google 
 which has very high compression ratios, but compresses and decompresses data 
 very quickly.
 {noformat}
 Format   Compression ratio IndexReader.document time
 
 uncompressed 100%  100%
 doc/deflate 1 59%  616%
 doc/deflate 9 58%  595%
 doc/snappy80%  129%
 index/deflate 1   49%  966%
 index/deflate 9   46%  938%
 index/snappy  65%  264%
 {noformat}
 (doc = doc-level compression, index = index-level compression)
 I find it interesting because it allows to trade speed for space (with 
 deflate, the .fdt file shrinks by a factor of 2, much better than with 
 doc-level compression). One other interesting thing is that {{index/snappy}} 
 is almost as compact as {{doc/deflate}} while it is more than 2x faster at 
 retrieving documents from disk.
 These tests have been done on a hot OS cache, which is the worst case for 
 compressed fields (one can expect better results for formats that have a high 
 compression ratio since they probably require fewer read/write operations 
 from disk).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 63 - Still Failing

2012-10-16 Thread Robert Muir
I think this is really https://issues.apache.org/jira/browse/LUCENE-4182 ?

It seemed to be triggered several times before by NGramTokenizer with
crazy params: e.g. large docs. So maybe this test is provoking it
too for the same reason.

I've never been able to reproduce these fails.

On Sun, Oct 14, 2012 at 11:50 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/63/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.TestBagOfPositions.test

 Error Message:
 Captured an uncaught exception in thread: Thread[id=644, name=Thread-561, 
 state=RUNNABLE, group=TGRP-TestBagOfPositions]

 Stack Trace:
 com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
 uncaught exception in thread: Thread[id=644, name=Thread-561, state=RUNNABLE, 
 group=TGRP-TestBagOfPositions]
 Caused by: java.lang.AssertionError: ram was 33879456 expected: 33851840 
 flush mem: 18092896 activeMem: 15786560 pendingMem: 0 flushingMem: 3 
 blockedMem: 0 peakDeltaMem: 99136
 at __randomizedtesting.SeedInfo.seed([11A534B74B63930E]:0)
 at 
 org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:114)
 at 
 org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:181)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:384)
 at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1443)
 at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1122)
 at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:201)
 at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160)
 at 
 org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:110)




 Build Log:
 [...truncated 420 lines...]
 [junit4:junit4] Suite: org.apache.lucene.index.TestBagOfPositions
 [junit4:junit4]   2 NOTE: download the large Jenkins line-docs file by 
 running 'ant get-jenkins-line-docs' in the lucene directory.
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=TestBagOfPositions -Dtests.method=test 
 -Dtests.seed=11A534B74B63930E -Dtests.multiplier=3 -Dtests.nightly=true 
 -Dtests.slow=true 
 -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
 -Dtests.locale=fi -Dtests.timezone=Africa/Conakry 
 -Dtests.file.encoding=ISO-8859-1
 [junit4:junit4] ERROR206s J0 | TestBagOfPositions.test 
 [junit4:junit4] Throwable #1: 
 com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
 uncaught exception in thread: Thread[id=644, name=Thread-561, state=RUNNABLE, 
 group=TGRP-TestBagOfPositions]
 [junit4:junit4] Caused by: java.lang.AssertionError: ram was 33879456 
 expected: 33851840 flush mem: 18092896 activeMem: 15786560 pendingMem: 0 
 flushingMem: 3 blockedMem: 0 peakDeltaMem: 99136
 [junit4:junit4]at 
 __randomizedtesting.SeedInfo.seed([11A534B74B63930E]:0)
 [junit4:junit4]at 
 org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:114)
 [junit4:junit4]at 
 org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:181)
 [junit4:junit4]at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:384)
 [junit4:junit4]at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1443)
 [junit4:junit4]at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1122)
 [junit4:junit4]at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:201)
 [junit4:junit4]at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160)
 [junit4:junit4]at 
 org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:110)
 [junit4:junit4] Throwable #2: 
 com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
 uncaught exception in thread: Thread[id=643, name=Thread-560, state=RUNNABLE, 
 group=TGRP-TestBagOfPositions]
 [junit4:junit4] Caused by: java.lang.AssertionError: ram was 33879456 
 expected: 33851840 flush mem: 18092896 activeMem: 15786560 pendingMem: 0 
 flushingMem: 3 blockedMem: 0 peakDeltaMem: 99136
 [junit4:junit4]at 
 __randomizedtesting.SeedInfo.seed([11A534B74B63930E]:0)
 [junit4:junit4]at 
 org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:114)
 [junit4:junit4]at 
 org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:181)
 [junit4:junit4]at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:384)
 [junit4:junit4]at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1443)
 [junit4:junit4]   

[jira] [Commented] (LUCENE-4226) Efficient compression of small to medium stored fields

2012-10-16 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477062#comment-13477062
 ] 

Adrien Grand commented on LUCENE-4226:
--

@radim you can have a look at CompressingCodec in lucene/test-framework
@Simon ok, thanks!

 Efficient compression of small to medium stored fields
 --

 Key: LUCENE-4226
 URL: https://issues.apache.org/jira/browse/LUCENE-4226
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Fix For: 4.1, 5.0

 Attachments: CompressionBenchmark.java, CompressionBenchmark.java, 
 LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, 
 LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, LUCENE-4226.patch, 
 SnappyCompressionAlgorithm.java


 I've been doing some experiments with stored fields lately. It is very common 
 for an index with stored fields enabled to have most of its space used by the 
 .fdt index file. To prevent this .fdt file from growing too much, one option 
 is to compress stored fields. Although compression works rather well for 
 large fields, this is not the case for small fields and the compression ratio 
 can be very close to 100%, even with efficient compression algorithms.
 In order to improve the compression ratio for small fields, I've written a 
 {{StoredFieldsFormat}} that compresses several documents in a single chunk of 
 data. To see how it behaves in terms of document deserialization speed and 
 compression ratio, I've run several tests with different index compression 
 strategies on 100,000 docs from Mike's 1K Wikipedia articles (title and text 
 were indexed and stored):
  - no compression,
  - docs compressed with deflate (compression level = 1),
  - docs compressed with deflate (compression level = 9),
  - docs compressed with Snappy,
  - using the compressing {{StoredFieldsFormat}} with deflate (level = 1) and 
 chunks of 6 docs,
  - using the compressing {{StoredFieldsFormat}} with deflate (level = 9) and 
 chunks of 6 docs,
  - using the compressing {{StoredFieldsFormat}} with Snappy and chunks of 6 
 docs.
 For those who don't know Snappy, it is compression algorithm from Google 
 which has very high compression ratios, but compresses and decompresses data 
 very quickly.
 {noformat}
 Format   Compression ratio IndexReader.document time
 
 uncompressed 100%  100%
 doc/deflate 1 59%  616%
 doc/deflate 9 58%  595%
 doc/snappy80%  129%
 index/deflate 1   49%  966%
 index/deflate 9   46%  938%
 index/snappy  65%  264%
 {noformat}
 (doc = doc-level compression, index = index-level compression)
 I find it interesting because it allows to trade speed for space (with 
 deflate, the .fdt file shrinks by a factor of 2, much better than with 
 doc-level compression). One other interesting thing is that {{index/snappy}} 
 is almost as compact as {{doc/deflate}} while it is more than 2x faster at 
 retrieving documents from disk.
 These tests have been done on a hot OS cache, which is the worst case for 
 compressed fields (one can expect better results for formats that have a high 
 compression ratio since they probably require fewer read/write operations 
 from disk).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3952) TextResponseWriter/XMLWriter: Make escaping deactivatable

2012-10-16 Thread Sebastian Lutze (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Lutze updated SOLR-3952:
--

Attachment: disable_escape.patch

 TextResponseWriter/XMLWriter: Make escaping deactivatable
 -

 Key: SOLR-3952
 URL: https://issues.apache.org/jira/browse/SOLR-3952
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 3.6
Reporter: Sebastian Lutze
Priority: Minor
  Labels: escaping, response, xml
 Fix For: 4.1

 Attachments: disable_escape.patch, disable_escape.patch


 Since we have full control over what is stored in our indexes, we want to 
 retrieve highlighted terms or phrases in real XML-tags ...
 {code:xml}
 str
  emNapoleon/em 
 /str
 {code}
 ... rather than in escaped sequences:
 {code:xml}
 str
  lt;emgt;Napoleonlt;/emgt; 
 /str
 {code}
 Until now I haven't discovered any solution to solve this problem 
 out-of-the-box. We patched together a very crude workaround involving 
 Cocoon's ServletService, a XSLT-stylesheet and disableOutputEscaping=yes. 
 Therefore this patch provides:
 - a field doEscape in TextResponseWriter and corresponding getters/setters
 - support for a request-parameter escape=off to disable escaping 
 I'm not sure if I have chosen the optimal approach to address this issue or 
 if the issue is even a issue. Maybe there is a better way with 
 Formatters/Encoders or something else? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-trunk-Linux-Java6-64-test-only - Build # 9932 - Failure!

2012-10-16 Thread Michael McCandless
I committed a fix.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Oct 16, 2012 at 7:23 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 Hmmm I'll dig.

 Mike McCandless

 http://blog.mikemccandless.com


 On Mon, Oct 15, 2012 at 7:35 PM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java6-64-test-only/9932/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.TestNRTThreads.testNRTThreads

 Error Message:
 saw non-zero open-but-deleted count

 Stack Trace:
 java.lang.AssertionError: saw non-zero open-but-deleted count
 at 
 __randomizedtesting.SeedInfo.seed([447148DE18F87BA8:DFA85CC559036DC3]:0)
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.assertTrue(Assert.java:43)
 at org.junit.Assert.assertFalse(Assert.java:68)
 at 
 org.apache.lucene.index.TestNRTThreads.doSearching(TestNRTThreads.java:89)
 at 
 org.apache.lucene.index.ThreadedIndexingAndSearchingTestCase.runTest(ThreadedIndexingAndSearchingTestCase.java:507)
 at 
 org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:127)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 

Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 9691 - Failure!

2012-10-16 Thread Michael McCandless
On Sat, Oct 13, 2012 at 12:05 PM, Robert Muir rcm...@gmail.com wrote:
 This one is now a nightly-only test! So maybe we can safely enable
 this for the hourly builds?

+1

Seems like we just need something to prune them if disk is getting full?

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4485) CheckIndex's term stats should not include deleted docs

2012-10-16 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4485:
---

Attachment: LUCENE-4485.patch

Simple patch ...

 CheckIndex's term stats should not include deleted docs
 ---

 Key: LUCENE-4485
 URL: https://issues.apache.org/jira/browse/LUCENE-4485
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4485.patch


 I was looking at the CheckIndex output on and index that has deletions, eg:
 {noformat}
   4 of 30: name=_90 docCount=588408
 codec=Lucene41
 compound=false
 numFiles=14
 size (MB)=265.318
 diagnostics = {os=Linux, os.version=3.2.0-23-generic, mergeFactor=10, 
 source=merge, lucene.version=5.0-SNAPSHOT, os.arch=amd64, 
 mergeMaxNumSegments=-1, java.version=1.7.0_07, java.vendor=Oracle Corporation}
 has deletions [delGen=1]
 test: open reader.OK [39351 deleted docs]
 test: fields..OK [8 fields]
 test: field norms.OK [2 fields]
 test: terms, freq, prox...OK [4910342 terms; 61319238 terms/docs pairs; 
 65597188 tokens]
 test (ignoring deletes): terms, freq, prox...OK [4910342 terms; 61319238 
 terms/docs pairs; 70293065 tokens]
 test: stored fields...OK [1647171 total field count; avg 3 fields per 
 doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector fields per doc]
 test: docvalues...OK [0 total doc count; 1 docvalues fields]
 {noformat}
 If you compare the {{test: terms, freq, prox}} (includes deletions) and the 
 next line (doesn't include deletions), it's confusing because only the 3rd 
 number (tokens) reflects deletions.  I think the first two numbers should 
 also reflect deletions?  This way an app could get a sense of how much 
 deadweight is in the index due to un-reclaimed deletions...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3881) frequent OOM in LanguageIdentifierUpdateProcessor

2012-10-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477104#comment-13477104
 ] 

Hoss Man commented on SOLR-3881:


bq. The reason why we concat all fl fields before detection was originally 
because Tika's detector gets better and better the longer input text you have.

But is it possible to give Tika a String[] or ListString instead of concating 
everything into a single String?

 frequent OOM in LanguageIdentifierUpdateProcessor
 -

 Key: SOLR-3881
 URL: https://issues.apache.org/jira/browse/SOLR-3881
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
 Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G 
 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=)
Reporter: Rob Tulloh

 We are seeing frequent failures from Solr causing it to OOM. Here is the 
 stack trace we observe when this happens:
 {noformat}
 Caused by: java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2882)
 at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
 at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
 at java.lang.StringBuffer.append(StringBuffer.java:224)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
 at 
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3954) Option to have updateHandler and DIH skip updateLog

2012-10-16 Thread Shawn Heisey (JIRA)
Shawn Heisey created SOLR-3954:
--

 Summary: Option to have updateHandler and DIH skip updateLog
 Key: SOLR-3954
 URL: https://issues.apache.org/jira/browse/SOLR-3954
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1


The updateLog feature makes updates take longer, likely because of the I/O time 
required to write the additional information to disk.  It may take as much as 
three times as long for the indexing portion of the process.  I'm not sure 
whether it affects the time to commit, but I would imagine that the difference 
there is small or zero.  When doing incremental updates/deletes on an existing 
index, the time lag is probably very small and unimportant.

When doing a full reindex (which may happen via DIH), especially if this is 
done in a build core that is then swapped with a live core, this performance 
hit is unacceptable.  It seems to make the import take about three times as 
long.

An option to have an update skip the updateLog would be very useful for these 
situations.  It should have a method in SolrJ and be exposed in DIH as well.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3954) Option to have updateHandler and DIH skip updateLog

2012-10-16 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477106#comment-13477106
 ] 

Shawn Heisey commented on SOLR-3954:


I was unsure what to put for the priority.  Minor seems slightly too low and 
Major seems too high.

 Option to have updateHandler and DIH skip updateLog
 ---

 Key: SOLR-3954
 URL: https://issues.apache.org/jira/browse/SOLR-3954
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1


 The updateLog feature makes updates take longer, likely because of the I/O 
 time required to write the additional information to disk.  It may take as 
 much as three times as long for the indexing portion of the process.  I'm not 
 sure whether it affects the time to commit, but I would imagine that the 
 difference there is small or zero.  When doing incremental updates/deletes on 
 an existing index, the time lag is probably very small and unimportant.
 When doing a full reindex (which may happen via DIH), especially if this is 
 done in a build core that is then swapped with a live core, this performance 
 hit is unacceptable.  It seems to make the import take about three times as 
 long.
 An option to have an update skip the updateLog would be very useful for these 
 situations.  It should have a method in SolrJ and be exposed in DIH as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4485) CheckIndex's term stats should not include deleted docs

2012-10-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477112#comment-13477112
 ] 

Robert Muir commented on LUCENE-4485:
-

+1

 CheckIndex's term stats should not include deleted docs
 ---

 Key: LUCENE-4485
 URL: https://issues.apache.org/jira/browse/LUCENE-4485
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4485.patch


 I was looking at the CheckIndex output on and index that has deletions, eg:
 {noformat}
   4 of 30: name=_90 docCount=588408
 codec=Lucene41
 compound=false
 numFiles=14
 size (MB)=265.318
 diagnostics = {os=Linux, os.version=3.2.0-23-generic, mergeFactor=10, 
 source=merge, lucene.version=5.0-SNAPSHOT, os.arch=amd64, 
 mergeMaxNumSegments=-1, java.version=1.7.0_07, java.vendor=Oracle Corporation}
 has deletions [delGen=1]
 test: open reader.OK [39351 deleted docs]
 test: fields..OK [8 fields]
 test: field norms.OK [2 fields]
 test: terms, freq, prox...OK [4910342 terms; 61319238 terms/docs pairs; 
 65597188 tokens]
 test (ignoring deletes): terms, freq, prox...OK [4910342 terms; 61319238 
 terms/docs pairs; 70293065 tokens]
 test: stored fields...OK [1647171 total field count; avg 3 fields per 
 doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector fields per doc]
 test: docvalues...OK [0 total doc count; 1 docvalues fields]
 {noformat}
 If you compare the {{test: terms, freq, prox}} (includes deletions) and the 
 next line (doesn't include deletions), it's confusing because only the 3rd 
 number (tokens) reflects deletions.  I think the first two numbers should 
 also reflect deletions?  This way an app could get a sense of how much 
 deadweight is in the index due to un-reclaimed deletions...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3881) frequent OOM in LanguageIdentifierUpdateProcessor

2012-10-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477123#comment-13477123
 ] 

Robert Muir commented on SOLR-3881:
---

The langdetect implementation can append each piece at a time.

It can also take reader: append(Reader), but that is really just syntactic 
sugar forwarding to append(String)
and not exceeding the Detector.max_text_length.

Seems like the concatenating stuff should be pushed out of the base class into 
the Tika impl.

 frequent OOM in LanguageIdentifierUpdateProcessor
 -

 Key: SOLR-3881
 URL: https://issues.apache.org/jira/browse/SOLR-3881
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
 Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G 
 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=)
Reporter: Rob Tulloh

 We are seeing frequent failures from Solr causing it to OOM. Here is the 
 stack trace we observe when this happens:
 {noformat}
 Caused by: java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2882)
 at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
 at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
 at java.lang.StringBuffer.append(StringBuffer.java:224)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
 at 
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-3843) Add lucene-codecs to Solr libs?

2012-10-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reopened SOLR-3843:



Reopening.

Core codecs and Solr should just work w/o requiring users to copy any jar files 
around.

 Add lucene-codecs to Solr libs?
 ---

 Key: SOLR-3843
 URL: https://issues.apache.org/jira/browse/SOLR-3843
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.0
Reporter: Adrien Grand
Priority: Minor
 Fix For: 4.1


 Solr gives the ability to its users to select the postings format to use on a 
 per-field basis but only Lucene40PostingsFormat is available by default 
 (unless users add lucene-codecs to the Solr lib directory). Maybe we should 
 add lucene-codecs to Solr libs (I mean in the WAR file) so that people can 
 try our non-default postings formats with minimum effort?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3843) Add lucene-codecs to Solr libs?

2012-10-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-3843:
---

 Priority: Critical  (was: Minor)
Affects Version/s: 4.0
Fix Version/s: 4.1

 Add lucene-codecs to Solr libs?
 ---

 Key: SOLR-3843
 URL: https://issues.apache.org/jira/browse/SOLR-3843
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.0
Reporter: Adrien Grand
Priority: Critical
 Fix For: 4.1


 Solr gives the ability to its users to select the postings format to use on a 
 per-field basis but only Lucene40PostingsFormat is available by default 
 (unless users add lucene-codecs to the Solr lib directory). Maybe we should 
 add lucene-codecs to Solr libs (I mean in the WAR file) so that people can 
 try our non-default postings formats with minimum effort?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3954) Option to have updateHandler and DIH skip updateLog

2012-10-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477196#comment-13477196
 ] 

Mark Miller commented on SOLR-3954:
---

What config are you using? The updatelog should not normally have this kind of 
performance penalty.

In any case, I don't think we would add an option to skip the update log - you 
can remove it if the performance is unacceptable.

 Option to have updateHandler and DIH skip updateLog
 ---

 Key: SOLR-3954
 URL: https://issues.apache.org/jira/browse/SOLR-3954
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1


 The updateLog feature makes updates take longer, likely because of the I/O 
 time required to write the additional information to disk.  It may take as 
 much as three times as long for the indexing portion of the process.  I'm not 
 sure whether it affects the time to commit, but I would imagine that the 
 difference there is small or zero.  When doing incremental updates/deletes on 
 an existing index, the time lag is probably very small and unimportant.
 When doing a full reindex (which may happen via DIH), especially if this is 
 done in a build core that is then swapped with a live core, this performance 
 hit is unacceptable.  It seems to make the import take about three times as 
 long.
 An option to have an update skip the updateLog would be very useful for these 
 situations.  It should have a method in SolrJ and be exposed in DIH as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3939) Solr Cloud recovery and leader election when unloading leader core

2012-10-16 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477222#comment-13477222
 ] 

Joel Bernstein commented on SOLR-3939:
--

It looks like after the leader is unloaded, the replica attempts to sync to the 
unloaded leader as part of the process to determine if it can be leader. When 
this fails, it thinks that there are better candidates to become leader. Then 
it goes into a recovery loop.

 Solr Cloud recovery and leader election when unloading leader core
 --

 Key: SOLR-3939
 URL: https://issues.apache.org/jira/browse/SOLR-3939
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-BETA, 4.0
Reporter: Joel Bernstein
Assignee: Mark Miller
  Labels: 4.0.1_Candidate
 Fix For: 4.1, 5.0

 Attachments: cloud.log, SOLR-3939.patch


 When a leader core is unloaded using the core admin api, the followers in the 
 shard go into recovery but do not come out. Leader election doesn't take 
 place and the shard goes down.
 This effects the ability to move a micro-shard from one Solr instance to 
 another Solr instance.
 The problem does not occur 100% of the time but a large % of the time. 
 To setup a test, startup Solr Cloud with a single shard. Add cores to that 
 shard as replicas using core admin. Then unload the leader core using core 
 admin. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3939) Solr Cloud recovery and leader election when unloading leader core

2012-10-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477247#comment-13477247
 ] 

Mark Miller commented on SOLR-3939:
---

That's what I see when I have an empty index. The leader sync fails because 
sync always fails with no local versions.

The case with docs is perhaps a bit trickier since my simple test passes. I'll 
take a look at the logs.

 Solr Cloud recovery and leader election when unloading leader core
 --

 Key: SOLR-3939
 URL: https://issues.apache.org/jira/browse/SOLR-3939
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-BETA, 4.0
Reporter: Joel Bernstein
Assignee: Mark Miller
  Labels: 4.0.1_Candidate
 Fix For: 4.1, 5.0

 Attachments: cloud.log, SOLR-3939.patch


 When a leader core is unloaded using the core admin api, the followers in the 
 shard go into recovery but do not come out. Leader election doesn't take 
 place and the shard goes down.
 This effects the ability to move a micro-shard from one Solr instance to 
 another Solr instance.
 The problem does not occur 100% of the time but a large % of the time. 
 To setup a test, startup Solr Cloud with a single shard. Add cores to that 
 shard as replicas using core admin. Then unload the leader core using core 
 admin. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3939) Solr Cloud recovery and leader election when unloading leader core

2012-10-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477250#comment-13477250
 ] 

Mark Miller commented on SOLR-3939:
---

I think I see the issue. While we have talked about it, we don't currently try 
to populate the transaction log after a replication.

So, the second core replica is replicating, it's got docs but no versions, then 
it tries to become the leader - but just like with the empty index, it cannot 
successfully sync with no versions as a frame of reference.

 Solr Cloud recovery and leader election when unloading leader core
 --

 Key: SOLR-3939
 URL: https://issues.apache.org/jira/browse/SOLR-3939
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-BETA, 4.0
Reporter: Joel Bernstein
Assignee: Mark Miller
  Labels: 4.0.1_Candidate
 Fix For: 4.1, 5.0

 Attachments: cloud.log, SOLR-3939.patch


 When a leader core is unloaded using the core admin api, the followers in the 
 shard go into recovery but do not come out. Leader election doesn't take 
 place and the shard goes down.
 This effects the ability to move a micro-shard from one Solr instance to 
 another Solr instance.
 The problem does not occur 100% of the time but a large % of the time. 
 To setup a test, startup Solr Cloud with a single shard. Add cores to that 
 shard as replicas using core admin. Then unload the leader core using core 
 admin. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3939) Solr Cloud recovery and leader election when unloading leader core

2012-10-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3939:
--

Priority: Critical  (was: Major)

 Solr Cloud recovery and leader election when unloading leader core
 --

 Key: SOLR-3939
 URL: https://issues.apache.org/jira/browse/SOLR-3939
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-BETA, 4.0
Reporter: Joel Bernstein
Assignee: Mark Miller
Priority: Critical
  Labels: 4.0.1_Candidate
 Fix For: 4.1, 5.0

 Attachments: cloud.log, SOLR-3939.patch


 When a leader core is unloaded using the core admin api, the followers in the 
 shard go into recovery but do not come out. Leader election doesn't take 
 place and the shard goes down.
 This effects the ability to move a micro-shard from one Solr instance to 
 another Solr instance.
 The problem does not occur 100% of the time but a large % of the time. 
 To setup a test, startup Solr Cloud with a single shard. Add cores to that 
 shard as replicas using core admin. Then unload the leader core using core 
 admin. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3939) Solr Cloud recovery and leader election when unloading leader core

2012-10-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477252#comment-13477252
 ] 

Mark Miller commented on SOLR-3939:
---

(My test was passing because I had the replica up initially, so it go the docs 
from the leader not through replication)

 Solr Cloud recovery and leader election when unloading leader core
 --

 Key: SOLR-3939
 URL: https://issues.apache.org/jira/browse/SOLR-3939
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-BETA, 4.0
Reporter: Joel Bernstein
Assignee: Mark Miller
Priority: Critical
  Labels: 4.0.1_Candidate
 Fix For: 4.1, 5.0

 Attachments: cloud.log, SOLR-3939.patch


 When a leader core is unloaded using the core admin api, the followers in the 
 shard go into recovery but do not come out. Leader election doesn't take 
 place and the shard goes down.
 This effects the ability to move a micro-shard from one Solr instance to 
 another Solr instance.
 The problem does not occur 100% of the time but a large % of the time. 
 To setup a test, startup Solr Cloud with a single shard. Add cores to that 
 shard as replicas using core admin. Then unload the leader core using core 
 admin. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3955) Return only matched multiValued field

2012-10-16 Thread Dotan Cohen (JIRA)
Dotan Cohen created SOLR-3955:
-

 Summary: Return only matched multiValued field
 Key: SOLR-3955
 URL: https://issues.apache.org/jira/browse/SOLR-3955
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Dotan Cohen


Assuming a multivalued, stored and indexed field named comment. When 
performing a search, it would be very helpful if there were a way to return 
only the values of comment which contain the match. For example:

When searching for gold instead of getting this result:

doc
arr name=comment
strTheres a lady whos sure/str
strall that glitters is gold/str
strand shes buying a stairway to heaven/str
/arr
/doc

I would prefer to get this result:

doc
arr name=comment
strall that glitters is gold/str
/arr
/doc

(psuedo-XML from memory, may not be accurate but illustrates the point)

Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3956) group.facet and facet.limit=-1 returns no facet counts

2012-10-16 Thread Mike Spencer (JIRA)
Mike Spencer created SOLR-3956:
--

 Summary: group.facet and facet.limit=-1 returns no facet counts
 Key: SOLR-3956
 URL: https://issues.apache.org/jira/browse/SOLR-3956
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0
Reporter: Mike Spencer


Attempting to use group.facet=true and facet.limit=-1 to return all facets from 
a grouped result ends up with the counts not being returned. Adjusting the 
facet.limit to any number greater than 0 returns the facet counts as expected.

This does not appear limited to a specific field type, as I have tried on (both 
multivalued and not) text, string, boolean, and double types.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3954) Option to have updateHandler and DIH skip updateLog

2012-10-16 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477283#comment-13477283
 ] 

Shawn Heisey commented on SOLR-3954:


Which specific configuration bits would you like to see?  My solrconfig.xml 
file is heavily split into separate files and uses xinclude.  I will go ahead 
and paste my best guesses now.

{code}
directoryFactory name=DirectoryFactory 
class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/

indexDefaults
  useCompoundFilefalse/useCompoundFile
  mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name=maxMergeAtOnce35/int
int name=segmentsPerTier35/int
int name=maxMergeAtOnceExplicit105/int
  /mergePolicy
  mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxMergeCount4/int
int name=maxThreadCount4/int
  /mergeScheduler
  ramBufferSizeMB128/ramBufferSizeMB
  maxFieldLength32768/maxFieldLength
  writeLockTimeout1000/writeLockTimeout
  commitLockTimeout1/commitLockTimeout
  lockTypenative/lockType
/indexDefaults

updateHandler class=solr.DirectUpdateHandler2
  autoCommit
maxDocs0/maxDocs
maxTime0/maxTime
  /autoCommit
!--
  updateLog /
--
/updateHandler
{code}

My schema has 47 fields defined.  Not all fields in a typical document will be 
there, but at least half of them usually will be present.  I use the ICU 
classes for lowercasing and most of the text fieldTypes are using 
WordDelimeterFilter.

{code}
  fields
   field name=catchall type=genText indexed=true stored=false 
multiValued=true termVectors=true/
   field name=doc_date type=tdate indexed=true stored=true/
   field name=pd type=tdate indexed=true stored=true/
   field name=ft_text type=ignored/
   field name=mime_type type=mimeText indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=ft_dname type=genText indexed=true stored=true/
   field name=ft_subject type=genText indexed=true stored=true/
   field name=action type=keyText indexed=true stored=true/
   field name=attribute type=keyText indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=category type=keyText indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=caption_writer type=keyText indexed=true stored=true/
   field name=doc_id type=keyText indexed=true stored=true/
   field name=ft_owner type=keyText indexed=true stored=true/
   field name=location type=keyText indexed=true stored=true/
   field name=special type=keyText indexed=true stored=true/
   field name=special_cats type=keyText indexed=true stored=true/
   field name=selector type=keyText indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=scode type=keyText indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=byline type=sourceText indexed=true stored=true/
   field name=credit type=sourceText indexed=true stored=false/
   field name=keywords type=sourceText indexed=true stored=true/
   field name=source type=sourceText indexed=true stored=true/
   field name=sg type=lcsemi indexed=true stored=false 
omitTermFreqAndPositions=true/
   field name=aimcode type=lowercase indexed=true stored=false 
omitTermFreqAndPositions=true/
   field name=nc_lang type=lowercase indexed=true stored=false 
omitTermFreqAndPositions=true/
   field name=tag_id type=lowercase indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=collection type=lowercase indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=feature type=lowercase indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=ip type=lowercase indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=longdim type=lowercase indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=webtable type=lowercase indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=set_name type=lowercase indexed=true stored=true 
omitTermFreqAndPositions=true/
   field name=did type=long indexed=true stored=true 
postingsFormat=BloomFilter/
   field name=doc_size type=long indexed=true stored=true/
   field name=post_date type=tlong indexed=true stored=true/
   field name=post_hour type=tlong indexed=true stored=true/
   field name=set_count type=int indexed=false stored=true/
   field name=set_lead type=boolean indexed=true stored=true 
default=true/
   field name=format type=string indexed=false stored=true/
   field name=ft_sfname type=string indexed=false stored=true/
   field name=text_preview type=string indexed=false stored=true/
   field name=_version_ type=long indexed=true stored=true/
   field name=headline type=keyText indexed=true stored=true/
   field name=mood type=keyText indexed=true stored=true/
   field name=object type=keyText indexed=true stored=true/
   field name=personality type=keyText indexed=true stored=true/
   field name=poster type=keyText indexed=true stored=true/
  /fields
  

[jira] [Commented] (SOLR-3954) Option to have updateHandler and DIH skip updateLog

2012-10-16 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477289#comment-13477289
 ] 

Shawn Heisey commented on SOLR-3954:


You'll notice that one field has postingsFormat.  This was for another bug that 
I filed.  It's not causing any difference in the config.  I will set up my 
import again so I can illustrate the performance impact from updateLog.


 Option to have updateHandler and DIH skip updateLog
 ---

 Key: SOLR-3954
 URL: https://issues.apache.org/jira/browse/SOLR-3954
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1


 The updateLog feature makes updates take longer, likely because of the I/O 
 time required to write the additional information to disk.  It may take as 
 much as three times as long for the indexing portion of the process.  I'm not 
 sure whether it affects the time to commit, but I would imagine that the 
 difference there is small or zero.  When doing incremental updates/deletes on 
 an existing index, the time lag is probably very small and unimportant.
 When doing a full reindex (which may happen via DIH), especially if this is 
 done in a build core that is then swapped with a live core, this performance 
 hit is unacceptable.  It seems to make the import take about three times as 
 long.
 An option to have an update skip the updateLog would be very useful for these 
 situations.  It should have a method in SolrJ and be exposed in DIH as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3954) Option to have updateHandler and DIH skip updateLog

2012-10-16 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477293#comment-13477293
 ] 

Shawn Heisey commented on SOLR-3954:


This is my most intense fieldType definition:

{code}
fieldType name=genText class=solr.TextField sortMissingLast=true 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory
  pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
  replacement=$2
  allowempty=false
/
filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1
  splitOnNumerics=1
  stemEnglishPossessive=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=1
  catenateNumbers=1
  catenateAll=0
  preserveOriginal=1
/
filter class=solr.ICUFoldingFilterFactory/
filter class=solr.LengthFilterFactory min=1 max=512/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory
  pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
  replacement=$2
  allowempty=false
/
filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1
  splitOnNumerics=1
  stemEnglishPossessive=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=0
  catenateNumbers=0
  catenateAll=0
  preserveOriginal=1
/
filter class=solr.ICUFoldingFilterFactory/
filter class=solr.LengthFilterFactory min=1 max=512/
  /analyzer
/fieldType
{code}


 Option to have updateHandler and DIH skip updateLog
 ---

 Key: SOLR-3954
 URL: https://issues.apache.org/jira/browse/SOLR-3954
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1


 The updateLog feature makes updates take longer, likely because of the I/O 
 time required to write the additional information to disk.  It may take as 
 much as three times as long for the indexing portion of the process.  I'm not 
 sure whether it affects the time to commit, but I would imagine that the 
 difference there is small or zero.  When doing incremental updates/deletes on 
 an existing index, the time lag is probably very small and unimportant.
 When doing a full reindex (which may happen via DIH), especially if this is 
 done in a build core that is then swapped with a live core, this performance 
 hit is unacceptable.  It seems to make the import take about three times as 
 long.
 An option to have an update skip the updateLog would be very useful for these 
 situations.  It should have a method in SolrJ and be exposed in DIH as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3954) Option to have updateHandler and DIH skip updateLog

2012-10-16 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477326#comment-13477326
 ] 

Shawn Heisey commented on SOLR-3954:


A completed import with updateLog turned off:

{code}
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
/lst
lst name=initArgs
  lst name=defaults
str name=configdih-config.xml/str
  /lst
/lst
str name=statusidle/str
str name=importResponse/
lst name=statusMessages
  str name=Total Requests made to DataSource1/str
  str name=Total Rows Fetched12947488/str
  str name=Total Documents Skipped0/str
  str name=Full Dump Started2012-10-16 07:46:01/str
  str name=Indexing completed. Added/Updated: 12947488 documents. Deleted 0 
documents./str
  str name=Committed2012-10-16 11:17:48/str
  str name=Total Documents Processed12947488/str
  str name=Time taken3:31:47.508/str
/lst
str name=WARNINGThis response format is experimental.  It is likely to 
change in the future./str
/response
{code}


 Option to have updateHandler and DIH skip updateLog
 ---

 Key: SOLR-3954
 URL: https://issues.apache.org/jira/browse/SOLR-3954
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1


 The updateLog feature makes updates take longer, likely because of the I/O 
 time required to write the additional information to disk.  It may take as 
 much as three times as long for the indexing portion of the process.  I'm not 
 sure whether it affects the time to commit, but I would imagine that the 
 difference there is small or zero.  When doing incremental updates/deletes on 
 an existing index, the time lag is probably very small and unimportant.
 When doing a full reindex (which may happen via DIH), especially if this is 
 done in a build core that is then swapped with a live core, this performance 
 hit is unacceptable.  It seems to make the import take about three times as 
 long.
 An option to have an update skip the updateLog would be very useful for these 
 situations.  It should have a method in SolrJ and be exposed in DIH as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3954) Option to have updateHandler and DIH skip updateLog

2012-10-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477333#comment-13477333
 ] 

David Smiley commented on SOLR-3954:


FWIW I've seen the updateLog grow to huge sizes for my bulk import.  I commit 
at the end (of course) no soft commits or auto commits in-between.  The 
updateLog is a hinderance during bulk imports.

 Option to have updateHandler and DIH skip updateLog
 ---

 Key: SOLR-3954
 URL: https://issues.apache.org/jira/browse/SOLR-3954
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1


 The updateLog feature makes updates take longer, likely because of the I/O 
 time required to write the additional information to disk.  It may take as 
 much as three times as long for the indexing portion of the process.  I'm not 
 sure whether it affects the time to commit, but I would imagine that the 
 difference there is small or zero.  When doing incremental updates/deletes on 
 an existing index, the time lag is probably very small and unimportant.
 When doing a full reindex (which may happen via DIH), especially if this is 
 done in a build core that is then swapped with a live core, this performance 
 hit is unacceptable.  It seems to make the import take about three times as 
 long.
 An option to have an update skip the updateLog would be very useful for these 
 situations.  It should have a method in SolrJ and be exposed in DIH as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3957) Remove response WARNING of This response format is experimental

2012-10-16 Thread Erik Hatcher (JIRA)
Erik Hatcher created SOLR-3957:
--

 Summary: Remove response WARNING of This response format is 
experimental
 Key: SOLR-3957
 URL: https://issues.apache.org/jira/browse/SOLR-3957
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.0
Reporter: Erik Hatcher
Priority: Minor
 Fix For: 5.0


Remove all the useless (which I daresay is all of them) response WARNINGs 
stating This response format is experimental.

At this point, all of these are more than just experimental, and even if so 
things are subject to change and in most cases can be done in a compatible 
manner anyway.

Less noise.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4472) Add setting that prevents merging on updateDocument

2012-10-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477418#comment-13477418
 ] 

Michael McCandless commented on LUCENE-4472:


I think forced merges or forcing reclaiming of deletions, both invoked
by explicit app request, are very different use cases than the
natural merging Lucene does during indexing (not directly invoked by
the app, but as a side effect of other API calls).

So I think it makes sense that the MP has separate methods to handle
these very different use cases.

I don't thing we should use single param / single method XXXContext
approach to bypass back compat.  We already tried this with
ScorerContext but backed it out because of the loss of type
safety... for expert APIs like this one I think it's actually good to
require apps to revisit their impls on upgrading, if we've added
parameters: it gives them a chance to improve their impls.  Plus this
API is already marked @experimental...

Also, single method taking a single XXXContext obj means that method
will have to have a switch or bunch of if statements to handle what
are in fact very different use cases, which is rather awkward.

Still, separately I would love to make forceMerge/Deletes un-public so
you have to work harder to invoke them (eg maybe you invoke the merge
policy directly and then call IW.maybeMerge ... or something).  We can
do that separately...


 Add setting that prevents merging on updateDocument
 ---

 Key: LUCENE-4472
 URL: https://issues.apache.org/jira/browse/LUCENE-4472
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4472.patch, LUCENE-4472.patch


 Currently we always call maybeMerge if a segment was flushed after 
 updateDocument. Some apps and in particular ElasticSearch uses some hacky 
 workarounds to disable that ie for merge throttling. It should be easier to 
 enable this kind of behavior. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3881) frequent OOM in LanguageIdentifierUpdateProcessor

2012-10-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477426#comment-13477426
 ] 

Jan Høydahl commented on SOLR-3881:
---

Probably built-in truncation is enough to avoid the OOMs, and we could refactor 
the multi string append if neccesary later.

 frequent OOM in LanguageIdentifierUpdateProcessor
 -

 Key: SOLR-3881
 URL: https://issues.apache.org/jira/browse/SOLR-3881
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
 Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G 
 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=)
Reporter: Rob Tulloh

 We are seeing frequent failures from Solr causing it to OOM. Here is the 
 stack trace we observe when this happens:
 {noformat}
 Caused by: java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2882)
 at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
 at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
 at java.lang.StringBuffer.append(StringBuffer.java:224)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
 at 
 org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
 at 
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4485) CheckIndex's term stats should not include deleted docs

2012-10-16 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-4485.


   Resolution: Fixed
Fix Version/s: 5.0
   4.1

 CheckIndex's term stats should not include deleted docs
 ---

 Key: LUCENE-4485
 URL: https://issues.apache.org/jira/browse/LUCENE-4485
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4485.patch


 I was looking at the CheckIndex output on and index that has deletions, eg:
 {noformat}
   4 of 30: name=_90 docCount=588408
 codec=Lucene41
 compound=false
 numFiles=14
 size (MB)=265.318
 diagnostics = {os=Linux, os.version=3.2.0-23-generic, mergeFactor=10, 
 source=merge, lucene.version=5.0-SNAPSHOT, os.arch=amd64, 
 mergeMaxNumSegments=-1, java.version=1.7.0_07, java.vendor=Oracle Corporation}
 has deletions [delGen=1]
 test: open reader.OK [39351 deleted docs]
 test: fields..OK [8 fields]
 test: field norms.OK [2 fields]
 test: terms, freq, prox...OK [4910342 terms; 61319238 terms/docs pairs; 
 65597188 tokens]
 test (ignoring deletes): terms, freq, prox...OK [4910342 terms; 61319238 
 terms/docs pairs; 70293065 tokens]
 test: stored fields...OK [1647171 total field count; avg 3 fields per 
 doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector fields per doc]
 test: docvalues...OK [0 total doc count; 1 docvalues fields]
 {noformat}
 If you compare the {{test: terms, freq, prox}} (includes deletions) and the 
 next line (doesn't include deletions), it's confusing because only the 3rd 
 number (tokens) reflects deletions.  I think the first two numbers should 
 also reflect deletions?  This way an app could get a sense of how much 
 deadweight is in the index due to un-reclaimed deletions...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files

2012-10-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477434#comment-13477434
 ] 

Michael McCandless commented on LUCENE-4484:


bq. (Solr defaults to NRTCachingDir)

Maybe it shouldn't?

Or ... does it also default to NRT searching, like ElasticSearch (I think), 
i.e. frequently opening a new searcher?  In which case it's a good default I 
think...

 NRTCachingDir can't handle large files
 --

 Key: LUCENE-4484
 URL: https://issues.apache.org/jira/browse/LUCENE-4484
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless

 I dug into this OOME, which easily repros for me on rev 1398268:
 {noformat}
 ant test  -Dtestcase=Test4GBStoredFields -Dtests.method=test 
 -Dtests.seed=2D89DD229CD304F5 -Dtests.multiplier=3 -Dtests.nightly=true 
 -Dtests.slow=true 
 -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
 -Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok 
 -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
 {noformat}
 The problem is the test got NRTCachingDir ... which cannot handle large files 
 because it decides up front (when createOutput is called) whether the file 
 will be in RAMDir vs wrapped dir ... so if that file turns out to be immense 
 (which this test does since stored fields files can grow arbitrarily huge w/o 
 any flush happening) then it takes unbounded RAM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3958) Solr should log a warning when old healthcheck method configured

2012-10-16 Thread Shawn Heisey (JIRA)
Shawn Heisey created SOLR-3958:
--

 Summary: Solr should log a warning when old healthcheck method 
configured
 Key: SOLR-3958
 URL: https://issues.apache.org/jira/browse/SOLR-3958
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Shawn Heisey
Priority: Minor
 Fix For: 4.1


The old (3.x and earlier) way of handling a health check (with enable/disable 
functionality) has changed in Solr 4.0.  If you are upgrading and still have 
the old method in the admin section, I believe that Solr should put a warning 
in the log.  Currently it is just ignored.  I do not believe it should keep 
Solr from starting, just log a warning.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3954) Option to have updateHandler and DIH skip updateLog

2012-10-16 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477445#comment-13477445
 ] 

Shawn Heisey commented on SOLR-3954:


Here's a direct comparison on the same hardware.  It might be important to know 
that when my import gets kicked off, there are actually four imports running.  
One of them is small -- during the second test (updateLog off), it imported 
687765 rows in 10 minutes and 08 seconds.  I did not check how long it took 
during the first test.  The other three imports are all nearly 13 million 
records each.

A du on the completed index directory with 12.9 million records shows 23520900 
KB.

I ran the first test and grabbed stats after an hour.  Then I killed Solr, 
commented out updateLog, started it up again, kicked off the full-import, and 
again grabbed stats after an hour.  Comparing the two shows that it is about 
twice as fast with updateLog turned off.

With updateLog turned on:

{code}
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
/lst
lst name=initArgs
  lst name=defaults
str name=configdih-config.xml/str
  /lst
/lst
str name=statusbusy/str
str name=importResponseA command is still running.../str
lst name=statusMessages
  str name=Time Elapsed1:0:1.762/str
  str name=Total Requests made to DataSource1/str
  str name=Total Rows Fetched2052096/str
  str name=Total Documents Processed2052095/str
  str name=Total Documents Skipped0/str
  str name=Full Dump Started2012-10-16 14:59:01/str
/lst
str name=WARNINGThis response format is experimental.  It is likely to 
change in the future./str
/response
{code}

With updateLog turned off:

{code}
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
/lst
lst name=initArgs
  lst name=defaults
str name=configdih-config.xml/str
  /lst
/lst
str name=statusbusy/str
str name=importResponseA command is still running.../str
lst name=statusMessages
  str name=Time Elapsed1:0:0.434/str
  str name=Total Requests made to DataSource1/str
  str name=Total Rows Fetched4167525/str
  str name=Total Documents Processed4167524/str
  str name=Total Documents Skipped0/str
  str name=Full Dump Started2012-10-16 16:05:01/str
/lst
str name=WARNINGThis response format is experimental.  It is likely to 
change in the future./str
/response
{code}


 Option to have updateHandler and DIH skip updateLog
 ---

 Key: SOLR-3954
 URL: https://issues.apache.org/jira/browse/SOLR-3954
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1


 The updateLog feature makes updates take longer, likely because of the I/O 
 time required to write the additional information to disk.  It may take as 
 much as three times as long for the indexing portion of the process.  I'm not 
 sure whether it affects the time to commit, but I would imagine that the 
 difference there is small or zero.  When doing incremental updates/deletes on 
 an existing index, the time lag is probably very small and unimportant.
 When doing a full reindex (which may happen via DIH), especially if this is 
 done in a build core that is then swapped with a live core, this performance 
 hit is unacceptable.  It seems to make the import take about three times as 
 long.
 An option to have an update skip the updateLog would be very useful for these 
 situations.  It should have a method in SolrJ and be exposed in DIH as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3954) Option to have updateHandler and DIH skip updateLog

2012-10-16 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477462#comment-13477462
 ] 

Shawn Heisey commented on SOLR-3954:


bq. In any case, I don't think we would add an option to skip the update log - 
you can remove it if the performance is unacceptable.

When I revamp my SolrJ application, I plan to use soft commit on a very short 
interval (maybe 10 seconds) but only do a hard commit every five minutes, 
possibly even less often.

If I understand the updateLog functionality right, and I don't claim that I do, 
it would mean that my SolrJ code would not need to keep separate track of which 
updates succeeded with soft commit and which ones succeeded with hard commit.  
If the server went down four minutes and 55 seconds after the last hard commit, 
I would have reasonable expectation that when it came back up, all those soft 
commits would get properly applied to my index.

Assuming I have a proper understanding above, I want the updateLog for my 
incremental updates.  It makes the bulk import take at least twice as long, and 
I do not need it there because if that fails, I will just start it over.  If I 
am going to benefit from updateLog, I need to be able to turn it off for bulk 
indexing.

Is there a way to create a second updateHandler that does not have updateLog 
enabled and tell DIH to use that handler?


 Option to have updateHandler and DIH skip updateLog
 ---

 Key: SOLR-3954
 URL: https://issues.apache.org/jira/browse/SOLR-3954
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1


 The updateLog feature makes updates take longer, likely because of the I/O 
 time required to write the additional information to disk.  It may take as 
 much as three times as long for the indexing portion of the process.  I'm not 
 sure whether it affects the time to commit, but I would imagine that the 
 difference there is small or zero.  When doing incremental updates/deletes on 
 an existing index, the time lag is probably very small and unimportant.
 When doing a full reindex (which may happen via DIH), especially if this is 
 done in a build core that is then swapped with a live core, this performance 
 hit is unacceptable.  It seems to make the import take about three times as 
 long.
 An option to have an update skip the updateLog would be very useful for these 
 situations.  It should have a method in SolrJ and be exposed in DIH as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3959) csv output is invalid csv if there is a currency field

2012-10-16 Thread Robert Muir (JIRA)
Robert Muir created SOLR-3959:
-

 Summary: csv output is invalid csv if there is a currency field
 Key: SOLR-3959
 URL: https://issues.apache.org/jira/browse/SOLR-3959
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir


Like in the example.

http://localhost:8983/solr/collection1/select?q=*%3A*fl=price_cwt=csv

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 10035 - Failure!

2012-10-16 Thread builder
Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/10035/

1 tests failed.
REGRESSION:  
org.apache.lucene.search.TestTimeLimitingCollector.testSearchMultiThreaded

Error Message:
Captured an uncaught exception in thread: Thread[id=255, name=Thread-198, 
state=RUNNABLE, group=TGRP-TestTimeLimitingCollector]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=255, name=Thread-198, state=RUNNABLE, 
group=TGRP-TestTimeLimitingCollector]
Caused by: java.lang.OutOfMemoryError: Java heap space
at __randomizedtesting.SeedInfo.seed([41D53676D3187506]:0)
at 
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.init(BlockTreeTermsReader.java:2266)
at 
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.init(BlockTreeTermsReader.java:1275)
at 
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.iterator(BlockTreeTermsReader.java:525)
at 
org.apache.lucene.index.FilterAtomicReader$FilterTerms.iterator(FilterAtomicReader.java:86)
at 
org.apache.lucene.index.AssertingAtomicReader$AssertingTerms.iterator(AssertingAtomicReader.java:99)
at org.apache.lucene.index.MultiTerms.iterator(MultiTerms.java:103)
at org.apache.lucene.index.TermContext.build(TermContext.java:94)
at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:167)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:186)
at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:400)
at 
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:648)
at 
org.apache.lucene.search.AssertingIndexSearcher.createNormalizedWeight(AssertingIndexSearcher.java:60)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:293)
at 
org.apache.lucene.search.TestTimeLimitingCollector.search(TestTimeLimitingCollector.java:124)
at 
org.apache.lucene.search.TestTimeLimitingCollector.doTestSearch(TestTimeLimitingCollector.java:139)
at 
org.apache.lucene.search.TestTimeLimitingCollector.access$200(TestTimeLimitingCollector.java:42)
at 
org.apache.lucene.search.TestTimeLimitingCollector$1.run(TestTimeLimitingCollector.java:292)




Build Log:
[...truncated 1072 lines...]
[junit4:junit4] Suite: org.apache.lucene.search.TestTimeLimitingCollector
[junit4:junit4]   2 oct 16, 2012 5:21:37 P.M. 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
[junit4:junit4]   2 Advertencia: Uncaught exception in thread: 
Thread[Thread-198,5,TGRP-TestTimeLimitingCollector]
[junit4:junit4]   2 java.lang.OutOfMemoryError: Java heap space
[junit4:junit4]   2at 
__randomizedtesting.SeedInfo.seed([41D53676D3187506]:0)
[junit4:junit4]   2at 
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.init(BlockTreeTermsReader.java:2266)
[junit4:junit4]   2at 
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.init(BlockTreeTermsReader.java:1275)
[junit4:junit4]   2at 
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.iterator(BlockTreeTermsReader.java:525)
[junit4:junit4]   2at 
org.apache.lucene.index.FilterAtomicReader$FilterTerms.iterator(FilterAtomicReader.java:86)
[junit4:junit4]   2at 
org.apache.lucene.index.AssertingAtomicReader$AssertingTerms.iterator(AssertingAtomicReader.java:99)
[junit4:junit4]   2at 
org.apache.lucene.index.MultiTerms.iterator(MultiTerms.java:103)
[junit4:junit4]   2at 
org.apache.lucene.index.TermContext.build(TermContext.java:94)
[junit4:junit4]   2at 
org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:167)
[junit4:junit4]   2at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:186)
[junit4:junit4]   2at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:400)
[junit4:junit4]   2at 
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:648)
[junit4:junit4]   2at 
org.apache.lucene.search.AssertingIndexSearcher.createNormalizedWeight(AssertingIndexSearcher.java:60)
[junit4:junit4]   2at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:293)
[junit4:junit4]   2at 
org.apache.lucene.search.TestTimeLimitingCollector.search(TestTimeLimitingCollector.java:124)
[junit4:junit4]   2at 
org.apache.lucene.search.TestTimeLimitingCollector.doTestSearch(TestTimeLimitingCollector.java:139)
[junit4:junit4]   2at 
org.apache.lucene.search.TestTimeLimitingCollector.access$200(TestTimeLimitingCollector.java:42)
[junit4:junit4]   2at 
org.apache.lucene.search.TestTimeLimitingCollector$1.run(TestTimeLimitingCollector.java:292)
[junit4:junit4]   2 
[junit4:junit4]   2 oct 16, 2012 5:21:43 P.M. 

[jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files

2012-10-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477536#comment-13477536
 ] 

Mark Miller commented on LUCENE-4484:
-

Right, we have changed the defaults to favor NRT.

You can always say to switch that if someone runs into a problem, but of course 
it would be nicer if NRTCachingDir was more versatile and could deal well with 
term vectors / stored fields.

I agree it's more of a niche situation (it's not likely a common problem), but 
it would be my preference.

 NRTCachingDir can't handle large files
 --

 Key: LUCENE-4484
 URL: https://issues.apache.org/jira/browse/LUCENE-4484
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless

 I dug into this OOME, which easily repros for me on rev 1398268:
 {noformat}
 ant test  -Dtestcase=Test4GBStoredFields -Dtests.method=test 
 -Dtests.seed=2D89DD229CD304F5 -Dtests.multiplier=3 -Dtests.nightly=true 
 -Dtests.slow=true 
 -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
 -Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok 
 -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
 {noformat}
 The problem is the test got NRTCachingDir ... which cannot handle large files 
 because it decides up front (when createOutput is called) whether the file 
 will be in RAMDir vs wrapped dir ... so if that file turns out to be immense 
 (which this test does since stored fields files can grow arbitrarily huge w/o 
 any flush happening) then it takes unbounded RAM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Welcome Alan Woodward as Lucene/Solr committer

2012-10-16 Thread Robert Muir
I'm pleased to announce that the Lucene PMC has voted Alan as a
Lucene/Solr committer.

Alan has been contributing patches on various tricky stuff: positions
iterators, span queries, highlighters, codecs, and so on.

Alan: its tradition that you introduce yourself with your background.

I think your account is fully working and you should be able to add
yourself to the who we are page on the website as well.

Congratulations!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4479) TokenSources.getTokenStream() doesn't return correctly for termvectors with positions but no offsets

2012-10-16 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-4479:
---

Assignee: Alan Woodward

 TokenSources.getTokenStream() doesn't return correctly for termvectors with 
 positions but no offsets
 

 Key: LUCENE-4479
 URL: https://issues.apache.org/jira/browse/LUCENE-4479
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.0
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Attachments: LUCENE-4479.patch, LUCENE-4479.patch


 The javadocs for TokenSources.getTokenStream(Terms, boolean) state:
 Low level api. Returns a token stream or null if no offset info available
 in index. This can be used to feed the highlighter with a pre-parsed token
 stream
 However, if the Terms instance passed in has positions but no offsets stored, 
 a TokenStream is incorrectly returned, rather than null.
 This has the effect of incorrectly highlighting fields with term vectors and 
 positions, but no offsets.  All highlighting markup is prepended to the 
 beginning of the field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org