date:20120314


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229061#comment-13229061
 ] 

Dawid Weiss commented on LUCENE-3867:
-

One can provide exact object allocation size (including alignments) by running 
with an agent (acquired from Instrumentation). This is shown here, for example:

http://www.javaspecialists.eu/archive/Issue142.html

I don't think it makes sense to be perfect here because there is a tradeoff 
between being accurate and being fast. One thing to possibly improve would be 
to handle reference size (4 vs. 8 bytes; in particular with compact references 
while running under 64 bit jvms).

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229062#comment-13229062
 ] 

Dawid Weiss commented on LUCENE-3867:
-

Oh, one thing that I had in the back of my mind was to run a side-by-side 
comparison of Lucene's memory estimator and exact memory occupation via agent 
and see what the real difference is (on various vms and with compact vs. 
non-compact refs).

This would be a 2 hour effort I guess, fun, but I don't have the time for it.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 12736 - Failure

2012-03-14 Thread Dawid Weiss

Now that I think of it the changes to LuceneTestCase may be blamed for
some of these errors because the uncaught exceptions rule is above the
routine where lingering threads are interrupted. It was the opposite
order before (understandably).

The good news is that I don't see any recent errors on the LUCENE-3808
branch (merged with current trunk), where threads are handled
internally by the runner.

I'll see what I can do about the issue above in trunk.

Dawid

On Wed, Mar 14, 2012 at 5:50 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12736/

 1 tests failed.
 FAILED:  junit.framework.TestSuite.org.apache.solr.TestGroupingSearch

 Error Message:
 Uncaught exception by thread: Thread[TimeLimitedCollector timer thread,5,]

 Stack Trace:
 org.apache.lucene.util.UncaughtExceptionsRule$UncaughtExceptionsInBackgroundThread:
  Uncaught exception by thread: Thread[TimeLimitedCollector timer thread,5,]
        at 
 org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:60)
        at 
 org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:21)
        at 
 org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
 Caused by: org.apache.lucene.util.ThreadInterruptedException: 
 java.lang.InterruptedException: sleep interrupted
        at 
 org.apache.lucene.search.TimeLimitingCollector$TimerThread.run(TimeLimitingCollector.java:268)
 Caused by: java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at 
 org.apache.lucene.search.TimeLimitingCollector$TimerThread.run(TimeLimitingCollector.java:266)




 Build Log (for compile errors):
 [...truncated 9456 lines...]




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
attachment: LUCENE-3808.png
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3868) Thread interruptions shouldn't cause unhandled thread errors (or should they?).

2012-03-14 Thread Dawid Weiss (Created) (JIRA)

Thread interruptions shouldn't cause unhandled thread errors (or should they?).
---

 Key: LUCENE-3868
 URL: https://issues.apache.org/jira/browse/LUCENE-3868
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.6, 4.0


This is a result of pulling uncaught exception catching to a rule above 
interrupt in internalTearDown(); check how it was before and restore previous 
behavior?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229066#comment-13229066
 ] 

Uwe Schindler commented on LUCENE-3867:
---

I was talking with Shai already about the OBJECT_REF size of 8, in 
RamUsageEstimator it is:

{code:java}
public final static int NUM_BYTES_OBJECT_REF = Constants.JRE_IS_64BIT ? 8 : 4;
{code}

...which does not take the CompressedOops into account. Can we detect those 
oops, so we can change the above ternary to return 4 on newer JVMs with 
compressed oops enabled?

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229067#comment-13229067
 ] 

Dawid Weiss commented on LUCENE-3867:
-

If you're running with an agent then it will tell you many bytes a reference 
is, so this would fix the issue. I don't think you can test this from within 
Java VM itself, but this is an interesting question. What you could do is spawn 
a child VM process with identical arguments (and an agent) and check it there, 
but this is quite awful... 

I'll ask on hotspot mailing list, maybe they know how to do this.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3243) eDismax and non-fielded range query

2012-03-14 Thread Created

eDismax and non-fielded range query
---

 Key: SOLR-3243
 URL: https://issues.apache.org/jira/browse/SOLR-3243
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.5, 3.4, 3.3, 3.2, 3.1
Reporter: Jan Høydahl
Priority: Critical
 Fix For: 3.6, 4.0


Reported by Bill Bell in SOLR-3085:

If you enter a non-fielded open-ended range in the search box, like [* TO *], 
eDismax will expand it to all fields:
{noformat}
+DisjunctionMaxQuery((content:[* TO *]^2.0 | id:[* TO *]^50.0 | author:[* TO 
*]^15.0 | meta:[* TO *]^10.0 | name:[* TO *]^20.0))
{noformat}

This does not make sense, and a side effect is that range queries for strings 
are very expensive, open-ended even more, and you can totally crash the search 
server by hammering something like ([* TO *] OR [* TO *] OR [* TO *]) a few 
times...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3085) Fix the dismax/edismax stopwords mm issue

2012-03-14 Thread Commented


[ 
https://issues.apache.org/jira/browse/SOLR-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229083#comment-13229083
 ] 

Jan Høydahl commented on SOLR-3085:
---

@Bill, since this is a bit off topic, I moved your loophole to SOLR-3243. It 
is certainly something that is dangerous and I cannot see a single usecase for 
allowing an un-fielded range! Good catch.

 Fix the dismax/edismax stopwords mm issue
 -

 Key: SOLR-3085
 URL: https://issues.apache.org/jira/browse/SOLR-3085
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Jan Høydahl
  Labels: MinimumShouldMatch, dismax, stopwords
 Fix For: 3.6, 4.0


 As discussed here http://search-lucene.com/m/Wr7iz1a95jx and here 
 http://search-lucene.com/m/Yne042qEyCq1 and here 
 http://search-lucene.com/m/RfAp82nSsla DisMax has an issue with stopwords if 
 not all fields used in QF have exactly same stopword lists.
 Typical solution is to not use stopwords or harmonize stopword lists across 
 all fields in your QF, or relax the MM to a lower percentag. Sometimes these 
 are not acceptable workarounds, and we should find a better solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

JIRA components update

2012-03-14 Thread Jan Høydahl

Hi,

I've not found a suitable JIRA component to put DisMax issues in, so I added a 
new component called query parsers. 
There may be other components missing as well, such as SolrCloud?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3086) eDismax: Allow controlling what query features to support

2012-03-14 Thread Updated


 [ 
https://issues.apache.org/jira/browse/SOLR-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3086:
--

Component/s: query parsers

 eDismax: Allow controlling what query features to support
 -

 Key: SOLR-3086
 URL: https://issues.apache.org/jira/browse/SOLR-3086
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Jan Høydahl
 Fix For: 4.0


 As per request from Hoss in SOLR-2368, this issue will add configuration 
 parameters to eDisMax to give user control over what query syntax will be 
 allowed and disallowed. This will allow us to effectively lobotomize eDisMax 
 to behave the same way as the old DisMax and accept all kinds of weird input 
 and correctly escape it to match literally, even if it's valid syntax for a 
 query feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Solr-trunk - Build # 1793 - Still Failing

2012-03-14 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Solr-trunk/1793/

1 tests failed.
FAILED:  org.apache.solr.TestDistributedSearch.testDistribSearch

Error Message:
Uncaught exception by thread: Thread[Thread-661,5,]

Stack Trace:
org.apache.lucene.util.UncaughtExceptionsRule$UncaughtExceptionsInBackgroundThread:
 Uncaught exception by thread: Thread[Thread-661,5,]
at 
org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:60)
at 
org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:618)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
at 
org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:20)
at 
org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:51)
at 
org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:21)
at 
org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
Caused by: java.lang.RuntimeException: 
org.apache.solr.client.solrj.SolrServerException: http://localhost:40647/solr
at 
org.apache.solr.TestDistributedSearch$1.run(TestDistributedSearch.java:374)
Caused by: org.apache.solr.client.solrj.SolrServerException: 
http://localhost:40647/solr
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:496)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:312)
at 
org.apache.solr.TestDistributedSearch$1.run(TestDistributedSearch.java:369)
Caused by: org.apache.commons.httpclient.ConnectTimeoutException: The host did 
not accept the connection within timeout of 100 ms
at 
org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:155)
at 
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125)
at 
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:426)
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at 
org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140)




Build Log (for compile errors):
[...truncated 9671 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3162) Continue work on new admin UI

2012-03-14 Thread Aliaksandr Zhuhrou (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229094#comment-13229094
 ] 

Aliaksandr Zhuhrou commented on SOLR-3162:
--

Sure, I can do this.

 Continue work on new admin UI
 -

 Key: SOLR-3162
 URL: https://issues.apache.org/jira/browse/SOLR-3162
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, web gui
Affects Versions: 4.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 4.0

 Attachments: SOLR-3162-index.png, SOLR-3162-schema-browser.png, 
 SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch, 
 SOLR-3162.patch, SOLR-3162.patch


 There have been more improvements to how the new UI works, but the current 
 open bugs are getting hard to keep straight. This is the new catch-all JIRA 
 for continued improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Uwe Schindler (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229093#comment-13229093
 ] 

Shai Erera commented on LUCENE-3867:


bq. I don't think it makes sense to be perfect here because there is a 
tradeoff between being accurate and being fast.

I agree. We should be fast, and as accurate as we can get while preserving 
speed.

I will fix the constant's value as it's wrong. The helper methods are just that 
- helper. Someone can use other techniques to compute the size of objects.

Will post a patch shortly.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3244) New Admin UI doesn't work on tomcat

2012-03-14 Thread Aliaksandr Zhuhrou (Created) (JIRA)

New Admin UI doesn't work on tomcat
---

 Key: SOLR-3244
 URL: https://issues.apache.org/jira/browse/SOLR-3244
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
Reporter: Aliaksandr Zhuhrou


I am currently unable to open admin interface when using war deployment under 
tomcat server.
The stack trace:

SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path 
[/solr] threw exception
java.lang.NullPointerException
at java.io.File.init(File.java:251)
at 
org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at 
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Tomcat version: Apache Tomcat/7.0.23
Java version: jdk1.7.0_02

I did some debugging and found that the problem related that 
it delegates the resolving of resource path to the 
org.apache.naming.resources.WARDirContext which simply returns null for any 
input parameters:

/**
  * Return the real path for a given virtual path, if possible; otherwise
  * return codenull/code.
  *
  * @param path The path to the desired resource
  */
@Override
protected String doGetRealPath(String path) {
  return null;
}

Need to check specification, because it may be actually the tomcat bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3244) New Admin UI doesn't work on tomcat

2012-03-14 Thread Aliaksandr Zhuhrou (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aliaksandr Zhuhrou updated SOLR-3244:
-

Description: 
I am currently unable to open admin interface when using war deployment under 
tomcat server.
The stack trace:

SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path 
[/solr] threw exception
java.lang.NullPointerException
at java.io.File.init(File.java:251)
at 
org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at 
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Tomcat version: Apache Tomcat/7.0.23
Java version: jdk1.7.0_02

I did some debugging and found that the problem related that 
it delegates the resolving of resource path to the 
org.apache.naming.resources.WARDirContext which simply returns null for any 
input parameters:

/**
  * Return the real path for a given virtual path, if possible; otherwise
  * return codenull/code.
  *
  * @param path The path to the desired resource
  */
@Override
protected String doGetRealPath(String path) {
  return null;
}

Need to check specification, because it may be actually the tomcat bug. We may 
try use the getResourceAsStream(java.lang.String path) method which should work 
even for war.

  was:
I am currently unable to open admin interface when using war deployment under 
tomcat server.
The stack trace:

SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path 
[/solr] threw exception
java.lang.NullPointerException
at java.io.File.init(File.java:251)
at 
org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at

[jira] [Issue Comment Edited] (SOLR-3244) New Admin UI doesn't work on tomcat


[ 
https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229109#comment-13229109
 ] 

Uwe Schindler edited comment on SOLR-3244 at 3/14/12 10:01 AM:
---

It seems that the file is missing in the WAR file?

{code:java}
File f = new File(getServletContext().getRealPath(admin.html));
if(f.exists()) {
  // This attribute is set by the SolrDispatchFilter
  CoreContainer cores = (CoreContainer) 
request.getAttribute(org.apache.solr.CoreContainer);

  String html = IOUtils.toString(new FileInputStream(f), UTF-8);
{code}

In general I am a little bit sceptical about the whole code. In my opinion 
using File and getRealPath is not the best idea. The simpliest and 
filesystem-unspecific to get the file is (there may be a servlet container that 
does not extract WAR files at all and simply returns the resource from *inside* 
the war file):

{code:java}
InputStream in = getServletContext().getResourceAsStream(/admin.html);
if(in != null) try {
  // This attribute is set by the SolrDispatchFilter
  CoreContainer cores = (CoreContainer) 
request.getAttribute(org.apache.solr.CoreContainer);

  String html = IOUtils.toString(in, UTF-8);
  ...
} finally {
  IOUtils.closeSafely(in);
}
{code}

Please note the / in the path, accoring to JavaDocs of getResource:

The path must begin with a / and is interpreted as relative to the current 
context root, or relative to the /META-INF/resources directory of a JAR file 
inside the web application's /WEB-INF/lib directory. This method will first 
search the document root of the web application for the requested resource, 
before searching any of the JAR files inside /WEB-INF/lib. The order in which 
the JAR files inside /WEB-INF/lib are searched is undefined.

This also applies to getRealPath, so I think Tomcat is more picky about that 
than jetty.

  was (Author: thetaphi):
It seems that the file is missing in the WAR file?

{code:java}
File f = new File(getServletContext().getRealPath(admin.html));
if(f.exists()) {
  // This attribute is set by the SolrDispatchFilter
  CoreContainer cores = (CoreContainer) 
request.getAttribute(org.apache.solr.CoreContainer);

  String html = IOUtils.toString(new FileInputStream(f), UTF-8);
{code}

In general I am a little bit sceptical about the whole code. In my opinion 
using File and getRealPath is not the best idea. The simpliest and 
filesystem-unspecific to get the file is (there may be a servlet container that 
does not extract WAR files at all and simply returns the resource from *inside* 
the war file):

{code:java}
InputStream in = getServletContext().getResourceAsStream(admin.html);
if(in != null) try {
  // This attribute is set by the SolrDispatchFilter
  CoreContainer cores = (CoreContainer) 
request.getAttribute(org.apache.solr.CoreContainer);

  String html = IOUtils.toString(in, UTF-8);
  ...
} finally {
  IOUtils.closeSafely(in);
}
{code}
  
 New Admin UI doesn't work on tomcat
 ---

 Key: SOLR-3244
 URL: https://issues.apache.org/jira/browse/SOLR-3244
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
Reporter: Aliaksandr Zhuhrou

 I am currently unable to open admin interface when using war deployment under 
 tomcat server.
 The stack trace:
 SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path 
 [/solr] threw exception
 java.lang.NullPointerException
   at java.io.File.init(File.java:251)
   at 
 org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
   at 
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
   at

[jira] [Updated] (SOLR-3244) New Admin UI doesn't work on tomcat

2012-03-14 Thread Uwe Schindler (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-3244:


Attachment: SOLR-3244.patch

Hi,

can you apply the attached patch and rebuild the WAR. This fixes this bug and 
also another security issue:
- The inlined pathes are not correctly escaped according to JavaScript rules, 
this can lead to security problems if you deploy to a path with strange 
characters in it...


 New Admin UI doesn't work on tomcat
 ---

 Key: SOLR-3244
 URL: https://issues.apache.org/jira/browse/SOLR-3244
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
Reporter: Aliaksandr Zhuhrou
 Attachments: SOLR-3244.patch


 I am currently unable to open admin interface when using war deployment under 
 tomcat server.
 The stack trace:
 SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path 
 [/solr] threw exception
 java.lang.NullPointerException
   at java.io.File.init(File.java:251)
   at 
 org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
   at 
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
   at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
   at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
   at 
 org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 Tomcat version: Apache Tomcat/7.0.23
 Java version: jdk1.7.0_02
 I did some debugging and found that the problem related that 
 it delegates the resolving of resource path to the 
 org.apache.naming.resources.WARDirContext which simply returns null for any 
 input parameters:
 /**
   * Return the real path for a given virtual path, if possible; otherwise
   * return codenull/code.
   *
   * @param path The path to the desired resource
   */
 @Override
 protected String doGetRealPath(String path) {
   return null;
 }
 Need to check specification, because it may be actually the tomcat bug. We 
 may try use the getResourceAsStream(java.lang.String path) method which 
 should work even for war.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-3244) New Admin UI doesn't work on tomcat

2012-03-14 Thread Uwe Schindler (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned SOLR-3244:
---

Assignee: Uwe Schindler

 New Admin UI doesn't work on tomcat
 ---

 Key: SOLR-3244
 URL: https://issues.apache.org/jira/browse/SOLR-3244
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
Reporter: Aliaksandr Zhuhrou
Assignee: Uwe Schindler
 Attachments: SOLR-3244.patch


 I am currently unable to open admin interface when using war deployment under 
 tomcat server.
 The stack trace:
 SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path 
 [/solr] threw exception
 java.lang.NullPointerException
   at java.io.File.init(File.java:251)
   at 
 org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
   at 
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
   at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
   at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
   at 
 org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 Tomcat version: Apache Tomcat/7.0.23
 Java version: jdk1.7.0_02
 I did some debugging and found that the problem related that 
 it delegates the resolving of resource path to the 
 org.apache.naming.resources.WARDirContext which simply returns null for any 
 input parameters:
 /**
   * Return the real path for a given virtual path, if possible; otherwise
   * return codenull/code.
   *
   * @param path The path to the desired resource
   */
 @Override
 protected String doGetRealPath(String path) {
   return null;
 }
 Need to check specification, because it may be actually the tomcat bug. We 
 may try use the getResourceAsStream(java.lang.String path) method which 
 should work even for war.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3244) New Admin UI doesn't work on tomcat


[ 
https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229115#comment-13229115
 ] 

Uwe Schindler commented on SOLR-3244:
-

It would be nice if you could test this, I have no Tomcat available...

 New Admin UI doesn't work on tomcat
 ---

 Key: SOLR-3244
 URL: https://issues.apache.org/jira/browse/SOLR-3244
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
Reporter: Aliaksandr Zhuhrou
Assignee: Uwe Schindler
 Attachments: SOLR-3244.patch


 I am currently unable to open admin interface when using war deployment under 
 tomcat server.
 The stack trace:
 SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path 
 [/solr] threw exception
 java.lang.NullPointerException
   at java.io.File.init(File.java:251)
   at 
 org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
   at 
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
   at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
   at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
   at 
 org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 Tomcat version: Apache Tomcat/7.0.23
 Java version: jdk1.7.0_02
 I did some debugging and found that the problem related that 
 it delegates the resolving of resource path to the 
 org.apache.naming.resources.WARDirContext which simply returns null for any 
 input parameters:
 /**
   * Return the real path for a given virtual path, if possible; otherwise
   * return codenull/code.
   *
   * @param path The path to the desired resource
   */
 @Override
 protected String doGetRealPath(String path) {
   return null;
 }
 Need to check specification, because it may be actually the tomcat bug. We 
 may try use the getResourceAsStream(java.lang.String path) method which 
 should work even for war.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229125#comment-13229125
 ] 

Michael McCandless commented on LUCENE-3867:


Nice catch on the overcounting of array's RAM usage!

And +1 for additional sizeOf(...) methods.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229127#comment-13229127
 ] 

Uwe Schindler commented on LUCENE-3867:
---

Hi Mike,

Dawid and I were already contacting Hotspot list. There is an easy way to get 
the compressedOoops setting from inside the JVM using MXBeans from the 
ManagementFactory. I think we will provide a patch later! I think by that we 
could also optimize the check for 64 bit, because that one should also be 
reported by the MXBean without looking into strange sysprops (see the TODO in 
the code for JRE_IS_64BIT).

Uwe

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229128#comment-13229128
 ] 

Dawid Weiss commented on LUCENE-3867:
-

Sysprops should be a fallback though because (to be verified) they're supported 
by other vendors whereas the mx bean may not be.

It needs to be verified by running under j9, jrockit, etc.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3241) Document boost fail if a field copy omit the norms

2012-03-14 Thread Commented


[ 
https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229134#comment-13229134
 ] 

Tomás Fernández Löbbe commented on SOLR-3241:
-

I found the issue with copyfields as you mentioned Robert. foo is 
omitNorms=false, and bar is omitNorms=true. I have a copyfield foo-bar and I 
add a document like:
document boost=X
field name=fooAA/field
/document

This case is fixed by the patch. Testing it, i found a similar situation where 
a field1 is a poly type with omitNorms=false, and the subtype if it has 
omitNorms=true. In this case, it fails even without a copyfield just by adding 
a document like:

document boost=X
field name=polyAAA,BBB/field
/document

I don't know if it makes sense to have a poly field where the subtype have a 
different value in the omitNorms attribute, probably this should fail even 
before the document is added.

 Document boost fail if a field copy omit the norms
 --

 Key: SOLR-3241
 URL: https://issues.apache.org/jira/browse/SOLR-3241
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
 Fix For: 3.6, 4.0

 Attachments: SOLR-3241.patch


 After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible 
 to set a boost to a field that has the omitNorms set to true. This is 
 making Solr's document index-time boost to fail when a field that doesn't 
 omit norms is copied (with copyField) to a field that does omit them and 
 document boost is used. For example:
 field name=author type=text indexed=true stored=false 
 omitNorms=false/
 field name=author_display type=string indexed=true stored=true 
 omitNorms=true/
 copyField source=author dest=author_display/
 I'm attaching a possible fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary

2012-03-14 Thread Agnieszka (Created) (JIRA)

Poor performance of Hunspell with Polish Dictionary
---

 Key: SOLR-3245
 URL: https://issues.apache.org/jira/browse/SOLR-3245
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.0
 Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 
cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java 
settings -server -Xms4096M -Xmx4096M 
Reporter: Agnieszka


In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance 
whereas performance of hunspell from http://code.google.com/p/lucene-hunspell/ 
in solr 3.4 is very good. 

Tests shows:

Solr 3.4, full import 489017 documents:

StempelPolishStemFilterFactory -  2908 seconds, 168 docs/sec 
HunspellStemFilterFactory - 3922 seconds, 125 docs/sec

Solr 4.0, full import 489017 documents:

StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec 
HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec

My schema is quit easy. For Hunspell I have one text field I copy 14 text 
fields to:
{code:xml}
field name=text type=text_pl_hunspell indexed=true stored=false 
multiValued=true/

copyField source=field1 dest=text/  

copyField source=field14 dest=text/
{code}


The text_pl_hunspell configuration:

{code:xml}
fieldType name=text_pl_hunspell class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=dict/stopwords_pl.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.HunspellStemFilterFactory 
dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true
!--filter class=solr.KeywordMarkerFilterFactory 
protected=protwords_pl.txt/--
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=dict/stopwords_pl.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.HunspellStemFilterFactory 
dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true
filter class=solr.KeywordMarkerFilterFactory 
protected=dict/protwords_pl.txt/
  /analyzer
/fieldType
{code}

I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, 
synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I 
used in 3.4 version. 

For Polish Stemmer the diffrence is only in definion text field:
{code}
field name=text type=text_pl indexed=true stored=false 
multiValued=true/

fieldType name=text_pl class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=dict/stopwords_pl.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StempelPolishStemFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=dict/protwords_pl.txt/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=dict/stopwords_pl.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StempelPolishStemFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=dict/protwords_pl.txt/
  /analyzer
/fieldType
{code}
One document has 23 fields:
- 14 text fields copy to one text field (above) that is only indexed
- 8 other indexed fields (2 strings, 2 tdates, 3 tint, 1 tfloat) The size of 
one document is 3-4 kB.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary

2012-03-14 Thread Agnieszka (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Agnieszka updated SOLR-3245:


Attachment: pl_PL.zip

Polish dictionary for Hunspell

 Poor performance of Hunspell with Polish Dictionary
 ---

 Key: SOLR-3245
 URL: https://issues.apache.org/jira/browse/SOLR-3245
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.0
 Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 
 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java 
 settings -server -Xms4096M -Xmx4096M 
Reporter: Agnieszka
  Labels: performance
 Attachments: pl_PL.zip


 In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance 
 whereas performance of hunspell from 
 http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. 
 Tests shows:
 Solr 3.4, full import 489017 documents:
 StempelPolishStemFilterFactory -  2908 seconds, 168 docs/sec 
 HunspellStemFilterFactory - 3922 seconds, 125 docs/sec
 Solr 4.0, full import 489017 documents:
 StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec 
 HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec
 My schema is quit easy. For Hunspell I have one text field I copy 14 text 
 fields to:
 {code:xml}
 field name=text type=text_pl_hunspell indexed=true stored=false 
 multiValued=true/
 copyField source=field1 dest=text/  
 
 copyField source=field14 dest=text/
 {code}
 The text_pl_hunspell configuration:
 {code:xml}
 fieldType name=text_pl_hunspell class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.HunspellStemFilterFactory 
 dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true
 !--filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords_pl.txt/--
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.HunspellStemFilterFactory 
 dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true
 filter class=solr.KeywordMarkerFilterFactory 
 protected=dict/protwords_pl.txt/
   /analyzer
 /fieldType
 {code}
 I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, 
 synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I 
 used in 3.4 version. 
 For Polish Stemmer the diffrence is only in definion text field:
 {code}
 field name=text type=text_pl indexed=true stored=false 
 multiValued=true/
 fieldType name=text_pl class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StempelPolishStemFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory 
 protected=dict/protwords_pl.txt/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StempelPolishStemFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory 
 protected=dict/protwords_pl.txt/
   /analyzer
 /fieldType
 {code}
 One document has 23 fields:
 - 14 text fields copy to one text field (above) that is only indexed
 - 8 other indexed fields (2 strings, 2 tdates, 3 tint, 1 tfloat) The size of 
 one document is 3-4 kB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA,

[jira] [Commented] (SOLR-3161) Use of 'qt' should be restricted to searching and should not start with a '/'

2012-03-14 Thread Erik Hatcher (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229154#comment-13229154
]

Erik Hatcher commented on SOLR-3161:

bq. As hoss points out, not all searching request handlers inherit from
SearchHandler.

Then use /-prefixed handlers for those rather than qt. Or, simply add
whatever logic to the DispatchingRequestHandler that makes sense and let qt
dispatching happen there, not from SDF. The DispatchingRequestHandler in my
patch was merely an example; I really don't care what logic is in there to
determine what can be dispatched to, as I'd never use it myself.

bq. The ability to distinguish an update handler from a request handler doesn't
sound complex

Again, I'd say stuff whatever smarts desired down into a dispatching request
handler rather than making Solr's top-level dispatching logic more complicated
than need be.

But, I will say that having a better separated class hierarchy for search vs.
update handlers is a good thing in general.

Use of 'qt' should be restricted to searching and should not start with a '/'
-

Key: SOLR-3161
URL: https://issues.apache.org/jira/browse/SOLR-3161
Project: Solr
Issue Type: Improvement
Components: search, web gui
Reporter: David Smiley
Assignee: David Smiley
Fix For: 3.6, 4.0

Attachments: SOLR-3161-disable-qt-by-default.patch,
SOLR-3161-dispatching-request-handler.patch,
SOLR-3161-dispatching-request-handler.patch

I haven't yet looked at the code involved for suggestions here; I'm speaking
based on how I think things should work and not work, based on intuitiveness
and security. In general I feel it is best practice to use '/' leading
request handler names and not use qt, but I don't hate it enough when used
in limited (search-only) circumstances to propose its demise. But if someone
proposes its deprecation that then I am +1 for that.
Here is my proposal:
Solr should error if the parameter qt is supplied with a leading '/'.
(trunk only)
Solr should only honor qt if the target request handler extends
solr.SearchHandler.
The new admin UI should only use 'qt' when it has to. For the query screen,
it could present a little pop-up menu of handlers to choose from, including
/select?qt=mycustom for handlers that aren't named with a leading '/'. This
choice should be positioned at the top.
And before I forget, me or someone should investigate if there are any
similar security problems with the shards.qt parameter. Perhaps shards.qt can
abide by the same rules outlined above.
Does anyone foresee any problems with this proposal?
On a related subject, I think the notion of a default request handler is bad
- the default=true thing. Honestly I'm not sure what it does, since I
noticed Solr trunk redirects '/solr/' to the new admin UI at '/solr/#/'.
Assuming it doesn't do anything useful anymore, I think it would be clearer
to use requestHandler name=/select class=solr.SearchHandler instead of
what's there now. The delta is to put the leading '/' on this request handler
name, and remove the default attribute.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exposing Solr routing to SolrJ client

2012-03-14 Thread Per Steffensen

FYI (if it is of any interest), we just hacked CloudSolrServer locally 
to support routing of realtime-get requests. Limitations are:
- Only id-parameter and not ids-parameter supported in realtime-get 
requests.
- Only schemas with uniqueKey on field named id and only id-field of 
type string supported.


We did this to be able to start performance tests on our own system 
building on SolrCloud. The performance of our own system is dependent on 
being able to do realtime-gets from the client (our system), because we 
often do updates of documents very quickly after they have been indexed 
for the first time (and we run with soft-commit = 1 sec - we cant wait 
for that). We use the version control (for optimistic locking) and 
unique key constraint where you fail instead of overwrite if document 
already exists 
(http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics) in our 
highly concurrent performance test, so that will also be tested wrt 
performance.


What we did in CloudSolrServer was:
* Added the following to the requst method between the if (collection 
== null) statement and the LBHttpSolrServer.Req req = new 
LBHttpSolrServer.Req(request, urlList); statement:

   ListString urlList = new ArrayListString();
   if (reqParams.get(CommonParams.QT) != null  
reqParams.get(CommonParams.QT).equals(/get)) {

 String id = reqParams.get(id);
 int hash = hash(id);
 String shardId = getShard(hash, collection, cloudState);

 ZkCoreNodeProps leaderProps = null;

 try {
   leaderProps = new ZkCoreNodeProps(zkStateReader.getLeaderProps(
   collection, shardId));
 } catch (InterruptedException ie) {
   throw new SolrServerException(ie);
 }

 String fullUrl = 
ensureUrlHasProtocolIdentifier(leaderProps.getCoreUrl());

 urlList.add(fullUrl);

   } else {
 stuff that was already in request between the if (collection == 
null) statement and the LBHttpSolrServer.Req req = new 
LBHttpSolrServer.Req(request, urlList); statement

   }
* Added the follow helper-methods (stolen from 
DistributedUpdateProcessor etc.)

 private String ensureUrlHasProtocolIdentifier(String url) {
   if (!url.startsWith(http://;)  !url.startsWith(https://;)) {
 url = http://; + url;
   }
   return url;
 }

 private String getShard(int hash, String collection, CloudState 
cloudState) {

   return cloudState.getShard(hash, collection);
 }

 private int hash(String id) {
   BytesRef indexedId = new BytesRef();
   UnicodeUtil.UTF16toUTF8(id, 0, id.length(), indexedId);
   return Hash.murmurhash3_x86_32(indexedId.bytes, indexedId.offset, 
indexedId.length, 0);

 }

It seems to work for us, but we look very much forward to the real 
solution.


Regards, Per Steffensen


Per Steffensen skrev:


Right, you can't yet even with CloudSolrServer - but I think it will 
be done soon - certainly before the 4 release anyway.
Ok, I will cross my fingers for it to be done soon. Thanks for your 
kind help.


Regards, Steff

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3241) Document boost fail if a field copy omit the norms


 [ 
https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-3241:
--

Attachment: SOLR-3241.patch

updating the patch from Tomás to include an additional test for the field 
boost+copyField.

we still need to add tests for this polyField case, and any other possibilities 
(can you polyField+copyField?)

 Document boost fail if a field copy omit the norms
 --

 Key: SOLR-3241
 URL: https://issues.apache.org/jira/browse/SOLR-3241
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
 Fix For: 3.6, 4.0

 Attachments: SOLR-3241.patch, SOLR-3241.patch


 After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible 
 to set a boost to a field that has the omitNorms set to true. This is 
 making Solr's document index-time boost to fail when a field that doesn't 
 omit norms is copied (with copyField) to a field that does omit them and 
 document boost is used. For example:
 field name=author type=text indexed=true stored=false 
 omitNorms=false/
 field name=author_display type=string indexed=true stored=true 
 omitNorms=true/
 copyField source=author dest=author_display/
 I'm attaching a possible fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229160#comment-13229160
 ] 

Michael McCandless commented on LUCENE-3867:


Consulting MXBean sounds great?

bq. Sysprops should be a fallback though 

+1


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Uwe Schindler (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3867:
--

Attachment: LUCENE-3867-compressedOops.patch

Here the patch for detecting compressesOops in Sun JVMs. For other JVMs it will 
simply use false, so the object refs will be guessed to have 64 bits, which is 
fine as upper memory limit.

The code does only use public Java APIs and falls back if anything fails to 
false.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3244) New Admin UI doesn't work on tomcat

2012-03-14 Thread Aliaksandr Zhuhrou (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229167#comment-13229167
 ] 

Aliaksandr Zhuhrou commented on SOLR-3244:
--

Sure, I will test it at evening. Thank you very much.

 New Admin UI doesn't work on tomcat
 ---

 Key: SOLR-3244
 URL: https://issues.apache.org/jira/browse/SOLR-3244
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
Reporter: Aliaksandr Zhuhrou
Assignee: Uwe Schindler
 Attachments: SOLR-3244.patch


 I am currently unable to open admin interface when using war deployment under 
 tomcat server.
 The stack trace:
 SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path 
 [/solr] threw exception
 java.lang.NullPointerException
   at java.io.File.init(File.java:251)
   at 
 org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
   at 
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
   at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
   at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
   at 
 org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 Tomcat version: Apache Tomcat/7.0.23
 Java version: jdk1.7.0_02
 I did some debugging and found that the problem related that 
 it delegates the resolving of resource path to the 
 org.apache.naming.resources.WARDirContext which simply returns null for any 
 input parameters:
 /**
   * Return the real path for a given virtual path, if possible; otherwise
   * return codenull/code.
   *
   * @param path The path to the desired resource
   */
 @Override
 protected String doGetRealPath(String path) {
   return null;
 }
 Need to check specification, because it may be actually the tomcat bug. We 
 may try use the getResourceAsStream(java.lang.String path) method which 
 should work even for war.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Shai Erera (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-3867:
---

Attachment: LUCENE-3867.patch

Patch adds RUE.sizeOf(String) and various sizeOf(arr[]) methods. Also fixes the 
ARRAY_HEADER.

Uwe, I merged with your patch, with one difference -- the System.out prints in 
the test are printed only if VERBOSE.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary

2012-03-14 Thread Agnieszka (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229180#comment-13229180
 ] 

Agnieszka commented on SOLR-3245:
-

I made one more test for Hunspell with english dictionary (from OpenOffice.org) 
in Solr 4.0. It seems that the problem not exists with the english dictionary.

Solr 4.0, full import 489017 documents, hunspell, english dictionary:

3146 seconds, 155 docs/sec


But I'm not sure if it is reliable because I use documents with polish text to 
test english dictionary.

 Poor performance of Hunspell with Polish Dictionary
 ---

 Key: SOLR-3245
 URL: https://issues.apache.org/jira/browse/SOLR-3245
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.0
 Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 
 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java 
 settings -server -Xms4096M -Xmx4096M 
Reporter: Agnieszka
  Labels: performance
 Attachments: pl_PL.zip


 In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance 
 whereas performance of hunspell from 
 http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. 
 Tests shows:
 Solr 3.4, full import 489017 documents:
 StempelPolishStemFilterFactory -  2908 seconds, 168 docs/sec 
 HunspellStemFilterFactory - 3922 seconds, 125 docs/sec
 Solr 4.0, full import 489017 documents:
 StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec 
 HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec
 My schema is quit easy. For Hunspell I have one text field I copy 14 text 
 fields to:
 {code:xml}
 field name=text type=text_pl_hunspell indexed=true stored=false 
 multiValued=true/
 copyField source=field1 dest=text/  
 
 copyField source=field14 dest=text/
 {code}
 The text_pl_hunspell configuration:
 {code:xml}
 fieldType name=text_pl_hunspell class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.HunspellStemFilterFactory 
 dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true
 !--filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords_pl.txt/--
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.HunspellStemFilterFactory 
 dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true
 filter class=solr.KeywordMarkerFilterFactory 
 protected=dict/protwords_pl.txt/
   /analyzer
 /fieldType
 {code}
 I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, 
 synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I 
 used in 3.4 version. 
 For Polish Stemmer the diffrence is only in definion text field:
 {code}
 field name=text type=text_pl indexed=true stored=false 
 multiValued=true/
 fieldType name=text_pl class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StempelPolishStemFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory 
 protected=dict/protwords_pl.txt/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StempelPolishStemFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory 
 protected=dict/protwords_pl.txt/
   /analyzer
 /fieldType
 {code}
 One document has 23 fields:
 - 14 text fields copy to one text field

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229182#comment-13229182
 ] 

Uwe Schindler commented on LUCENE-3867:
---

Shai: Thanks! I am in a train at the moment, so internet is slow/not working. I 
will later find out what MXBeans we can use to detect 64bit without looking at 
strange sysprops (which may have been modified by user code, so not really 
secure to use...).

I left the non-verbose printlns in it, so people reviewing the patch can 
quickly see by running that test what happens on their JVM. It would be 
interesting to see what your jRockit does... :-)

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3241) Document boost fail if a field copy omit the norms


 [ 
https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-3241:
--

Attachment: SOLR-3241.patch

Updated patch with fixes for the polyfield case (untested!)

After reviewing the code: Tomás had the correct fix for the copyField case, his 
patch fixes a logic bug, nothing more!

The polyField case is different: its too late in DocumentBuilder to do anything 
here after the creation of IndexableFields: moreover we cannot nuke the whole 
boost for the field because we cannot assume anything just because 
isPolyField() == true, for example a custom field type might not even be 
instanceof AbstractSubTypeField! 

Because of this I think these fieldtypes should really treat the fact they use 
'real fields' as an impl detail. so I added the logic to their subfield 
creation.

 Document boost fail if a field copy omit the norms
 --

 Key: SOLR-3241
 URL: https://issues.apache.org/jira/browse/SOLR-3241
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
 Fix For: 3.6, 4.0

 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch


 After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible 
 to set a boost to a field that has the omitNorms set to true. This is 
 making Solr's document index-time boost to fail when a field that doesn't 
 omit norms is copied (with copyField) to a field that does omit them and 
 document boost is used. For example:
 field name=author type=text indexed=true stored=false 
 omitNorms=false/
 field name=author_display type=string indexed=true stored=true 
 omitNorms=true/
 copyField source=author dest=author_display/
 I'm attaching a possible fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229188#comment-13229188
 ] 

Shai Erera commented on LUCENE-3867:


I tried IBM and Oracle 1.6 JVMs, and both printed the same:

{code}
[junit] - Standard Output ---
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: This JVM uses CompressedOops: false
[junit] -  ---
{code}

So no CompressedOops for me :).

bq. I will later find out what MXBeans we can use to detect 64bit without 
looking at strange sysprops

Ok. If you'll make it, we can add these changes to that patch, otherwise we can 
also do them in a separate issue.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Uwe Schindler (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229191#comment-13229191
 ] 

Uwe Schindler commented on LUCENE-3867:
---

Hm, for it prints true. What JVMs are you using and what settings?

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229191#comment-13229191
 ] 

Uwe Schindler edited comment on LUCENE-3867 at 3/14/12 1:47 PM:


Hm, for me (1.6.0_31, 7u3) it prints true. What JVMs are you using and what 
settings?

  was (Author: thetaphi):
Hm, for it prints true. What JVMs are you using and what settings?
  
 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3221) Make Shard handler threadpool configurable

2012-03-14 Thread Erick Erickson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229197#comment-13229197
 ] 

Erick Erickson commented on SOLR-3221:
--

Yeah, just go ahead and edit the Wiki, all you need to do is create an account.

As for CHANGES.txt. Just attach it to this JIRA and I'll go ahead and commit it 
without a new JIRA. Two tricky things:
1 there are two files, one in the 3x branch and one in the 4x, both need the 
text
2 The 4x branch needs to be edited in two places, one for the 3.x section and 
in the 4x section.

I'll see it when if this JIRA changes and check them in.

Thanks!

 Make Shard handler threadpool configurable
 --

 Key: SOLR-3221
 URL: https://issues.apache.org/jira/browse/SOLR-3221
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.6, 4.0
Reporter: Greg Bowyer
Assignee: Erick Erickson
  Labels: distributed, http, shard
 Fix For: 3.6, 4.0

 Attachments: SOLR-3221-3x_branch.patch, SOLR-3221-3x_branch.patch, 
 SOLR-3221-3x_branch.patch, SOLR-3221-3x_branch.patch, 
 SOLR-3221-3x_branch.patch, SOLR-3221-trunk.patch, SOLR-3221-trunk.patch, 
 SOLR-3221-trunk.patch, SOLR-3221-trunk.patch, SOLR-3221-trunk.patch


 From profiling of monitor contention, as well as observations of the
 95th and 99th response times for nodes that perform distributed search
 (or ‟aggregator‟ nodes) it would appear that the HttpShardHandler code
 currently does a suboptimal job of managing outgoing shard level
 requests.
 Presently the code contained within lucene 3.5's SearchHandler and
 Lucene trunk / 3x's ShardHandlerFactory create arbitrary threads in
 order to service distributed search requests. This is done presently to
 limit the size of the threadpool such that it does not consume resources
 in deployment configurations that do not use distributed search.
 This unfortunately has two impacts on the response time if the node
 coordinating the distribution is under high load.
 The usage of the MaxConnectionsPerHost configuration option results in
 aggressive activity on semaphores within HttpCommons, it has been
 observed that the aggregator can have a response time far greater than
 that of the searchers. The above monitor contention would appear to
 suggest that in some cases its possible for liveness issues to occur and
 for simple queries to be starved of resources simply due to a lack of
 attention from the viewpoint of context switching.
 With, as mentioned above the http commons connection being hotly
 contended
 The fair, queue based configuration eliminates this, at the cost of
 throughput.
 This patch aims to make the threadpool largely configurable allowing for
 those using solr to choose the throughput vs latency balance they
 desire.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229200#comment-13229200
 ] 

Uwe Schindler commented on LUCENE-3867:
---

Here my results:

{noformat}
*
JAVA_HOME = C:\Program Files\Java\jdk1.7.0_03
java version 1.7.0_03
Java(TM) SE Runtime Environment (build 1.7.0_03-b05)
Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)
*

C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\coreant test 
-Dtestcase=TestRam*
[junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,561 sec
[junit]
[junit] - Standard Output ---
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: This JVM uses CompressedOops: true
[junit] -  ---

C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\coreant test 
-Dtestcase=TestRam* -Dargs=-XX:-UseCompressedOops
[junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,5 sec
[junit]
[junit] - Standard Output ---
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: This JVM uses CompressedOops: false
[junit] -  ---

*
JAVA_HOME = C:\Program Files\Java\jdk1.6.0_31
java version 1.6.0_31
Java(TM) SE Runtime Environment (build 1.6.0_31-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
*

C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\coreant test 
-Dtestcase=TestRam*
[junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,453 sec
[junit]
[junit] - Standard Output ---
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: This JVM uses CompressedOops: true
[junit] -  ---

C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\coreant test 
-Dtestcase=TestRam* -Dargs=-XX:-UseCompressedOops
[junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,421 sec
[junit]
[junit] - Standard Output ---
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: This JVM uses CompressedOops: false
[junit] -  ---

C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\coreant test 
-Dtestcase=TestRam* -Dargs=-XX:+UseCompressedOops
[junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,422 sec
[junit]
[junit] - Standard Output ---
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: This JVM uses CompressedOops: true
[junit] -  ---
{noformat}

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 *

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229211#comment-13229211
 ] 

Shai Erera commented on LUCENE-3867:


Oracle:

{code}
java version 1.6.0_21
Java(TM) SE Runtime Environment (build 1.6.0_21-b07)
Java HotSpot(TM) 64-Bit Server VM (build 17.0-b17, mixed mode)
{code}

IBM:

{code}
java version 1.6.0
Java(TM) SE Runtime Environment (build pwa6460sr9fp3-2022_05(SR9 FP3))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Windows 7 amd64-64 
jvmwa6460sr9-2011_94827 (JIT enabled, AOT enabled)
J9VM - 2011_094827
JIT  - r9_20101028_17488ifx45
GC   - 20101027_AA)
JCL  - 20110727_07
{code}

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229215#comment-13229215
 ] 

Shai Erera commented on LUCENE-3867:


I ran ant test-core -Dtestcase=TestRam* -Dtests.verbose=true 
-Dargs=-XX:+UseCompressedOops and with the Oracle JVM I get Compressed Oops: 
true but with IBM JVM I still get 'false'.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229217#comment-13229217
 ] 

Uwe Schindler commented on LUCENE-3867:
---

OK, that is expected. 1.6.0_21 does not enable compressedOops by default, so 
false is correct. If you manually enable, it gets true.

jRockit is jRockit and not Sun/Oracle, so the result is somehow expected. It 
seems to nor have that MXBrean. But the code does not produce strange 
exceptions, so at least in the Sun VM we can detect compressed Oops and guess 
the reference size better. 8 is still not bad as it gives an upper limit.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229218#comment-13229218
 ] 

Uwe Schindler commented on LUCENE-3867:
---

By the way, here is the code from the hotspot mailing list member (my code is 
based on it), it also shows the outputs for different JVMs:

https://gist.github.com/1333043

(I just removed the com.sun.* imports and replaced by reflection)

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229226#comment-13229226
 ] 

Shai Erera commented on LUCENE-3867:


bq. 8 is still not bad as it gives an upper limit.

I agree. Better to over-estimate here, than under-estimate.

Would appreciate if someone can take a look at the sizeOf() impls before I 
commit.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3241) Document boost fail if a field copy omit the norms

2012-03-14 Thread Updated


 [ 
https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-3241:


Attachment: SOLR-3241.patch

I'm attaching another patch with some more tests.
Also updated the DocumentBuilder to use the existing logic instead of 
replicating it where the fix is applied.

 Document boost fail if a field copy omit the norms
 --

 Key: SOLR-3241
 URL: https://issues.apache.org/jira/browse/SOLR-3241
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
 Fix For: 3.6, 4.0

 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, 
 SOLR-3241.patch


 After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible 
 to set a boost to a field that has the omitNorms set to true. This is 
 making Solr's document index-time boost to fail when a field that doesn't 
 omit norms is copied (with copyField) to a field that does omit them and 
 document boost is used. For example:
 field name=author type=text indexed=true stored=false 
 omitNorms=false/
 field name=author_display type=string indexed=true stored=true 
 omitNorms=true/
 copyField source=author dest=author_display/
 I'm attaching a possible fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest


[ 
https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229238#comment-13229238
 ] 

Robert Muir commented on LUCENE-3869:
-

Stacktrace:
{noformat}
junit-sequential:
[junit] Testsuite: org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.603 sec
[junit] 
[junit] Testsuite: org.apache.lucene.analysis.uima.UIMATypeAwareAnalyzerTest
[junit] 2012-03-14 10:42:27
[junit] Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed 
mode):
[junit] 
[junit] Thread-9 prio=10 tid=0x7f3ba44af000 nid=0x34d1 runnable 
[0x7f3ba3a72000]
[junit]java.lang.Thread.State: RUNNABLE
[junit] at java.util.HashMap.transfer(HashMap.java:484)
[junit] at java.util.HashMap.resize(HashMap.java:463)
[junit] at java.util.HashMap.addEntry(HashMap.java:755)
[junit] at java.util.HashMap.put(HashMap.java:385)
[junit] at java.util.HashSet.add(HashSet.java:200)
[junit] at 
org.apache.uima.analysis_engine.impl.AnalysisEngineManagementImpl.setName(AnalysisEngineManagementImpl.java:245)
[junit] at 
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:181)
[junit] at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:127)
[junit] at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
[junit] at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
[junit] at 
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:267)
[junit] at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:335)
[junit] at 
org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:73)
[junit] at 
org.apache.lucene.analysis.uima.BaseUIMATokenizer.init(BaseUIMATokenizer.java:45)
[junit] at 
org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.init(UIMATypeAwareAnnotationsTokenizer.java:54)
[junit] at 
org.apache.lucene.analysis.uima.UIMATypeAwareAnalyzer.createComponents(UIMATypeAwareAnalyzer.java:40)
[junit] at 
org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83)
[junit] at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:368)
[junit] at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:338)
[junit] at 
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:330)
[junit] 
[junit] Low Memory Detector daemon prio=10 tid=0x41a0d000 
nid=0x34c7 runnable [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] CompilerThread1 daemon prio=10 tid=0x41a0a800 nid=0x34c6 
waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] CompilerThread0 daemon prio=10 tid=0x41a08000 nid=0x34c5 
waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] Signal Dispatcher daemon prio=10 tid=0x41a05800 
nid=0x34c4 waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] Finalizer daemon prio=10 tid=0x419e9000 nid=0x34c3 in 
Object.wait() [0x7f3ba092]
[junit]java.lang.Thread.State: WAITING (on object monitor)
[junit] at java.lang.Object.wait(Native Method)
[junit] - waiting on 0xe6920d88 (a 
java.lang.ref.ReferenceQueue$Lock)
[junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
[junit] - locked 0xe6920d88 (a 
java.lang.ref.ReferenceQueue$Lock)
[junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
[junit] at 
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
[junit] 
[junit] Reference Handler daemon prio=10 tid=0x419e6800 
nid=0x34c2 in Object.wait() [0x7f3ba8a21000]
[junit]java.lang.Thread.State: WAITING (on object monitor)
[junit] at java.lang.Object.wait(Native Method)
[junit] - waiting on 0xe6920d20 (a 
java.lang.ref.Reference$Lock)
[junit] at java.lang.Object.wait(Object.java:485)
[junit] at 
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
[junit] - locked 0xe6920d20 (a java.lang.ref.Reference$Lock)
[junit] 
[junit] 
LTC-main#seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd prio=10 
tid=0x4197a800 nid=0x34b8 in Object.wait() [0x7f3badfc6000]
[junit]

[jira] [Created] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest

2012-03-14 Thread Robert Muir (Created) (JIRA)

possible hang in UIMATypeAwareAnalyzerTest
--

 Key: LUCENE-3869
 URL: https://issues.apache.org/jira/browse/LUCENE-3869
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Robert Muir


Just testing an unrelated patch, I was hung (with 100% cpu) in 
UIMATypeAwareAnalyzerTest.

I'll attach stacktrace at the moment of the hang.

The fact we get a seed in the actual stacktraces for cases like this is 
awesome! Thanks Dawid!
I don't think it reproduces 100%, but I'll try beasting this seed to see if i 
can reproduce the hang:

should be 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest 
-Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd' 
from what I can see.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest

2012-03-14 Thread Tommaso Teofili (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229241#comment-13229241
 ] 

Tommaso Teofili commented on LUCENE-3869:
-

Thanks Robert, I'm taking a look

 possible hang in UIMATypeAwareAnalyzerTest
 --

 Key: LUCENE-3869
 URL: https://issues.apache.org/jira/browse/LUCENE-3869
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Robert Muir

 Just testing an unrelated patch, I was hung (with 100% cpu) in 
 UIMATypeAwareAnalyzerTest.
 I'll attach stacktrace at the moment of the hang.
 The fact we get a seed in the actual stacktraces for cases like this is 
 awesome! Thanks Dawid!
 I don't think it reproduces 100%, but I'll try beasting this seed to see if i 
 can reproduce the hang:
 should be 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest 
 -Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd' 
 from what I can see.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3241) Document boost fail if a field copy omit the norms

2012-03-14 Thread Uwe Schindler (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229243#comment-13229243
 ] 

Robert Muir commented on SOLR-3241:
---

Thanks, patch looks good!

I'll wait a bit for any other input.

 Document boost fail if a field copy omit the norms
 --

 Key: SOLR-3241
 URL: https://issues.apache.org/jira/browse/SOLR-3241
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
 Fix For: 3.6, 4.0

 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, 
 SOLR-3241.patch


 After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible 
 to set a boost to a field that has the omitNorms set to true. This is 
 making Solr's document index-time boost to fail when a field that doesn't 
 omit norms is copied (with copyField) to a field that does omit them and 
 document boost is used. For example:
 field name=author type=text indexed=true stored=false 
 omitNorms=false/
 field name=author_display type=string indexed=true stored=true 
 omitNorms=true/
 copyField source=author dest=author_display/
 I'm attaching a possible fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229244#comment-13229244
 ] 

Uwe Schindler edited comment on LUCENE-3867 at 3/14/12 3:01 PM:


On Hotspot Mailing list some people also seem to have an idea about jRockit and 
IBM J9:

{quote}
From: Krystal Mok
Sent: Wednesday, March 14, 2012 3:46 PM
To: Uwe Schindler
Cc: Dawid Weiss; hotspot compiler
Subject: Re: How to detect if the VM is running with compact refs from within 
the VM (no agent)?

Hi,

Just in case you'd care, the same MXBean could be used to detect compressed 
references on JRockit, too. It's probably available starting from JRockit R28.

Instead of UseCompressedOops, use CompressedRefs as the VM option name on 
JRockit.

Don't know how to extract this information for J9 without another whole bunch 
of hackeries...well, you could try this, on a best-effort basis for platform 
detection:
IBM J9's VM version string contains the compressed reference information. 
Example:

$ export JAVA_OPTS='-Xcompressedrefs'
$ groovysh
Groovy Shell (1.7.7, JVM: 1.7.0)
Type 'help' or '\h' for help.

groovy:000 System.getProperty 'java.vm.info'
=== JRE 1.7.0 Linux amd64-64 Compressed References 20110810_88604 (JIT 
enabled, AOT enabled)
J9VM - R26_Java726_GA_20110810_1208_B88592
JIT  - r11_20110810_20466
GC   - R26_Java726_GA_20110810_1208_B88592_CMPRSS
J9CL - 20110810_88604
groovy:000 quit

So grepping for Compressed References in the java.vm.info system property 
gives you the clue.

- Kris
{quote}

  was (Author: thetaphi):
On Hotspot Mailing list some people also seem to have an idea about jRockit 
and IBM J9:

{quote}
From: Krystal Mok [mailto:rednaxel...@gmail.com] 
Sent: Wednesday, March 14, 2012 3:46 PM
To: Uwe Schindler
Cc: Dawid Weiss; hotspot compiler
Subject: Re: How to detect if the VM is running with compact refs from within 
the VM (no agent)?

Hi,

Just in case you'd care, the same MXBean could be used to detect compressed 
references on JRockit, too. It's probably available starting from JRockit R28.

Instead of UseCompressedOops, use CompressedRefs as the VM option name on 
JRockit.

Don't know how to extract this information for J9 without another whole bunch 
of hackeries...well, you could try this, on a best-effort basis for platform 
detection:
IBM J9's VM version string contains the compressed reference information. 
Example:

$ export JAVA_OPTS='-Xcompressedrefs'
$ groovysh
Groovy Shell (1.7.7, JVM: 1.7.0)
Type 'help' or '\h' for help.

groovy:000 System.getProperty 'java.vm.info'
=== JRE 1.7.0 Linux amd64-64 Compressed References 20110810_88604 (JIT 
enabled, AOT enabled)
J9VM - R26_Java726_GA_20110810_1208_B88592
JIT  - r11_20110810_20466
GC   - R26_Java726_GA_20110810_1208_B88592_CMPRSS
J9CL - 20110810_88604
groovy:000 quit

So grepping for Compressed References in the java.vm.info system property 
gives you the clue.

- Kris
{quote}
  
 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229244#comment-13229244
 ] 

Uwe Schindler commented on LUCENE-3867:
---

On Hotspot Mailing list some people also seem to have an idea about jRockit and 
IBM J9:

{quote}
From: Krystal Mok [mailto:rednaxel...@gmail.com] 
Sent: Wednesday, March 14, 2012 3:46 PM
To: Uwe Schindler
Cc: Dawid Weiss; hotspot compiler
Subject: Re: How to detect if the VM is running with compact refs from within 
the VM (no agent)?

Hi,

Just in case you'd care, the same MXBean could be used to detect compressed 
references on JRockit, too. It's probably available starting from JRockit R28.

Instead of UseCompressedOops, use CompressedRefs as the VM option name on 
JRockit.

Don't know how to extract this information for J9 without another whole bunch 
of hackeries...well, you could try this, on a best-effort basis for platform 
detection:
IBM J9's VM version string contains the compressed reference information. 
Example:

$ export JAVA_OPTS='-Xcompressedrefs'
$ groovysh
Groovy Shell (1.7.7, JVM: 1.7.0)
Type 'help' or '\h' for help.

groovy:000 System.getProperty 'java.vm.info'
=== JRE 1.7.0 Linux amd64-64 Compressed References 20110810_88604 (JIT 
enabled, AOT enabled)
J9VM - R26_Java726_GA_20110810_1208_B88592
JIT  - r11_20110810_20466
GC   - R26_Java726_GA_20110810_1208_B88592_CMPRSS
J9CL - 20110810_88604
groovy:000 quit

So grepping for Compressed References in the java.vm.info system property 
gives you the clue.

- Kris
{quote}

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest


[ 
https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229255#comment-13229255
 ] 

Robert Muir commented on LUCENE-3869:
-

If you try this seed a lot of times, eventually it will reproduce.

I tried adding -Dtests.iter=10 to my command-line, 
so 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest 
-Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd 
-Dtests.iter=10'

after about 3 or 4 runs it hung... though interestingly it hung at 200% cpu 
usage (as if two of the analysis threads were stuck).

Stacktrace is in the same place, just for both threads:
{noformat}
[junit] Thread-48 prio=10 tid=0x7fbec0b9c000 nid=0x4979 runnable 
[0x7fbebfa7d000]
[junit]java.lang.Thread.State: RUNNABLE
[junit] at java.util.HashMap.getEntry(HashMap.java:347)
[junit] at java.util.HashMap.containsKey(HashMap.java:335)
[junit] at java.util.HashSet.contains(HashSet.java:184)
[junit] at 
org.apache.uima.analysis_engine.impl.AnalysisEngineManagementImpl.setName(AnalysisEngineManagementImpl.java:242)
[junit] at 
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:181)
[junit] at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:127)
[junit] at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
[junit] at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
[junit] at 
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:267)
[junit] at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:335)
[junit] at 
org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:73)
[junit] at 
org.apache.lucene.analysis.uima.BaseUIMATokenizer.init(BaseUIMATokenizer.java:45)
[junit] at 
org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.init(UIMATypeAwareAnnotationsTokenizer.java:54)
[junit] at 
org.apache.lucene.analysis.uima.UIMATypeAwareAnalyzer.createComponents(UIMATypeAwareAnalyzer.java:40)
[junit] at 
org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83)
[junit] at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:368)
[junit] at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:338)
[junit] at 
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:330)
[junit] 
[junit] Thread-46 prio=10 tid=0x7fbec10df000 nid=0x4977 runnable 
[0x7fbebfb7e000]
[junit]java.lang.Thread.State: RUNNABLE
[junit] at java.util.HashMap.getEntry(HashMap.java:347)
[junit] at java.util.HashMap.containsKey(HashMap.java:335)
[junit] at java.util.HashSet.contains(HashSet.java:184)
[junit] at 
org.apache.uima.analysis_engine.impl.AnalysisEngineManagementImpl.setName(AnalysisEngineManagementImpl.java:242)
[junit] at 
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:181)
[junit] at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:127)
[junit] at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
[junit] at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
[junit] at 
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:267)
[junit] at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:335)
[junit] at 
org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:73)
[junit] at 
org.apache.lucene.analysis.uima.BaseUIMATokenizer.init(BaseUIMATokenizer.java:45)
[junit] at 
org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.init(UIMATypeAwareAnnotationsTokenizer.java:54)
[junit] at 
org.apache.lucene.analysis.uima.UIMATypeAwareAnalyzer.createComponents(UIMATypeAwareAnalyzer.java:40)
[junit] at 
org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83)
[junit] at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:368)
[junit] at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:338)
[junit] at 
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:330)
{noformat}


 possible hang in UIMATypeAwareAnalyzerTest
 --

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229264#comment-13229264
 ] 

Michael McCandless commented on LUCENE-3867:


Patch looks good!

Maybe just explain in sizeOf(String) javadoc that this method assumes the 
String is standalone (ie, does not reference a larger char[] than itself)?

Because... if you call String.substring, the returned string references a slice 
the char[] of the original one... and so technically the RAM it's tying up 
could be (much) larger than expected.  (At least, this used to be the case... 
not sure if it's changed...).

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3841) CloseableThreadLocal does not work well with Tomcat thread pooling

2012-03-14 Thread Michael McCandless (Resolved) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless resolved LUCENE-3841.

Resolution: Fixed

Thanks Matthew!

CloseableThreadLocal does not work well with Tomcat thread pooling
--

Key: LUCENE-3841
URL: https://issues.apache.org/jira/browse/LUCENE-3841
Project: Lucene - Java
Issue Type: Bug
Components: core/other
Affects Versions: 3.5
Environment: Lucene/Tika/Snowball running in a Tomcat web application
Reporter: Matthew Bellew
Assignee: Michael McCandless
Fix For: 3.6, 4.0

Attachments: LUCENE-3841.patch

We tracked down a large memory leak (effectively a leak anyway) caused
by how Analyzer users CloseableThreadLocal.
CloseableThreadLocal.hardRefs holds references to Thread objects as
keys. The problem is that it only frees these references in the set()
method, and SnowballAnalyzer will only call set() when it is used by a
NEW thread.
The problem scenario is as follows:
The server experiences a spike in usage (say by robots or whatever)
and many threads are created and referenced by
CloseableThreadLocal.hardRefs. The server quiesces and lets many of
these threads expire normally. Now we have a smaller, but adequate
thread pool. So CloseableThreadLocal.set() may not be called by
SnowBallAnalyzer (via Analyzer) for a _long_ time. The purge code is
never called, and these threads along with their thread local storage
(lucene related or not) is never cleaned up.
I think calling the purge code in both get() and set() would have
avoided this problem, but is potentially expensive. Perhaps using
WeakHashMap instead of HashMap may also have helped. WeakHashMap
purges on get() and set(). So this might be an efficient way to
clean up threads in get(), while set() might do the more expensive
Map.keySet() iteration.
Our current work around is to not share SnowBallAnalyzer instances
among HTTP searcher threads. We open and close one on every request.
Thanks,
Matt

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229273#comment-13229273
 ] 

Shai Erera commented on LUCENE-3867:


Good point. I clarified the jdocs with this:

{code}
  /**
   * Returns the approximate size of a String object. This computation relies on
   * {@link String#length()} to compute the number of bytes held by the char[].
   * However, if the String object passed to this method is the result of e.g.
   * {@link String#substring}, the computation may be entirely inaccurate
   * (depending on the difference between length() and the actual char[]
   * length).
   */
{code}

If there are no objections, I'd like to commit this.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229275#comment-13229275
 ] 

Dawid Weiss commented on LUCENE-3867:
-

I would opt for sizeOf to return the actual size of the object, including 
underlying string buffers... We can take into account interning buffers but 
other than that I wouldn't skew the result because it can be misleading.


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229279#comment-13229279
 ] 

Dawid Weiss commented on LUCENE-3867:
-

I don't like this special handling of Strings, to be honest. Why do we need/do 
it?

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Terrance A. Snyder (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229286#comment-13229286
 ] 

Shai Erera commented on LUCENE-3867:


bq. I don't like this special handling of Strings, to be honest. Why do we 
need/do it?

Because I wrote it, and it seemed useful to me, so why not? We know how Strings 
look like, at least in their worse case. If there will be a better 
implementation, we can fix it in RUE, rather than having many impls try to do 
it on their own?

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1970) need to customize location of dataimport.properties

2012-03-14 Thread Terrance A. Snyder (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229289#comment-13229289
]

Terrance A. Snyder commented on SOLR-1970:
--

+1 I am looking at implementing with current trunk. I'll hopefully submit a
patch to this. Keep in mind this may have to be added to the core configuration
area to keep backward compatibility.

Something like:

core name=users_001 instanceDir=users config=solrconfig.xml
dataDir=../users_001 dataImportPropertiesFile=../users_001/di.properties/
core name=users_002 instanceDir=users config=solrconfig.xml
dataDir=../users_002 dataImportPropertiesFile=../users_001/di.properties/

This would operation just like todays core config:

$SOLR_HOME/users
/users/conf
/users/conf/solrconfig.xml
/users/conf/schema.xml
/users_001
/users_001/data
/users_001/di.properties
/users_002
/users_002/data
/users_002/di.properties

This allows the core configuration and sharding to work effectively. The core
question is how this would play with zookeeper / cloud support.

I would think this should already be baked into SolrCloud but I could be
wrong. Any thoughts?

need to customize location of dataimport.properties
---

Key: SOLR-1970
URL: https://issues.apache.org/jira/browse/SOLR-1970
Project: Solr
Issue Type: Improvement
Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Chris Book

By default dataimport.properties is written to {solr.home}/conf/. However
when using multiple solr cores, it is currently useful to use the same conf
directory for all of the cores and use solr.xml to specify a different
schema.xml. I can then specify a different data-config.xml for each core to
define how the data gets from the database to each core's shema.
However, all the solr cores will fight over writing to the
dataimport.properties file. There should be an option in solrconfig.xml to
specify the location or name of this file so that a different one can be used
for each core.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-1970) need to customize location of dataimport.properties

[
https://issues.apache.org/jira/browse/SOLR-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229289#comment-13229289
]

Terrance A. Snyder edited comment on SOLR-1970 at 3/14/12 4:08 PM:
---

+1 I am looking at implementing with current trunk. I'll hopefully submit a
patch to this. Keep in mind this may have to be added to the core configuration
area to keep backward compatibility.

Something like:

This would operation just like todays core config:

$SOLR_HOME/users
/users/conf
/users/conf/solrconfig.xml
/users/conf/schema.xml
/users_001
/users_001/data
/users_001/di.properties
/users_002
/users_002/data
/users_002/di.properties

This allows the core configuration and sharding to work effectively. The core
question is how this would play with zookeeper / cloud support.

I would think this should already be baked into SolrCloud but I could be
wrong. Any thoughts?

was (Author: terrance.snyder):
+1 I am looking at implementing with current trunk. I'll hopefully submit a
patch to this. Keep in mind this may have to be added to the core configuration
area to keep backward compatibility.

Something like:

This would operation just like todays core config:

$SOLR_HOME/users
/users/conf
/users/conf/solrconfig.xml
/users/conf/schema.xml
/users_001
/users_001/data
/users_001/di.properties
/users_002
/users_002/data
/users_002/di.properties

This allows the core configuration and sharding to work effectively. The core
question is how this would play with zookeeper / cloud support.

I would think this should already be baked into SolrCloud but I could be
wrong. Any thoughts?

need to customize location of dataimport.properties
---

Key: SOLR-1970
URL: https://issues.apache.org/jira/browse/SOLR-1970
Project: Solr
Issue Type: Improvement
Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Chris Book

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: UpdateRequestProcessor to extract Solr XML from rich documents

2012-03-14 Thread Mark Miller


On Mar 14, 2012, at 12:07 PM, Emmanuel Espina wrote:

 XmlWritingUpdateProcessorFactory.java

+1 - looks like a useful update proc. I'd make a couple minor suggestions, like 
looking at the response of mkdirs and logging an error or warning if it doesn't 
exist and can't be made, and closing the file writer in a finally block.

I'd go straight to a JIRA though.

- Mark Miller
lucidimagination.com












-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229296#comment-13229296
 ] 

Michael McCandless commented on LUCENE-3867:


bq. I don't like this special handling of Strings, to be honest. 

I'm confused: what special handling of Strings are we talking about...?

You mean that sizeOf(String) doesn't return the correct answer if the string 
came from a previous .substring (.split too) call...?

If so, how can we actually fix that?  Is there some way to ask a string for the 
true length of its char[]?

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3241) Document boost fail if a field copy omit the norms

2012-03-14 Thread Hoss Man (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229298#comment-13229298
 ] 

Hoss Man commented on SOLR-3241:


patch looks fine ... i wish there was a way to make it easier for poly fields 
so they wouldn't have do do the check themselves, but when i tried the idea i 
had it didn't work, so better to go with this for now and maybe refactor a 
helper method later.

the few changes i would make:

1) make the new tests grab the IndexSchema obejct and assert that every field 
(that the cares about) has the expected omitNorms value -- future proof 
ourselves against someone nuetering the test w/o realizing by tweaking the test 
schema because they don't know that there is a specific reason for those 
omitNorm settings

2) add a test that explicitly verifies the failure case of someone setting 
field boost on a field with omitNorms==true, assert that we get the expected 
error mesg (doesn't look like this was added when LUCENE-3796 was commited, and 
we want to make sure we don't inadvertantly break that error check)



 Document boost fail if a field copy omit the norms
 --

 Key: SOLR-3241
 URL: https://issues.apache.org/jira/browse/SOLR-3241
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
 Fix For: 3.6, 4.0

 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, 
 SOLR-3241.patch


 After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible 
 to set a boost to a field that has the omitNorms set to true. This is 
 making Solr's document index-time boost to fail when a field that doesn't 
 omit norms is copied (with copyField) to a field that does omit them and 
 document boost is used. For example:
 field name=author type=text indexed=true stored=false 
 omitNorms=false/
 field name=author_display type=string indexed=true stored=true 
 omitNorms=true/
 copyField source=author dest=author_display/
 I'm attaching a possible fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229311#comment-13229311
 ] 

Dawid Weiss commented on LUCENE-3867:
-

{code}
+  /** Returns the size in bytes of the String[] object. */
+  public static int sizeOf(String[] arr) {
+int size = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF * 
arr.length);
+for (String s : arr) {
+  size += sizeOf(s);
+}
+return size;
+  }
+
+  /** Returns the approximate size of a String object. */
+  public static int sizeOf(String str) {
+// String's char[] size
+int arraySize = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_CHAR * 
str.length());
+
+// String's row object size
+int objectSize = alignObjectSize(NUM_BYTES_OBJECT_REF /* array reference */
++ 3 * NUM_BYTES_INT /* String holds 3 integers */
++ NUM_BYTES_OBJECT_HEADER /* String object header */);
+
+return objectSize + arraySize;
+  }
{code}

What I mean is that without looking at the code I would expect sizeOf(String[] 
N) to return the actual memory taken by an array of strings. If they point to a 
single char[], this should simple count the object overhead, not count every 
character N times as it would do now. This isn't sizeOf(), this is sum(string 
lengths * 2) + epsilon to me.

I'd keep RamUsageEstimator exactly what the name says -- an estimation of the 
actual memory taken by a given object. A string can point to a char[] and if so 
this should be traversed as an object and counted once.


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229313#comment-13229313
 ] 

Dawid Weiss commented on LUCENE-3867:
-

bq. If so, how can we actually fix that? Is there some way to ask a string for 
the true length of its char[]?

Same as with other objects -- traverse its fields and count them (once, 
building an identity set for all objects reachable from the root)?

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Getting bug fix from Lucene into Lucene.Net

2012-03-14 Thread Tom Cabanski

I am running into the following problem discussed and fixed in the current
Lucene Java version:

While sorting results over a numeric field, documents which do not contain
a value for the sorting field seem to get 0 (ZERO) value in the sort.
(Tested against Double, Float, Int  Long numeric fields ascending and
descending order).
This behavior is unexpected, as zero is comparable to the rest of the
values. A better solution would either be allowing the user to define such
a non-value default, or always bring those document results as the last
ones.

Is this something that can get into Lucene.Net soon?  See
https://issues.apache.org/jira/browse/LUCENE-3390 for complete information
on the Lucene issue and fix.

 Tom Cabanski
Software Developer  President http://southsidesoft.com
 5702 Kendall Hill Ln
Sugar Land, TX 77479
832-766-5961

*
*
Profiles: http://http://www.linkedin.com/in/tomcabanski
LinkedInhttp://http://www.linkedin.com/in/tomcabanski
Contact me: [image: Google Talk] t...@cabanski.com [image:
Skype]tom.cabanski [image:
MSN] t...@cabanski.com
My blog: Top Ten Reasons to Avoid Taking Advanced Distributed System Design
with Udi 
Dahanhttp://tom.cabanski.com/2012/02/09/top-ten-reasons-to-avoid-taking-advanced-distributed-system-design-with-udi-dahan/
 [image: Twitter] http://twitter.com/tcabanski Latest tweet: @DataArtist
: Ditto for Azure. It all came back. Funny thing is I think MS never even
bothered to fix Zune problem because it was firmware
Follow @tcabanski http://twitter.com/tcabanski Reply
http://twitter.com/?status=@tcabanski%20in_reply_to_status_id=179875723673223170in_reply_to=tcabanski
Retweet
http://twitter.com/?status=RT%20%40tcabanski%3A%20%40DataArtist%20%3A%20Ditto%20for%20Azure.%20%20It%20all%20came%20back.%20%20Funny%20thing%20is%20I%20think%20MS%20never%20even%20bothered%20to%20fix%20Zune%20problem%20because%20it%20was%20firmware
 05:24 Mar-14http://twitter.com/tcabanski/statuses/179875723673223168
  Get this email app!
http://www.wisestamp.com/apps/twitter?utm_source=extensionutm_medium=emailutm_term=twitterutm_campaign=apps

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229332#comment-13229332
 ] 

Shai Erera commented on LUCENE-3867:


bq. What I mean is that without looking at the code I would expect 
sizeOf(String[] N) to return the actual memory taken by an array of strings.

So you mean you'd want sizeOf(String[]) be just that?

{code}
return alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF * 
arr.length);
{code}

I don't mind. I just thought that since we know how to compute sizeOf(String), 
we can use that. It's an extreme case, I think, that someone will want to 
compute the size of String[] which share same char[] instance ... but I don't 
mind if it bothers you that much, to simplify it and document that it computes 
the raw size of the String[].

But I don't think that we should change sizeOf(String) to not count the char[] 
size. It's part of the object, and really it's String, not like we're trying to 
compute the size of a general object.

bq. Same as with other objects – traverse its fields and count them

RUE already has .estimateRamUsage(Object) which does that through reflection. I 
think that sizeOf(String) can remain fast as it is now, with the comment that 
it my over-estimate if the String is actually a sub-string of one original 
larger string. In the worse case, we'll just be over-estimating.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Uwe Schindler (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3867:
--

Attachment: LUCENE-3867.patch

Hi Shai,

can ypou try this patch with J9 or maybe JRockit (Robert)? If yozu use one of 
those JVMs you may have to explicitely enable  compressed Oops/refs!

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3241) Document boost fail if a field copy omit the norms


 [ 
https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-3241:
--

Attachment: SOLR-3241.patch

updated patch with hossman's suggested test improvements. I'll commit soon.

 Document boost fail if a field copy omit the norms
 --

 Key: SOLR-3241
 URL: https://issues.apache.org/jira/browse/SOLR-3241
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
 Fix For: 3.6, 4.0

 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, 
 SOLR-3241.patch, SOLR-3241.patch


 After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible 
 to set a boost to a field that has the omitNorms set to true. This is 
 making Solr's document index-time boost to fail when a field that doesn't 
 omit norms is copied (with copyField) to a field that does omit them and 
 document boost is used. For example:
 field name=author type=text indexed=true stored=false 
 omitNorms=false/
 field name=author_display type=string indexed=true stored=true 
 omitNorms=true/
 copyField source=author dest=author_display/
 I'm attaching a possible fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Harley Parks (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229339#comment-13229339
]

Dawid Weiss commented on LUCENE-3867:
-

bq. RUE already has .estimateRamUsage(Object) which does that through
reflection. I think that sizeOf(String) can remain fast as it is now, with the
comment that it my over-estimate if the String is actually a sub-string of one
original larger string. In the worse case, we'll just be over-estimating.

Yeah, that's exactly what I didn't like. All the primitive/ primitive array
methods are fine, but why make things inconsistent with sizeOf(String)? I'd
rather have the reflection-based method estimate the size of a String/String[].
Like we mentioned it's always a matter of speed/accuracy but here I'd opt for
accuracy because the output can be off by a lot if you make substrings along
the way (not to mention it assumes details about String internal implementation
which may or may not be true, depending on the vendor).

Do you have a need for this method, Shai? If you don't then why not wait (with
this part) until such a need arises?

RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
-

Key: LUCENE-3867
URL: https://issues.apache.org/jira/browse/LUCENE-3867
Project: Lucene - Java
Issue Type: Bug
Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
Fix For: 3.6, 4.0

Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch,
LUCENE-3867.patch

RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that:
NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The
NUM_BYTES_OBJECT_REF part should not be included, at least not according to
this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
{quote}
A single-dimension array is a single object. As expected, the array has the
usual object header. However, this object head is 12 bytes to accommodate a
four-byte array length. Then comes the actual array data which, as you might
expect, consists of the number of elements multiplied by the number of bytes
required for one element, depending on its type. The memory usage for one
element is 4 bytes for an object reference ...
{quote}
While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel
about including such helper methods in RUE, as static, stateless, methods?
It's not perfect, there's some room for improvement I'm sure, here it is:
{code}
/**
* Computes the approximate size of a String object. Note that if this
object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
public static int sizeOf(String str) {
return 2 * str.length() + 6 // chars + additional safeness for
arrays alignment
+ 3 * RamUsageEstimator.NUM_BYTES_INT // String
maintains 3 integers
+ RamUsageEstimator.NUM_BYTES_ARRAY_HEADER //
char[] array
+ RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; //
String object
}
{code}
If people are not against it, I'd like to also add sizeOf(int[] / byte[] /
long[] / double[] ... and String[]).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Getting bug fix from Lucene Java into Lucene.Net

2012-03-14 Thread Christopher Currens

That largely depends on how different the 3.4/3.5 java API is from the
current version in our trunk (3.0.3).  From a maintainability standpoint,
it would be easier for us to fix this bug when we've ported Lucene.NET to
the same version that java was at when it was fixed.  It looks like they
added a class and changed another's signature, and then changed all of the
comparator classes to inherit from the new one, instead the old.  I don't
know what that would do to our current version of code, or if that can be
worked in without major changes.

Worst case scenario, we can't get it in until we reach 3.4, but I can
certainly see what we can do get it in earlier.  In the meantime, it seems
that Hoss Man has noted a workaround here
https://issues.apache.org/jira/browse/LUCENE-3390?focusedCommentId=13088832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13088832
that can be used.


Thanks,
Christopher

On Wed, Mar 14, 2012 at 6:04 AM, Tom Cabanski t...@cabanski.com wrote:

 I am running into the following problem discussed and fixed in the current
 Lucene Java version:

 While sorting results over a numeric field, documents which do not contain
 a value for the sorting field seem to get 0 (ZERO) value in the sort.
 (Tested against Double, Float, Int  Long numeric fields ascending and
 descending order).
 This behavior is unexpected, as zero is comparable to the rest of the
 values. A better solution would either be allowing the user to define such
 a non-value default, or always bring those document results as the last
 ones.

 Is this something that can get into Lucene.Net soon?  See
 https://issues.apache.org/jira/browse/LUCENE-3390 for complete information
 on the Lucene issue and fix.


  Tom Cabanski
 Software Developer  President http://southsidesoft.com
  5702 Kendall Hill Ln
 Sugar Land, TX 77479
 832-766-5961

 *
 *
 Profiles: http://http://www.linkedin.com/in/tomcabanski
 LinkedInhttp://http://www.linkedin.com/in/tomcabanski
 Contact me: [image: Google Talk] t...@cabanski.com [image:
 Skype]tom.cabanski [image:
 MSN] t...@cabanski.com
 My blog: Top Ten Reasons to Avoid Taking Advanced Distributed System Design
 with Udi Dahan
 http://tom.cabanski.com/2012/02/09/top-ten-reasons-to-avoid-taking-advanced-distributed-system-design-with-udi-dahan/
 
  [image: Twitter] http://twitter.com/tcabanski Latest tweet: @DataArtist
 : Ditto for Azure. It all came back. Funny thing is I think MS never even
 bothered to fix Zune problem because it was firmware
 Follow @tcabanski http://twitter.com/tcabanski Reply
 
 http://twitter.com/?status=@tcabanski%20in_reply_to_status_id=179875723673223170in_reply_to=tcabanski
 
 Retweet
 
 http://twitter.com/?status=RT%20%40tcabanski%3A%20%40DataArtist%20%3A%20Ditto%20for%20Azure.%20%20It%20all%20came%20back.%20%20Funny%20thing%20is%20I%20think%20MS%20never%20even%20bothered%20to%20fix%20Zune%20problem%20because%20it%20was%20firmware
 
  05:24 Mar-14http://twitter.com/tcabanski/statuses/179875723673223168
  Get this email app!
 
 http://www.wisestamp.com/apps/twitter?utm_source=extensionutm_medium=emailutm_term=twitterutm_campaign=apps

[jira] [Issue Comment Edited] (SOLR-2155) Geospatial search using geohash prefixes

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228938#comment-13228938
]

Harley Parks edited comment on SOLR-2155 at 3/14/12 5:00 PM:
-

Geospatial search using geohash prefixes

Key: SOLR-2155
URL: https://issues.apache.org/jira/browse/SOLR-2155
Project: Solr
Issue Type: Improvement
Reporter: David Smiley
Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch,
GeoHashPrefixFilter.patch,
SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch,
SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip,
Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch

There currently isn't a solution in Solr for doing geospatial filtering on
documents that have a variable number of points. This scenario occurs when
there is location extraction (i.e. via a gazateer) occurring on free text.
None, one, or many geospatial locations might be extracted from any given
document and users want to limit their search results to those occurring in a
user-specified area.
I've implemented this by furthering the GeoHash based work in Lucene/Solr
with a geohash prefix based filter. A geohash refers to a lat-lon box on the
earth. Each successive character added further subdivides the box into a 4x8
(or 8x4 depending on the even/odd length of the geohash) grid. The first
step in this scheme is figuring out which geohash grid squares cover the
user's search query. I've added various extra methods to GeoHashUtils (and
added tests) to assist in this purpose. The next step is an actual Lucene
Filter, GeoHashPrefixFilter, that uses these geohash prefixes in
TermsEnum.seek() to skip to relevant grid squares in the index. Once a
matching geohash grid is found, the points therein are compared against the
user's query to see if it matches. I created an abstraction GeoShape
extended by subclasses named PointDistance... and CartesianBox to support
different queried shapes so that the filter need not care about these details.
This work was presented at LuceneRevolution in Boston on October 8th.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229344#comment-13229344
 ] 

Shai Erera commented on LUCENE-3867:


bq. Do you have a need for this method, Shai?

I actually started this issue because of this method :). I wrote the method for 
my own code, then spotted the bug in the ARRAY_HEADER, and on the go thought 
that it will be good if RUE would offer it for me / other people can benefit 
from it. Because from my experience, after I put code in Lucene, very smart 
people improve and optimize it, and I benefit from it in new releases.

So while I could keep sizeOf(String) in my own code, I know that 
Uwe/Robert/Mike/You will make it more efficient when Java 7/8/9 will be out, 
while I'll totally forget about it ! :).

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3246) UpdateRequestProcessor to extract Solr XML from rich documents

2012-03-14 Thread Emmanuel Espina (Created) (JIRA)

UpdateRequestProcessor to extract Solr XML from rich documents
--

 Key: SOLR-3246
 URL: https://issues.apache.org/jira/browse/SOLR-3246
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Emmanuel Espina
Priority: Minor


This would be an update request handler to save a file with the xml that 
represents the document in an external directory. The original
idea behind this was to add it to the processing chain of the 
ExtractingRequestHandler to store an already parsed version of the docs. This 
storage of pre-parsed documents will make the re indexing of the entire index 
faster (avoiding the Tika phase, and just sending the xml to the standard 
update processor).
As a side effect, extracting the xml can make debugging of rich docs easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Harley Parks (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229363#comment-13229363
 ] 

Dawid Weiss commented on LUCENE-3867:
-

Yeah... well... I'm flattered :) I'm still -1 for adding this particular method 
because I don't like being surprised at how a method works and this is 
surprising behavior to me, especially in this class (even if it's documented in 
the javadoc, but who reads it anyway, right?).

If others don't share my opinion then can we at least rename this method to 
sizeOfBlah(..) where Blah is something that would indicate it's not actually 
taking into account char buffer sharing or sub-slicing (suggestions for Blah 
welcome)?

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2155) Geospatial search using geohash prefixes

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228938#comment-13228938
]

Harley Parks edited comment on SOLR-2155 at 3/14/12 5:14 PM:
-

For some reason
package solr2155.lucene.spatial.geometry.shape;
is miss named
and some other issues with the build.. but I'm trying to use eclipse with a
maven build.
and might be missing something else...
so, downloaded Maven, and jdk 6, setup JAVA_HOME path, added Maven bin to PATH,
unzip and cd to Solr2155-1.0.3-project, in cmd window, executed mvn package,
and it built nicely...
then added Solr2155-1.0.3.jar to the tomcat/solr/lib,
followed the readme.txt file instructions to update the solr schema

was (Author: powersparks):
For some reason
package solr2155.lucene.spatial.geometry.shape;
is miss named
and some other issues with the build.. but I'm trying to use eclipse with a
maven build.
and might be missing something else...
so, downloaded Maven, and jdk 6, setup JAVA_HOME path, added Maven bin to PATH,
unzip and cd to Solr2155-1.0.3-project, in cmd window, executed mvn package,
and it built nicely...
then added Solr2155-1.0.3.jar to the tomcat/solr/lib

Geospatial search using geohash prefixes

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-14 Thread Harley Parks (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228938#comment-13228938
]

Harley Parks edited comment on SOLR-2155 at 3/14/12 5:16 PM:
-

Geospatial search using geohash prefixes

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Mark Miller (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229400#comment-13229400
 ] 

Mark Miller commented on LUCENE-3867:
-

estimateSizeOf(..)
guessSizeOf(..)
wildGuessSizeOf(..)
incorrectSizeOf(..)
sizeOfWeiss(..)
weissSize(..)
sizeOfButWithoutTakingIntoAccountCharBufferSharingOrSubSlicingSeeJavaDoc(..)

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229410#comment-13229410
 ] 

Michael McCandless commented on LUCENE-3867:


{quote}
bq. If so, how can we actually fix that? Is there some way to ask a string for 
the true length of its char[]?

Same as with other objects – traverse its fields and count them (once, building 
an identity set for all objects reachable from the root)?
{quote}

Aha, cool!  I hadn't realized RUE can crawl into the private char[] inside 
string and count up the RAM usage correctly.  That's nice.

Maybe lowerBoundSizeOf(...)?

Or maybe we don't add the new string methods (sizeOf(String), sizeOf(String[])) 
and somewhere document that you should do new RUE().size(String/String[]) 
instead...?  Hmm or maybe we do add the methods, but implement them 
under-the-hood w/ that?

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-14 Thread Harley Parks (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228938#comment-13228938
]

Harley Parks edited comment on SOLR-2155 at 3/14/12 5:36 PM:
-

Geospatial search using geohash prefixes

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-14 Thread Harley Parks (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229423#comment-13229423
]

Harley Parks commented on SOLR-2155:

did i need to rebuild the index after making changes to the schema?
What does a valid url query for geohash look like?

Geospatial search using geohash prefixes

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229440#comment-13229440
 ] 

Dawid Weiss commented on LUCENE-3867:
-

bq. sizeOfWeiss(..)

We're talking some serious dimensions here, beware of buffer overflows!

bq. Or maybe we don't add the new string methods (sizeOf(String), 
sizeOf(String[])) and somewhere document that you should do new 
RUE().size(String/String[]) instead..

This is something I would go for -- it's consistent with what I would consider 
this class's logic. I would even change it to sizeOf(Object) -- this would be a 
static shortcut to just measure an object's size, no strings attached?

Kabutz's code also distinguishes interned strings/ cached boxed integers and 
enums. This could be a switch much like it is now with interned Strings. Then 
this would really be either an upper (why lower, Mike?) bound or something that 
would try to be close to the exact memory consumption.

A fun way to determine if we're right would be to run a benchmark with -Xmx20mb 
and test how close we can get to the main memory pool's maximum value before 
OOM is thrown. :)


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3241) Document boost fail if a field copy omit the norms

2012-03-14 Thread Robert Muir (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-3241.
---

Resolution: Fixed
  Assignee: Robert Muir

Thanks Tomás!

 Document boost fail if a field copy omit the norms
 --

 Key: SOLR-3241
 URL: https://issues.apache.org/jira/browse/SOLR-3241
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
Assignee: Robert Muir
 Fix For: 3.6, 4.0

 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, 
 SOLR-3241.patch, SOLR-3241.patch


 After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible 
 to set a boost to a field that has the omitNorms set to true. This is 
 making Solr's document index-time boost to fail when a field that doesn't 
 omit norms is copied (with copyField) to a field that does omit them and 
 document boost is used. For example:
 field name=author type=text indexed=true stored=false 
 omitNorms=false/
 field name=author_display type=string indexed=true stored=true 
 omitNorms=true/
 copyField source=author dest=author_display/
 I'm attaching a possible fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect

2012-03-14 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229456#comment-13229456
 ] 

Michael McCandless commented on LUCENE-3867:


bq. (why lower, Mike?)

Oh I just meant the sizeOf(String) impl in the current patch is a lower bound 
(since it guesses the private char[] length by calling String.length(), which 
is a lower bound on the actual char[] length).


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229460#comment-13229460
 ] 

Dawid Weiss commented on LUCENE-3867:
-

John Rose just replied to my question -- there are fields in Unsafe that allow 
array scaling (1.7). Check these out:

{noformat}
ARRAY_BOOLEAN_INDEX_SCALE = theUnsafe.arrayIndexScale([Z);
ARRAY_BYTE_INDEX_SCALE = theUnsafe.arrayIndexScale([B);
ARRAY_SHORT_INDEX_SCALE = theUnsafe.arrayIndexScale([S);
ARRAY_CHAR_INDEX_SCALE = theUnsafe.arrayIndexScale([C);
ARRAY_INT_INDEX_SCALE = theUnsafe.arrayIndexScale([I);
ARRAY_LONG_INDEX_SCALE = theUnsafe.arrayIndexScale([J);
ARRAY_FLOAT_INDEX_SCALE = theUnsafe.arrayIndexScale([F);
ARRAY_DOUBLE_INDEX_SCALE = theUnsafe.arrayIndexScale([D);
ARRAY_OBJECT_INDEX_SCALE = 
theUnsafe.arrayIndexScale([Ljava/lang/Object;);
ADDRESS_SIZE = theUnsafe.addressSize();
{noformat}

So... there is a (theoretical?) possibility that, say, byte[] is machine 
word-aligned :) I bet any RAM estimator written so far will be screwed if this 
happens :)

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3246) UpdateRequestProcessor to extract Solr XML from rich documents

2012-03-14 Thread Emmanuel Espina (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229468#comment-13229468
 ] 

Emmanuel Espina commented on SOLR-3246:
---

This is similar to https://issues.apache.org/jira/browse/SOLR-903
But this would be a server side component.

 UpdateRequestProcessor to extract Solr XML from rich documents
 --

 Key: SOLR-3246
 URL: https://issues.apache.org/jira/browse/SOLR-3246
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Emmanuel Espina
Priority: Minor

 This would be an update request handler to save a file with the xml that 
 represents the document in an external directory. The original
 idea behind this was to add it to the processing chain of the 
 ExtractingRequestHandler to store an already parsed version of the docs. This 
 storage of pre-parsed documents will make the re indexing of the entire index 
 faster (avoiding the Tika phase, and just sending the xml to the standard 
 update processor).
 As a side effect, extracting the xml can make debugging of rich docs easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3244) New Admin UI doesn't work on tomcat

2012-03-14 Thread Aliaksandr Zhuhrou (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229477#comment-13229477
 ] 

Aliaksandr Zhuhrou commented on SOLR-3244:
--

I checked your patch and it works on Tomcat.

 New Admin UI doesn't work on tomcat
 ---

 Key: SOLR-3244
 URL: https://issues.apache.org/jira/browse/SOLR-3244
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
Reporter: Aliaksandr Zhuhrou
Assignee: Uwe Schindler
 Attachments: SOLR-3244.patch


 I am currently unable to open admin interface when using war deployment under 
 tomcat server.
 The stack trace:
 SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path 
 [/solr] threw exception
 java.lang.NullPointerException
   at java.io.File.init(File.java:251)
   at 
 org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
   at 
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
   at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
   at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
   at 
 org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 Tomcat version: Apache Tomcat/7.0.23
 Java version: jdk1.7.0_02
 I did some debugging and found that the problem related that 
 it delegates the resolving of resource path to the 
 org.apache.naming.resources.WARDirContext which simply returns null for any 
 input parameters:
 /**
   * Return the real path for a given virtual path, if possible; otherwise
   * return codenull/code.
   *
   * @param path The path to the desired resource
   */
 @Override
 protected String doGetRealPath(String path) {
   return null;
 }
 Need to check specification, because it may be actually the tomcat bug. We 
 may try use the getResourceAsStream(java.lang.String path) method which 
 should work even for war.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3246) UpdateRequestProcessor to extract Solr XML from rich documents

2012-03-14 Thread Emmanuel Espina (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Espina updated SOLR-3246:
--

Attachment: SOLR-3246.patch

Initial code for this component (with a very simple test)

 UpdateRequestProcessor to extract Solr XML from rich documents
 --

 Key: SOLR-3246
 URL: https://issues.apache.org/jira/browse/SOLR-3246
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Emmanuel Espina
Priority: Minor
 Attachments: SOLR-3246.patch


 This would be an update request handler to save a file with the xml that 
 represents the document in an external directory. The original
 idea behind this was to add it to the processing chain of the 
 ExtractingRequestHandler to store an already parsed version of the docs. This 
 storage of pre-parsed documents will make the re indexing of the entire index 
 faster (avoiding the Tika phase, and just sending the xml to the standard 
 update processor).
 As a side effect, extracting the xml can make debugging of rich docs easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3247) LBHttpSolrServer constructor ignores passed in ResponseParser

2012-03-14 Thread Grant Ingersoll (Created) (JIRA)

LBHttpSolrServer constructor ignores passed in ResponseParser
-

 Key: SOLR-3247
 URL: https://issues.apache.org/jira/browse/SOLR-3247
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 4.0


The constructor on line 191 accepts a ResponseParser object, but it ignores it. 
 We should either drop that constructor or honor setting it.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3248) CloudSolrServer should add methods to make it easier to set the collection on a per request basis

2012-03-14 Thread Grant Ingersoll (Created) (JIRA)

CloudSolrServer should add methods to make it easier to set the collection on a 
per request basis
-

 Key: SOLR-3248
 URL: https://issues.apache.org/jira/browse/SOLR-3248
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 4.0


It would be good if CloudSolrServer would add methods that make it easier for 
specifying the collection, such as when adding documents.  Right now, one has 
to use the UpdateRequest approach, which is more cumbersome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3244) New Admin UI doesn't work on tomcat


[ 
https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229575#comment-13229575
 ] 

Uwe Schindler commented on SOLR-3244:
-

Fine, I will commit this now!

 New Admin UI doesn't work on tomcat
 ---

 Key: SOLR-3244
 URL: https://issues.apache.org/jira/browse/SOLR-3244
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
Reporter: Aliaksandr Zhuhrou
Assignee: Uwe Schindler
 Attachments: SOLR-3244.patch


 I am currently unable to open admin interface when using war deployment under 
 tomcat server.
 The stack trace:
 SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path 
 [/solr] threw exception
 java.lang.NullPointerException
   at java.io.File.init(File.java:251)
   at 
 org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
   at 
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
   at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
   at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
   at 
 org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 Tomcat version: Apache Tomcat/7.0.23
 Java version: jdk1.7.0_02
 I did some debugging and found that the problem related that 
 it delegates the resolving of resource path to the 
 org.apache.naming.resources.WARDirContext which simply returns null for any 
 input parameters:
 /**
   * Return the real path for a given virtual path, if possible; otherwise
   * return codenull/code.
   *
   * @param path The path to the desired resource
   */
 @Override
 protected String doGetRealPath(String path) {
   return null;
 }
 Need to check specification, because it may be actually the tomcat bug. We 
 may try use the getResourceAsStream(java.lang.String path) method which 
 should work even for war.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect


[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229579#comment-13229579
 ] 

Uwe Schindler commented on LUCENE-3867:
---

So the whole Oops MBean magic is obsolete... ADDRESS_SIZE = 
theUnsafe.addressSize(); woooah, so simple - works on more platforms for 
guessing!

I will check this out with the usual reflection magic :-)

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
 -

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3842) Analyzing Suggester


 [ 
https://issues.apache.org/jira/browse/LUCENE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3842:


Attachment: LUCENE-3842.patch

updated patch, tying in Mike's patch too.

Currently my silly test fails because it trips mike's assert. it starts with a 
stopword :)

 Analyzing Suggester
 ---

 Key: LUCENE-3842
 URL: https://issues.apache.org/jira/browse/LUCENE-3842
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/spellchecker
Affects Versions: 3.6, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3842-TokenStream_to_Automaton.patch, 
 LUCENE-3842.patch, LUCENE-3842.patch


 Since we added shortest-path wFSA search in LUCENE-3714, and generified the 
 comparator in LUCENE-3801,
 I think we should look at implementing suggesters that have more capabilities 
 than just basic prefix matching.
 In particular I think the most flexible approach is to integrate with 
 Analyzer at both build and query time,
 such that we build a wFST with:
 input: analyzed text such as ghost0christmas0past -- byte 0 here is an 
 optional token separator
 output: surface form such as the ghost of christmas past
 weight: the weight of the suggestion
 we make an FST with PairOutputsweight,output, but only do the shortest path 
 operation on the weight side (like
 the test in LUCENE-3801), at the same time accumulating the output (surface 
 form), which will be the actual suggestion.
 This allows a lot of flexibility:
 * Using even standardanalyzer means you can offer suggestions that ignore 
 stopwords, e.g. if you type in ghost of chr...,
   it will suggest the ghost of christmas past
 * we can add support for synonyms/wdf/etc at both index and query time (there 
 are tradeoffs here, and this is not implemented!)
 * this is a basis for more complicated suggesters such as Japanese 
 suggesters, where the analyzed form is in fact the reading,
   so we would add a TokenFilter that copies ReadingAttribute into term text 
 to support that...
 * other general things like offering suggestions that are more fuzzy like 
 using a plural stemmer or ignoring accents or whatever.
 According to my benchmarks, suggestions are still very fast with the 
 prototype (e.g. ~ 100,000 QPS), and the FST size does not
 explode (its short of twice that of a regular wFST, but this is still far 
 smaller than TST or JaSpell, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3842) Analyzing Suggester