[jira] [Created] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-3242) QueryElevateComponent should support blacklist and de-elevate
[ https://issues.apache.org/jira/browse/SOLR-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog closed SOLR-3242. --- Resolution: Invalid 'blacklist' is already a feature with the 'exclude' syntax. Pushing a result to the bottom is I suppose interesting, but I will not ask for it. QueryElevateComponent should support blacklist and de-elevate - Key: SOLR-3242 URL: https://issues.apache.org/jira/browse/SOLR-3242 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Lance Norskog Priority: Minor The QueryElevateComponent should allow you ban some results, and push down other results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter
[ https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229046#comment-13229046 ] Bill Bell commented on SOLR-3230: - Yonik... I am not that familiar with this code. I do notice 2 methods in LatLonType.java. Is this the right place? public Query getFieldQuery(QParser parser, SchemaField field, String externalVal) { public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2, boolean minInclusive, boolean maxInclusive) { I did not see how these 2 functions are called. In class SpatialDistanceQuery I did not see where you said we are using range or fc... ? Maybe example code ? Performance improvement for geofilt by doing a bbox approximation and then Filter - Key: SOLR-3230 URL: https://issues.apache.org/jira/browse/SOLR-3230 Project: Solr Issue Type: Improvement Reporter: Bill Bell Assignee: Grant Ingersoll Fix For: 4.0 Attachments: SOLR-3230.patch This changes {!geofilt} to use a bounding box and then does a accurate filter. See attachment -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
storing lucene index in hbase (Hbase Directory implementation) as GSoC
Hi, I know there are a lot of attempts to make lucene searches distributed but I haven't seen one that tries to implement a lucene Directory in HBase/ Hadoop, except one discussion in this article[1]. I've worked with HBase and I believe this is a good approach to combine the two. The thing with this concept is that you could very easily build a distributed search by running multiple search slaves that could each search a part of the index and then aggregate the results. If you dig deep enough you could make those searches take advantage of data locality (run searches on the node/region server that has your index data) and then you really are in business. Also, a HBase/Hadoop solution is also possible: store some data in HBase and bigger parts directly in Hadoop inside a file structure to overcome HDFS small file issues. This could allow HBase queries to perform better but will complicate the design a bit. I'm interested in hearing your opinions on this and I also wish to propose this as GSoC idea that I'm interested in implementing. [1] http://www.infoq.com/articles/LuceneHbase -- Ioan Eugen Stan http://ieugen.blogspot.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229061#comment-13229061 ] Dawid Weiss commented on LUCENE-3867: - One can provide exact object allocation size (including alignments) by running with an agent (acquired from Instrumentation). This is shown here, for example: http://www.javaspecialists.eu/archive/Issue142.html I don't think it makes sense to be perfect here because there is a tradeoff between being accurate and being fast. One thing to possibly improve would be to handle reference size (4 vs. 8 bytes; in particular with compact references while running under 64 bit jvms). RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229062#comment-13229062 ] Dawid Weiss commented on LUCENE-3867: - Oh, one thing that I had in the back of my mind was to run a side-by-side comparison of Lucene's memory estimator and exact memory occupation via agent and see what the real difference is (on various vms and with compact vs. non-compact refs). This would be a 2 hour effort I guess, fun, but I don't have the time for it. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 12736 - Failure
Now that I think of it the changes to LuceneTestCase may be blamed for some of these errors because the uncaught exceptions rule is above the routine where lingering threads are interrupted. It was the opposite order before (understandably). The good news is that I don't see any recent errors on the LUCENE-3808 branch (merged with current trunk), where threads are handled internally by the runner. I'll see what I can do about the issue above in trunk. Dawid On Wed, Mar 14, 2012 at 5:50 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12736/ 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.TestGroupingSearch Error Message: Uncaught exception by thread: Thread[TimeLimitedCollector timer thread,5,] Stack Trace: org.apache.lucene.util.UncaughtExceptionsRule$UncaughtExceptionsInBackgroundThread: Uncaught exception by thread: Thread[TimeLimitedCollector timer thread,5,] at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:60) at org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:21) at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) Caused by: org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException: sleep interrupted at org.apache.lucene.search.TimeLimitingCollector$TimerThread.run(TimeLimitingCollector.java:268) Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.lucene.search.TimeLimitingCollector$TimerThread.run(TimeLimitingCollector.java:266) Build Log (for compile errors): [...truncated 9456 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org attachment: LUCENE-3808.png - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3868) Thread interruptions shouldn't cause unhandled thread errors (or should they?).
Thread interruptions shouldn't cause unhandled thread errors (or should they?). --- Key: LUCENE-3868 URL: https://issues.apache.org/jira/browse/LUCENE-3868 Project: Lucene - Java Issue Type: Bug Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 3.6, 4.0 This is a result of pulling uncaught exception catching to a rule above interrupt in internalTearDown(); check how it was before and restore previous behavior? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229066#comment-13229066 ] Uwe Schindler commented on LUCENE-3867: --- I was talking with Shai already about the OBJECT_REF size of 8, in RamUsageEstimator it is: {code:java} public final static int NUM_BYTES_OBJECT_REF = Constants.JRE_IS_64BIT ? 8 : 4; {code} ...which does not take the CompressedOops into account. Can we detect those oops, so we can change the above ternary to return 4 on newer JVMs with compressed oops enabled? RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229067#comment-13229067 ] Dawid Weiss commented on LUCENE-3867: - If you're running with an agent then it will tell you many bytes a reference is, so this would fix the issue. I don't think you can test this from within Java VM itself, but this is an interesting question. What you could do is spawn a child VM process with identical arguments (and an agent) and check it there, but this is quite awful... I'll ask on hotspot mailing list, maybe they know how to do this. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3243) eDismax and non-fielded range query
eDismax and non-fielded range query --- Key: SOLR-3243 URL: https://issues.apache.org/jira/browse/SOLR-3243 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.5, 3.4, 3.3, 3.2, 3.1 Reporter: Jan Høydahl Priority: Critical Fix For: 3.6, 4.0 Reported by Bill Bell in SOLR-3085: If you enter a non-fielded open-ended range in the search box, like [* TO *], eDismax will expand it to all fields: {noformat} +DisjunctionMaxQuery((content:[* TO *]^2.0 | id:[* TO *]^50.0 | author:[* TO *]^15.0 | meta:[* TO *]^10.0 | name:[* TO *]^20.0)) {noformat} This does not make sense, and a side effect is that range queries for strings are very expensive, open-ended even more, and you can totally crash the search server by hammering something like ([* TO *] OR [* TO *] OR [* TO *]) a few times... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3085) Fix the dismax/edismax stopwords mm issue
[ https://issues.apache.org/jira/browse/SOLR-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229083#comment-13229083 ] Jan Høydahl commented on SOLR-3085: --- @Bill, since this is a bit off topic, I moved your loophole to SOLR-3243. It is certainly something that is dangerous and I cannot see a single usecase for allowing an un-fielded range! Good catch. Fix the dismax/edismax stopwords mm issue - Key: SOLR-3085 URL: https://issues.apache.org/jira/browse/SOLR-3085 Project: Solr Issue Type: Bug Components: search Reporter: Jan Høydahl Labels: MinimumShouldMatch, dismax, stopwords Fix For: 3.6, 4.0 As discussed here http://search-lucene.com/m/Wr7iz1a95jx and here http://search-lucene.com/m/Yne042qEyCq1 and here http://search-lucene.com/m/RfAp82nSsla DisMax has an issue with stopwords if not all fields used in QF have exactly same stopword lists. Typical solution is to not use stopwords or harmonize stopword lists across all fields in your QF, or relax the MM to a lower percentag. Sometimes these are not acceptable workarounds, and we should find a better solution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
JIRA components update
Hi, I've not found a suitable JIRA component to put DisMax issues in, so I added a new component called query parsers. There may be other components missing as well, such as SolrCloud? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3086) eDismax: Allow controlling what query features to support
[ https://issues.apache.org/jira/browse/SOLR-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3086: -- Component/s: query parsers eDismax: Allow controlling what query features to support - Key: SOLR-3086 URL: https://issues.apache.org/jira/browse/SOLR-3086 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Jan Høydahl Fix For: 4.0 As per request from Hoss in SOLR-2368, this issue will add configuration parameters to eDisMax to give user control over what query syntax will be allowed and disallowed. This will allow us to effectively lobotomize eDisMax to behave the same way as the old DisMax and accept all kinds of weird input and correctly escape it to match literally, even if it's valid syntax for a query feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-trunk - Build # 1793 - Still Failing
Build: https://builds.apache.org/job/Solr-trunk/1793/ 1 tests failed. FAILED: org.apache.solr.TestDistributedSearch.testDistribSearch Error Message: Uncaught exception by thread: Thread[Thread-661,5,] Stack Trace: org.apache.lucene.util.UncaughtExceptionsRule$UncaughtExceptionsInBackgroundThread: Uncaught exception by thread: Thread[Thread-661,5,] at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:60) at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:618) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:20) at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:51) at org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:21) at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) Caused by: java.lang.RuntimeException: org.apache.solr.client.solrj.SolrServerException: http://localhost:40647/solr at org.apache.solr.TestDistributedSearch$1.run(TestDistributedSearch.java:374) Caused by: org.apache.solr.client.solrj.SolrServerException: http://localhost:40647/solr at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:496) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:312) at org.apache.solr.TestDistributedSearch$1.run(TestDistributedSearch.java:369) Caused by: org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 100 ms at org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:155) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:426) Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140) Build Log (for compile errors): [...truncated 9671 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3162) Continue work on new admin UI
[ https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229094#comment-13229094 ] Aliaksandr Zhuhrou commented on SOLR-3162: -- Sure, I can do this. Continue work on new admin UI - Key: SOLR-3162 URL: https://issues.apache.org/jira/browse/SOLR-3162 Project: Solr Issue Type: Improvement Components: Schema and Analysis, web gui Affects Versions: 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 4.0 Attachments: SOLR-3162-index.png, SOLR-3162-schema-browser.png, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch There have been more improvements to how the new UI works, but the current open bugs are getting hard to keep straight. This is the new catch-all JIRA for continued improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229093#comment-13229093 ] Shai Erera commented on LUCENE-3867: bq. I don't think it makes sense to be perfect here because there is a tradeoff between being accurate and being fast. I agree. We should be fast, and as accurate as we can get while preserving speed. I will fix the constant's value as it's wrong. The helper methods are just that - helper. Someone can use other techniques to compute the size of objects. Will post a patch shortly. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3244) New Admin UI doesn't work on tomcat
New Admin UI doesn't work on tomcat --- Key: SOLR-3244 URL: https://issues.apache.org/jira/browse/SOLR-3244 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Reporter: Aliaksandr Zhuhrou I am currently unable to open admin interface when using war deployment under tomcat server. The stack trace: SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path [/solr] threw exception java.lang.NullPointerException at java.io.File.init(File.java:251) at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Tomcat version: Apache Tomcat/7.0.23 Java version: jdk1.7.0_02 I did some debugging and found that the problem related that it delegates the resolving of resource path to the org.apache.naming.resources.WARDirContext which simply returns null for any input parameters: /** * Return the real path for a given virtual path, if possible; otherwise * return codenull/code. * * @param path The path to the desired resource */ @Override protected String doGetRealPath(String path) { return null; } Need to check specification, because it may be actually the tomcat bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3244) New Admin UI doesn't work on tomcat
[ https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aliaksandr Zhuhrou updated SOLR-3244: - Description: I am currently unable to open admin interface when using war deployment under tomcat server. The stack trace: SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path [/solr] threw exception java.lang.NullPointerException at java.io.File.init(File.java:251) at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Tomcat version: Apache Tomcat/7.0.23 Java version: jdk1.7.0_02 I did some debugging and found that the problem related that it delegates the resolving of resource path to the org.apache.naming.resources.WARDirContext which simply returns null for any input parameters: /** * Return the real path for a given virtual path, if possible; otherwise * return codenull/code. * * @param path The path to the desired resource */ @Override protected String doGetRealPath(String path) { return null; } Need to check specification, because it may be actually the tomcat bug. We may try use the getResourceAsStream(java.lang.String path) method which should work even for war. was: I am currently unable to open admin interface when using war deployment under tomcat server. The stack trace: SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path [/solr] threw exception java.lang.NullPointerException at java.io.File.init(File.java:251) at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at
[jira] [Issue Comment Edited] (SOLR-3244) New Admin UI doesn't work on tomcat
[ https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229109#comment-13229109 ] Uwe Schindler edited comment on SOLR-3244 at 3/14/12 10:01 AM: --- It seems that the file is missing in the WAR file? {code:java} File f = new File(getServletContext().getRealPath(admin.html)); if(f.exists()) { // This attribute is set by the SolrDispatchFilter CoreContainer cores = (CoreContainer) request.getAttribute(org.apache.solr.CoreContainer); String html = IOUtils.toString(new FileInputStream(f), UTF-8); {code} In general I am a little bit sceptical about the whole code. In my opinion using File and getRealPath is not the best idea. The simpliest and filesystem-unspecific to get the file is (there may be a servlet container that does not extract WAR files at all and simply returns the resource from *inside* the war file): {code:java} InputStream in = getServletContext().getResourceAsStream(/admin.html); if(in != null) try { // This attribute is set by the SolrDispatchFilter CoreContainer cores = (CoreContainer) request.getAttribute(org.apache.solr.CoreContainer); String html = IOUtils.toString(in, UTF-8); ... } finally { IOUtils.closeSafely(in); } {code} Please note the / in the path, accoring to JavaDocs of getResource: The path must begin with a / and is interpreted as relative to the current context root, or relative to the /META-INF/resources directory of a JAR file inside the web application's /WEB-INF/lib directory. This method will first search the document root of the web application for the requested resource, before searching any of the JAR files inside /WEB-INF/lib. The order in which the JAR files inside /WEB-INF/lib are searched is undefined. This also applies to getRealPath, so I think Tomcat is more picky about that than jetty. was (Author: thetaphi): It seems that the file is missing in the WAR file? {code:java} File f = new File(getServletContext().getRealPath(admin.html)); if(f.exists()) { // This attribute is set by the SolrDispatchFilter CoreContainer cores = (CoreContainer) request.getAttribute(org.apache.solr.CoreContainer); String html = IOUtils.toString(new FileInputStream(f), UTF-8); {code} In general I am a little bit sceptical about the whole code. In my opinion using File and getRealPath is not the best idea. The simpliest and filesystem-unspecific to get the file is (there may be a servlet container that does not extract WAR files at all and simply returns the resource from *inside* the war file): {code:java} InputStream in = getServletContext().getResourceAsStream(admin.html); if(in != null) try { // This attribute is set by the SolrDispatchFilter CoreContainer cores = (CoreContainer) request.getAttribute(org.apache.solr.CoreContainer); String html = IOUtils.toString(in, UTF-8); ... } finally { IOUtils.closeSafely(in); } {code} New Admin UI doesn't work on tomcat --- Key: SOLR-3244 URL: https://issues.apache.org/jira/browse/SOLR-3244 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Reporter: Aliaksandr Zhuhrou I am currently unable to open admin interface when using war deployment under tomcat server. The stack trace: SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path [/solr] threw exception java.lang.NullPointerException at java.io.File.init(File.java:251) at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at
[jira] [Updated] (SOLR-3244) New Admin UI doesn't work on tomcat
[ https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-3244: Attachment: SOLR-3244.patch Hi, can you apply the attached patch and rebuild the WAR. This fixes this bug and also another security issue: - The inlined pathes are not correctly escaped according to JavaScript rules, this can lead to security problems if you deploy to a path with strange characters in it... New Admin UI doesn't work on tomcat --- Key: SOLR-3244 URL: https://issues.apache.org/jira/browse/SOLR-3244 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Reporter: Aliaksandr Zhuhrou Attachments: SOLR-3244.patch I am currently unable to open admin interface when using war deployment under tomcat server. The stack trace: SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path [/solr] threw exception java.lang.NullPointerException at java.io.File.init(File.java:251) at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Tomcat version: Apache Tomcat/7.0.23 Java version: jdk1.7.0_02 I did some debugging and found that the problem related that it delegates the resolving of resource path to the org.apache.naming.resources.WARDirContext which simply returns null for any input parameters: /** * Return the real path for a given virtual path, if possible; otherwise * return codenull/code. * * @param path The path to the desired resource */ @Override protected String doGetRealPath(String path) { return null; } Need to check specification, because it may be actually the tomcat bug. We may try use the getResourceAsStream(java.lang.String path) method which should work even for war. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3244) New Admin UI doesn't work on tomcat
[ https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned SOLR-3244: --- Assignee: Uwe Schindler New Admin UI doesn't work on tomcat --- Key: SOLR-3244 URL: https://issues.apache.org/jira/browse/SOLR-3244 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Reporter: Aliaksandr Zhuhrou Assignee: Uwe Schindler Attachments: SOLR-3244.patch I am currently unable to open admin interface when using war deployment under tomcat server. The stack trace: SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path [/solr] threw exception java.lang.NullPointerException at java.io.File.init(File.java:251) at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Tomcat version: Apache Tomcat/7.0.23 Java version: jdk1.7.0_02 I did some debugging and found that the problem related that it delegates the resolving of resource path to the org.apache.naming.resources.WARDirContext which simply returns null for any input parameters: /** * Return the real path for a given virtual path, if possible; otherwise * return codenull/code. * * @param path The path to the desired resource */ @Override protected String doGetRealPath(String path) { return null; } Need to check specification, because it may be actually the tomcat bug. We may try use the getResourceAsStream(java.lang.String path) method which should work even for war. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3244) New Admin UI doesn't work on tomcat
[ https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229115#comment-13229115 ] Uwe Schindler commented on SOLR-3244: - It would be nice if you could test this, I have no Tomcat available... New Admin UI doesn't work on tomcat --- Key: SOLR-3244 URL: https://issues.apache.org/jira/browse/SOLR-3244 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Reporter: Aliaksandr Zhuhrou Assignee: Uwe Schindler Attachments: SOLR-3244.patch I am currently unable to open admin interface when using war deployment under tomcat server. The stack trace: SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path [/solr] threw exception java.lang.NullPointerException at java.io.File.init(File.java:251) at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Tomcat version: Apache Tomcat/7.0.23 Java version: jdk1.7.0_02 I did some debugging and found that the problem related that it delegates the resolving of resource path to the org.apache.naming.resources.WARDirContext which simply returns null for any input parameters: /** * Return the real path for a given virtual path, if possible; otherwise * return codenull/code. * * @param path The path to the desired resource */ @Override protected String doGetRealPath(String path) { return null; } Need to check specification, because it may be actually the tomcat bug. We may try use the getResourceAsStream(java.lang.String path) method which should work even for war. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229125#comment-13229125 ] Michael McCandless commented on LUCENE-3867: Nice catch on the overcounting of array's RAM usage! And +1 for additional sizeOf(...) methods. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229127#comment-13229127 ] Uwe Schindler commented on LUCENE-3867: --- Hi Mike, Dawid and I were already contacting Hotspot list. There is an easy way to get the compressedOoops setting from inside the JVM using MXBeans from the ManagementFactory. I think we will provide a patch later! I think by that we could also optimize the check for 64 bit, because that one should also be reported by the MXBean without looking into strange sysprops (see the TODO in the code for JRE_IS_64BIT). Uwe RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229128#comment-13229128 ] Dawid Weiss commented on LUCENE-3867: - Sysprops should be a fallback though because (to be verified) they're supported by other vendors whereas the mx bean may not be. It needs to be verified by running under j9, jrockit, etc. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3241) Document boost fail if a field copy omit the norms
[ https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229134#comment-13229134 ] Tomás Fernández Löbbe commented on SOLR-3241: - I found the issue with copyfields as you mentioned Robert. foo is omitNorms=false, and bar is omitNorms=true. I have a copyfield foo-bar and I add a document like: document boost=X field name=fooAA/field /document This case is fixed by the patch. Testing it, i found a similar situation where a field1 is a poly type with omitNorms=false, and the subtype if it has omitNorms=true. In this case, it fails even without a copyfield just by adding a document like: document boost=X field name=polyAAA,BBB/field /document I don't know if it makes sense to have a poly field where the subtype have a different value in the omitNorms attribute, probably this should fail even before the document is added. Document boost fail if a field copy omit the norms -- Key: SOLR-3241 URL: https://issues.apache.org/jira/browse/SOLR-3241 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Fix For: 3.6, 4.0 Attachments: SOLR-3241.patch After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible to set a boost to a field that has the omitNorms set to true. This is making Solr's document index-time boost to fail when a field that doesn't omit norms is copied (with copyField) to a field that does omit them and document boost is used. For example: field name=author type=text indexed=true stored=false omitNorms=false/ field name=author_display type=string indexed=true stored=true omitNorms=true/ copyField source=author dest=author_display/ I'm attaching a possible fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary
Poor performance of Hunspell with Polish Dictionary --- Key: SOLR-3245 URL: https://issues.apache.org/jira/browse/SOLR-3245 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0 Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java settings -server -Xms4096M -Xmx4096M Reporter: Agnieszka In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance whereas performance of hunspell from http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. Tests shows: Solr 3.4, full import 489017 documents: StempelPolishStemFilterFactory - 2908 seconds, 168 docs/sec HunspellStemFilterFactory - 3922 seconds, 125 docs/sec Solr 4.0, full import 489017 documents: StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec My schema is quit easy. For Hunspell I have one text field I copy 14 text fields to: {code:xml} field name=text type=text_pl_hunspell indexed=true stored=false multiValued=true/ copyField source=field1 dest=text/ copyField source=field14 dest=text/ {code} The text_pl_hunspell configuration: {code:xml} fieldType name=text_pl_hunspell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true !--filter class=solr.KeywordMarkerFilterFactory protected=protwords_pl.txt/-- /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I used in 3.4 version. For Polish Stemmer the diffrence is only in definion text field: {code} field name=text type=text_pl indexed=true stored=false multiValued=true/ fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} One document has 23 fields: - 14 text fields copy to one text field (above) that is only indexed - 8 other indexed fields (2 strings, 2 tdates, 3 tint, 1 tfloat) The size of one document is 3-4 kB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary
[ https://issues.apache.org/jira/browse/SOLR-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Agnieszka updated SOLR-3245: Attachment: pl_PL.zip Polish dictionary for Hunspell Poor performance of Hunspell with Polish Dictionary --- Key: SOLR-3245 URL: https://issues.apache.org/jira/browse/SOLR-3245 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0 Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java settings -server -Xms4096M -Xmx4096M Reporter: Agnieszka Labels: performance Attachments: pl_PL.zip In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance whereas performance of hunspell from http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. Tests shows: Solr 3.4, full import 489017 documents: StempelPolishStemFilterFactory - 2908 seconds, 168 docs/sec HunspellStemFilterFactory - 3922 seconds, 125 docs/sec Solr 4.0, full import 489017 documents: StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec My schema is quit easy. For Hunspell I have one text field I copy 14 text fields to: {code:xml} field name=text type=text_pl_hunspell indexed=true stored=false multiValued=true/ copyField source=field1 dest=text/ copyField source=field14 dest=text/ {code} The text_pl_hunspell configuration: {code:xml} fieldType name=text_pl_hunspell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true !--filter class=solr.KeywordMarkerFilterFactory protected=protwords_pl.txt/-- /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I used in 3.4 version. For Polish Stemmer the diffrence is only in definion text field: {code} field name=text type=text_pl indexed=true stored=false multiValued=true/ fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} One document has 23 fields: - 14 text fields copy to one text field (above) that is only indexed - 8 other indexed fields (2 strings, 2 tdates, 3 tint, 1 tfloat) The size of one document is 3-4 kB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA,
[jira] [Commented] (SOLR-3161) Use of 'qt' should be restricted to searching and should not start with a '/'
[ https://issues.apache.org/jira/browse/SOLR-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229154#comment-13229154 ] Erik Hatcher commented on SOLR-3161: bq. As hoss points out, not all searching request handlers inherit from SearchHandler. Then use /-prefixed handlers for those rather than qt. Or, simply add whatever logic to the DispatchingRequestHandler that makes sense and let qt dispatching happen there, not from SDF. The DispatchingRequestHandler in my patch was merely an example; I really don't care what logic is in there to determine what can be dispatched to, as I'd never use it myself. bq. The ability to distinguish an update handler from a request handler doesn't sound complex Again, I'd say stuff whatever smarts desired down into a dispatching request handler rather than making Solr's top-level dispatching logic more complicated than need be. But, I will say that having a better separated class hierarchy for search vs. update handlers is a good thing in general. Use of 'qt' should be restricted to searching and should not start with a '/' - Key: SOLR-3161 URL: https://issues.apache.org/jira/browse/SOLR-3161 Project: Solr Issue Type: Improvement Components: search, web gui Reporter: David Smiley Assignee: David Smiley Fix For: 3.6, 4.0 Attachments: SOLR-3161-disable-qt-by-default.patch, SOLR-3161-dispatching-request-handler.patch, SOLR-3161-dispatching-request-handler.patch I haven't yet looked at the code involved for suggestions here; I'm speaking based on how I think things should work and not work, based on intuitiveness and security. In general I feel it is best practice to use '/' leading request handler names and not use qt, but I don't hate it enough when used in limited (search-only) circumstances to propose its demise. But if someone proposes its deprecation that then I am +1 for that. Here is my proposal: Solr should error if the parameter qt is supplied with a leading '/'. (trunk only) Solr should only honor qt if the target request handler extends solr.SearchHandler. The new admin UI should only use 'qt' when it has to. For the query screen, it could present a little pop-up menu of handlers to choose from, including /select?qt=mycustom for handlers that aren't named with a leading '/'. This choice should be positioned at the top. And before I forget, me or someone should investigate if there are any similar security problems with the shards.qt parameter. Perhaps shards.qt can abide by the same rules outlined above. Does anyone foresee any problems with this proposal? On a related subject, I think the notion of a default request handler is bad - the default=true thing. Honestly I'm not sure what it does, since I noticed Solr trunk redirects '/solr/' to the new admin UI at '/solr/#/'. Assuming it doesn't do anything useful anymore, I think it would be clearer to use requestHandler name=/select class=solr.SearchHandler instead of what's there now. The delta is to put the leading '/' on this request handler name, and remove the default attribute. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Exposing Solr routing to SolrJ client
FYI (if it is of any interest), we just hacked CloudSolrServer locally to support routing of realtime-get requests. Limitations are: - Only id-parameter and not ids-parameter supported in realtime-get requests. - Only schemas with uniqueKey on field named id and only id-field of type string supported. We did this to be able to start performance tests on our own system building on SolrCloud. The performance of our own system is dependent on being able to do realtime-gets from the client (our system), because we often do updates of documents very quickly after they have been indexed for the first time (and we run with soft-commit = 1 sec - we cant wait for that). We use the version control (for optimistic locking) and unique key constraint where you fail instead of overwrite if document already exists (http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics) in our highly concurrent performance test, so that will also be tested wrt performance. What we did in CloudSolrServer was: * Added the following to the requst method between the if (collection == null) statement and the LBHttpSolrServer.Req req = new LBHttpSolrServer.Req(request, urlList); statement: ListString urlList = new ArrayListString(); if (reqParams.get(CommonParams.QT) != null reqParams.get(CommonParams.QT).equals(/get)) { String id = reqParams.get(id); int hash = hash(id); String shardId = getShard(hash, collection, cloudState); ZkCoreNodeProps leaderProps = null; try { leaderProps = new ZkCoreNodeProps(zkStateReader.getLeaderProps( collection, shardId)); } catch (InterruptedException ie) { throw new SolrServerException(ie); } String fullUrl = ensureUrlHasProtocolIdentifier(leaderProps.getCoreUrl()); urlList.add(fullUrl); } else { stuff that was already in request between the if (collection == null) statement and the LBHttpSolrServer.Req req = new LBHttpSolrServer.Req(request, urlList); statement } * Added the follow helper-methods (stolen from DistributedUpdateProcessor etc.) private String ensureUrlHasProtocolIdentifier(String url) { if (!url.startsWith(http://;) !url.startsWith(https://;)) { url = http://; + url; } return url; } private String getShard(int hash, String collection, CloudState cloudState) { return cloudState.getShard(hash, collection); } private int hash(String id) { BytesRef indexedId = new BytesRef(); UnicodeUtil.UTF16toUTF8(id, 0, id.length(), indexedId); return Hash.murmurhash3_x86_32(indexedId.bytes, indexedId.offset, indexedId.length, 0); } It seems to work for us, but we look very much forward to the real solution. Regards, Per Steffensen Per Steffensen skrev: Right, you can't yet even with CloudSolrServer - but I think it will be done soon - certainly before the 4 release anyway. Ok, I will cross my fingers for it to be done soon. Thanks for your kind help. Regards, Steff - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3241) Document boost fail if a field copy omit the norms
[ https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-3241: -- Attachment: SOLR-3241.patch updating the patch from Tomás to include an additional test for the field boost+copyField. we still need to add tests for this polyField case, and any other possibilities (can you polyField+copyField?) Document boost fail if a field copy omit the norms -- Key: SOLR-3241 URL: https://issues.apache.org/jira/browse/SOLR-3241 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Fix For: 3.6, 4.0 Attachments: SOLR-3241.patch, SOLR-3241.patch After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible to set a boost to a field that has the omitNorms set to true. This is making Solr's document index-time boost to fail when a field that doesn't omit norms is copied (with copyField) to a field that does omit them and document boost is used. For example: field name=author type=text indexed=true stored=false omitNorms=false/ field name=author_display type=string indexed=true stored=true omitNorms=true/ copyField source=author dest=author_display/ I'm attaching a possible fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229160#comment-13229160 ] Michael McCandless commented on LUCENE-3867: Consulting MXBean sounds great? bq. Sysprops should be a fallback though +1 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3867: -- Attachment: LUCENE-3867-compressedOops.patch Here the patch for detecting compressesOops in Sun JVMs. For other JVMs it will simply use false, so the object refs will be guessed to have 64 bits, which is fine as upper memory limit. The code does only use public Java APIs and falls back if anything fails to false. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3244) New Admin UI doesn't work on tomcat
[ https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229167#comment-13229167 ] Aliaksandr Zhuhrou commented on SOLR-3244: -- Sure, I will test it at evening. Thank you very much. New Admin UI doesn't work on tomcat --- Key: SOLR-3244 URL: https://issues.apache.org/jira/browse/SOLR-3244 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Reporter: Aliaksandr Zhuhrou Assignee: Uwe Schindler Attachments: SOLR-3244.patch I am currently unable to open admin interface when using war deployment under tomcat server. The stack trace: SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path [/solr] threw exception java.lang.NullPointerException at java.io.File.init(File.java:251) at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Tomcat version: Apache Tomcat/7.0.23 Java version: jdk1.7.0_02 I did some debugging and found that the problem related that it delegates the resolving of resource path to the org.apache.naming.resources.WARDirContext which simply returns null for any input parameters: /** * Return the real path for a given virtual path, if possible; otherwise * return codenull/code. * * @param path The path to the desired resource */ @Override protected String doGetRealPath(String path) { return null; } Need to check specification, because it may be actually the tomcat bug. We may try use the getResourceAsStream(java.lang.String path) method which should work even for war. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3867: --- Attachment: LUCENE-3867.patch Patch adds RUE.sizeOf(String) and various sizeOf(arr[]) methods. Also fixes the ARRAY_HEADER. Uwe, I merged with your patch, with one difference -- the System.out prints in the test are printed only if VERBOSE. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary
[ https://issues.apache.org/jira/browse/SOLR-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229180#comment-13229180 ] Agnieszka commented on SOLR-3245: - I made one more test for Hunspell with english dictionary (from OpenOffice.org) in Solr 4.0. It seems that the problem not exists with the english dictionary. Solr 4.0, full import 489017 documents, hunspell, english dictionary: 3146 seconds, 155 docs/sec But I'm not sure if it is reliable because I use documents with polish text to test english dictionary. Poor performance of Hunspell with Polish Dictionary --- Key: SOLR-3245 URL: https://issues.apache.org/jira/browse/SOLR-3245 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0 Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java settings -server -Xms4096M -Xmx4096M Reporter: Agnieszka Labels: performance Attachments: pl_PL.zip In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance whereas performance of hunspell from http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. Tests shows: Solr 3.4, full import 489017 documents: StempelPolishStemFilterFactory - 2908 seconds, 168 docs/sec HunspellStemFilterFactory - 3922 seconds, 125 docs/sec Solr 4.0, full import 489017 documents: StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec My schema is quit easy. For Hunspell I have one text field I copy 14 text fields to: {code:xml} field name=text type=text_pl_hunspell indexed=true stored=false multiValued=true/ copyField source=field1 dest=text/ copyField source=field14 dest=text/ {code} The text_pl_hunspell configuration: {code:xml} fieldType name=text_pl_hunspell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true !--filter class=solr.KeywordMarkerFilterFactory protected=protwords_pl.txt/-- /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I used in 3.4 version. For Polish Stemmer the diffrence is only in definion text field: {code} field name=text type=text_pl indexed=true stored=false multiValued=true/ fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} One document has 23 fields: - 14 text fields copy to one text field
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229182#comment-13229182 ] Uwe Schindler commented on LUCENE-3867: --- Shai: Thanks! I am in a train at the moment, so internet is slow/not working. I will later find out what MXBeans we can use to detect 64bit without looking at strange sysprops (which may have been modified by user code, so not really secure to use...). I left the non-verbose printlns in it, so people reviewing the patch can quickly see by running that test what happens on their JVM. It would be interesting to see what your jRockit does... :-) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3241) Document boost fail if a field copy omit the norms
[ https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-3241: -- Attachment: SOLR-3241.patch Updated patch with fixes for the polyfield case (untested!) After reviewing the code: Tomás had the correct fix for the copyField case, his patch fixes a logic bug, nothing more! The polyField case is different: its too late in DocumentBuilder to do anything here after the creation of IndexableFields: moreover we cannot nuke the whole boost for the field because we cannot assume anything just because isPolyField() == true, for example a custom field type might not even be instanceof AbstractSubTypeField! Because of this I think these fieldtypes should really treat the fact they use 'real fields' as an impl detail. so I added the logic to their subfield creation. Document boost fail if a field copy omit the norms -- Key: SOLR-3241 URL: https://issues.apache.org/jira/browse/SOLR-3241 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Fix For: 3.6, 4.0 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible to set a boost to a field that has the omitNorms set to true. This is making Solr's document index-time boost to fail when a field that doesn't omit norms is copied (with copyField) to a field that does omit them and document boost is used. For example: field name=author type=text indexed=true stored=false omitNorms=false/ field name=author_display type=string indexed=true stored=true omitNorms=true/ copyField source=author dest=author_display/ I'm attaching a possible fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229188#comment-13229188 ] Shai Erera commented on LUCENE-3867: I tried IBM and Oracle 1.6 JVMs, and both printed the same: {code} [junit] - Standard Output --- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: false [junit] - --- {code} So no CompressedOops for me :). bq. I will later find out what MXBeans we can use to detect 64bit without looking at strange sysprops Ok. If you'll make it, we can add these changes to that patch, otherwise we can also do them in a separate issue. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229191#comment-13229191 ] Uwe Schindler commented on LUCENE-3867: --- Hm, for it prints true. What JVMs are you using and what settings? RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229191#comment-13229191 ] Uwe Schindler edited comment on LUCENE-3867 at 3/14/12 1:47 PM: Hm, for me (1.6.0_31, 7u3) it prints true. What JVMs are you using and what settings? was (Author: thetaphi): Hm, for it prints true. What JVMs are you using and what settings? RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3221) Make Shard handler threadpool configurable
[ https://issues.apache.org/jira/browse/SOLR-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229197#comment-13229197 ] Erick Erickson commented on SOLR-3221: -- Yeah, just go ahead and edit the Wiki, all you need to do is create an account. As for CHANGES.txt. Just attach it to this JIRA and I'll go ahead and commit it without a new JIRA. Two tricky things: 1 there are two files, one in the 3x branch and one in the 4x, both need the text 2 The 4x branch needs to be edited in two places, one for the 3.x section and in the 4x section. I'll see it when if this JIRA changes and check them in. Thanks! Make Shard handler threadpool configurable -- Key: SOLR-3221 URL: https://issues.apache.org/jira/browse/SOLR-3221 Project: Solr Issue Type: Improvement Affects Versions: 3.6, 4.0 Reporter: Greg Bowyer Assignee: Erick Erickson Labels: distributed, http, shard Fix For: 3.6, 4.0 Attachments: SOLR-3221-3x_branch.patch, SOLR-3221-3x_branch.patch, SOLR-3221-3x_branch.patch, SOLR-3221-3x_branch.patch, SOLR-3221-3x_branch.patch, SOLR-3221-trunk.patch, SOLR-3221-trunk.patch, SOLR-3221-trunk.patch, SOLR-3221-trunk.patch, SOLR-3221-trunk.patch From profiling of monitor contention, as well as observations of the 95th and 99th response times for nodes that perform distributed search (or ‟aggregator‟ nodes) it would appear that the HttpShardHandler code currently does a suboptimal job of managing outgoing shard level requests. Presently the code contained within lucene 3.5's SearchHandler and Lucene trunk / 3x's ShardHandlerFactory create arbitrary threads in order to service distributed search requests. This is done presently to limit the size of the threadpool such that it does not consume resources in deployment configurations that do not use distributed search. This unfortunately has two impacts on the response time if the node coordinating the distribution is under high load. The usage of the MaxConnectionsPerHost configuration option results in aggressive activity on semaphores within HttpCommons, it has been observed that the aggregator can have a response time far greater than that of the searchers. The above monitor contention would appear to suggest that in some cases its possible for liveness issues to occur and for simple queries to be starved of resources simply due to a lack of attention from the viewpoint of context switching. With, as mentioned above the http commons connection being hotly contended The fair, queue based configuration eliminates this, at the cost of throughput. This patch aims to make the threadpool largely configurable allowing for those using solr to choose the throughput vs latency balance they desire. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229200#comment-13229200 ] Uwe Schindler commented on LUCENE-3867: --- Here my results: {noformat} * JAVA_HOME = C:\Program Files\Java\jdk1.7.0_03 java version 1.7.0_03 Java(TM) SE Runtime Environment (build 1.7.0_03-b05) Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode) * C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\coreant test -Dtestcase=TestRam* [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,561 sec [junit] [junit] - Standard Output --- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: true [junit] - --- C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\coreant test -Dtestcase=TestRam* -Dargs=-XX:-UseCompressedOops [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,5 sec [junit] [junit] - Standard Output --- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: false [junit] - --- * JAVA_HOME = C:\Program Files\Java\jdk1.6.0_31 java version 1.6.0_31 Java(TM) SE Runtime Environment (build 1.6.0_31-b05) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) * C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\coreant test -Dtestcase=TestRam* [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,453 sec [junit] [junit] - Standard Output --- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: true [junit] - --- C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\coreant test -Dtestcase=TestRam* -Dargs=-XX:-UseCompressedOops [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,421 sec [junit] [junit] - Standard Output --- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: false [junit] - --- C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\coreant test -Dtestcase=TestRam* -Dargs=-XX:+UseCompressedOops [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,422 sec [junit] [junit] - Standard Output --- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: true [junit] - --- {noformat} RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 *
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229211#comment-13229211 ] Shai Erera commented on LUCENE-3867: Oracle: {code} java version 1.6.0_21 Java(TM) SE Runtime Environment (build 1.6.0_21-b07) Java HotSpot(TM) 64-Bit Server VM (build 17.0-b17, mixed mode) {code} IBM: {code} java version 1.6.0 Java(TM) SE Runtime Environment (build pwa6460sr9fp3-2022_05(SR9 FP3)) IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Windows 7 amd64-64 jvmwa6460sr9-2011_94827 (JIT enabled, AOT enabled) J9VM - 2011_094827 JIT - r9_20101028_17488ifx45 GC - 20101027_AA) JCL - 20110727_07 {code} RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229215#comment-13229215 ] Shai Erera commented on LUCENE-3867: I ran ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:+UseCompressedOops and with the Oracle JVM I get Compressed Oops: true but with IBM JVM I still get 'false'. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229217#comment-13229217 ] Uwe Schindler commented on LUCENE-3867: --- OK, that is expected. 1.6.0_21 does not enable compressedOops by default, so false is correct. If you manually enable, it gets true. jRockit is jRockit and not Sun/Oracle, so the result is somehow expected. It seems to nor have that MXBrean. But the code does not produce strange exceptions, so at least in the Sun VM we can detect compressed Oops and guess the reference size better. 8 is still not bad as it gives an upper limit. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229218#comment-13229218 ] Uwe Schindler commented on LUCENE-3867: --- By the way, here is the code from the hotspot mailing list member (my code is based on it), it also shows the outputs for different JVMs: https://gist.github.com/1333043 (I just removed the com.sun.* imports and replaced by reflection) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229226#comment-13229226 ] Shai Erera commented on LUCENE-3867: bq. 8 is still not bad as it gives an upper limit. I agree. Better to over-estimate here, than under-estimate. Would appreciate if someone can take a look at the sizeOf() impls before I commit. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3241) Document boost fail if a field copy omit the norms
[ https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-3241: Attachment: SOLR-3241.patch I'm attaching another patch with some more tests. Also updated the DocumentBuilder to use the existing logic instead of replicating it where the fix is applied. Document boost fail if a field copy omit the norms -- Key: SOLR-3241 URL: https://issues.apache.org/jira/browse/SOLR-3241 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Fix For: 3.6, 4.0 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible to set a boost to a field that has the omitNorms set to true. This is making Solr's document index-time boost to fail when a field that doesn't omit norms is copied (with copyField) to a field that does omit them and document boost is used. For example: field name=author type=text indexed=true stored=false omitNorms=false/ field name=author_display type=string indexed=true stored=true omitNorms=true/ copyField source=author dest=author_display/ I'm attaching a possible fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest
[ https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229238#comment-13229238 ] Robert Muir commented on LUCENE-3869: - Stacktrace: {noformat} junit-sequential: [junit] Testsuite: org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.603 sec [junit] [junit] Testsuite: org.apache.lucene.analysis.uima.UIMATypeAwareAnalyzerTest [junit] 2012-03-14 10:42:27 [junit] Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode): [junit] [junit] Thread-9 prio=10 tid=0x7f3ba44af000 nid=0x34d1 runnable [0x7f3ba3a72000] [junit]java.lang.Thread.State: RUNNABLE [junit] at java.util.HashMap.transfer(HashMap.java:484) [junit] at java.util.HashMap.resize(HashMap.java:463) [junit] at java.util.HashMap.addEntry(HashMap.java:755) [junit] at java.util.HashMap.put(HashMap.java:385) [junit] at java.util.HashSet.add(HashSet.java:200) [junit] at org.apache.uima.analysis_engine.impl.AnalysisEngineManagementImpl.setName(AnalysisEngineManagementImpl.java:245) [junit] at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:181) [junit] at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:127) [junit] at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) [junit] at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) [junit] at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:267) [junit] at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:335) [junit] at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:73) [junit] at org.apache.lucene.analysis.uima.BaseUIMATokenizer.init(BaseUIMATokenizer.java:45) [junit] at org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.init(UIMATypeAwareAnnotationsTokenizer.java:54) [junit] at org.apache.lucene.analysis.uima.UIMATypeAwareAnalyzer.createComponents(UIMATypeAwareAnalyzer.java:40) [junit] at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:368) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:338) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:330) [junit] [junit] Low Memory Detector daemon prio=10 tid=0x41a0d000 nid=0x34c7 runnable [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] CompilerThread1 daemon prio=10 tid=0x41a0a800 nid=0x34c6 waiting on condition [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] CompilerThread0 daemon prio=10 tid=0x41a08000 nid=0x34c5 waiting on condition [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] Signal Dispatcher daemon prio=10 tid=0x41a05800 nid=0x34c4 waiting on condition [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] Finalizer daemon prio=10 tid=0x419e9000 nid=0x34c3 in Object.wait() [0x7f3ba092] [junit]java.lang.Thread.State: WAITING (on object monitor) [junit] at java.lang.Object.wait(Native Method) [junit] - waiting on 0xe6920d88 (a java.lang.ref.ReferenceQueue$Lock) [junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) [junit] - locked 0xe6920d88 (a java.lang.ref.ReferenceQueue$Lock) [junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) [junit] at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) [junit] [junit] Reference Handler daemon prio=10 tid=0x419e6800 nid=0x34c2 in Object.wait() [0x7f3ba8a21000] [junit]java.lang.Thread.State: WAITING (on object monitor) [junit] at java.lang.Object.wait(Native Method) [junit] - waiting on 0xe6920d20 (a java.lang.ref.Reference$Lock) [junit] at java.lang.Object.wait(Object.java:485) [junit] at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) [junit] - locked 0xe6920d20 (a java.lang.ref.Reference$Lock) [junit] [junit] LTC-main#seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd prio=10 tid=0x4197a800 nid=0x34b8 in Object.wait() [0x7f3badfc6000] [junit]
[jira] [Created] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest
possible hang in UIMATypeAwareAnalyzerTest -- Key: LUCENE-3869 URL: https://issues.apache.org/jira/browse/LUCENE-3869 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Robert Muir Just testing an unrelated patch, I was hung (with 100% cpu) in UIMATypeAwareAnalyzerTest. I'll attach stacktrace at the moment of the hang. The fact we get a seed in the actual stacktraces for cases like this is awesome! Thanks Dawid! I don't think it reproduces 100%, but I'll try beasting this seed to see if i can reproduce the hang: should be 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest -Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd' from what I can see. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest
[ https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229241#comment-13229241 ] Tommaso Teofili commented on LUCENE-3869: - Thanks Robert, I'm taking a look possible hang in UIMATypeAwareAnalyzerTest -- Key: LUCENE-3869 URL: https://issues.apache.org/jira/browse/LUCENE-3869 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Robert Muir Just testing an unrelated patch, I was hung (with 100% cpu) in UIMATypeAwareAnalyzerTest. I'll attach stacktrace at the moment of the hang. The fact we get a seed in the actual stacktraces for cases like this is awesome! Thanks Dawid! I don't think it reproduces 100%, but I'll try beasting this seed to see if i can reproduce the hang: should be 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest -Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd' from what I can see. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3241) Document boost fail if a field copy omit the norms
[ https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229243#comment-13229243 ] Robert Muir commented on SOLR-3241: --- Thanks, patch looks good! I'll wait a bit for any other input. Document boost fail if a field copy omit the norms -- Key: SOLR-3241 URL: https://issues.apache.org/jira/browse/SOLR-3241 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Fix For: 3.6, 4.0 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible to set a boost to a field that has the omitNorms set to true. This is making Solr's document index-time boost to fail when a field that doesn't omit norms is copied (with copyField) to a field that does omit them and document boost is used. For example: field name=author type=text indexed=true stored=false omitNorms=false/ field name=author_display type=string indexed=true stored=true omitNorms=true/ copyField source=author dest=author_display/ I'm attaching a possible fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229244#comment-13229244 ] Uwe Schindler edited comment on LUCENE-3867 at 3/14/12 3:01 PM: On Hotspot Mailing list some people also seem to have an idea about jRockit and IBM J9: {quote} From: Krystal Mok Sent: Wednesday, March 14, 2012 3:46 PM To: Uwe Schindler Cc: Dawid Weiss; hotspot compiler Subject: Re: How to detect if the VM is running with compact refs from within the VM (no agent)? Hi, Just in case you'd care, the same MXBean could be used to detect compressed references on JRockit, too. It's probably available starting from JRockit R28. Instead of UseCompressedOops, use CompressedRefs as the VM option name on JRockit. Don't know how to extract this information for J9 without another whole bunch of hackeries...well, you could try this, on a best-effort basis for platform detection: IBM J9's VM version string contains the compressed reference information. Example: $ export JAVA_OPTS='-Xcompressedrefs' $ groovysh Groovy Shell (1.7.7, JVM: 1.7.0) Type 'help' or '\h' for help. groovy:000 System.getProperty 'java.vm.info' === JRE 1.7.0 Linux amd64-64 Compressed References 20110810_88604 (JIT enabled, AOT enabled) J9VM - R26_Java726_GA_20110810_1208_B88592 JIT - r11_20110810_20466 GC - R26_Java726_GA_20110810_1208_B88592_CMPRSS J9CL - 20110810_88604 groovy:000 quit So grepping for Compressed References in the java.vm.info system property gives you the clue. - Kris {quote} was (Author: thetaphi): On Hotspot Mailing list some people also seem to have an idea about jRockit and IBM J9: {quote} From: Krystal Mok [mailto:rednaxel...@gmail.com] Sent: Wednesday, March 14, 2012 3:46 PM To: Uwe Schindler Cc: Dawid Weiss; hotspot compiler Subject: Re: How to detect if the VM is running with compact refs from within the VM (no agent)? Hi, Just in case you'd care, the same MXBean could be used to detect compressed references on JRockit, too. It's probably available starting from JRockit R28. Instead of UseCompressedOops, use CompressedRefs as the VM option name on JRockit. Don't know how to extract this information for J9 without another whole bunch of hackeries...well, you could try this, on a best-effort basis for platform detection: IBM J9's VM version string contains the compressed reference information. Example: $ export JAVA_OPTS='-Xcompressedrefs' $ groovysh Groovy Shell (1.7.7, JVM: 1.7.0) Type 'help' or '\h' for help. groovy:000 System.getProperty 'java.vm.info' === JRE 1.7.0 Linux amd64-64 Compressed References 20110810_88604 (JIT enabled, AOT enabled) J9VM - R26_Java726_GA_20110810_1208_B88592 JIT - r11_20110810_20466 GC - R26_Java726_GA_20110810_1208_B88592_CMPRSS J9CL - 20110810_88604 groovy:000 quit So grepping for Compressed References in the java.vm.info system property gives you the clue. - Kris {quote} RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229244#comment-13229244 ] Uwe Schindler commented on LUCENE-3867: --- On Hotspot Mailing list some people also seem to have an idea about jRockit and IBM J9: {quote} From: Krystal Mok [mailto:rednaxel...@gmail.com] Sent: Wednesday, March 14, 2012 3:46 PM To: Uwe Schindler Cc: Dawid Weiss; hotspot compiler Subject: Re: How to detect if the VM is running with compact refs from within the VM (no agent)? Hi, Just in case you'd care, the same MXBean could be used to detect compressed references on JRockit, too. It's probably available starting from JRockit R28. Instead of UseCompressedOops, use CompressedRefs as the VM option name on JRockit. Don't know how to extract this information for J9 without another whole bunch of hackeries...well, you could try this, on a best-effort basis for platform detection: IBM J9's VM version string contains the compressed reference information. Example: $ export JAVA_OPTS='-Xcompressedrefs' $ groovysh Groovy Shell (1.7.7, JVM: 1.7.0) Type 'help' or '\h' for help. groovy:000 System.getProperty 'java.vm.info' === JRE 1.7.0 Linux amd64-64 Compressed References 20110810_88604 (JIT enabled, AOT enabled) J9VM - R26_Java726_GA_20110810_1208_B88592 JIT - r11_20110810_20466 GC - R26_Java726_GA_20110810_1208_B88592_CMPRSS J9CL - 20110810_88604 groovy:000 quit So grepping for Compressed References in the java.vm.info system property gives you the clue. - Kris {quote} RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest
[ https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229255#comment-13229255 ] Robert Muir commented on LUCENE-3869: - If you try this seed a lot of times, eventually it will reproduce. I tried adding -Dtests.iter=10 to my command-line, so 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest -Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd -Dtests.iter=10' after about 3 or 4 runs it hung... though interestingly it hung at 200% cpu usage (as if two of the analysis threads were stuck). Stacktrace is in the same place, just for both threads: {noformat} [junit] Thread-48 prio=10 tid=0x7fbec0b9c000 nid=0x4979 runnable [0x7fbebfa7d000] [junit]java.lang.Thread.State: RUNNABLE [junit] at java.util.HashMap.getEntry(HashMap.java:347) [junit] at java.util.HashMap.containsKey(HashMap.java:335) [junit] at java.util.HashSet.contains(HashSet.java:184) [junit] at org.apache.uima.analysis_engine.impl.AnalysisEngineManagementImpl.setName(AnalysisEngineManagementImpl.java:242) [junit] at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:181) [junit] at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:127) [junit] at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) [junit] at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) [junit] at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:267) [junit] at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:335) [junit] at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:73) [junit] at org.apache.lucene.analysis.uima.BaseUIMATokenizer.init(BaseUIMATokenizer.java:45) [junit] at org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.init(UIMATypeAwareAnnotationsTokenizer.java:54) [junit] at org.apache.lucene.analysis.uima.UIMATypeAwareAnalyzer.createComponents(UIMATypeAwareAnalyzer.java:40) [junit] at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:368) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:338) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:330) [junit] [junit] Thread-46 prio=10 tid=0x7fbec10df000 nid=0x4977 runnable [0x7fbebfb7e000] [junit]java.lang.Thread.State: RUNNABLE [junit] at java.util.HashMap.getEntry(HashMap.java:347) [junit] at java.util.HashMap.containsKey(HashMap.java:335) [junit] at java.util.HashSet.contains(HashSet.java:184) [junit] at org.apache.uima.analysis_engine.impl.AnalysisEngineManagementImpl.setName(AnalysisEngineManagementImpl.java:242) [junit] at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:181) [junit] at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:127) [junit] at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) [junit] at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) [junit] at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:267) [junit] at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:335) [junit] at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:73) [junit] at org.apache.lucene.analysis.uima.BaseUIMATokenizer.init(BaseUIMATokenizer.java:45) [junit] at org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.init(UIMATypeAwareAnnotationsTokenizer.java:54) [junit] at org.apache.lucene.analysis.uima.UIMATypeAwareAnalyzer.createComponents(UIMATypeAwareAnalyzer.java:40) [junit] at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:368) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:338) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:330) {noformat} possible hang in UIMATypeAwareAnalyzerTest --
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229264#comment-13229264 ] Michael McCandless commented on LUCENE-3867: Patch looks good! Maybe just explain in sizeOf(String) javadoc that this method assumes the String is standalone (ie, does not reference a larger char[] than itself)? Because... if you call String.substring, the returned string references a slice the char[] of the original one... and so technically the RAM it's tying up could be (much) larger than expected. (At least, this used to be the case... not sure if it's changed...). RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3841) CloseableThreadLocal does not work well with Tomcat thread pooling
[ https://issues.apache.org/jira/browse/LUCENE-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3841. Resolution: Fixed Thanks Matthew! CloseableThreadLocal does not work well with Tomcat thread pooling -- Key: LUCENE-3841 URL: https://issues.apache.org/jira/browse/LUCENE-3841 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 3.5 Environment: Lucene/Tika/Snowball running in a Tomcat web application Reporter: Matthew Bellew Assignee: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3841.patch We tracked down a large memory leak (effectively a leak anyway) caused by how Analyzer users CloseableThreadLocal. CloseableThreadLocal.hardRefs holds references to Thread objects as keys. The problem is that it only frees these references in the set() method, and SnowballAnalyzer will only call set() when it is used by a NEW thread. The problem scenario is as follows: The server experiences a spike in usage (say by robots or whatever) and many threads are created and referenced by CloseableThreadLocal.hardRefs. The server quiesces and lets many of these threads expire normally. Now we have a smaller, but adequate thread pool. So CloseableThreadLocal.set() may not be called by SnowBallAnalyzer (via Analyzer) for a _long_ time. The purge code is never called, and these threads along with their thread local storage (lucene related or not) is never cleaned up. I think calling the purge code in both get() and set() would have avoided this problem, but is potentially expensive. Perhaps using WeakHashMap instead of HashMap may also have helped. WeakHashMap purges on get() and set(). So this might be an efficient way to clean up threads in get(), while set() might do the more expensive Map.keySet() iteration. Our current work around is to not share SnowBallAnalyzer instances among HTTP searcher threads. We open and close one on every request. Thanks, Matt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229273#comment-13229273 ] Shai Erera commented on LUCENE-3867: Good point. I clarified the jdocs with this: {code} /** * Returns the approximate size of a String object. This computation relies on * {@link String#length()} to compute the number of bytes held by the char[]. * However, if the String object passed to this method is the result of e.g. * {@link String#substring}, the computation may be entirely inaccurate * (depending on the difference between length() and the actual char[] * length). */ {code} If there are no objections, I'd like to commit this. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229275#comment-13229275 ] Dawid Weiss commented on LUCENE-3867: - I would opt for sizeOf to return the actual size of the object, including underlying string buffers... We can take into account interning buffers but other than that I wouldn't skew the result because it can be misleading. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229279#comment-13229279 ] Dawid Weiss commented on LUCENE-3867: - I don't like this special handling of Strings, to be honest. Why do we need/do it? RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229286#comment-13229286 ] Shai Erera commented on LUCENE-3867: bq. I don't like this special handling of Strings, to be honest. Why do we need/do it? Because I wrote it, and it seemed useful to me, so why not? We know how Strings look like, at least in their worse case. If there will be a better implementation, we can fix it in RUE, rather than having many impls try to do it on their own? RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1970) need to customize location of dataimport.properties
[ https://issues.apache.org/jira/browse/SOLR-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229289#comment-13229289 ] Terrance A. Snyder commented on SOLR-1970: -- +1 I am looking at implementing with current trunk. I'll hopefully submit a patch to this. Keep in mind this may have to be added to the core configuration area to keep backward compatibility. Something like: core name=users_001 instanceDir=users config=solrconfig.xml dataDir=../users_001 dataImportPropertiesFile=../users_001/di.properties/ core name=users_002 instanceDir=users config=solrconfig.xml dataDir=../users_002 dataImportPropertiesFile=../users_001/di.properties/ This would operation just like todays core config: $SOLR_HOME/users /users/conf /users/conf/solrconfig.xml /users/conf/schema.xml /users_001 /users_001/data /users_001/di.properties /users_002 /users_002/data /users_002/di.properties This allows the core configuration and sharding to work effectively. The core question is how this would play with zookeeper / cloud support. I would think this should already be baked into SolrCloud but I could be wrong. Any thoughts? need to customize location of dataimport.properties --- Key: SOLR-1970 URL: https://issues.apache.org/jira/browse/SOLR-1970 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Chris Book By default dataimport.properties is written to {solr.home}/conf/. However when using multiple solr cores, it is currently useful to use the same conf directory for all of the cores and use solr.xml to specify a different schema.xml. I can then specify a different data-config.xml for each core to define how the data gets from the database to each core's shema. However, all the solr cores will fight over writing to the dataimport.properties file. There should be an option in solrconfig.xml to specify the location or name of this file so that a different one can be used for each core. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-1970) need to customize location of dataimport.properties
[ https://issues.apache.org/jira/browse/SOLR-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229289#comment-13229289 ] Terrance A. Snyder edited comment on SOLR-1970 at 3/14/12 4:08 PM: --- +1 I am looking at implementing with current trunk. I'll hopefully submit a patch to this. Keep in mind this may have to be added to the core configuration area to keep backward compatibility. Something like: core name=users_001 instanceDir=users config=solrconfig.xml dataDir=../users_001 dataImportPropertiesFile=../users_001/di.properties/ core name=users_002 instanceDir=users config=solrconfig.xml dataDir=../users_002 dataImportPropertiesFile=../users_002/di.properties/ This would operation just like todays core config: $SOLR_HOME/users /users/conf /users/conf/solrconfig.xml /users/conf/schema.xml /users_001 /users_001/data /users_001/di.properties /users_002 /users_002/data /users_002/di.properties This allows the core configuration and sharding to work effectively. The core question is how this would play with zookeeper / cloud support. I would think this should already be baked into SolrCloud but I could be wrong. Any thoughts? was (Author: terrance.snyder): +1 I am looking at implementing with current trunk. I'll hopefully submit a patch to this. Keep in mind this may have to be added to the core configuration area to keep backward compatibility. Something like: core name=users_001 instanceDir=users config=solrconfig.xml dataDir=../users_001 dataImportPropertiesFile=../users_001/di.properties/ core name=users_002 instanceDir=users config=solrconfig.xml dataDir=../users_002 dataImportPropertiesFile=../users_001/di.properties/ This would operation just like todays core config: $SOLR_HOME/users /users/conf /users/conf/solrconfig.xml /users/conf/schema.xml /users_001 /users_001/data /users_001/di.properties /users_002 /users_002/data /users_002/di.properties This allows the core configuration and sharding to work effectively. The core question is how this would play with zookeeper / cloud support. I would think this should already be baked into SolrCloud but I could be wrong. Any thoughts? need to customize location of dataimport.properties --- Key: SOLR-1970 URL: https://issues.apache.org/jira/browse/SOLR-1970 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Chris Book By default dataimport.properties is written to {solr.home}/conf/. However when using multiple solr cores, it is currently useful to use the same conf directory for all of the cores and use solr.xml to specify a different schema.xml. I can then specify a different data-config.xml for each core to define how the data gets from the database to each core's shema. However, all the solr cores will fight over writing to the dataimport.properties file. There should be an option in solrconfig.xml to specify the location or name of this file so that a different one can be used for each core. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: UpdateRequestProcessor to extract Solr XML from rich documents
On Mar 14, 2012, at 12:07 PM, Emmanuel Espina wrote: XmlWritingUpdateProcessorFactory.java +1 - looks like a useful update proc. I'd make a couple minor suggestions, like looking at the response of mkdirs and logging an error or warning if it doesn't exist and can't be made, and closing the file writer in a finally block. I'd go straight to a JIRA though. - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229296#comment-13229296 ] Michael McCandless commented on LUCENE-3867: bq. I don't like this special handling of Strings, to be honest. I'm confused: what special handling of Strings are we talking about...? You mean that sizeOf(String) doesn't return the correct answer if the string came from a previous .substring (.split too) call...? If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]? RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3241) Document boost fail if a field copy omit the norms
[ https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229298#comment-13229298 ] Hoss Man commented on SOLR-3241: patch looks fine ... i wish there was a way to make it easier for poly fields so they wouldn't have do do the check themselves, but when i tried the idea i had it didn't work, so better to go with this for now and maybe refactor a helper method later. the few changes i would make: 1) make the new tests grab the IndexSchema obejct and assert that every field (that the cares about) has the expected omitNorms value -- future proof ourselves against someone nuetering the test w/o realizing by tweaking the test schema because they don't know that there is a specific reason for those omitNorm settings 2) add a test that explicitly verifies the failure case of someone setting field boost on a field with omitNorms==true, assert that we get the expected error mesg (doesn't look like this was added when LUCENE-3796 was commited, and we want to make sure we don't inadvertantly break that error check) Document boost fail if a field copy omit the norms -- Key: SOLR-3241 URL: https://issues.apache.org/jira/browse/SOLR-3241 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Fix For: 3.6, 4.0 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible to set a boost to a field that has the omitNorms set to true. This is making Solr's document index-time boost to fail when a field that doesn't omit norms is copied (with copyField) to a field that does omit them and document boost is used. For example: field name=author type=text indexed=true stored=false omitNorms=false/ field name=author_display type=string indexed=true stored=true omitNorms=true/ copyField source=author dest=author_display/ I'm attaching a possible fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229311#comment-13229311 ] Dawid Weiss commented on LUCENE-3867: - {code} + /** Returns the size in bytes of the String[] object. */ + public static int sizeOf(String[] arr) { +int size = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF * arr.length); +for (String s : arr) { + size += sizeOf(s); +} +return size; + } + + /** Returns the approximate size of a String object. */ + public static int sizeOf(String str) { +// String's char[] size +int arraySize = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_CHAR * str.length()); + +// String's row object size +int objectSize = alignObjectSize(NUM_BYTES_OBJECT_REF /* array reference */ ++ 3 * NUM_BYTES_INT /* String holds 3 integers */ ++ NUM_BYTES_OBJECT_HEADER /* String object header */); + +return objectSize + arraySize; + } {code} What I mean is that without looking at the code I would expect sizeOf(String[] N) to return the actual memory taken by an array of strings. If they point to a single char[], this should simple count the object overhead, not count every character N times as it would do now. This isn't sizeOf(), this is sum(string lengths * 2) + epsilon to me. I'd keep RamUsageEstimator exactly what the name says -- an estimation of the actual memory taken by a given object. A string can point to a char[] and if so this should be traversed as an object and counted once. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229313#comment-13229313 ] Dawid Weiss commented on LUCENE-3867: - bq. If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]? Same as with other objects -- traverse its fields and count them (once, building an identity set for all objects reachable from the root)? RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Getting bug fix from Lucene into Lucene.Net
I am running into the following problem discussed and fixed in the current Lucene Java version: While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Is this something that can get into Lucene.Net soon? See https://issues.apache.org/jira/browse/LUCENE-3390 for complete information on the Lucene issue and fix. Tom Cabanski Software Developer President http://southsidesoft.com 5702 Kendall Hill Ln Sugar Land, TX 77479 832-766-5961 * * Profiles: http://http://www.linkedin.com/in/tomcabanski LinkedInhttp://http://www.linkedin.com/in/tomcabanski Contact me: [image: Google Talk] t...@cabanski.com [image: Skype]tom.cabanski [image: MSN] t...@cabanski.com My blog: Top Ten Reasons to Avoid Taking Advanced Distributed System Design with Udi Dahanhttp://tom.cabanski.com/2012/02/09/top-ten-reasons-to-avoid-taking-advanced-distributed-system-design-with-udi-dahan/ [image: Twitter] http://twitter.com/tcabanski Latest tweet: @DataArtist : Ditto for Azure. It all came back. Funny thing is I think MS never even bothered to fix Zune problem because it was firmware Follow @tcabanski http://twitter.com/tcabanski Reply http://twitter.com/?status=@tcabanski%20in_reply_to_status_id=179875723673223170in_reply_to=tcabanski Retweet http://twitter.com/?status=RT%20%40tcabanski%3A%20%40DataArtist%20%3A%20Ditto%20for%20Azure.%20%20It%20all%20came%20back.%20%20Funny%20thing%20is%20I%20think%20MS%20never%20even%20bothered%20to%20fix%20Zune%20problem%20because%20it%20was%20firmware 05:24 Mar-14http://twitter.com/tcabanski/statuses/179875723673223168 Get this email app! http://www.wisestamp.com/apps/twitter?utm_source=extensionutm_medium=emailutm_term=twitterutm_campaign=apps
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229332#comment-13229332 ] Shai Erera commented on LUCENE-3867: bq. What I mean is that without looking at the code I would expect sizeOf(String[] N) to return the actual memory taken by an array of strings. So you mean you'd want sizeOf(String[]) be just that? {code} return alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF * arr.length); {code} I don't mind. I just thought that since we know how to compute sizeOf(String), we can use that. It's an extreme case, I think, that someone will want to compute the size of String[] which share same char[] instance ... but I don't mind if it bothers you that much, to simplify it and document that it computes the raw size of the String[]. But I don't think that we should change sizeOf(String) to not count the char[] size. It's part of the object, and really it's String, not like we're trying to compute the size of a general object. bq. Same as with other objects – traverse its fields and count them RUE already has .estimateRamUsage(Object) which does that through reflection. I think that sizeOf(String) can remain fast as it is now, with the comment that it my over-estimate if the String is actually a sub-string of one original larger string. In the worse case, we'll just be over-estimating. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3867: -- Attachment: LUCENE-3867.patch Hi Shai, can ypou try this patch with J9 or maybe JRockit (Robert)? If yozu use one of those JVMs you may have to explicitely enable compressed Oops/refs! RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3241) Document boost fail if a field copy omit the norms
[ https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-3241: -- Attachment: SOLR-3241.patch updated patch with hossman's suggested test improvements. I'll commit soon. Document boost fail if a field copy omit the norms -- Key: SOLR-3241 URL: https://issues.apache.org/jira/browse/SOLR-3241 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Fix For: 3.6, 4.0 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible to set a boost to a field that has the omitNorms set to true. This is making Solr's document index-time boost to fail when a field that doesn't omit norms is copied (with copyField) to a field that does omit them and document boost is used. For example: field name=author type=text indexed=true stored=false omitNorms=false/ field name=author_display type=string indexed=true stored=true omitNorms=true/ copyField source=author dest=author_display/ I'm attaching a possible fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229339#comment-13229339 ] Dawid Weiss commented on LUCENE-3867: - bq. RUE already has .estimateRamUsage(Object) which does that through reflection. I think that sizeOf(String) can remain fast as it is now, with the comment that it my over-estimate if the String is actually a sub-string of one original larger string. In the worse case, we'll just be over-estimating. Yeah, that's exactly what I didn't like. All the primitive/ primitive array methods are fine, but why make things inconsistent with sizeOf(String)? I'd rather have the reflection-based method estimate the size of a String/String[]. Like we mentioned it's always a matter of speed/accuracy but here I'd opt for accuracy because the output can be off by a lot if you make substrings along the way (not to mention it assumes details about String internal implementation which may or may not be true, depending on the vendor). Do you have a need for this method, Shai? If you don't then why not wait (with this part) until such a need arises? RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Getting bug fix from Lucene Java into Lucene.Net
That largely depends on how different the 3.4/3.5 java API is from the current version in our trunk (3.0.3). From a maintainability standpoint, it would be easier for us to fix this bug when we've ported Lucene.NET to the same version that java was at when it was fixed. It looks like they added a class and changed another's signature, and then changed all of the comparator classes to inherit from the new one, instead the old. I don't know what that would do to our current version of code, or if that can be worked in without major changes. Worst case scenario, we can't get it in until we reach 3.4, but I can certainly see what we can do get it in earlier. In the meantime, it seems that Hoss Man has noted a workaround here https://issues.apache.org/jira/browse/LUCENE-3390?focusedCommentId=13088832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13088832 that can be used. Thanks, Christopher On Wed, Mar 14, 2012 at 6:04 AM, Tom Cabanski t...@cabanski.com wrote: I am running into the following problem discussed and fixed in the current Lucene Java version: While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Is this something that can get into Lucene.Net soon? See https://issues.apache.org/jira/browse/LUCENE-3390 for complete information on the Lucene issue and fix. Tom Cabanski Software Developer President http://southsidesoft.com 5702 Kendall Hill Ln Sugar Land, TX 77479 832-766-5961 * * Profiles: http://http://www.linkedin.com/in/tomcabanski LinkedInhttp://http://www.linkedin.com/in/tomcabanski Contact me: [image: Google Talk] t...@cabanski.com [image: Skype]tom.cabanski [image: MSN] t...@cabanski.com My blog: Top Ten Reasons to Avoid Taking Advanced Distributed System Design with Udi Dahan http://tom.cabanski.com/2012/02/09/top-ten-reasons-to-avoid-taking-advanced-distributed-system-design-with-udi-dahan/ [image: Twitter] http://twitter.com/tcabanski Latest tweet: @DataArtist : Ditto for Azure. It all came back. Funny thing is I think MS never even bothered to fix Zune problem because it was firmware Follow @tcabanski http://twitter.com/tcabanski Reply http://twitter.com/?status=@tcabanski%20in_reply_to_status_id=179875723673223170in_reply_to=tcabanski Retweet http://twitter.com/?status=RT%20%40tcabanski%3A%20%40DataArtist%20%3A%20Ditto%20for%20Azure.%20%20It%20all%20came%20back.%20%20Funny%20thing%20is%20I%20think%20MS%20never%20even%20bothered%20to%20fix%20Zune%20problem%20because%20it%20was%20firmware 05:24 Mar-14http://twitter.com/tcabanski/statuses/179875723673223168 Get this email app! http://www.wisestamp.com/apps/twitter?utm_source=extensionutm_medium=emailutm_term=twitterutm_campaign=apps
[jira] [Issue Comment Edited] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228938#comment-13228938 ] Harley Parks edited comment on SOLR-2155 at 3/14/12 5:00 PM: - For some reason package solr2155.lucene.spatial.geometry.shape; is miss named and some other issues with the build.. but I'm trying to use eclipse with a maven build. and might be missing something else... so, downloaded Maven, and jdk 6, setup JAVA_HOME path, added Maven bin to PATH, unzip and cd to Solr2155-1.0.3-project, in cmd window, executed mvn package, and it built nicely... then added Solr2155-1.0.3.jar to the tomcat/solr/lib was (Author: powersparks): For some reason package solr2155.lucene.spatial.geometry.shape; is miss named and some other issues with the build.. but I'm trying to use eclipse with a maven build. and might be missing something else... Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229344#comment-13229344 ] Shai Erera commented on LUCENE-3867: bq. Do you have a need for this method, Shai? I actually started this issue because of this method :). I wrote the method for my own code, then spotted the bug in the ARRAY_HEADER, and on the go thought that it will be good if RUE would offer it for me / other people can benefit from it. Because from my experience, after I put code in Lucene, very smart people improve and optimize it, and I benefit from it in new releases. So while I could keep sizeOf(String) in my own code, I know that Uwe/Robert/Mike/You will make it more efficient when Java 7/8/9 will be out, while I'll totally forget about it ! :). RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3246) UpdateRequestProcessor to extract Solr XML from rich documents
UpdateRequestProcessor to extract Solr XML from rich documents -- Key: SOLR-3246 URL: https://issues.apache.org/jira/browse/SOLR-3246 Project: Solr Issue Type: New Feature Components: update Reporter: Emmanuel Espina Priority: Minor This would be an update request handler to save a file with the xml that represents the document in an external directory. The original idea behind this was to add it to the processing chain of the ExtractingRequestHandler to store an already parsed version of the docs. This storage of pre-parsed documents will make the re indexing of the entire index faster (avoiding the Tika phase, and just sending the xml to the standard update processor). As a side effect, extracting the xml can make debugging of rich docs easier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229363#comment-13229363 ] Dawid Weiss commented on LUCENE-3867: - Yeah... well... I'm flattered :) I'm still -1 for adding this particular method because I don't like being surprised at how a method works and this is surprising behavior to me, especially in this class (even if it's documented in the javadoc, but who reads it anyway, right?). If others don't share my opinion then can we at least rename this method to sizeOfBlah(..) where Blah is something that would indicate it's not actually taking into account char buffer sharing or sub-slicing (suggestions for Blah welcome)? RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228938#comment-13228938 ] Harley Parks edited comment on SOLR-2155 at 3/14/12 5:14 PM: - For some reason package solr2155.lucene.spatial.geometry.shape; is miss named and some other issues with the build.. but I'm trying to use eclipse with a maven build. and might be missing something else... so, downloaded Maven, and jdk 6, setup JAVA_HOME path, added Maven bin to PATH, unzip and cd to Solr2155-1.0.3-project, in cmd window, executed mvn package, and it built nicely... then added Solr2155-1.0.3.jar to the tomcat/solr/lib, followed the readme.txt file instructions to update the solr schema was (Author: powersparks): For some reason package solr2155.lucene.spatial.geometry.shape; is miss named and some other issues with the build.. but I'm trying to use eclipse with a maven build. and might be missing something else... so, downloaded Maven, and jdk 6, setup JAVA_HOME path, added Maven bin to PATH, unzip and cd to Solr2155-1.0.3-project, in cmd window, executed mvn package, and it built nicely... then added Solr2155-1.0.3.jar to the tomcat/solr/lib Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228938#comment-13228938 ] Harley Parks edited comment on SOLR-2155 at 3/14/12 5:16 PM: - For some reason package solr2155.lucene.spatial.geometry.shape; is miss named and some other issues with the build.. but I'm trying to use eclipse with a maven build. and might be missing something else... so, downloaded Maven, and jdk 6, setup JAVA_HOME path, added Maven bin to PATH, unzip and cd to Solr2155-1.0.3-project, in cmd window, executed mvn package, and it built nicely... then added Solr2155-1.0.3.jar to the tomcat/solr/lib, followed the readme.txt file instructions to update the solr schema. so now it is working... and the GeoHash field no longer shows a lat,long but a geohash. was (Author: powersparks): For some reason package solr2155.lucene.spatial.geometry.shape; is miss named and some other issues with the build.. but I'm trying to use eclipse with a maven build. and might be missing something else... so, downloaded Maven, and jdk 6, setup JAVA_HOME path, added Maven bin to PATH, unzip and cd to Solr2155-1.0.3-project, in cmd window, executed mvn package, and it built nicely... then added Solr2155-1.0.3.jar to the tomcat/solr/lib, followed the readme.txt file instructions to update the solr schema Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229400#comment-13229400 ] Mark Miller commented on LUCENE-3867: - estimateSizeOf(..) guessSizeOf(..) wildGuessSizeOf(..) incorrectSizeOf(..) sizeOfWeiss(..) weissSize(..) sizeOfButWithoutTakingIntoAccountCharBufferSharingOrSubSlicingSeeJavaDoc(..) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229410#comment-13229410 ] Michael McCandless commented on LUCENE-3867: {quote} bq. If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]? Same as with other objects – traverse its fields and count them (once, building an identity set for all objects reachable from the root)? {quote} Aha, cool! I hadn't realized RUE can crawl into the private char[] inside string and count up the RAM usage correctly. That's nice. Maybe lowerBoundSizeOf(...)? Or maybe we don't add the new string methods (sizeOf(String), sizeOf(String[])) and somewhere document that you should do new RUE().size(String/String[]) instead...? Hmm or maybe we do add the methods, but implement them under-the-hood w/ that? RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228938#comment-13228938 ] Harley Parks edited comment on SOLR-2155 at 3/14/12 5:36 PM: - For some reason package solr2155.lucene.spatial.geometry.shape; is miss named and some other issues with the build.. but I'm trying to use eclipse with a maven build. and might be missing something else... so, downloaded Maven, and jdk 6, setup JAVA_HOME path, added Maven bin to PATH, unzip and cd to Solr2155-1.0.3-project, in cmd window, executed mvn package, and it built nicely... then added Solr2155-1.0.3.jar to the tomcat/solr/lib, followed the readme.txt file instructions to update the solr schema. so now it is working... and the GeoHash field no longer shows a lat,long but a geohash... is that expected? example: doc float name=score1.0/float arr name=GeoTagGeoHash str87zdk9gyt4kz/str was (Author: powersparks): For some reason package solr2155.lucene.spatial.geometry.shape; is miss named and some other issues with the build.. but I'm trying to use eclipse with a maven build. and might be missing something else... so, downloaded Maven, and jdk 6, setup JAVA_HOME path, added Maven bin to PATH, unzip and cd to Solr2155-1.0.3-project, in cmd window, executed mvn package, and it built nicely... then added Solr2155-1.0.3.jar to the tomcat/solr/lib, followed the readme.txt file instructions to update the solr schema. so now it is working... and the GeoHash field no longer shows a lat,long but a geohash. Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229423#comment-13229423 ] Harley Parks commented on SOLR-2155: did i need to rebuild the index after making changes to the schema? What does a valid url query for geohash look like? Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229440#comment-13229440 ] Dawid Weiss commented on LUCENE-3867: - bq. sizeOfWeiss(..) We're talking some serious dimensions here, beware of buffer overflows! bq. Or maybe we don't add the new string methods (sizeOf(String), sizeOf(String[])) and somewhere document that you should do new RUE().size(String/String[]) instead.. This is something I would go for -- it's consistent with what I would consider this class's logic. I would even change it to sizeOf(Object) -- this would be a static shortcut to just measure an object's size, no strings attached? Kabutz's code also distinguishes interned strings/ cached boxed integers and enums. This could be a switch much like it is now with interned Strings. Then this would really be either an upper (why lower, Mike?) bound or something that would try to be close to the exact memory consumption. A fun way to determine if we're right would be to run a benchmark with -Xmx20mb and test how close we can get to the main memory pool's maximum value before OOM is thrown. :) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3241) Document boost fail if a field copy omit the norms
[ https://issues.apache.org/jira/browse/SOLR-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-3241. --- Resolution: Fixed Assignee: Robert Muir Thanks Tomás! Document boost fail if a field copy omit the norms -- Key: SOLR-3241 URL: https://issues.apache.org/jira/browse/SOLR-3241 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Assignee: Robert Muir Fix For: 3.6, 4.0 Attachments: SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch, SOLR-3241.patch After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible to set a boost to a field that has the omitNorms set to true. This is making Solr's document index-time boost to fail when a field that doesn't omit norms is copied (with copyField) to a field that does omit them and document boost is used. For example: field name=author type=text indexed=true stored=false omitNorms=false/ field name=author_display type=string indexed=true stored=true omitNorms=true/ copyField source=author dest=author_display/ I'm attaching a possible fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229456#comment-13229456 ] Michael McCandless commented on LUCENE-3867: bq. (why lower, Mike?) Oh I just meant the sizeOf(String) impl in the current patch is a lower bound (since it guesses the private char[] length by calling String.length(), which is a lower bound on the actual char[] length). RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229460#comment-13229460 ] Dawid Weiss commented on LUCENE-3867: - John Rose just replied to my question -- there are fields in Unsafe that allow array scaling (1.7). Check these out: {noformat} ARRAY_BOOLEAN_INDEX_SCALE = theUnsafe.arrayIndexScale([Z); ARRAY_BYTE_INDEX_SCALE = theUnsafe.arrayIndexScale([B); ARRAY_SHORT_INDEX_SCALE = theUnsafe.arrayIndexScale([S); ARRAY_CHAR_INDEX_SCALE = theUnsafe.arrayIndexScale([C); ARRAY_INT_INDEX_SCALE = theUnsafe.arrayIndexScale([I); ARRAY_LONG_INDEX_SCALE = theUnsafe.arrayIndexScale([J); ARRAY_FLOAT_INDEX_SCALE = theUnsafe.arrayIndexScale([F); ARRAY_DOUBLE_INDEX_SCALE = theUnsafe.arrayIndexScale([D); ARRAY_OBJECT_INDEX_SCALE = theUnsafe.arrayIndexScale([Ljava/lang/Object;); ADDRESS_SIZE = theUnsafe.addressSize(); {noformat} So... there is a (theoretical?) possibility that, say, byte[] is machine word-aligned :) I bet any RAM estimator written so far will be screwed if this happens :) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3246) UpdateRequestProcessor to extract Solr XML from rich documents
[ https://issues.apache.org/jira/browse/SOLR-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229468#comment-13229468 ] Emmanuel Espina commented on SOLR-3246: --- This is similar to https://issues.apache.org/jira/browse/SOLR-903 But this would be a server side component. UpdateRequestProcessor to extract Solr XML from rich documents -- Key: SOLR-3246 URL: https://issues.apache.org/jira/browse/SOLR-3246 Project: Solr Issue Type: New Feature Components: update Reporter: Emmanuel Espina Priority: Minor This would be an update request handler to save a file with the xml that represents the document in an external directory. The original idea behind this was to add it to the processing chain of the ExtractingRequestHandler to store an already parsed version of the docs. This storage of pre-parsed documents will make the re indexing of the entire index faster (avoiding the Tika phase, and just sending the xml to the standard update processor). As a side effect, extracting the xml can make debugging of rich docs easier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3244) New Admin UI doesn't work on tomcat
[ https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229477#comment-13229477 ] Aliaksandr Zhuhrou commented on SOLR-3244: -- I checked your patch and it works on Tomcat. New Admin UI doesn't work on tomcat --- Key: SOLR-3244 URL: https://issues.apache.org/jira/browse/SOLR-3244 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Reporter: Aliaksandr Zhuhrou Assignee: Uwe Schindler Attachments: SOLR-3244.patch I am currently unable to open admin interface when using war deployment under tomcat server. The stack trace: SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path [/solr] threw exception java.lang.NullPointerException at java.io.File.init(File.java:251) at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Tomcat version: Apache Tomcat/7.0.23 Java version: jdk1.7.0_02 I did some debugging and found that the problem related that it delegates the resolving of resource path to the org.apache.naming.resources.WARDirContext which simply returns null for any input parameters: /** * Return the real path for a given virtual path, if possible; otherwise * return codenull/code. * * @param path The path to the desired resource */ @Override protected String doGetRealPath(String path) { return null; } Need to check specification, because it may be actually the tomcat bug. We may try use the getResourceAsStream(java.lang.String path) method which should work even for war. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3246) UpdateRequestProcessor to extract Solr XML from rich documents
[ https://issues.apache.org/jira/browse/SOLR-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Espina updated SOLR-3246: -- Attachment: SOLR-3246.patch Initial code for this component (with a very simple test) UpdateRequestProcessor to extract Solr XML from rich documents -- Key: SOLR-3246 URL: https://issues.apache.org/jira/browse/SOLR-3246 Project: Solr Issue Type: New Feature Components: update Reporter: Emmanuel Espina Priority: Minor Attachments: SOLR-3246.patch This would be an update request handler to save a file with the xml that represents the document in an external directory. The original idea behind this was to add it to the processing chain of the ExtractingRequestHandler to store an already parsed version of the docs. This storage of pre-parsed documents will make the re indexing of the entire index faster (avoiding the Tika phase, and just sending the xml to the standard update processor). As a side effect, extracting the xml can make debugging of rich docs easier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3247) LBHttpSolrServer constructor ignores passed in ResponseParser
LBHttpSolrServer constructor ignores passed in ResponseParser - Key: SOLR-3247 URL: https://issues.apache.org/jira/browse/SOLR-3247 Project: Solr Issue Type: Bug Reporter: Grant Ingersoll Priority: Minor Fix For: 4.0 The constructor on line 191 accepts a ResponseParser object, but it ignores it. We should either drop that constructor or honor setting it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3248) CloudSolrServer should add methods to make it easier to set the collection on a per request basis
CloudSolrServer should add methods to make it easier to set the collection on a per request basis - Key: SOLR-3248 URL: https://issues.apache.org/jira/browse/SOLR-3248 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 4.0 It would be good if CloudSolrServer would add methods that make it easier for specifying the collection, such as when adding documents. Right now, one has to use the UpdateRequest approach, which is more cumbersome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3244) New Admin UI doesn't work on tomcat
[ https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229575#comment-13229575 ] Uwe Schindler commented on SOLR-3244: - Fine, I will commit this now! New Admin UI doesn't work on tomcat --- Key: SOLR-3244 URL: https://issues.apache.org/jira/browse/SOLR-3244 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Reporter: Aliaksandr Zhuhrou Assignee: Uwe Schindler Attachments: SOLR-3244.patch I am currently unable to open admin interface when using war deployment under tomcat server. The stack trace: SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path [/solr] threw exception java.lang.NullPointerException at java.io.File.init(File.java:251) at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Tomcat version: Apache Tomcat/7.0.23 Java version: jdk1.7.0_02 I did some debugging and found that the problem related that it delegates the resolving of resource path to the org.apache.naming.resources.WARDirContext which simply returns null for any input parameters: /** * Return the real path for a given virtual path, if possible; otherwise * return codenull/code. * * @param path The path to the desired resource */ @Override protected String doGetRealPath(String path) { return null; } Need to check specification, because it may be actually the tomcat bug. We may try use the getResourceAsStream(java.lang.String path) method which should work even for war. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229579#comment-13229579 ] Uwe Schindler commented on LUCENE-3867: --- So the whole Oops MBean magic is obsolete... ADDRESS_SIZE = theUnsafe.addressSize(); woooah, so simple - works on more platforms for guessing! I will check this out with the usual reflection magic :-) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3842) Analyzing Suggester
[ https://issues.apache.org/jira/browse/LUCENE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3842: Attachment: LUCENE-3842.patch updated patch, tying in Mike's patch too. Currently my silly test fails because it trips mike's assert. it starts with a stopword :) Analyzing Suggester --- Key: LUCENE-3842 URL: https://issues.apache.org/jira/browse/LUCENE-3842 Project: Lucene - Java Issue Type: New Feature Components: modules/spellchecker Affects Versions: 3.6, 4.0 Reporter: Robert Muir Attachments: LUCENE-3842-TokenStream_to_Automaton.patch, LUCENE-3842.patch, LUCENE-3842.patch Since we added shortest-path wFSA search in LUCENE-3714, and generified the comparator in LUCENE-3801, I think we should look at implementing suggesters that have more capabilities than just basic prefix matching. In particular I think the most flexible approach is to integrate with Analyzer at both build and query time, such that we build a wFST with: input: analyzed text such as ghost0christmas0past -- byte 0 here is an optional token separator output: surface form such as the ghost of christmas past weight: the weight of the suggestion we make an FST with PairOutputsweight,output, but only do the shortest path operation on the weight side (like the test in LUCENE-3801), at the same time accumulating the output (surface form), which will be the actual suggestion. This allows a lot of flexibility: * Using even standardanalyzer means you can offer suggestions that ignore stopwords, e.g. if you type in ghost of chr..., it will suggest the ghost of christmas past * we can add support for synonyms/wdf/etc at both index and query time (there are tradeoffs here, and this is not implemented!) * this is a basis for more complicated suggesters such as Japanese suggesters, where the analyzed form is in fact the reading, so we would add a TokenFilter that copies ReadingAttribute into term text to support that... * other general things like offering suggestions that are more fuzzy like using a plural stemmer or ignoring accents or whatever. According to my benchmarks, suggestions are still very fast with the prototype (e.g. ~ 100,000 QPS), and the FST size does not explode (its short of twice that of a regular wFST, but this is still far smaller than TST or JaSpell, etc). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3842) Analyzing Suggester
[ https://issues.apache.org/jira/browse/LUCENE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229584#comment-13229584 ] Robert Muir commented on LUCENE-3842: - I also don't think we really need this generic getFiniteStrings. its just to get it off the ground. we can just write the possibilities on the fly i think and it will be simpler... Analyzing Suggester --- Key: LUCENE-3842 URL: https://issues.apache.org/jira/browse/LUCENE-3842 Project: Lucene - Java Issue Type: New Feature Components: modules/spellchecker Affects Versions: 3.6, 4.0 Reporter: Robert Muir Attachments: LUCENE-3842-TokenStream_to_Automaton.patch, LUCENE-3842.patch, LUCENE-3842.patch Since we added shortest-path wFSA search in LUCENE-3714, and generified the comparator in LUCENE-3801, I think we should look at implementing suggesters that have more capabilities than just basic prefix matching. In particular I think the most flexible approach is to integrate with Analyzer at both build and query time, such that we build a wFST with: input: analyzed text such as ghost0christmas0past -- byte 0 here is an optional token separator output: surface form such as the ghost of christmas past weight: the weight of the suggestion we make an FST with PairOutputsweight,output, but only do the shortest path operation on the weight side (like the test in LUCENE-3801), at the same time accumulating the output (surface form), which will be the actual suggestion. This allows a lot of flexibility: * Using even standardanalyzer means you can offer suggestions that ignore stopwords, e.g. if you type in ghost of chr..., it will suggest the ghost of christmas past * we can add support for synonyms/wdf/etc at both index and query time (there are tradeoffs here, and this is not implemented!) * this is a basis for more complicated suggesters such as Japanese suggesters, where the analyzed form is in fact the reading, so we would add a TokenFilter that copies ReadingAttribute into term text to support that... * other general things like offering suggestions that are more fuzzy like using a plural stemmer or ignoring accents or whatever. According to my benchmarks, suggestions are still very fast with the prototype (e.g. ~ 100,000 QPS), and the FST size does not explode (its short of twice that of a regular wFST, but this is still far smaller than TST or JaSpell, etc). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3244) New Admin UI doesn't work on tomcat
[ https://issues.apache.org/jira/browse/SOLR-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved SOLR-3244. - Resolution: Fixed Fix Version/s: 4.0 Committed trunk revision: 1300710 Thanks Aliaksandr! New Admin UI doesn't work on tomcat --- Key: SOLR-3244 URL: https://issues.apache.org/jira/browse/SOLR-3244 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Reporter: Aliaksandr Zhuhrou Assignee: Uwe Schindler Fix For: 4.0 Attachments: SOLR-3244.patch I am currently unable to open admin interface when using war deployment under tomcat server. The stack trace: SEVERE: Servlet.service() for servlet [LoadAdminUI] in context with path [/solr] threw exception java.lang.NullPointerException at java.io.File.init(File.java:251) at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:50) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:292) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1815) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Tomcat version: Apache Tomcat/7.0.23 Java version: jdk1.7.0_02 I did some debugging and found that the problem related that it delegates the resolving of resource path to the org.apache.naming.resources.WARDirContext which simply returns null for any input parameters: /** * Return the real path for a given virtual path, if possible; otherwise * return codenull/code. * * @param path The path to the desired resource */ @Override protected String doGetRealPath(String path) { return null; } Need to check specification, because it may be actually the tomcat bug. We may try use the getResourceAsStream(java.lang.String path) method which should work even for war. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org