[jira] [Updated] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager
[ https://issues.apache.org/jira/browse/SOLR-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ricardo Merizalde updated SOLR-5215: Component/s: SolrCloud > Deadlock in Solr Cloud ConnectionManager > > > Key: SOLR-5215 > URL: https://issues.apache.org/jira/browse/SOLR-5215 > Project: Solr > Issue Type: Bug > Components: clients - java, SolrCloud >Affects Versions: 4.2.1 > Environment: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 > x86_64 x86_64 x86_64 GNU/Linux > java version "1.6.0_18" > Java(TM) SE Runtime Environment (build 1.6.0_18-b07) > Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode) >Reporter: Ricardo Merizalde > > We are constantly seeing a deadlocks in our production application servers. > The problem seems to be that a thread A: > - tries to process an event and acquires the ConnectionManager lock > - the update callback acquires connectionUpdateLock and invokes > waitForConnected > - waitForConnected tries to acquire the ConnectionManager lock (which already > has) > - waitForConnected calls wait and release the ConnectionManager lock (but > still has the connectionUpdateLock) > The a thread B: > - tries to process an event and acquires the ConnectionManager lock > - the update call back tries to acquire connectionUpdateLock but gets blocked > holding the ConnectionManager lock and preventing thread A from getting out > of the wait state. > > Here is part of the thread dump: > "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x59965800 > nid=0x3e81 waiting for monitor entry [0x57169000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71) > - waiting to lock <0x2aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x5ad4 > nid=0x3e67 waiting for monitor entry [0x4dbd4000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98) > - waiting to lock <0x2aab1b0e0f78> (a java.lang.Object) > at > org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) > - locked <0x2aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x2aac4c2f7000 > nid=0x3d9a waiting for monitor entry [0x42821000] >java.lang.Thread.State: BLOCKED (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x2aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165) > - locked <0x2aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98) > - locked <0x2aab1b0e0f78> (a java.lang.Object) > at > org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) > - locked <0x2aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > Found one Java-level deadlock: > = > "http-0.0.0.0-8080-82-EventThread": > waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a > org.apache.solr.common.cloud.ConnectionManager), > which is held by "http-0.0.0.0-8080-82-EventThread" > "http-0.0.0.0-8080-82-EventThread": > waiting to lock monitor 0x2aac4c314978 (object 0x2aab1b0e0f78, a > java.lang.Object), > which is held by "http-0.0.0.0-8080-82-EventThread" > "http-0.0.0.0-8080-82-EventThread": > waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a > org.apache.solr.common.cloud.ConnectionManager), > which is held by "http-0.0.0.0-8080-82-EventThread"
Re: I believe in a project using Lucene
On 5 September 2013 06:53, Alberto Marques wrote: > Hello > My question is simple, I believe in a project using Lucene. To be able to > index a website. As > http://boc.cantabria.es/boces/boletines.do?boton=UltimoBOCPublicado, seeking > information on pdf files. Is it possible? Yes, it is eminently possible. I would suggest using Solr instead of Lucene directly. You should be able to get started by searching Google on the topic, or looking at the Solr Wiki, e.g., http://wiki.apache.org/solr/ExtractingRequestHandler If you need further help, such a question is better addressed to the solr-user mailing list rather than this one, which is meant for discussions related to development. Regards, Gora - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
I believe in a project using Lucene
Hello My question is simple, I believe in a project using Lucene. To be able to index a website. As http://boc.cantabria.es/boces/boletines.do?boton=UltimoBOCPublicado, seeking information on pdf files. Is it possible?
[jira] [Commented] (SOLR-4277) Spellchecker sometimes falsely reports a spelling error and correction
[ https://issues.apache.org/jira/browse/SOLR-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758364#comment-13758364 ] scott hobson commented on SOLR-4277: Hi, I am having this same issue. The "correctlySpelled" flag is always false. I understand that it should still be giving suggestions for the "did you mean..." searches, but shouldn't the correctlySpelled flag at least be accurate? It could easily say true and still give you suggested words, and that would be even better because you can differentiate between a suggestion and a correction. Right now you cannot, unless I'm missing something... Thanks, Scott > Spellchecker sometimes falsely reports a spelling error and correction > -- > > Key: SOLR-4277 > URL: https://issues.apache.org/jira/browse/SOLR-4277 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 4.0 >Reporter: Jack Krupansky > > In some cases, the Solr spell checker improperly reports query terms as being > misspelled. > Using the Solr example for 4.0, I added these mini documents: > {code} > curl http://localhost:8983/solr/update?commit=true -H > 'Content-type:application/csv' -d ' > id,name > spel-1,aardvark abacus ball bill cat cello > spel-2,abate accord band bell cattle check > spel-3,adorn border clean clock' > {code} > I then issued this request: > {code} > curl "http://localhost:8983/solr/spell/?q=check&indent=true"; > {code} > The spell checker falsely concluded that "check" was misspelled and > improperly corrected it to "clock": > {code} > > > > 1 > 0 > 5 > 1 > > > clock > 1 > > > > false > > clock > 1 > > clock > > > > > {code} > And if I query for "clock", it gets corrected to "check"! > {code} > curl "http://localhost:8983/solr/spell/?q=clock&indent=true"; > {code} > {code} > > > 1 > 0 > 5 > 1 > > > check > 1 > > > > false > > check > 1 > > check > > > > {code} > Note: This appears to be only because "clock" is so close to "check". With > other terms I don't see the problem: > {code} > curl "http://localhost:8983/solr/spell/?q=cattle+abate+check&indent=true"; > {code} > {code} > > > 1 > 13 > 18 > 1 > > > clock > 1 > > > > false > > cattle abate clock > 2 > > cattle > abate > clock > > > > {code} > Although, it inappropriately lists "cattle" and "abate" in the "misspellings" > section even though no suggestions were offered. > Finally, I can workaround this issue by removing the following line from > solrconfig.xml: > {code} > 5 > {code} > Which responds to the previous request with: > {code} > > false > > {code} > Which makes the original problem go away. Although, it does beg the question > as to why my 100% correct query is still tagged as "correctlySpelled" = > "false", but that's a separate Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758328#comment-13758328 ] Shai Erera commented on LUCENE-5189: Correct, that's a problem that Rob identified few days ago and it can be solved if we gen FieldInfos, because ReaderAndLiveDocs will detect that case and add a new FieldInfo, as well as create a new gen for this segment's FIS.I have two tests in TestNumericDVUpdates which currently test that this is not supported -- once we gen FIS, we'll change them to assert it is supported. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758362#comment-13758362 ] Erick Erickson commented on SOLR-2548: -- Still have a test error in TestDistributedGrouping, no clue why and can't look right now. It's certainly a result of the changes in UnInvertField since if I put that in a clean trunk the same problem occurs. My guess is that I can't synchronize on cache for some reason, but not much in the way of evidence for that right now. > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758311#comment-13758311 ] Shai Erera commented on LUCENE-5189: I think global FIS is an interesting idea, but per-segment FIS.gen is a lower hanging fruit. I did it once and it was quite straightforward (maybe someone will have reservations on how I did it though): * SIS tracks fieldInfosGen (in this patch, rename all dvGen in SIS to fisGen) * FI tracks dvGen * A new FIS45Format reads/writes each FI's dvGen * ReaderAndLiveDocs writes a new FIS gen, containing the entire FIS, so SR only reads the latest gen to load FIS I think we should explore global FIS separately, because it brings its own issues, e.g. do we keep FISFormat or nuke it? Who invokes it (probably SIS)? It's also quite orthogonal to that issue, or at least, we can proceed with it and improve FIS gen'ing later with global FIS. As for SI.attributes(), I think we can move them under SIS. We should open an issue to do that. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2548: - Attachment: SOLR-2548.patch OK, maybe this time. 1> put back the passing in base. 2> took out the sleep. 3> changed how exceptions are propagated up past the new threads which fixed another test that this code broke. 4> Added a non-deterministic test that forces parallel uninverting of the fields to make sure we exercise the synchronize/notify code. This test can't _guarantee_ to execute that code every time, but it did manage with some printlns. Running tests again, precommit all that. Won't check in until at least tomorrow. And thank heaven for "local history" in IntelliJ ;) > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758325#comment-13758325 ] Yonik Seeley commented on LUCENE-5189: -- The problem that Mike highlights "some segments might be missing the field entirely so you cannot update those", is pretty bad though. Things work differently (i.e. your update may fail) depending on exactly how segment flushes and merges are done. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758314#comment-13758314 ] Yonik Seeley commented on SOLR-2548: One issue with a static "pending" set on UnInvertedField is that it will block different cores trying to un-invert the same field. This should probably be implemented the same way the FieldCache does it (insertion of a placeholder). > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge
[ https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758323#comment-13758323 ] Christine Poerschke commented on SOLR-5214: --- Hello. Here's one of the stack traces. And in case it's useful context, during the shard split indexing into the cloud had been stopped but periodic admin/luke and admin/mbeans cat=CACHE stats requests were happening. {noformat} 2013-09-03 07:27:51,947 ERROR [qtp1533478516-49] o.a.s.s.SolrDispatchFilter [SolrException.java:119] null:java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding.decode(StringCoding.java:215) at java.lang.String.(String.java:453) at java.lang.String.(String.java:505) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:154) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:272) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:133) at org.apache.lucene.index.FilterAtomicReader.document(FilterAtomicReader.java:365) at org.apache.lucene.index.IndexReader.document(IndexReader.java:436) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:332) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:298) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:86) at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2448) at org.apache.solr.update.SolrIndexSplitter.split(SolrIndexSplitter.java:118) at org.apache.solr.update.DirectUpdateHandler2.split(DirectUpdateHandler2.java:749) at org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:282) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:185) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:206) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) {noformat} > collections?action=SPLITSHARD running out of heap space due to merge > > > Key: SOLR-5214 > URL: https://issues.apache.org/jira/browse/SOLR-5214 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 4.3 >Reporter: Christine Poerschke >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-5214.patch > > > The problem we saw was that splitting a shard with many segments and documents > failed by running out of heap space. > Increasing heap space so that all existing segments could be merged into one > overall segment does not seem practical. Running the split without segment > merging worked. > Could split always run without merging, or merge=true/false be an optional > parameter for the SPLITSHARD action? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758271#comment-13758271 ] Yonik Seeley commented on SOLR-2548: bq. It appears to be useless complexity, perhaps a remnant from the original patch against 3.1. I took them out. Actually, I see now (and it's absolutely needed ;-) The base docset can change from one facet request to another (think excludes), hence if we go multi-threaded, we can't reference "SimpleFacets.docs" in anything that could be executed from a separate thread. > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5142) Block Indexing / Join Improvements
[ https://issues.apache.org/jira/browse/SOLR-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758286#comment-13758286 ] Mikhail Khludnev commented on SOLR-5142: bq. they are needed for things like distributed search. I don't think children participate in distributed search. Everything is handled on parents level. I suppose uniqueKey field should span whole block, instead of \_root_. > Block Indexing / Join Improvements > -- > > Key: SOLR-5142 > URL: https://issues.apache.org/jira/browse/SOLR-5142 > Project: Solr > Issue Type: Improvement >Reporter: Yonik Seeley > Fix For: 4.5, 5.0 > > > Follow-on main issue for general block indexing / join improvements -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758261#comment-13758261 ] Erick Erickson commented on SOLR-2548: -- bq: Was there a bug that these changes fixed? Nope, I thought it was a refactoring and didn't look closely. It appears to be useless complexity, perhaps a remnant from the original patch against 3.1. I took them out. bq: please let's not do that for multi-threaded code. I can always count on you to call me on sleeping, don't know why I even try to put a sleep in any more :). OK, took it out and substituted a notifyAll. And added a test that gets into this code while actually doing the inverting rather than just pulls stuff from the cache. I'll attach a new patch in a few. > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated LUCENE-5197: - Attachment: LUCENE-5197.patch Took into account termsCache in SimpleTextFieldReader as discussed with Michael. > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch, LUCENE-5197.patch, > LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758257#comment-13758257 ] Michael McCandless commented on LUCENE-5189: Actually, that would also solve the other problems as well? Ie, the global FieldInfos would be gen'd: on commit we'd write a new FIS file, which all segments in that commit point would use. Any attribute changes to a FieldInfo would be saved, even on update; new fields could be created via update; any segments that have no documents with the field won't be an issue. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758251#comment-13758251 ] Michael McCandless commented on LUCENE-5189: One option, to solve the "some segments might be missing the field entirely so you cannot update those" would be to have the FieldInfos accumulate across segments, i.e. a more global FieldInfos, maybe written to a separate global file (not per segment). This way, if any doc in any segment has added the field, then the global FieldInfos would contain it. Not saying this is an appealing option (there are tons of tradeoffs), but I think it would address that limitation. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758234#comment-13758234 ] Michael McCandless commented on LUCENE-5197: bq. But only under the assumption that SimpleTextTerms implementation will be used for the SimpleTextFieldsReader (it uses the abstract Terms class in the termsCache). comments? I think it's fine to change its termsCache to be SimpleTextTerms. Thanks! > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch, LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758233#comment-13758233 ] Michael McCandless commented on LUCENE-5189: bq. Frankly I am tired of hearing this phrase being used in this way Actually, I think this is a fair use of "progress not perfection". Either that or I don't understand what you're calling "broken APIs" in the current patch. As I understand it, what's "broken" here is that you cannot set the attributes in SegmentInfo nor FieldInfo from your DocValuesFormat writer when it's an update being written: the changes won't be saved. So, I proposed that we document this as a limitation of the SI/FI attributes API: when writing updates, any changes will be lost. For "normal" segment flushes, they work correctly. It'd be a documented limitation, and we can later fix it. I think this situation is very similar to LUCENE-5197, which I would also call "progress not perfection": we are adding a new API (SegmentReader.ramBytesUsed), with an initial implementation that we think might be improved by later cutting over to RamUsageEstimator. But I think we should commit the initial approach (it's useful, it should work well) and later improve the implementation. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager
Ricardo Merizalde created SOLR-5215: --- Summary: Deadlock in Solr Cloud ConnectionManager Key: SOLR-5215 URL: https://issues.apache.org/jira/browse/SOLR-5215 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 4.2.1 Environment: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux java version "1.6.0_18" Java(TM) SE Runtime Environment (build 1.6.0_18-b07) Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode) Reporter: Ricardo Merizalde We are constantly seeing a deadlock in our production application servers. The problem seems to be that a thread A: - try to process an event and acquires the ConnectionManager lock - the update callback acquires connectionUpdateLock and invokes waitForConnected - waitForConnected tries to acquire the ConnectionManager lock (which already has) - waitForConnected calls wait releasing the ConnectionManager lock (but still has the connectionUpdateLock) The thread B: - tries to process an event and acquires the ConnectionManager lock - the update call back tries to acquire connectionUpdateLock but gets blocked holding the ConnectionManager lock and preventing thread A from getting out of the wait state. Here is part of the thread dump: "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x59965800 nid=0x3e81 waiting for monitor entry [0x57169000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71) - waiting to lock <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x5ad4 nid=0x3e67 waiting for monitor entry [0x4dbd4000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98) - waiting to lock <0x2aab1b0e0f78> (a java.lang.Object) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) - locked <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x2aac4c2f7000 nid=0x3d9a waiting for monitor entry [0x42821000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165) - locked <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98) - locked <0x2aab1b0e0f78> (a java.lang.Object) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) - locked <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Found one Java-level deadlock: = "http-0.0.0.0-8080-82-EventThread": waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a org.apache.solr.common.cloud.ConnectionManager), which is held by "http-0.0.0.0-8080-82-EventThread" "http-0.0.0.0-8080-82-EventThread": waiting to lock monitor 0x2aac4c314978 (object 0x2aab1b0e0f78, a java.lang.Object), which is held by "http-0.0.0.0-8080-82-EventThread" "http-0.0.0.0-8080-82-EventThread": waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a org.apache.solr.common.cloud.ConnectionManager), which is held by "http-0.0.0.0-8080-82-EventThread" Java stack information for the threads listed above: === "http-0.0.0.0-8080-82-EventThread": at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71) - waiting to lock <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.zookeeper.Clien
[jira] [Updated] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager
[ https://issues.apache.org/jira/browse/SOLR-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ricardo Merizalde updated SOLR-5215: Description: We are constantly seeing a deadlocks in our production application servers. The problem seems to be that a thread A: - tries to process an event and acquires the ConnectionManager lock - the update callback acquires connectionUpdateLock and invokes waitForConnected - waitForConnected tries to acquire the ConnectionManager lock (which already has) - waitForConnected calls wait and release the ConnectionManager lock (but still has the connectionUpdateLock) The a thread B: - tries to process an event and acquires the ConnectionManager lock - the update call back tries to acquire connectionUpdateLock but gets blocked holding the ConnectionManager lock and preventing thread A from getting out of the wait state. Here is part of the thread dump: "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x59965800 nid=0x3e81 waiting for monitor entry [0x57169000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71) - waiting to lock <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x5ad4 nid=0x3e67 waiting for monitor entry [0x4dbd4000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98) - waiting to lock <0x2aab1b0e0f78> (a java.lang.Object) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) - locked <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x2aac4c2f7000 nid=0x3d9a waiting for monitor entry [0x42821000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165) - locked <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98) - locked <0x2aab1b0e0f78> (a java.lang.Object) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) - locked <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Found one Java-level deadlock: = "http-0.0.0.0-8080-82-EventThread": waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a org.apache.solr.common.cloud.ConnectionManager), which is held by "http-0.0.0.0-8080-82-EventThread" "http-0.0.0.0-8080-82-EventThread": waiting to lock monitor 0x2aac4c314978 (object 0x2aab1b0e0f78, a java.lang.Object), which is held by "http-0.0.0.0-8080-82-EventThread" "http-0.0.0.0-8080-82-EventThread": waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a org.apache.solr.common.cloud.ConnectionManager), which is held by "http-0.0.0.0-8080-82-EventThread" Java stack information for the threads listed above: === "http-0.0.0.0-8080-82-EventThread": at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71) - waiting to lock <0x2aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) "http-0.0.0.0-8080-82-EventThread": at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98) - waiting to lock <0x2aab1b0e0f78> (a java.lang.Object) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStra
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758164#comment-13758164 ] Yonik Seeley commented on SOLR-2548: bq. I used a simple wait/sleep loop here Ugh - please let's not do that for multi-threaded code. Also, I see some stuff like this in the patch: {code} - counts = getGroupedCounts(searcher, docs, field, multiToken, offset,limit, mincount, missing, sort, prefix); + counts = getGroupedCounts(searcher, base, field, multiToken, offset,limit, mincount, missing, sort, prefix); {code} Was there a bug that these changes fixed? > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge
[ https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758153#comment-13758153 ] Shalin Shekhar Mangar commented on SOLR-5214: - Thanks Christine. Do you have the OutOfMemoryError stack trace? > collections?action=SPLITSHARD running out of heap space due to merge > > > Key: SOLR-5214 > URL: https://issues.apache.org/jira/browse/SOLR-5214 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 4.3 >Reporter: Christine Poerschke >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-5214.patch > > > The problem we saw was that splitting a shard with many segments and documents > failed by running out of heap space. > Increasing heap space so that all existing segments could be merged into one > overall segment does not seem practical. Running the split without segment > merging worked. > Could split always run without merging, or merge=true/false be an optional > parameter for the SPLITSHARD action? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5213) collections?action=SPLITSHARD parent vs. sub-shards numDocs
[ https://issues.apache.org/jira/browse/SOLR-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-5213: --- Assignee: Shalin Shekhar Mangar > collections?action=SPLITSHARD parent vs. sub-shards numDocs > --- > > Key: SOLR-5213 > URL: https://issues.apache.org/jira/browse/SOLR-5213 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 4.4 >Reporter: Christine Poerschke >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-5213.patch > > > The problem we saw was that splitting a shard took a long time and at the end > of it the sub-shards contained fewer documents than the original shard. > The root cause was eventually tracked down to the disappearing documents not > falling into the hash ranges of the sub-shards. > Could SolrIndexSplitter split report per-segment numDocs for parent and > sub-shards, with at least a warning logged for any discrepancies (documents > falling into none of the sub-shards or documents falling into several > sub-shards)? > Additionally, could a case be made for erroring out when discrepancies are > detected i.e. not proceeding with the shard split? Either to always error or > to have an verifyNumDocs=false/true optional parameter for the SPLITSHARD > action. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge
[ https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-5214: --- Assignee: Shalin Shekhar Mangar > collections?action=SPLITSHARD running out of heap space due to merge > > > Key: SOLR-5214 > URL: https://issues.apache.org/jira/browse/SOLR-5214 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 4.3 >Reporter: Christine Poerschke >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-5214.patch > > > The problem we saw was that splitting a shard with many segments and documents > failed by running out of heap space. > Increasing heap space so that all existing segments could be merged into one > overall segment does not seem practical. Running the split without segment > merging worked. > Could split always run without merging, or merge=true/false be an optional > parameter for the SPLITSHARD action? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2548: - Attachment: SOLR-2548.patch Hmm, the whole recording-thread-info is a little more ambitious than I want to be right now. For the nonce, I did some "by hand" debugging, added in a couple of (temporary) print message in the getUnInvertedField code and insured that when it's called it only executes once per field, so I think I'll call that good now. I did play around with the directExcecutor and now I get to add another bit of knowledge, that it's really kind of cool that it allows one to have code like this. No matter how many times you submit a job, it all just executes in the current thread. Arcane, but kind of cool. As for the rest, I've added at least functional tests and one test that the caching code is working that's non-deterministic but might trip bad conditions at least some of the time. So unless people object I'll be committing this probably tomorrow. It passes precommit and at least all the tests in TestFaceting, I'll be running the full suite in a minute. > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758082#comment-13758082 ] Areek Zillur commented on LUCENE-5197: -- [~mikemccand] I can add a ramBytesUsed method to the SimpleTextTerms class to account for it. But only under the assumption that SimpleTextTerms implementation will be used for the SimpleTextFieldsReader (it uses the abstract Terms class in the termsCache). comments? > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch, LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5201) Compression issue on highly compressible inputs with LZ4.compressHC
[ https://issues.apache.org/jira/browse/LUCENE-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-5201: - Attachment: LUCENE-5201.patch This bugs needed two conditions to appear: - the input needs to be highly compressible so that there are collisions in the chain table used for finding references backwards in the stream, - the start offset needs to be > 0. CompressingStoredFieldFormat only calls LZ4.compress(HC) with positive start offsets since LUCENE-5188 so this shouldn't have impact on people who were using CompressionMode.FAST_DECOMPRESSION (which seems to be confirmed by the fact that we never saw any test failure related to this until today, only a few minutes after I committed LUCENE-5188). I was able to write a test case that reproduces the bug and changed the existing tests so that they don't only test compression with a start offset of 0. > Compression issue on highly compressible inputs with LZ4.compressHC > --- > > Key: LUCENE-5201 > URL: https://issues.apache.org/jira/browse/LUCENE-5201 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Adrien Grand > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5201.patch > > > LZ4.compressHC sometimes fails at compressing highly compressible inputs when > the start offset is > 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5200) HighFreqTerms has confusing behavior with -t option
[ https://issues.apache.org/jira/browse/LUCENE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5200: Attachment: LUCENE-5200.patch > HighFreqTerms has confusing behavior with -t option > --- > > Key: LUCENE-5200 > URL: https://issues.apache.org/jira/browse/LUCENE-5200 > Project: Lucene - Core > Issue Type: Bug > Components: modules/other >Reporter: Robert Muir > Attachments: LUCENE-5200.patch > > > {code} > * HighFreqTerms class extracts the top n most frequent terms > * (by document frequency) from an existing Lucene index and reports their > * document frequency. > * > * If the -t flag is given, both document frequency and total tf (total > * number of occurrences) are reported, ordered by descending total tf. > {code} > Problem #1: > Its tricky what happens with -t: if you ask for the top-100 terms, it > requests the top-100 terms (by docFreq), then resorts the top-N by > totalTermFreq. > So its not really the top 100 most frequently occurring terms. > Problem #2: > Using the -t option can be confusing and slow: the reported docFreq includes > deletions, but totalTermFreq does not (it actually walks postings lists if > there is even one deletion). > I think this is a relic from 3.x days when lucene did not support this > statistic. I think we should just always output both TermsEnum.docFreq() and > TermsEnum.totalTermFreq(), and -t just determines the comparator of the PQ. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5142) Block Indexing / Join Improvements
[ https://issues.apache.org/jira/browse/SOLR-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758043#comment-13758043 ] Yonik Seeley commented on SOLR-5142: bq. Don't you feel unique key should be optional for children documents? unique keys are for more than just implementing overwriting though - they are needed for things like distributed search. > Block Indexing / Join Improvements > -- > > Key: SOLR-5142 > URL: https://issues.apache.org/jira/browse/SOLR-5142 > Project: Solr > Issue Type: Improvement >Reporter: Yonik Seeley > Fix For: 4.5, 5.0 > > > Follow-on main issue for general block indexing / join improvements -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5210) amend example's schema.xml and solrconfig.xml for blockjoin support
[ https://issues.apache.org/jira/browse/SOLR-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-5210. Resolution: Fixed > amend example's schema.xml and solrconfig.xml for blockjoin support > --- > > Key: SOLR-5210 > URL: https://issues.apache.org/jira/browse/SOLR-5210 > Project: Solr > Issue Type: Sub-task >Reporter: Mikhail Khludnev >Assignee: Yonik Seeley > Fix For: 4.5, 5.0 > > > I suppose it make sense to apply > https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig.xml?r1=1513290&r2=1513289&pathrev=1513290 > and > https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/schema.xml?r1=1513290&r2=1513289&pathrev=1513290 > to example's config too provide out-of-the-box block join experience. > WDYT? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated LUCENE-5197: - Attachment: LUCENE-5197.patch Changed getSizeInbytes to ramBytesUsed as Robert suggested > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch, LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5210) amend example's schema.xml and solrconfig.xml for blockjoin support
[ https://issues.apache.org/jira/browse/SOLR-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758036#comment-13758036 ] ASF subversion and git services commented on SOLR-5210: --- Commit 1520082 from [~yo...@apache.org] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1520082 ] SOLR-5210: add block join support to example > amend example's schema.xml and solrconfig.xml for blockjoin support > --- > > Key: SOLR-5210 > URL: https://issues.apache.org/jira/browse/SOLR-5210 > Project: Solr > Issue Type: Sub-task >Reporter: Mikhail Khludnev >Assignee: Yonik Seeley > Fix For: 4.5, 5.0 > > > I suppose it make sense to apply > https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig.xml?r1=1513290&r2=1513289&pathrev=1513290 > and > https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/schema.xml?r1=1513290&r2=1513289&pathrev=1513290 > to example's config too provide out-of-the-box block join experience. > WDYT? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5210) amend example's schema.xml and solrconfig.xml for blockjoin support
[ https://issues.apache.org/jira/browse/SOLR-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758033#comment-13758033 ] ASF subversion and git services commented on SOLR-5210: --- Commit 1520081 from [~yo...@apache.org] in branch 'dev/trunk' [ https://svn.apache.org/r1520081 ] SOLR-5210: add block join support to example > amend example's schema.xml and solrconfig.xml for blockjoin support > --- > > Key: SOLR-5210 > URL: https://issues.apache.org/jira/browse/SOLR-5210 > Project: Solr > Issue Type: Sub-task >Reporter: Mikhail Khludnev >Assignee: Yonik Seeley > Fix For: 4.5, 5.0 > > > I suppose it make sense to apply > https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig.xml?r1=1513290&r2=1513289&pathrev=1513290 > and > https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/schema.xml?r1=1513290&r2=1513289&pathrev=1513290 > to example's config too provide out-of-the-box block join experience. > WDYT? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757931#comment-13757931 ] Hoss Man commented on SOLR-2548: bq. Still checking on the implications of stacking up a bunch of directExecutors all through the CompletionService, not something I've used recently and the details are hazy. unless i'm missing something, it should be a single directExecutor, and when a job is submitted to the CompletionService, nothing happens in the background at all -- the thread that submitted the job then immediately executes the job. Telling the COpmletionService to use the directExecutor is essentially a way of saying "when someone asks you to do execute X, make them do it themselves" bq. Is there a decent way to check whether more than one thread was actually spawned? I doubt it ... but it would be nice to at least know the functionality succeeds w/o failure. There might be a way to subclass & instrument the ThreadPoolExecutor (or the Queue it uses to manage jobs) so that you could make it keep track of the max number of live threads at any one time, or the max size of the queue at any one time, and then your test could reach in and inspect either of those values to know if the _wrong_ thing happened (ie: too many threads spun up, or too many things enqueued w/o being handed to threads) ... but i'm not sure how hard that would be. Acctually -- maybe a better thing to do would be to have the Callables record the thread id of whatever thread executed them, and include that in the debug info ... then the test could just confirm that all of the ids match and don't start with "facetExecutor-" in the directExecutor case, and that the number of unique ids seen is not greater then N in the facet.threads=N case. (That debug info could theoretically be useful to end users as well, to see that multiple threads really are getting used) > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch, SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5201) Compression issue on highly compressible inputs with LZ4.compressHC
[ https://issues.apache.org/jira/browse/LUCENE-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757930#comment-13757930 ] Adrien Grand commented on LUCENE-5201: -- A fix is already committed but I opened this issue on the suggestion of Uwe so that it has an entry in the changelog. > Compression issue on highly compressible inputs with LZ4.compressHC > --- > > Key: LUCENE-5201 > URL: https://issues.apache.org/jira/browse/LUCENE-5201 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Adrien Grand > Fix For: 5.0, 4.5 > > > LZ4.compressHC sometimes fails at compressing highly compressible inputs when > the start offset is > 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5201) Compression issue on highly compressible inputs with LZ4.compressHC
Adrien Grand created LUCENE-5201: Summary: Compression issue on highly compressible inputs with LZ4.compressHC Key: LUCENE-5201 URL: https://issues.apache.org/jira/browse/LUCENE-5201 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 5.0, 4.5 LZ4.compressHC sometimes fails at compressing highly compressible inputs when the start offset is > 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 57577 - Failure!
Good ideas. Uwe also suggested to open an issue so that this bug fix is in the changelog. I will do it soon... -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757921#comment-13757921 ] Shai Erera commented on LUCENE-5189: bq. It does not solve problem #2 (SegmentInfos.attributes) Correct. So this API is broken today for LiveDocsFormat (since it's the only updateable thing), but field updates only broaden the broken-ness into other formats (now only DVF, but in the future others too). Correct? I think that moving this API into the commit is not an overkill. I remember Mike and I once discussed if we can use that API to save per-segment facets "schema details". I don't remember how this ended, but maybe we shouldn't remove it? Alternatively, we could gen SIFormat too ... that may be an overkill though. Recording per-segment StringStringMap in SIS seems simple enough. Regarding FIS.gen, I honestly thought to keep it simple by writing all FIS entirely in each gen and not complicate the code by writing parts of an FI in different gens and merging them by SR. This is what I plan to do in this issue. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4909) Solr and IndexReader Re-opening on Replication Slave
[ https://issues.apache.org/jira/browse/SOLR-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757924#comment-13757924 ] Robert Muir commented on SOLR-4909: --- Thanks Michael: at a glance the patch looks good to me. I wonder if we can improve the test: I'm a bit concerned with random merge policies that it might sporatically fail. Maybe we can change the test to use LogDocMergePolicy in its configuration and explicitly assert the segment structure. I'll take a closer look as soon as I have a chance: its not your fault, the code around here is just a bit scary. > Solr and IndexReader Re-opening on Replication Slave > > > Key: SOLR-4909 > URL: https://issues.apache.org/jira/browse/SOLR-4909 > Project: Solr > Issue Type: Improvement > Components: replication (java), search >Affects Versions: 4.3 >Reporter: Michael Garski > Fix For: 4.5, 5.0 > > Attachments: SOLR-4909_confirm_keys.patch, SOLR-4909-demo.patch, > SOLR-4909_fix.patch, SOLR-4909.patch, SOLR-4909_v2.patch, SOLR-4909_v3.patch > > > I've been experimenting with caching filter data per segment in Solr using a > CachingWrapperFilter & FilteredQuery within a custom query parser (as > suggested by [~yo...@apache.org] in SOLR-3763) and encountered situations > where the value of getCoreCacheKey() on the AtomicReader for each segment can > change for a given segment on disk when the searcher is reopened. As > CachingWrapperFilter uses the value of the segment's getCoreCacheKey() as the > key in the cache, there are situations where the data cached on that segment > is not reused when the segment on disk is still part of the index. This > affects the Lucene field cache and field value caches as well as they are > cached per segment. > When Solr first starts it opens the searcher's underlying DirectoryReader in > StandardIndexReaderFactory.newReader by calling > DirectoryReader.open(indexDir, termInfosIndexDivisor), and the reader is > subsequently reopened in SolrCore.openNewSearcher by calling > DirectoryReader.openIfChanged(currentReader, writer.get(), true). The act of > reopening the reader with the writer when it was first opened without a > writer results in the value of getCoreCacheKey() changing on each of the > segments even though some of the segments have not changed. Depending on the > role of the Solr server, this has different effects: > * On a SolrCloud node or free-standing index and search server the segment > cache is invalidated during the first DirectoryReader reopen - subsequent > reopens use the same IndexWriter instance and as such the value of > getCoreCacheKey() on each segment does not change so the cache is retained. > * For a master-slave replication set up the segment cache invalidation occurs > on the slave during every replication as the index is reopened using a new > IndexWriter instance which results in the value of getCoreCacheKey() changing > on each segment when the DirectoryReader is reopened using a different > IndexWriter instance. > I can think of a few approaches to alter the re-opening behavior to allow > reuse of segment level caches in both cases, and I'd like to get some input > on other ideas before digging in: > * To change the cloud node/standalone first commit issue it might be possible > to create the UpdateHandler and IndexWriter before the DirectoryReader, and > use the writer to open the reader. There is a comment in the SolrCore > constructor by [~yo...@apache.org] that the searcher should be opened before > the update handler so that may not be an acceptable approach. > * To change the behavior of a slave in a replication set up, one solution > would be to not open a writer from the SnapPuller when the new index is > retrieved if the core is enabled as a slave only. The writer is needed on a > server configured as a master & slave that is functioning as a replication > repeater so downstream slaves can see the changes in the index and retrieve > them. > I'll attach a unit test that demonstrates the behavior of reopening the > DirectoryReader and it's effects on the value of getCoreCacheKey. My > assumption is that the behavior of Lucene during the various reader reopen > operations is correct and that the changes are necessary on the Solr side of > things. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 57577 - Failure!
Thanks Adrien: Would the TestCompressingStoredFieldsFormat would eventually catch it? This one seems to randomize its parameters, but perhaps it would be good to explicitly add Test*StoredFieldsFormat's for the different test codec modes we have: CompressionMode.HIGH_COMPRESSION, CompressionMode.FAST_DECOMPRESSION, CompressionMode.Fast ? On Wed, Sep 4, 2013 at 12:06 PM, Adrien Grand wrote: > Actually this is a side-effect of LUCENE-5188. There is a bug in > LZ4.compressHC (which I committed to test various trade-offs between > compression speed and ratio but is not used in any official codec) on > very compressible inputs which seems to be more easily triggered now > that the inputs can be sliced. I have a fix that I'm testing and > should be able to commit soon. > > -- > Adrien > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757889#comment-13757889 ] Robert Muir commented on LUCENE-5197: - Can we rename FixedGapTermsIndexReader.getSizeInBytes to FixedGapTermsIndexReader.ramBytesUsed? Otherwise, the patch consistently uses the same name (ramBytesUsed) throughout, its just this one that is inconsistent. > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 57577 - Failure!
I tried to look into this failure, thinking it was related to LUCENE-5188 (since it happened just after that was committed and involves stored fields compression). doesnt reproduce for me though: maybe because of how the test uses threads? On Wed, Sep 4, 2013 at 11:08 AM, wrote: > Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/57577/ > > 1 tests failed. > REGRESSION: > org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testFlushDocCount > > Error Message: > Captured an uncaught exception in thread: Thread[id=238, name=Thread-169, > state=RUNNABLE, group=TGRP-TestFlushByRamOrCountsPolicy] > > Stack Trace: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=238, name=Thread-169, state=RUNNABLE, > group=TGRP-TestFlushByRamOrCountsPolicy] > Caused by: java.lang.RuntimeException: > java.lang.ArrayIndexOutOfBoundsException: 591472 > at __randomizedtesting.SeedInfo.seed([F60802156EF89C32]:0) > at > org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:329) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 591472 > at > org.apache.lucene.codecs.compressing.LZ4$HCHashTable.insertAndFindBestMatch(LZ4.java:333) > at org.apache.lucene.codecs.compressing.LZ4.compressHC(LZ4.java:401) > at > org.apache.lucene.codecs.compressing.CompressionMode$LZ4HighCompressor.compress(CompressionMode.java:177) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:227) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:160) > at > org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:128) > at > org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65) > at > org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:278) > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:272) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:446) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1519) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1189) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1170) > at > org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:314) > > > > > Build Log: > [...truncated 752 lines...] >[junit4] Suite: org.apache.lucene.index.TestFlushByRamOrCountsPolicy >[junit4] 2> Set 04, 2013 5:07:51 QN > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException >[junit4] 2> WARNING: Uncaught exception in thread: > Thread[Thread-169,5,TGRP-TestFlushByRamOrCountsPolicy] >[junit4] 2> java.lang.RuntimeException: > java.lang.ArrayIndexOutOfBoundsException: 591472 >[junit4] 2>at > __randomizedtesting.SeedInfo.seed([F60802156EF89C32]:0) >[junit4] 2>at > org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:329) >[junit4] 2> Caused by: java.lang.ArrayIndexOutOfBoundsException: 591472 >[junit4] 2>at > org.apache.lucene.codecs.compressing.LZ4$HCHashTable.insertAndFindBestMatch(LZ4.java:333) >[junit4] 2>at > org.apache.lucene.codecs.compressing.LZ4.compressHC(LZ4.java:401) >[junit4] 2>at > org.apache.lucene.codecs.compressing.CompressionMode$LZ4HighCompressor.compress(CompressionMode.java:177) >[junit4] 2>at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:227) >[junit4] 2>at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:160) >[junit4] 2>at > org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:128) >[junit4] 2>at > org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65) >[junit4] 2>at > org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:278) >[junit4] 2>at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:272) >[junit4] 2>at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:446) >[junit4] 2>at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1519) >[junit4] 2>at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1189) >[junit4] 2>at >
[jira] [Assigned] (SOLR-5210) amend example's schema.xml and solrconfig.xml for blockjoin support
[ https://issues.apache.org/jira/browse/SOLR-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned SOLR-5210: -- Assignee: Yonik Seeley > amend example's schema.xml and solrconfig.xml for blockjoin support > --- > > Key: SOLR-5210 > URL: https://issues.apache.org/jira/browse/SOLR-5210 > Project: Solr > Issue Type: Sub-task >Reporter: Mikhail Khludnev >Assignee: Yonik Seeley > Fix For: 4.5, 5.0 > > > I suppose it make sense to apply > https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig.xml?r1=1513290&r2=1513289&pathrev=1513290 > and > https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/schema.xml?r1=1513290&r2=1513289&pathrev=1513290 > to example's config too provide out-of-the-box block join experience. > WDYT? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5168) BJQParserTest reproducible failures
[ https://issues.apache.org/jira/browse/SOLR-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-5168. Resolution: Fixed Fix Version/s: 5.0 4.5 > BJQParserTest reproducible failures > --- > > Key: SOLR-5168 > URL: https://issues.apache.org/jira/browse/SOLR-5168 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man >Assignee: Yonik Seeley > Fix For: 4.5, 5.0 > > Attachments: BJQTest.patch > > > two recent Jenkins builds have uncovered some test seeds that cause failures > in multiple test methods in BJQParserTest. These seeds reproduce reliably > (as of trunk r1514815) ... > {noformat} > ant test -Dtestcase=BJQParserTest -Dtests.seed=7A613F321CE87F5B > -Dtests.multiplier=3 -Dtests.slow=true > ant test -Dtestcase=BJQParserTest -Dtests.seed=1DC8055F837E437E > -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5168) BJQParserTest reproducible failures
[ https://issues.apache.org/jira/browse/SOLR-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757888#comment-13757888 ] Yonik Seeley commented on SOLR-5168: Right - I was just running the test in a loop locally first to ensure everything was actually fixed. > BJQParserTest reproducible failures > --- > > Key: SOLR-5168 > URL: https://issues.apache.org/jira/browse/SOLR-5168 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man >Assignee: Yonik Seeley > Attachments: BJQTest.patch > > > two recent Jenkins builds have uncovered some test seeds that cause failures > in multiple test methods in BJQParserTest. These seeds reproduce reliably > (as of trunk r1514815) ... > {noformat} > ant test -Dtestcase=BJQParserTest -Dtests.seed=7A613F321CE87F5B > -Dtests.multiplier=3 -Dtests.slow=true > ant test -Dtestcase=BJQParserTest -Dtests.seed=1DC8055F837E437E > -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 57577 - Failure!
Actually this is a side-effect of LUCENE-5188. There is a bug in LZ4.compressHC (which I committed to test various trade-offs between compression speed and ratio but is not used in any official codec) on very compressible inputs which seems to be more easily triggered now that the inputs can be sliced. I have a fix that I'm testing and should be able to commit soon. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5175) Don't reorder children document
[ https://issues.apache.org/jira/browse/SOLR-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-5175. Resolution: Fixed committed - I agree it's nicer to not reorder (esp for the single-level case), but I don't think we should guarantee the document order - it's an implementation detail. > Don't reorder children document > --- > > Key: SOLR-5175 > URL: https://issues.apache.org/jira/browse/SOLR-5175 > Project: Solr > Issue Type: Sub-task > Components: update >Reporter: Mikhail Khludnev > Labels: patch, test > Fix For: 4.5, 5.0 > > Attachments: SOLR-5175.patch > > > AddUpdateCommand reverses children documents that causes failure of > BJQParserTest.testGrandChildren() discussed in SOLR-5168 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757885#comment-13757885 ] Robert Muir commented on LUCENE-5189: - {quote} Just so I understand, if we gen FieldInfos, does that solve the brokenness of the Codec APIs (in addition to the other things that it solves)? If not, in what way are they broken, and is this break a new thing that NDV updates cause/expose, or it's a break that exists in general? Can you list the breaks here (because I think that FIS.gen solves all the points you raised above). {quote} It does not solve problem #2 (SegmentInfos.attributes). This API should removed, deprecated, made internal-only, or something like that. Another option is to move this stuff into the commit, but that might be overkill: today this stuff is only used as a backwards-compatibility crutch (i think) to read 3.x indexes: so it can possibly be just removed in trunk right now. Gen'ing FieldInfos brings about its own set of questions as far as when/how/if any new fieldinfo information is merged and when/how its visible to the codec API. its very scary but I don't see any alternative at the moment. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5168) BJQParserTest reproducible failures
[ https://issues.apache.org/jira/browse/SOLR-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757884#comment-13757884 ] Mikhail Khludnev commented on SOLR-5168: Ignore is removed at SOLR-5175. see https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test/org/apache/solr/search/join/BJQParserTest.java?r1=1520042&r2=1520041&pathrev=1520042 feel free to close this one. > BJQParserTest reproducible failures > --- > > Key: SOLR-5168 > URL: https://issues.apache.org/jira/browse/SOLR-5168 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man >Assignee: Yonik Seeley > Attachments: BJQTest.patch > > > two recent Jenkins builds have uncovered some test seeds that cause failures > in multiple test methods in BJQParserTest. These seeds reproduce reliably > (as of trunk r1514815) ... > {noformat} > ant test -Dtestcase=BJQParserTest -Dtests.seed=7A613F321CE87F5B > -Dtests.multiplier=3 -Dtests.slow=true > ant test -Dtestcase=BJQParserTest -Dtests.seed=1DC8055F837E437E > -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757875#comment-13757875 ] Robert Muir commented on LUCENE-5189: - {quote} This would let us proceed (progress not perfection) and then later, we address it. Ie, I think the added boolean is a fair compromise. {quote} Its not a fair compromise at all. To me, as a search engine library, this is not progress. its going backwards. Yes: I'm looking at it solely from an API perspective. Yes: others look at things from only features/performance perspective and do not seem to care about APIs. But as a library, the API is all that matters. So I just want to make it clear: saying "progress not perfection" is not a good excuse for leaving broken APIs about the codebase and shoving in features as fast as possible: its not progress to me so I simply do not see it that way. Frankly I am tired of hearing this phrase being used in this way, and when I see it in the future, it will encourage me to take a closer inspection of APIs and do pickier reviews. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5175) Don't reorder children document
[ https://issues.apache.org/jira/browse/SOLR-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757834#comment-13757834 ] ASF subversion and git services commented on SOLR-5175: --- Commit 1520042 from [~yo...@apache.org] in branch 'dev/trunk' [ https://svn.apache.org/r1520042 ] SOLR-5175: keep child order in block index > Don't reorder children document > --- > > Key: SOLR-5175 > URL: https://issues.apache.org/jira/browse/SOLR-5175 > Project: Solr > Issue Type: Sub-task > Components: update >Reporter: Mikhail Khludnev > Labels: patch, test > Fix For: 4.5, 5.0 > > Attachments: SOLR-5175.patch > > > AddUpdateCommand reverses children documents that causes failure of > BJQParserTest.testGrandChildren() discussed in SOLR-5168 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757864#comment-13757864 ] Shai Erera commented on LUCENE-5189: Just so I understand, if we gen FieldInfos, does that solve the brokenness of the Codec APIs (in addition to the other things that it solves)? If not, in what way are they broken, and is this break a new thing that NDV updates cause/expose, or it's a break that exists in general? Can you list the breaks here (because I think that FIS.gen solves all the points you raised above). > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5175) Don't reorder children document
[ https://issues.apache.org/jira/browse/SOLR-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757838#comment-13757838 ] ASF subversion and git services commented on SOLR-5175: --- Commit 1520045 from [~yo...@apache.org] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1520045 ] SOLR-5175: keep child order in block index > Don't reorder children document > --- > > Key: SOLR-5175 > URL: https://issues.apache.org/jira/browse/SOLR-5175 > Project: Solr > Issue Type: Sub-task > Components: update >Reporter: Mikhail Khludnev > Labels: patch, test > Fix For: 4.5, 5.0 > > Attachments: SOLR-5175.patch > > > AddUpdateCommand reverses children documents that causes failure of > BJQParserTest.testGrandChildren() discussed in SOLR-5168 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 57577 - Failure!
Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/57577/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testFlushDocCount Error Message: Captured an uncaught exception in thread: Thread[id=238, name=Thread-169, state=RUNNABLE, group=TGRP-TestFlushByRamOrCountsPolicy] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=238, name=Thread-169, state=RUNNABLE, group=TGRP-TestFlushByRamOrCountsPolicy] Caused by: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 591472 at __randomizedtesting.SeedInfo.seed([F60802156EF89C32]:0) at org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:329) Caused by: java.lang.ArrayIndexOutOfBoundsException: 591472 at org.apache.lucene.codecs.compressing.LZ4$HCHashTable.insertAndFindBestMatch(LZ4.java:333) at org.apache.lucene.codecs.compressing.LZ4.compressHC(LZ4.java:401) at org.apache.lucene.codecs.compressing.CompressionMode$LZ4HighCompressor.compress(CompressionMode.java:177) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:227) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:160) at org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:128) at org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:278) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:272) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:446) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1519) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1189) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1170) at org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:314) Build Log: [...truncated 752 lines...] [junit4] Suite: org.apache.lucene.index.TestFlushByRamOrCountsPolicy [junit4] 2> Set 04, 2013 5:07:51 QN com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[Thread-169,5,TGRP-TestFlushByRamOrCountsPolicy] [junit4] 2> java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 591472 [junit4] 2>at __randomizedtesting.SeedInfo.seed([F60802156EF89C32]:0) [junit4] 2>at org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:329) [junit4] 2> Caused by: java.lang.ArrayIndexOutOfBoundsException: 591472 [junit4] 2>at org.apache.lucene.codecs.compressing.LZ4$HCHashTable.insertAndFindBestMatch(LZ4.java:333) [junit4] 2>at org.apache.lucene.codecs.compressing.LZ4.compressHC(LZ4.java:401) [junit4] 2>at org.apache.lucene.codecs.compressing.CompressionMode$LZ4HighCompressor.compress(CompressionMode.java:177) [junit4] 2>at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:227) [junit4] 2>at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:160) [junit4] 2>at org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:128) [junit4] 2>at org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65) [junit4] 2>at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:278) [junit4] 2>at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:272) [junit4] 2>at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:446) [junit4] 2>at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1519) [junit4] 2>at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1189) [junit4] 2>at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1170) [junit4] 2>at org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:314) [junit4] 2> [junit4] 1> FAILED exc: [junit4] 1> java.lang.ArrayIndexOutOfBoundsException: 591472 [junit4] 1>at org.apache.lucene.codecs.compressing.LZ4$HCHashTable.insertAndFindBestMatch(LZ4.java:333) [junit4] 1>
[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-3069: -- Attachment: LUCENE-3069.patch Patch from last commit, and summary: Previously our term dictionary were both block-based: * BlockTerms dict breaks terms list into several blocks, as a linear structure with skip points. * BlockTreeTerms dict uses a trie-like structure to decide how terms are assigned to different blocks, and uses an FST index to optimize seeking performance. However, those two kinds of term dictionary don't hold all the term data in memory. For the worst case there would be at least two seeks: one from index in memory, another from file on disk. And we already have many complicated optimizations for this... If by design a term dictionary can be memory resident, the data structure will be simpler (after all we don't need maintain extra file pointers for a second-time seek, and we don't have to decide heuristic for how terms are clustered). And this is why those two FST-based implementation are introduced. Another big change in the code is: since our term dictionaries were both block-based, previous API was also limited. It was the postings writer who collected term metadata, and the term dictionary who told postings writer the range of terms it should flush to block. However, encoding of terms data should be decided by term dictionary part, since postings writer doesn't always know how terms are structured in term dictionary... Previous API had some tricky codes for this, e.g. PulsingPostingsWriter had to use terms' ordinal in block to decide how to write metadata, which is unnecessary. To make the API between term dict and postings list more 'pluggable' and 'general', I refactored the PostingsReader/WriterBase. For example, the postings writer should provide some information to term dictionary, like how many metadata values are strictly monotonic, so that term dictionary can optimize delta-encoding itself. And since the term dictionary now fully decides how metadata are written, it gets the ability to utilize intblock-based metadata encoding. Now the two implementations of term dictionary can easily be plugged with current postings formats, like: * FST41 = FSTTermdict + Lucene41PostingsBaseFormat, * FSTOrd41 = FSTOrdTermdict + Lucene41PostingsBaseFormat. * FSTOrdPulsing41 = FSTOrdTermsdict + PulsingPostingsWrapper + Lucene41PostingsFormat About performance, as shown before, those two term dict improve on primary key lookup, but still have overhead on wildcard query (both two term dict have only prefix information, and term dictionary cannot work well with this...). I'll try to hack this later. > Lucene should have an entirely memory resident term dictionary > -- > > Key: LUCENE-3069 > URL: https://issues.apache.org/jira/browse/LUCENE-3069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index, core/search >Affects Versions: 4.0-ALPHA >Reporter: Simon Willnauer >Assignee: Han Jiang > Labels: gsoc2013 > Fix For: 5.0, 4.5 > > Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch > > > FST based TermDictionary has been a great improvement yet it still uses a > delta codec file for scanning to terms. Some environments have enough memory > available to keep the entire FST based term dict in memory. We should add a > TermDictionary implementation that encodes all needed information for each > term into the FST (custom fst.Output) and builds a FST from the entire term > not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757814#comment-13757814 ] ASF subversion and git services commented on LUCENE-3069: - Commit 1520034 from [~billy] in branch 'dev/branches/lucene3069' [ https://svn.apache.org/r1520034 ] LUCENE-3069: move TermDict impls to package 'memory', nuke all 'Temp' symbols > Lucene should have an entirely memory resident term dictionary > -- > > Key: LUCENE-3069 > URL: https://issues.apache.org/jira/browse/LUCENE-3069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index, core/search >Affects Versions: 4.0-ALPHA >Reporter: Simon Willnauer >Assignee: Han Jiang > Labels: gsoc2013 > Fix For: 5.0, 4.5 > > Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch > > > FST based TermDictionary has been a great improvement yet it still uses a > delta codec file for scanning to terms. Some environments have enough memory > available to keep the entire FST based term dict in memory. We should add a > TermDictionary implementation that encodes all needed information for each > term into the FST (custom fst.Output) and builds a FST from the entire term > not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757802#comment-13757802 ] Robert Muir commented on LUCENE-5189: - {quote} But I also don't mind moving forward with SWS.isFieldUpdate and remove it in a follow on issue ... as long as it's done before 4.5. {quote} I don't think that will be an issue at all. if we want to iterate and leave the codec APIs broken, I won't object: but simple rule. Trunk only. We can't do this kind of stuff on the stable branch at all: Things that get backported there need to be "ready to ship". > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
[ https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757789#comment-13757789 ] ASF subversion and git services commented on LUCENE-5188: - Commit 1520025 from [~jpountz] in branch 'dev/trunk' [ https://svn.apache.org/r1520025 ] LUCENE-5188: Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors. > Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors > --- > > Key: LUCENE-5188 > URL: https://issues.apache.org/jira/browse/LUCENE-5188 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5188.patch > > > The way CompressingStoredFieldsFormat works is that it first decompresses > data and then consults the StoredFieldVisitor. This is a bit wasteful in case > documents are big and only the first field of a document is of interest so > maybe we could decompress and consult the StoredFieldVicitor in a more > streaming fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757771#comment-13757771 ] Han Jiang commented on LUCENE-3069: --- Yes, with slight changes, it can support seek by ord. (With FST.getByOutput). > Lucene should have an entirely memory resident term dictionary > -- > > Key: LUCENE-3069 > URL: https://issues.apache.org/jira/browse/LUCENE-3069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index, core/search >Affects Versions: 4.0-ALPHA >Reporter: Simon Willnauer >Assignee: Han Jiang > Labels: gsoc2013 > Fix For: 5.0, 4.5 > > Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch > > > FST based TermDictionary has been a great improvement yet it still uses a > delta codec file for scanning to terms. Some environments have enough memory > available to keep the entire FST based term dict in memory. We should add a > TermDictionary implementation that encodes all needed information for each > term into the FST (custom fst.Output) and builds a FST from the entire term > not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge
[ https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-5214: -- Attachment: (was: SOLR-5214.patch) > collections?action=SPLITSHARD running out of heap space due to merge > > > Key: SOLR-5214 > URL: https://issues.apache.org/jira/browse/SOLR-5214 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 4.3 >Reporter: Christine Poerschke > Attachments: SOLR-5214.patch > > > The problem we saw was that splitting a shard with many segments and documents > failed by running out of heap space. > Increasing heap space so that all existing segments could be merged into one > overall segment does not seem practical. Running the split without segment > merging worked. > Could split always run without merging, or merge=true/false be an optional > parameter for the SPLITSHARD action? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge
[ https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-5214: -- Attachment: SOLR-5214.patch Correcting subReaders.length vs. leaves.size() typo in my original patch. > collections?action=SPLITSHARD running out of heap space due to merge > > > Key: SOLR-5214 > URL: https://issues.apache.org/jira/browse/SOLR-5214 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 4.3 >Reporter: Christine Poerschke > Attachments: SOLR-5214.patch, SOLR-5214.patch > > > The problem we saw was that splitting a shard with many segments and documents > failed by running out of heap space. > Increasing heap space so that all existing segments could be merged into one > overall segment does not seem practical. Running the split without segment > merging worked. > Could split always run without merging, or merge=true/false be an optional > parameter for the SPLITSHARD action? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757741#comment-13757741 ] David Smiley commented on LUCENE-3069: -- I like FSTOrd as well. Presumably this one also exposes it via TermsEnum.ord()? > Lucene should have an entirely memory resident term dictionary > -- > > Key: LUCENE-3069 > URL: https://issues.apache.org/jira/browse/LUCENE-3069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index, core/search >Affects Versions: 4.0-ALPHA >Reporter: Simon Willnauer >Assignee: Han Jiang > Labels: gsoc2013 > Fix For: 5.0, 4.5 > > Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch > > > FST based TermDictionary has been a great improvement yet it still uses a > delta codec file for scanning to terms. Some environments have enough memory > available to keep the entire FST based term dict in memory. We should add a > TermDictionary implementation that encodes all needed information for each > term into the FST (custom fst.Output) and builds a FST from the entire term > not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge
[ https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-5214: -- Attachment: SOLR-5214.patch Attaching patch against trunk, to not merge when splitting (i.e. no merge=true/false parameter as yet). > collections?action=SPLITSHARD running out of heap space due to merge > > > Key: SOLR-5214 > URL: https://issues.apache.org/jira/browse/SOLR-5214 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 4.3 >Reporter: Christine Poerschke > Attachments: SOLR-5214.patch > > > The problem we saw was that splitting a shard with many segments and documents > failed by running out of heap space. > Increasing heap space so that all existing segments could be merged into one > overall segment does not seem practical. Running the split without segment > merging worked. > Could split always run without merging, or merge=true/false be an optional > parameter for the SPLITSHARD action? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge
Christine Poerschke created SOLR-5214: - Summary: collections?action=SPLITSHARD running out of heap space due to merge Key: SOLR-5214 URL: https://issues.apache.org/jira/browse/SOLR-5214 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.3 Reporter: Christine Poerschke The problem we saw was that splitting a shard with many segments and documents failed by running out of heap space. Increasing heap space so that all existing segments could be merged into one overall segment does not seem practical. Running the split without segment merging worked. Could split always run without merging, or merge=true/false be an optional parameter for the SPLITSHARD action? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-SmokeRelease-4.x - Build # 105 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/105/ No tests ran. Build Log: [...truncated 34200 lines...] prepare-release-no-sign: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease [copy] Copying 416 files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/lucene [copy] Copying 194 files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/solr [exec] JAVA6_HOME is /home/hudson/tools/java/latest1.6 [exec] JAVA7_HOME is /home/hudson/tools/java/latest1.7 [exec] NOTE: output encoding is US-ASCII [exec] [exec] Load release URL "file:/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/"... [exec] [exec] Test Lucene... [exec] test basics... [exec] get KEYS [exec] 0.1 MB in 0.01 sec (11.0 MB/sec) [exec] check changes HTML... [exec] download lucene-4.5.0-src.tgz... [exec] 27.1 MB in 0.04 sec (605.5 MB/sec) [exec] verify md5/sha1 digests [exec] download lucene-4.5.0.tgz... [exec] 49.0 MB in 0.07 sec (660.3 MB/sec) [exec] verify md5/sha1 digests [exec] download lucene-4.5.0.zip... [exec] 58.8 MB in 0.12 sec (509.9 MB/sec) [exec] verify md5/sha1 digests [exec] unpack lucene-4.5.0.tgz... [exec] verify JAR/WAR metadata... [exec] test demo with 1.6... [exec] got 5717 hits for query "lucene" [exec] test demo with 1.7... [exec] got 5717 hits for query "lucene" [exec] check Lucene's javadoc JAR [exec] [exec] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeReleaseTmp/unpack/lucene-4.5.0/docs/core/org/apache/lucene/util/AttributeSource.html [exec] broken details HTML: Method Detail: addAttributeImpl: closing "" does not match opening "" [exec] broken details HTML: Method Detail: getAttribute: closing "" does not match opening "" [exec] Traceback (most recent call last): [exec] File "/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py", line 1450, in [exec] main() [exec] File "/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py", line 1394, in main [exec] smokeTest(baseURL, svnRevision, version, tmpDir, isSigned, testArgs) [exec] File "/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py", line 1431, in smokeTest [exec] unpackAndVerify('lucene', tmpDir, artifact, svnRevision, version, testArgs) [exec] File "/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py", line 607, in unpackAndVerify [exec] verifyUnpacked(project, artifact, unpackPath, svnRevision, version, testArgs) [exec] File "/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py", line 786, in verifyUnpacked [exec] checkJavadocpath('%s/docs' % unpackPath) [exec] File "/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py", line 904, in checkJavadocpath [exec] raise RuntimeError('missing javadocs package summaries!') [exec] RuntimeError: missing javadocs package summaries! BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/build.xml:321: exec returned: 1 Total time: 19 minutes 36 seconds Build step 'Invoke Ant' marked build as failure Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757688#comment-13757688 ] Shai Erera commented on LUCENE-5189: I think it's important to solve FIS.gen, either on this issue or a separate one, but before 4.5 is out. Because now SegmentInfos records per-field dvGen and if we gen FIS, this will be recorded by a new Lucene45FieldInfosFormat, and SIS will need to record fieldInfosGen. I actually don't mind to do it in this issue. It's work that's needed and affects NDV-updates (e.g. sparse fields which now hit a too late cryptic exception). But I also don't mind moving forward with SWS.isFieldUpdate and remove it in a follow on issue ... as long as it's done before 4.5. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757681#comment-13757681 ] Erick Erickson commented on SOLR-2548: -- [~hossman_luc...@fucit.org] Thanks. Your comments made me look more carefully at directExecutor, it took me a bit to wrap my head around that one. 1> Still checking on the implications of stacking up a bunch of directExecutors all through the CompletionService, not something I've used recently and the details are hazy. As far as tests are concerned, I haven't gotten there yet, the original patch didn't have any... It should be easy to create tests with multiple field.facet clauses, TestFaceting does this so there are templates. Is there a decent way to check whether more than one thread was actually spawned? If so, can you point me at some code that actually does that? Otherwise I'll create tests that just get the right response for single and multiple facet.field specifications and a bit of walk-through with the debugger to insure we actually go through that code path. 2> done. Thanks again. > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch, SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757674#comment-13757674 ] Michael McCandless commented on LUCENE-5189: We could simply document this as a limitation, today? Ie, that if it's an update, the DVFormat cannot use the attributes APIs. This would let us proceed (progress not perfection) and then later, we address it. Ie, I think the added boolean is a fair compromise. Or, we can pursue gen'ing FIS on this patch, but this is going to add a lot of trickiness/complexity; I think it'd be better to explore it separately. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757676#comment-13757676 ] Han Jiang commented on LUCENE-3069: --- OK! These two term dicts are both FST-based: * FST term dict directly uses FST to map term to its metadata & stats (FST) * FSTOrd term dict uses FST to map term to its ordinal number (FST), and the ordinal is then used to seek metadata from another big chunk. I prefer the second impl since it puts much less stress on FST. I have updated the detailed format explaination in last commit. Hmm, I'll create another patch for this... > Lucene should have an entirely memory resident term dictionary > -- > > Key: LUCENE-3069 > URL: https://issues.apache.org/jira/browse/LUCENE-3069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index, core/search >Affects Versions: 4.0-ALPHA >Reporter: Simon Willnauer >Assignee: Han Jiang > Labels: gsoc2013 > Fix For: 5.0, 4.5 > > Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch > > > FST based TermDictionary has been a great improvement yet it still uses a > delta codec file for scanning to terms. Some environments have enough memory > available to keep the entire FST based term dict in memory. We should add a > TermDictionary implementation that encodes all needed information for each > term into the FST (custom fst.Output) and builds a FST from the entire term > not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757668#comment-13757668 ] Michael McCandless commented on LUCENE-3069: Thanks for uploading the diffs against trunk, Han; I'll review this. Can you explain the two new terms dict impls? And maybe write up a brief summary of all the changes (to help others understand the patch)? Maybe we can put the new "all in memory" terms dict impls under oal.codecs.memory? FSTTerms* seems like a good name? (Just because in the future maybe we have other impls of "all in memory" terms dicts)... > Lucene should have an entirely memory resident term dictionary > -- > > Key: LUCENE-3069 > URL: https://issues.apache.org/jira/browse/LUCENE-3069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index, core/search >Affects Versions: 4.0-ALPHA >Reporter: Simon Willnauer >Assignee: Han Jiang > Labels: gsoc2013 > Fix For: 5.0, 4.5 > > Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch > > > FST based TermDictionary has been a great improvement yet it still uses a > delta codec file for scanning to terms. Some environments have enough memory > available to keep the entire FST based term dict in memory. We should add a > TermDictionary implementation that encodes all needed information for each > term into the FST (custom fst.Output) and builds a FST from the entire term > not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
[ https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757666#comment-13757666 ] Simon Willnauer commented on LUCENE-5188: - thanks adrien for elaborating... progress over perfection so lets move on here. +1 to commit > Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors > --- > > Key: LUCENE-5188 > URL: https://issues.apache.org/jira/browse/LUCENE-5188 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5188.patch > > > The way CompressingStoredFieldsFormat works is that it first decompresses > data and then consults the StoredFieldVisitor. This is a bit wasteful in case > documents are big and only the first field of a document is of interest so > maybe we could decompress and consult the StoredFieldVicitor in a more > streaming fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757665#comment-13757665 ] Michael McCandless commented on LUCENE-5197: +1 to current patch, except SimpleTextFieldsReader does in fact use RAM (it has a termsCache, and it sneakily pre-loads all terms for each field into an FST!). I think we should go with this current approach, and then later, if/when we improve RUE to easily restrict where it crawls / speed it up / etc., then we can cutover. > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
[ https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757641#comment-13757641 ] Adrien Grand commented on LUCENE-5188: -- These bytes can be shared because they are write-only, kind of like /dev/null. Having this on DataInput to be able to skip an entire decompression would be nice but unfortunately with the current design, the field numbers are stored in the compressed stream, so you need to decompress anyway to know whether you should skip (StoredFieldVisitor allows to skip based on the FieldInfo, that my StoredFieldReader computes from the field number). But your idea is something I would like to explore for the next StoredFieldsFormat, along with preset dictionaries. > Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors > --- > > Key: LUCENE-5188 > URL: https://issues.apache.org/jira/browse/LUCENE-5188 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5188.patch > > > The way CompressingStoredFieldsFormat works is that it first decompresses > data and then consults the StoredFieldVisitor. This is a bit wasteful in case > documents are big and only the first field of a document is of interest so > maybe we could decompress and consult the StoredFieldVicitor in a more > streaming fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
[ https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757635#comment-13757635 ] Simon Willnauer commented on LUCENE-5188: - cool stuff adrien! One thing I wonder is if we should use a specialized DataInput maybe SkippableDataInput in that class to prevent the static method. That shared byte array worries me. Aside of this, I wonder if we had this method in DataInput or however we gonna do this would it be possible to skip an entire decompression step if we know that the amount of bytes we skip is larger than one or more decompression blocks. I have to admit I don't exactly know how this works and if what I propose is possible but that would help me to better understand why we need to read all the data and decompress if we trash it anyway. > Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors > --- > > Key: LUCENE-5188 > URL: https://issues.apache.org/jira/browse/LUCENE-5188 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5188.patch > > > The way CompressingStoredFieldsFormat works is that it first decompresses > data and then consults the StoredFieldVisitor. This is a bit wasteful in case > documents are big and only the first field of a document is of interest so > maybe we could decompress and consult the StoredFieldVicitor in a more > streaming fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5213) collections?action=SPLITSHARD parent vs. sub-shards numDocs
[ https://issues.apache.org/jira/browse/SOLR-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-5213: -- Attachment: SOLR-5213.patch Attaching patch for reporting per-segment numDocs for parent and sub-shards. > collections?action=SPLITSHARD parent vs. sub-shards numDocs > --- > > Key: SOLR-5213 > URL: https://issues.apache.org/jira/browse/SOLR-5213 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 4.4 >Reporter: Christine Poerschke > Attachments: SOLR-5213.patch > > > The problem we saw was that splitting a shard took a long time and at the end > of it the sub-shards contained fewer documents than the original shard. > The root cause was eventually tracked down to the disappearing documents not > falling into the hash ranges of the sub-shards. > Could SolrIndexSplitter split report per-segment numDocs for parent and > sub-shards, with at least a warning logged for any discrepancies (documents > falling into none of the sub-shards or documents falling into several > sub-shards)? > Additionally, could a case be made for erroring out when discrepancies are > detected i.e. not proceeding with the shard split? Either to always error or > to have an verifyNumDocs=false/true optional parameter for the SPLITSHARD > action. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5213) collections?action=SPLITSHARD parent vs. sub-shards numDocs
Christine Poerschke created SOLR-5213: - Summary: collections?action=SPLITSHARD parent vs. sub-shards numDocs Key: SOLR-5213 URL: https://issues.apache.org/jira/browse/SOLR-5213 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.4 Reporter: Christine Poerschke The problem we saw was that splitting a shard took a long time and at the end of it the sub-shards contained fewer documents than the original shard. The root cause was eventually tracked down to the disappearing documents not falling into the hash ranges of the sub-shards. Could SolrIndexSplitter split report per-segment numDocs for parent and sub-shards, with at least a warning logged for any discrepancies (documents falling into none of the sub-shards or documents falling into several sub-shards)? Additionally, could a case be made for erroring out when discrepancies are detected i.e. not proceeding with the shard split? Either to always error or to have an verifyNumDocs=false/true optional parameter for the SPLITSHARD action. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4818) Create a boolean perceptron classifier
[ https://issues.apache.org/jira/browse/LUCENE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili resolved LUCENE-4818. - Resolution: Fixed marking it as resolved, future improvements would come in separate issues. > Create a boolean perceptron classifier > -- > > Key: LUCENE-4818 > URL: https://issues.apache.org/jira/browse/LUCENE-4818 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/classification >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-4818.patch > > > Create a Lucene based classifier using the perceptron algorithm (see > http://en.wikipedia.org/wiki/Perceptron) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
[ https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757570#comment-13757570 ] Adrien Grand commented on LUCENE-5188: -- I will commit later today if there is no objection. > Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors > --- > > Key: LUCENE-5188 > URL: https://issues.apache.org/jira/browse/LUCENE-5188 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5188.patch > > > The way CompressingStoredFieldsFormat works is that it first decompresses > data and then consults the StoredFieldVisitor. This is a bit wasteful in case > documents are big and only the first field of a document is of interest so > maybe we could decompress and consult the StoredFieldVicitor in a more > streaming fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Egense updated SOLR-2649: Comment: was deleted (was: Thank you! We have been waiting a long time for this fix.) > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Bug > Components: query parsers >Reporter: Magnus Bergmark >Priority: Minor > Fix For: 4.5, 5.0 > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Egense updated SOLR-2649: Comment: was deleted (was: Thanks for the clarification. ) > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Bug > Components: query parsers >Reporter: Magnus Bergmark >Priority: Minor > Fix For: 4.5, 5.0 > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine
[ https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757536#comment-13757536 ] Tommaso Teofili commented on SOLR-5201: --- here's a draft patch: https://github.com/tteofili/lucene-solr/compare/apache:trunk...solr-5201.patch _AnalysisEngines_ are initialized inside _UIMAUpdateRequestProcessorFactories_ together with a _JCasPool_ to better handle multiple concurrent requests. My benchmarks (ran 'ant clean test -Dtests.multiplier=100' with and without the above patch) show execution of _UIMAUpdateRequestProcessorTest#testMultiplierProcessing_ is ~10 times faster and less memory consumptive (~240MB saved over ~650MB heap) > UIMAUpdateRequestProcessor should reuse the AnalysisEngine > -- > > Key: SOLR-5201 > URL: https://issues.apache.org/jira/browse/SOLR-5201 > Project: Solr > Issue Type: Improvement > Components: contrib - UIMA >Affects Versions: 4.4 >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 4.5, 5.0 > > Attachments: SOLR-5201-ae-cache-every-request_branch_4x.patch, > SOLR-5201-ae-cache-only-single-request_branch_4x.patch > > > As reported in http://markmail.org/thread/2psiyl4ukaejl4fx > UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request > which is bad for performance therefore it'd be nice if such AEs could be > reused whenever that's possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 795 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/795/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 10286 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/bin/java -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=79AF0BD36D5B1D8 -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.docvaluesformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=ISO-8859-1 -classpath /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/test-framework/lib/junit4-ant-2.0.10.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test-files:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/lucene-codecs-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queries/lucene-queries-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queryparser/lucene-queryparser-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/join/lucene-join-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-cli-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-codec-1.7.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-configuration-1.6.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-fileupload-1.2.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-lang-2.6.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/concurrentlinkedhashmap-lru-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/dom4j-1.6.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/guava-14.0.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-annotations-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-auth-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-common-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-hdfs-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/joda-tim