[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection
[ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527356#comment-13527356 ] Shai Erera commented on LUCENE-4600: Changing the title, which got me thinking -- Mike, if we do the Reader/DV caching approach, that could benefit post-collection performance too, right? Is it possile that you hack the current FacetsCollector to do the aggregation over CachedBytes and then compare the difference? Because your first results show that during-collection are not that much faster than post-collection, I am just wondering if it'll be the same when we cache the bytes outside the collector entirely. If so, I think it should push us to do this caching outside, because we've already identified cases where post-collection is needed (e.g. sampling) too. > Explore facets aggregation during documents collection > -- > > Key: LUCENE-4600 > URL: https://issues.apache.org/jira/browse/LUCENE-4600 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Attachments: LUCENE-4600.patch, LUCENE-4600.patch > > > Today the facet module simply gathers all hits (as a bitset, optionally with > a float[] to hold scores as well, if you will aggregate them) during > collection, and then at the end when you call getFacetsResults(), it makes a > 2nd pass over all those hits doing the actual aggregation. > We should investigate just aggregating as we collect instead, so we don't > have to tie up transient RAM (fairly small for the bit set but possibly big > for the float[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4600) Explore facets aggregation during documents collection
[ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4600: --- Summary: Explore facets aggregation during documents collection (was: Facets should aggregate during collection, not at the end) > Explore facets aggregation during documents collection > -- > > Key: LUCENE-4600 > URL: https://issues.apache.org/jira/browse/LUCENE-4600 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Attachments: LUCENE-4600.patch, LUCENE-4600.patch > > > Today the facet module simply gathers all hits (as a bitset, optionally with > a float[] to hold scores as well, if you will aggregate them) during > collection, and then at the end when you call getFacetsResults(), it makes a > 2nd pass over all those hits doing the actual aggregation. > We should investigate just aggregating as we collect instead, so we don't > have to tie up transient RAM (fairly small for the bit set but possibly big > for the float[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-788) MoreLikeThis should support distributed search
[ https://issues.apache.org/jira/browse/SOLR-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-788: - Fix Version/s: 5.0 > MoreLikeThis should support distributed search > -- > > Key: SOLR-788 > URL: https://issues.apache.org/jira/browse/SOLR-788 > Project: Solr > Issue Type: Improvement > Components: MoreLikeThis >Reporter: Grant Ingersoll >Assignee: Mark Miller >Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: AlternateDistributedMLT.patch, MLT.patch, MLT.patch, > MoreLikeThisComponentTest.patch, SOLR-788.patch, SolrMoreLikeThisPatch.txt > > > The MoreLikeThis component should support distributed processing. > See SOLR-303. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-788) MoreLikeThis should support distributed search
[ https://issues.apache.org/jira/browse/SOLR-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-788: - Attachment: SOLR-788.patch This patch fixes some formatting in the latest patch and adds the base for some tests as well as one test - it's currently failing. > MoreLikeThis should support distributed search > -- > > Key: SOLR-788 > URL: https://issues.apache.org/jira/browse/SOLR-788 > Project: Solr > Issue Type: Improvement > Components: MoreLikeThis >Reporter: Grant Ingersoll >Assignee: Mark Miller >Priority: Minor > Fix For: 4.1 > > Attachments: AlternateDistributedMLT.patch, MLT.patch, MLT.patch, > MoreLikeThisComponentTest.patch, SOLR-788.patch, SolrMoreLikeThisPatch.txt > > > The MoreLikeThis component should support distributed processing. > See SOLR-303. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4600) Facets should aggregate during collection, not at the end
[ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527353#comment-13527353 ] Shai Erera commented on LUCENE-4600: bq. Net/net I think we should offer an easy-to-use DV-backed facets impl... If only DV could handle multi-values. Can they handle a single byte[]? Because essentially that's what the facets API needs today - it stores everything in the payload, which is byte[]. Having a multi-val DV could benefit us by e.g. not needing to write an iterator on the payload to get the category ordinals ... The patch looks very good. Few comments/questions: * Do I understand correctly that the caching Collector is reusable? Otherwise I don't see how the CachedBytes help. ** Preferably, if we had an AtomicReader which caches these bytes, then you wouldn't need to reuse the Collector? ** Hmmm, what if you used the in-mem Codec, for loading just this term's posting list into RAM? Do you think that you would gain the same? If you want to make this a class that can be reused by other scenarios, then few tips that can enable that: * Instead of referencing CatListParams.DEFAULT_TERM, you can pull the CLP from FacetSearchParams.getFacetIndexingParams().getCLP(new CP()).getTerm(). * Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder. * Not sure that we should, but this class supports only one CLP. I think it's ok to leave it like that, and get the CLP.term() at ctor, but then we must be able to cache the bytes at the reader level. That way, if an app uses multiple CLPs, it can initialize multi such Collectors. * I think it's ok to rely on the top Query to not call us for deleted docs, and therefore pass liveDocs=null. If a Query wants to iterate on deleted docs, we should count facets for them too. * Maybe you should take the IntArrayAllocator from the outside? That class can be initialized by the app once to e.g. use maxArrays=10 (e.g. if it expects max 10 queries in parallel), and then the int[] are reused whenever possible. The way the patch is now, if you reuse that Collector, you can only reuse one array. * In setNextReader you sync on the cache only in case someone executes a search w/ an ExecutorService? That's another point where caching at the Codec/AtomicReader level would be better, right? * Why is acceptDocsOutOfOrder false? Is it because of how the cache works? Because facet counting is not limited to in-order only. ** For the non-caching one that's true, because we can only advance on the fulltree posting. But if the posting is entirely in RAM, we can random access it? I wonder if we can write a good single Collector, and optimize the caching stuff through the Reader, or DV. Collectors in Lucene are usually not reusable? At least, I haven't seen such pattern. The current FacetsCollector isn't reusable (b/c of the bitset and potential scores array). So I'm worried users might be confused and won't benefit the most from that Collector, b/c they won't reuse it .. On the other hand, saying that we have a FacetsIndexReader (composite) which per configuration initializes the right FacetAtomicReader would be more consumable by apps. About the results, just to clarify -- in both runs the 'QPS base' refers to current facet counting and 'QPS comp' refers to the two new collectors respectively? I'm surprised that the int[][][] didn't perform much better, since you don't need to do the decoding for every document, for every query. But then, perhaps it's because the RAM size is so large, and we pay a lot swapping in/out from CPU cache ... Also, note that you wrote a specialized code for decoding the payload, vs. using an API to do that (e.g. PackedInts / IntDecoder). I wonder how would that compare to the base collection, i.e. would we still see the big difference between int[][][] and the byte[] caching. Overall though, great work Mike ! We must get this code in. It's clear that it can potentially gain a lot for some scenarios ... > Facets should aggregate during collection, not at the end > - > > Key: LUCENE-4600 > URL: https://issues.apache.org/jira/browse/LUCENE-4600 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Attachments: LUCENE-4600.patch, LUCENE-4600.patch > > > Today the facet module simply gathers all hits (as a bitset, optionally with > a float[] to hold scores as well, if you will aggregate them) during > collection, and then at the end when you call getFacetsResults(), it makes a > 2nd pass over all those hits doing the actual aggregation. > We should investigate just aggregating as we collect instead, so we don't
[jira] [Resolved] (SOLR-4158) When a core is registering in ZooKeeper it may not wait long enough to find the leader due to how long the potential leader waits to see replicas.
[ https://issues.apache.org/jira/browse/SOLR-4158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4158. --- Resolution: Fixed > When a core is registering in ZooKeeper it may not wait long enough to find > the leader due to how long the potential leader waits to see replicas. > -- > > Key: SOLR-4158 > URL: https://issues.apache.org/jira/browse/SOLR-4158 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > Attachments: SOLR-4158.patch > > > Rather than waiting just 30 seconds, we must wait at least as long as the > current wait a potential leader does on cluster startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4158) When a core is registering in ZooKeeper it may not wait long enough to find the leader due to how long the potential leader waits to see replicas.
[ https://issues.apache.org/jira/browse/SOLR-4158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527351#comment-13527351 ] Commit Tag Bot commented on SOLR-4158: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1418819 SOLR-4158: When a core is registering in ZooKeeper it may not wait long enough to find the leader due to how long the potential leader waits to see replicas. > When a core is registering in ZooKeeper it may not wait long enough to find > the leader due to how long the potential leader waits to see replicas. > -- > > Key: SOLR-4158 > URL: https://issues.apache.org/jira/browse/SOLR-4158 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > Attachments: SOLR-4158.patch > > > Rather than waiting just 30 seconds, we must wait at least as long as the > current wait a potential leader does on cluster startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4158) When a core is registering in ZooKeeper it may not wait long enough to find the leader due to how long the potential leader waits to see replicas.
[ https://issues.apache.org/jira/browse/SOLR-4158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527348#comment-13527348 ] Commit Tag Bot commented on SOLR-4158: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1418818 SOLR-4158: When a core is registering in ZooKeeper it may not wait long enough to find the leader due to how long the potential leader waits to see replicas. > When a core is registering in ZooKeeper it may not wait long enough to find > the leader due to how long the potential leader waits to see replicas. > -- > > Key: SOLR-4158 > URL: https://issues.apache.org/jira/browse/SOLR-4158 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > Attachments: SOLR-4158.patch > > > Rather than waiting just 30 seconds, we must wait at least as long as the > current wait a potential leader does on cluster startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3948) Calculate/display deleted documents in admin interface
[ https://issues.apache.org/jira/browse/SOLR-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-3948: - Assignee: Mark Miller > Calculate/display deleted documents in admin interface > -- > > Key: SOLR-3948 > URL: https://issues.apache.org/jira/browse/SOLR-3948 > Project: Solr > Issue Type: Improvement > Components: web gui >Affects Versions: 4.0 >Reporter: Shawn Heisey >Assignee: Mark Miller >Priority: Minor > Fix For: 4.1 > > Attachments: SOLR-3948.patch > > > The admin interface shows you two totals that let you infer how many deleted > documents exist in the index by subtracting Num Docs from Max Doc. It would > make things much easier for novice users and for automated statistics > gathering if the number of deleted documents were calculated for you and > displayed. > Last Modified: 3 minutes ago > Num Docs: 12924551 > Max Doc: 13011778 > Version: 862 > Segment Count: 23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2986) Warning missing for features that require stored uniqueKey - MoreLikeThis
[ https://issues.apache.org/jira/browse/SOLR-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-2986: - Assignee: Mark Miller > Warning missing for features that require stored uniqueKey - MoreLikeThis > - > > Key: SOLR-2986 > URL: https://issues.apache.org/jira/browse/SOLR-2986 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 3.5, 4.0-ALPHA >Reporter: Shawn Heisey >Assignee: Mark Miller >Priority: Minor > Fix For: 4.1 > > Attachments: SOLR-2986.patch > > > If your uniqueKey is not stored, you get this warning: > uniqueKey is not stored - distributed search will not work > There is at least one other feature that relies on a stored uniqueKey - > MoreLikeThis. Attaching a patch that updates the warning message. It may > actually require a more generic message. It's possible there are other > features that will not work without storing the uniqueKey. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2986) Warning missing for features that require stored uniqueKey - MoreLikeThis
[ https://issues.apache.org/jira/browse/SOLR-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527347#comment-13527347 ] Mark Miller commented on SOLR-2986: --- I know you didn't change the log level, but shouldn't this actually be at the warning level? > Warning missing for features that require stored uniqueKey - MoreLikeThis > - > > Key: SOLR-2986 > URL: https://issues.apache.org/jira/browse/SOLR-2986 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 3.5, 4.0-ALPHA >Reporter: Shawn Heisey >Priority: Minor > Fix For: 4.1 > > Attachments: SOLR-2986.patch > > > If your uniqueKey is not stored, you get this warning: > uniqueKey is not stored - distributed search will not work > There is at least one other feature that relies on a stored uniqueKey - > MoreLikeThis. Attaching a patch that updates the warning message. It may > actually require a more generic message. It's possible there are other > features that will not work without storing the uniqueKey. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Solr patches for 4x - how to get committer attention?
On 12/8/2012 9:08 PM, Mark Miller wrote: Up to you, but I find pings in the JIRA issue best myself for the general case. But since you have started this thread, what is your list? If you have a pile, you might as well list them out and see if you can get catch some eyes. I do have a pile. When I started this list, I didn't realize it would get quite this large. I have been a lot busier on Jira than I realized! I am curious about whether my Jetty8 deployment should remain on the BIO connector or if it should be changed to NIO. There has been a lot of discussion about NIO in recent issues. Here are some issues and their descriptions that either I have filed myself or feel strongly that we need to include, broken down into categories: Simple patch, fills a need, please consider immediate committing: SOLR-2986Warning missing for features that require stored uniqueKey - MoreLikeThis SOLR-3918Change the way -excl-slf4j targets work SOLR-3948Calculate/display deleted documents in admin interface SOLR-4048Add a "getRecursive" method to NamedList I actually came up with this! Others have made it better. Is it good enough yet? SOLR-3393Implement an optimized LFUCache Patch submitted, but I'm not 100% certain that the way I solved it is acceptable. SOLR-3284StreamingUpdateSolrServer swallows exceptions SOLR-4143setRequestHandler - option to not set qt parameter Probably a real bug that needs attention: SOLR-3923eDismax: complex fielded query with parens is not recognized I've always had a workaround, but I see the occasional mailing list help request: SOLR-1920Need generic placemarker for DIH delta-import Would be nice to add/fix, but not a showstopper: SOLR-1919Time-based autowarm limits SOLR-2182Distributed PingRequestHandler SOLR-2728DIH status: "Total Documents Processed" field disappears SOLR-2729DIH status: successful zero-document delta-import missing "" field SOLR-3319Improve DataImportHandler status response SOLR-3458Allow multiple Items to stay open on Plugins-Page SOLR-3950Attempting postings="BloomFilter" results in UnsupportedOperationException SOLR-3953postingsFormat doesn't work on field, only on fieldType SOLR-3954Option to have updateHandler and DIH skip updateLog SOLR-3969Admin dashboard -- include cache/buffer memory utilization SOLR-3982No way to get current dataimport status from admin GUI SOLR-3990index size unavailable in gui/mbeans unless replication handler configured SOLR-4053metrics - add statistics on searcher/cache warming SOLR-4132Special log category for announcements, startup messages Same as above, but probably not a well-isolated or trivial change: SOLR-Create an option that allows a query to be cached, but not used for warming Not important for me, but might greatly help someone else: SOLR-3958Solr should log a warning when old healthcheck method configured SOLR-3972Missing admin-extra files result in SEVERE log entries with giant stacktrace Becoming increasingly less important as time goes on: SOLR-2204Cross-version replication broken by new javabin format Things that I have mostly forgotten about: SOLR-2889Implement Adaptive Replacement Cache Some quick notes about issues not mentioned in the list above: SOLR-788 SolrCloud has put new focus on this. Management really wants this. SOLR-3979 will likely be fixed if SOLR-4129 gets committed. SOLR-4135 is getting attention via SOLR-4117. SOLR-4148 is known. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527342#comment-13527342 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1418814 SOLR-2592: deleteByQuery routing support > Custom Hashing > -- > > Key: SOLR-2592 > URL: https://issues.apache.org/jira/browse/SOLR-2592 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.0-ALPHA >Reporter: Noble Paul > Fix For: 4.1 > > Attachments: dbq_fix.patch, pluggable_sharding.patch, > pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, > SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, > SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, > SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch > > > If the data in a cloud can be partitioned on some criteria (say range, hash, > attribute value etc) It will be easy to narrow down the search to a smaller > subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Solr patches for 4x - how to get committer attention?
Up to you, but I find pings in the JIRA issue best myself for the general case. But since you have started this thread, what is your list? If you have a pile, you might as well list them out and see if you can get catch some eyes. - Mark On Dec 8, 2012, at 1:18 PM, Shawn Heisey wrote: > I'm curious: What is the best way to ask for committer opinion on the > SOLR- patches that I have submitted and other issues that I care about? > One big email to dev@l.o where I talk about each issue? A series of emails > each asking about one issue? Jira comments asking for attention? > > I did try asking in #lucene, but it was very late in the evening (US > timezones) and this is a time of year when lots of people take holidays. > > Thanks, > Shawn > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4159) When we are starting a shard from rest, a leader should not consider it's last published state when deciding if it can be the new leader.
[ https://issues.apache.org/jira/browse/SOLR-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527312#comment-13527312 ] Commit Tag Bot commented on SOLR-4159: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1418795 SOLR-4159: CHANGES entry > When we are starting a shard from rest, a leader should not consider it's > last published state when deciding if it can be the new leader. > - > > Key: SOLR-4159 > URL: https://issues.apache.org/jira/browse/SOLR-4159 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 4.1, 5.0 > > > This makes it so that if a leader goes down before any replicas sync from it, > one of the replicas won't take over. But because we wait a while for known > replicas to come up and sync + pick the best leader, we should not need to be > so strict in this protection and let a replica take a stab at being the > leader - it may have been up to date and was only publishing through recovery > phases to find that out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4159) When we are starting a shard from rest, a leader should not consider it's last published state when deciding if it can be the new leader.
[ https://issues.apache.org/jira/browse/SOLR-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527308#comment-13527308 ] Commit Tag Bot commented on SOLR-4159: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1418793 SOLR-4159: When we are starting a shard from rest, a potential leader should not consider it's last published state when deciding if it can be the new leader. > When we are starting a shard from rest, a leader should not consider it's > last published state when deciding if it can be the new leader. > - > > Key: SOLR-4159 > URL: https://issues.apache.org/jira/browse/SOLR-4159 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 4.1, 5.0 > > > This makes it so that if a leader goes down before any replicas sync from it, > one of the replicas won't take over. But because we wait a while for known > replicas to come up and sync + pick the best leader, we should not need to be > so strict in this protection and let a replica take a stab at being the > leader - it may have been up to date and was only publishing through recovery > phases to find that out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4159) When we are starting a shard from rest, a leader should not consider it's last published state when deciding if it can be the new leader.
[ https://issues.apache.org/jira/browse/SOLR-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527303#comment-13527303 ] Commit Tag Bot commented on SOLR-4159: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1418791 SOLR-4159: CHANGIES entry > When we are starting a shard from rest, a leader should not consider it's > last published state when deciding if it can be the new leader. > - > > Key: SOLR-4159 > URL: https://issues.apache.org/jira/browse/SOLR-4159 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 4.1, 5.0 > > > This makes it so that if a leader goes down before any replicas sync from it, > one of the replicas won't take over. But because we wait a while for known > replicas to come up and sync + pick the best leader, we should not need to be > so strict in this protection and let a replica take a stab at being the > leader - it may have been up to date and was only publishing through recovery > phases to find that out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4159) When we are starting a shard from rest, a leader should not consider it's last published state when deciding if it can be the new leader.
[ https://issues.apache.org/jira/browse/SOLR-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527301#comment-13527301 ] Commit Tag Bot commented on SOLR-4159: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1418790 SOLR-4159: When we are starting a shard from rest, a potential leader should not consider it's last published state when deciding if it can be the new leader. > When we are starting a shard from rest, a leader should not consider it's > last published state when deciding if it can be the new leader. > - > > Key: SOLR-4159 > URL: https://issues.apache.org/jira/browse/SOLR-4159 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 4.1, 5.0 > > > This makes it so that if a leader goes down before any replicas sync from it, > one of the replicas won't take over. But because we wait a while for known > replicas to come up and sync + pick the best leader, we should not need to be > so strict in this protection and let a replica take a stab at being the > leader - it may have been up to date and was only publishing through recovery > phases to find that out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3911) Make Directory and DirectoryFactory first class so that the majority of Solr's features work with any custom implementations.
[ https://issues.apache.org/jira/browse/SOLR-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527299#comment-13527299 ] Commit Tag Bot commented on SOLR-3911: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1418789 SOLR-3911: sync properties files after write so that they are written out before the directory is closed. > Make Directory and DirectoryFactory first class so that the majority of > Solr's features work with any custom implementations. > - > > Key: SOLR-3911 > URL: https://issues.apache.org/jira/browse/SOLR-3911 > Project: Solr > Issue Type: Improvement >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > Attachments: SOLR-3911.patch, SOLR-3911.patch, SOLR-3911.patch > > > The biggest issue is that many parts of Solr rely on a local file system > based Directory implementation - most notably, replication. This should all > be changed to use the Directory and DirectoryFactory abstractions. > Other parts of the code that count on the local file system for making paths > and getting file sizes should also be changed to use Directory and/or > DirectoryFactory. > Original title: Replication should work with any Directory impl, not just > local filesystem based Directories. > I've wanted to do this for a long time - there is no reason replication > should not support any directory impl. This will let us use the mockdir for > replication tests rather than having to force an FSDir and lose all the extra > test checks and simulations. This will improve our testing around replication > a lot, and allow custom Directory impls to be used on multi node Solr. > Expanded scope - full first class support for DirectoryFactory and Directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4158) When a core is registering in ZooKeeper it may not wait long enough to find the leader due to how long the potential leader waits to see replicas.
[ https://issues.apache.org/jira/browse/SOLR-4158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4158: -- Attachment: SOLR-4158.patch Patch that has registering cores wait an appropriate time to see the leader pop up. > When a core is registering in ZooKeeper it may not wait long enough to find > the leader due to how long the potential leader waits to see replicas. > -- > > Key: SOLR-4158 > URL: https://issues.apache.org/jira/browse/SOLR-4158 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > Attachments: SOLR-4158.patch > > > Rather than waiting just 30 seconds, we must wait at least as long as the > current wait a potential leader does on cluster startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4159) When we are starting a shard from rest, a leader should not consider it's last published state when deciding if it can be the new leader.
Mark Miller created SOLR-4159: - Summary: When we are starting a shard from rest, a leader should not consider it's last published state when deciding if it can be the new leader. Key: SOLR-4159 URL: https://issues.apache.org/jira/browse/SOLR-4159 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.1, 5.0 This makes it so that if a leader goes down before any replicas sync from it, one of the replicas won't take over. But because we wait a while for known replicas to come up and sync + pick the best leader, we should not need to be so strict in this protection and let a replica take a stab at being the leader - it may have been up to date and was only publishing through recovery phases to find that out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4158) When a core is registering in ZooKeeper it may not wait long enough to find the leader due to how long the potential leader waits to see replicas.
[ https://issues.apache.org/jira/browse/SOLR-4158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527295#comment-13527295 ] Mark Miller commented on SOLR-4158: --- As reported by Alain Rogister on the mailing list. > When a core is registering in ZooKeeper it may not wait long enough to find > the leader due to how long the potential leader waits to see replicas. > -- > > Key: SOLR-4158 > URL: https://issues.apache.org/jira/browse/SOLR-4158 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > > Rather than waiting just 30 seconds, we must wait at least as long as the > current wait a potential leader does on cluster startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4158) When a core is registering in ZooKeeper it may not wait long enough to find the leader due to how long the potential leader waits to see replicas.
Mark Miller created SOLR-4158: - Summary: When a core is registering in ZooKeeper it may not wait long enough to find the leader due to how long the potential leader waits to see replicas. Key: SOLR-4158 URL: https://issues.apache.org/jira/browse/SOLR-4158 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.1, 5.0 Rather than waiting just 30 seconds, we must wait at least as long as the current wait a potential leader does on cluster startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527288#comment-13527288 ] Erick Erickson commented on SOLR-4030: -- Radim: At some point, you might reflect upon the fact that the common element in your conflicts with various Apache projects is...you. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Lucene Developers >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527275#comment-13527275 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1418762 SOLR-2592: realtime-get support > Custom Hashing > -- > > Key: SOLR-2592 > URL: https://issues.apache.org/jira/browse/SOLR-2592 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.0-ALPHA >Reporter: Noble Paul > Fix For: 4.1 > > Attachments: dbq_fix.patch, pluggable_sharding.patch, > pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, > SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, > SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, > SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch > > > If the data in a cloud can be partitioned on some criteria (say range, hash, > attribute value etc) It will be easy to narrow down the search to a smaller > subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3911) Make Directory and DirectoryFactory first class so that the majority of Solr's features work with any custom implementations.
[ https://issues.apache.org/jira/browse/SOLR-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527267#comment-13527267 ] Commit Tag Bot commented on SOLR-3911: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1418756 SOLR-3911: write out replication stats through Directory > Make Directory and DirectoryFactory first class so that the majority of > Solr's features work with any custom implementations. > - > > Key: SOLR-3911 > URL: https://issues.apache.org/jira/browse/SOLR-3911 > Project: Solr > Issue Type: Improvement >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > Attachments: SOLR-3911.patch, SOLR-3911.patch, SOLR-3911.patch > > > The biggest issue is that many parts of Solr rely on a local file system > based Directory implementation - most notably, replication. This should all > be changed to use the Directory and DirectoryFactory abstractions. > Other parts of the code that count on the local file system for making paths > and getting file sizes should also be changed to use Directory and/or > DirectoryFactory. > Original title: Replication should work with any Directory impl, not just > local filesystem based Directories. > I've wanted to do this for a long time - there is no reason replication > should not support any directory impl. This will let us use the mockdir for > replication tests rather than having to force an FSDir and lose all the extra > test checks and simulations. This will improve our testing around replication > a lot, and allow custom Directory impls to be used on multi node Solr. > Expanded scope - full first class support for DirectoryFactory and Directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3911) Make Directory and DirectoryFactory first class so that the majority of Solr's features work with any custom implementations.
[ https://issues.apache.org/jira/browse/SOLR-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527266#comment-13527266 ] Mark Miller commented on SOLR-3911: --- 'replication stats' I meant - committed. > Make Directory and DirectoryFactory first class so that the majority of > Solr's features work with any custom implementations. > - > > Key: SOLR-3911 > URL: https://issues.apache.org/jira/browse/SOLR-3911 > Project: Solr > Issue Type: Improvement >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > Attachments: SOLR-3911.patch, SOLR-3911.patch, SOLR-3911.patch > > > The biggest issue is that many parts of Solr rely on a local file system > based Directory implementation - most notably, replication. This should all > be changed to use the Directory and DirectoryFactory abstractions. > Other parts of the code that count on the local file system for making paths > and getting file sizes should also be changed to use Directory and/or > DirectoryFactory. > Original title: Replication should work with any Directory impl, not just > local filesystem based Directories. > I've wanted to do this for a long time - there is no reason replication > should not support any directory impl. This will let us use the mockdir for > replication tests rather than having to force an FSDir and lose all the extra > test checks and simulations. This will improve our testing around replication > a lot, and allow custom Directory impls to be used on multi node Solr. > Expanded scope - full first class support for DirectoryFactory and Directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4600) Facets should aggregate during collection, not at the end
[ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4600: --- Attachment: LUCENE-4600.patch New patch, adding a hacked up CachedCountingFacetsCollector. All it does is first pre-load all payloads into a PackedBytes (just like DocValues), and then during aggregation, instead of pulling the byte[] from payloads it pulls it from this RAM cache. This results in an unexpectedly big speedup: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff HighTerm0.53 (0.9%)1.00 (2.5%) 87.3% ( 83% - 91%) LowTerm7.59 (0.6%) 26.75 (12.9%) 252.6% ( 237% - 267%) MedTerm3.35 (0.7%) 12.71 (9.0%) 279.8% ( 268% - 291%) {noformat} The only "real" difference is that I'm pulling the byte[] from RAM instead of from payloads, ie I still pay the vInt+dgap decode cost per hit ... so it's surprising payloads add THAT MUCH overhead? (The test was "hot" so payloads were coming from OS's IO cache via MMapDir). I think the reason why HighTerm sees the least gains is because .advance is much less costly for it, since often the target is in the already-loaded block. I had separately previously tested the existing int[][][] cache (CategoryListCache) but it had smaller gains than this (73% for MedTerm), and it required more RAM (1.9 GB vs 377 RAM for this patch). Net/net I think we should offer an easy-to-use DV-backed facets impl... > Facets should aggregate during collection, not at the end > - > > Key: LUCENE-4600 > URL: https://issues.apache.org/jira/browse/LUCENE-4600 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Attachments: LUCENE-4600.patch, LUCENE-4600.patch > > > Today the facet module simply gathers all hits (as a bitset, optionally with > a float[] to hold scores as well, if you will aggregate them) during > collection, and then at the end when you call getFacetsResults(), it makes a > 2nd pass over all those hits doing the actual aggregation. > We should investigate just aggregating as we collect instead, so we don't > have to tie up transient RAM (fairly small for the bit set but possibly big > for the float[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527263#comment-13527263 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1418755 SOLR-2592: integration tests for routing > Custom Hashing > -- > > Key: SOLR-2592 > URL: https://issues.apache.org/jira/browse/SOLR-2592 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.0-ALPHA >Reporter: Noble Paul > Fix For: 4.1 > > Attachments: dbq_fix.patch, pluggable_sharding.patch, > pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, > SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, > SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, > SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch > > > If the data in a cloud can be partitioned on some criteria (say range, hash, > attribute value etc) It will be easy to narrow down the search to a smaller > subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3393) Implement an optimized LFUCache
[ https://issues.apache.org/jira/browse/SOLR-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527262#comment-13527262 ] Shawn Heisey commented on SOLR-3393: Followup while going through all my old issues: I like Adrien's changes to my patch in general, but I still think a slow default decay (subtract 1 from each frequency) on autowarm is a good idea. In the interests of making sure it doesn't affect performance much, require a minimum time period to elapse before decaying again. IMHO my previous LFU implementation (the one that actually got committed) is total crap and this should just completely replace it. > Implement an optimized LFUCache > --- > > Key: SOLR-3393 > URL: https://issues.apache.org/jira/browse/SOLR-3393 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.6, 4.0-ALPHA >Reporter: Shawn Heisey >Assignee: Hoss Man >Priority: Minor > Fix For: 4.1 > > Attachments: SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch, > SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch > > > SOLR-2906 gave us an inefficient LFU cache modeled on > FastLRUCache/ConcurrentLRUCache. It could use some serious improvement. The > following project includes an Apache 2.0 licensed O(1) implementation. The > second link is the paper (PDF warning) it was based on: > https://github.com/chirino/hawtdb > http://dhruvbird.com/lfu.pdf > Using this project and paper, I will attempt to make a new O(1) cache called > FastLFUCache that is modeled on LRUCache.java. This will (for now) leave the > existing LFUCache/ConcurrentLFUCache implementation in place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4591) Make StoredFieldsFormat more configurable
[ https://issues.apache.org/jira/browse/LUCENE-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Delbru updated LUCENE-4591: -- Attachment: PerFieldStoredFieldsWriter.java PerFieldStoredFieldsReader.java PerFieldStoredFieldsFormat.java > Make StoredFieldsFormat more configurable > - > > Key: LUCENE-4591 > URL: https://issues.apache.org/jira/browse/LUCENE-4591 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: 4.1 >Reporter: Renaud Delbru > Fix For: 4.1 > > Attachments: LUCENE-4591.patch, PerFieldStoredFieldsFormat.java, > PerFieldStoredFieldsReader.java, PerFieldStoredFieldsWriter.java > > > The current StoredFieldsFormat are implemented with the assumption that only > one type of StoredfieldsFormat is used by the index. > We would like to be able to configure a StoredFieldsFormat per field, > similarly to the PostingsFormat. > There is a few issues that need to be solved for allowing that: > 1) allowing to configure a segment suffix to the StoredFieldsFormat > 2) implement SPI interface in StoredFieldsFormat > 3) create a PerFieldStoredFieldsFormat > We are proposing to start first with 1) by modifying the signature of > StoredFieldsFormat#fieldsReader and StoredFieldsFormat#fieldsWriter so that > they use SegmentReadState and SegmentWriteState instead of the current set of > parameters. > Let us know what you think about this idea. If this is of interest, we can > contribute with a first path for 1). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4591) Make StoredFieldsFormat more configurable
[ https://issues.apache.org/jira/browse/LUCENE-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527261#comment-13527261 ] Renaud Delbru commented on LUCENE-4591: --- It is a similar approach that we followed (see attached files: PerFieldStoredFieldsFormat, PerFieldStoredFieldsWriter and PerFieldStoredFieldsReader). The issue is that our secondary StoredFieldsReader/Writer we are using is, for the moment, a wrapper around an instance of the CompressingStoredFieldsReader/Writer (using a wrapper approach was another way to extend CompressingStoredFieldsReader/Writer). The wrapper implements our encoding logic, and uses the underlying CompressingStoredFieldsWriter to write our data as a binary block. The problem with this approach is that since we can not configure the segment suffix of the CompressingStoredFieldsWriter, then the two StoredFieldsFormat try to write to files that have identical names. Since we are using a CompressingStoredFieldsReader/Writer as underlying mechanism to write the stored fields, why are we not using just one instance to store default lucene fields and our specific fields ? The reasons are: that it was more simple for our first implementation to leverage CompressingStoredFieldsReader/Writer (as a temporary solution); and that we would like to keep things (code and segment files) more isolated from each other. As said previously, we could simply copy-paste the compressing codec on our side to solve the problem, but I thought that maybe by raising the issue, we could have found a more appropriate solution. > Make StoredFieldsFormat more configurable > - > > Key: LUCENE-4591 > URL: https://issues.apache.org/jira/browse/LUCENE-4591 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: 4.1 >Reporter: Renaud Delbru > Fix For: 4.1 > > Attachments: LUCENE-4591.patch > > > The current StoredFieldsFormat are implemented with the assumption that only > one type of StoredfieldsFormat is used by the index. > We would like to be able to configure a StoredFieldsFormat per field, > similarly to the PostingsFormat. > There is a few issues that need to be solved for allowing that: > 1) allowing to configure a segment suffix to the StoredFieldsFormat > 2) implement SPI interface in StoredFieldsFormat > 3) create a PerFieldStoredFieldsFormat > We are proposing to start first with 1) by modifying the signature of > StoredFieldsFormat#fieldsReader and StoredFieldsFormat#fieldsWriter so that > they use SegmentReadState and SegmentWriteState instead of the current set of > parameters. > Let us know what you think about this idea. If this is of interest, we can > contribute with a first path for 1). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527260#comment-13527260 ] Radim Kolar commented on SOLR-4030: --- Yes, i had similar issue with nutch project. Markus Jelsma reuploaded my patch back to JIRA and refused to delete it. Its still in JIRA violating my copyright laws. I decided not to escalate the conflict and go after ASF and harm other projects just because of misbehavior of one person. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Lucene Developers >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: What happened to Solr test coverage
Not sure... I verified that a local "ant test" runs TestGroupingSearch, and that certainly exercises something like GroupingSpecification (which has 0 coverage in the link you show). I don't know much about the clover stuff... Is this because of failed tests? https://builds.apache.org/job/Lucene-Solr-Clover-4.x/72/#showFailuresLink -Yonik http://lucidworks.com On Sat, Dec 8, 2012 at 3:39 PM, Uwe Schindler wrote: > Sorry, wrong link: > https://builds.apache.org/job/Lucene-Solr-Clover-4.x/72/clover-report/pkg-summary.html > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Uwe Schindler [mailto:u...@thetaphi.de] >> Sent: Saturday, December 08, 2012 9:38 PM >> To: dev@lucene.apache.org >> Subject: What happened to Solr test coverage >> >> Hi all, >> >> what happened to the Solr test coverage, the overall Lucene/Solr test >> coverage was at 80% not long ago (before Robert disabled solr tests >> completely)? See https://builds.apache.org/job/Lucene-Solr-Clover- >> 4.x/72/clover-report/ >> Now, after reenabling both trunk and 4.x are down to 60%. Looking in the >> above link, most packages listed with no coverage are from Solr. E.g. >> solr/grouping has no coverage at all. What happened? >> >> Uwe >> >> >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional >> commands, e-mail: dev-h...@lucene.apache.org > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-Tests-trunk-Java6 - Build # 15670 - Still Failing
>> Thanks Uwe. Thanks Uwe. I actually planned on reverting that... then went on to fix something else in my code... then forgot about it and then fell asleep... I guess too many things at once. D. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: What happened to Solr test coverage
Sorry, wrong link: https://builds.apache.org/job/Lucene-Solr-Clover-4.x/72/clover-report/pkg-summary.html - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Saturday, December 08, 2012 9:38 PM > To: dev@lucene.apache.org > Subject: What happened to Solr test coverage > > Hi all, > > what happened to the Solr test coverage, the overall Lucene/Solr test > coverage was at 80% not long ago (before Robert disabled solr tests > completely)? See https://builds.apache.org/job/Lucene-Solr-Clover- > 4.x/72/clover-report/ > Now, after reenabling both trunk and 4.x are down to 60%. Looking in the > above link, most packages listed with no coverage are from Solr. E.g. > solr/grouping has no coverage at all. What happened? > > Uwe > > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional > commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
What happened to Solr test coverage
Hi all, what happened to the Solr test coverage, the overall Lucene/Solr test coverage was at 80% not long ago (before Robert disabled solr tests completely)? See https://builds.apache.org/job/Lucene-Solr-Clover-4.x/72/clover-report/ Now, after reenabling both trunk and 4.x are down to 60%. Looking in the above link, most packages listed with no coverage are from Solr. E.g. solr/grouping has no coverage at all. What happened? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui
[ https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-4041: Reporter: Lucene Developers (was: Radim Kolar) > Allow segment merge monitoring in Solr Admin gui > > > Key: SOLR-4041 > URL: https://issues.apache.org/jira/browse/SOLR-4041 > Project: Solr > Issue Type: Improvement > Components: web gui >Reporter: Lucene Developers >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add solrMbean for ConcurrentMergeScheduler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui
[ https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened SOLR-4041: - I am not sure why this was resolved? Reopening it, the issue is relevant. > Allow segment merge monitoring in Solr Admin gui > > > Key: SOLR-4041 > URL: https://issues.apache.org/jira/browse/SOLR-4041 > Project: Solr > Issue Type: Improvement > Components: web gui >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add solrMbean for ConcurrentMergeScheduler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3911) Make Directory and DirectoryFactory first class so that the majority of Solr's features work with any custom implementations.
[ https://issues.apache.org/jira/browse/SOLR-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527249#comment-13527249 ] Mark Miller commented on SOLR-3911: --- I'll commit code to write the replication state through the directory in a moment. > Make Directory and DirectoryFactory first class so that the majority of > Solr's features work with any custom implementations. > - > > Key: SOLR-3911 > URL: https://issues.apache.org/jira/browse/SOLR-3911 > Project: Solr > Issue Type: Improvement >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > Attachments: SOLR-3911.patch, SOLR-3911.patch, SOLR-3911.patch > > > The biggest issue is that many parts of Solr rely on a local file system > based Directory implementation - most notably, replication. This should all > be changed to use the Directory and DirectoryFactory abstractions. > Other parts of the code that count on the local file system for making paths > and getting file sizes should also be changed to use Directory and/or > DirectoryFactory. > Original title: Replication should work with any Directory impl, not just > local filesystem based Directories. > I've wanted to do this for a long time - there is no reason replication > should not support any directory impl. This will let us use the mockdir for > replication tests rather than having to force an FSDir and lose all the extra > test checks and simulations. This will improve our testing around replication > a lot, and allow custom Directory impls to be used on multi node Solr. > Expanded scope - full first class support for DirectoryFactory and Directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527247#comment-13527247 ] Uwe Schindler commented on SOLR-4030: - bq. Contributions to our software are under the ASL by default But does any user who opens an issue knows this? The Apache Issue tracker is missing a extra page, referred from the create issue / upload patch page that all work done here is under ASL. My own issue tracker presents the terms and conditions to the user when using the issue tracker. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Lucene Developers >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527244#comment-13527244 ] Yonik Seeley commented on SOLR-4030: The checkbox was never necessary - it's just that people became used to seeing it and started assuming it was. Contributions to our software are under the ASL by default - one would need to explicitly state when adding something that looks like a contribution to our JIRA that it was in fact *not* a contribution (and that is what the old checkbox facilitated). > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Lucene Developers >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527242#comment-13527242 ] Uwe Schindler commented on SOLR-4030: - @Yonik: I agree here, we should not overcomplicate it. Just ignore Radim (at least I will do this). My only problem here is the missing checkbox in JIRA, where the users from the beginning have to specify if they agree with the Apache License. In that case, Radim would not be able to go to court or think about it, because he agreed on publishing the patch by open source terms. The problem, as Jack now explain, is the case for external people, using this issue tracker as a source for their work. Maybe some company would have used Radim's patch for their own product! Nobody inform them that he removed his patch here. In my opinion, you should not be able to at least remove patches in JIRA, but just offer the option to say at a later stage: "I submitted the patch, but I need to undo that." This should be noted in the submit form, so anybody who is not sure if the patch can be added here, would not do it. If some patch really have to be deleted, only PMC members should be able to do this completely (like in SVN where you can revert, but you have no chance to completely remove the occurence - only SVN admins could really remove a commit completely). To stop Radim editing this issues or reopen/close it, I set a new reporter. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Lucene Developers >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4135) java.lang.IllegalArgumentException when getting index size in replication handler
[ https://issues.apache.org/jira/browse/SOLR-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527239#comment-13527239 ] Mark Miller commented on SOLR-4135: --- Hey Shawn, I think SOLR-4117 does indeed solve this for 5.X - however, doing this on 4.X is a bit more difficult. Because we are using a library to get the size, we would have to replace that call with our code. That's fine with me, but I'm hoping to merge back the 5.X directory work soon, now that it has hardened a fair amount. > java.lang.IllegalArgumentException when getting index size in replication > handler > - > > Key: SOLR-4135 > URL: https://issues.apache.org/jira/browse/SOLR-4135 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.1 > Environment: Linux bigindy5 2.6.32-279.14.1.el6.centos.plus.x86_64 #1 > SMP Wed Nov 7 00:40:45 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > java version "1.7.0_09" > Java(TM) SE Runtime Environment (build 1.7.0_09-b05) > Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode) > solr-impl4.1-SNAPSHOT 1415856M - ncindex - 2012-11-30 14:42:27 >Reporter: Shawn Heisey > Fix For: 4.1, 5.0 > > > Very similar to SOLR-4117, but happens even with branch_4x checkouts after > that patch was committed. I am not actually doing replication. This > exception happens on some calls to /admin/mbeans?stats=true that are made by > a SolrJ application. > ERROR - 2012-11-30 17:31:00.592; org.apache.solr.common.SolrException; > java.lang.IllegalArgumentException: > /index/solr4/cores/s5_0/../../data/s5_0/index/segments_9k does not exist > at org.apache.commons.io.FileUtils.sizeOf(FileUtils.java:2053) > at > org.apache.commons.io.FileUtils.sizeOfDirectory(FileUtils.java:2089) > at > org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:477) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-4030: Reporter: Lucene Developers (was: Unassigned Developer) > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Lucene Developers >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527238#comment-13527238 ] Yonik Seeley commented on SOLR-4030: Lets not over-think this or make it too complicated guys... No, you can't take back a contribution once it's been contributed. If/when it's been committed or not is just a detail. We can normally *choose* not to commit it (the distinction is pretty important), and I think that's what we should do here. If a contribution wasn't valid in the first place (i.e. someone saying... "hey, this person didn't have permission to contribute X") then we can figure that out on a case-by-case basis. Hasn't happened yet here AFAIK. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Unassigned Developer >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-4030: Reporter: Unassigned Developer (was: Radim Kolar) > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Unassigned Developer >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened SOLR-4030: - Keep issue open > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4600) Facets should aggregate during collection, not at the end
[ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4600: --- Attachment: LUCENE-4600.patch Initial prototype patch ... I created a CountingFacetsCollector that aggregates per-segment, and it "hardwires" a dgap/vint decoding. I tested using luceneutil's date faceting and it gives decent speedups for TermQuery: {noformat} HighTerm0.54 (2.7%)0.63 (1.4%) 17.6% ( 13% - 22%) LowTerm7.69 (1.6%)9.15 (2.1%) 18.9% ( 14% - 23%) MedTerm3.39 (1.2%)4.48 (1.3%) 32.2% ( 29% - 35%) {noformat} > Facets should aggregate during collection, not at the end > - > > Key: LUCENE-4600 > URL: https://issues.apache.org/jira/browse/LUCENE-4600 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Attachments: LUCENE-4600.patch > > > Today the facet module simply gathers all hits (as a bitset, optionally with > a float[] to hold scores as well, if you will aggregate them) during > collection, and then at the end when you call getFacetsResults(), it makes a > 2nd pass over all those hits doing the actual aggregation. > We should investigate just aggregating as we collect instead, so we don't > have to tie up transient RAM (fairly small for the bit set but possibly big > for the float[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527232#comment-13527232 ] Jack Krupansky commented on SOLR-4030: -- bq. if a contributor wishes to withdraw his own work - especially if it's not yet committed to the codebase - we should simply allow them to do so. That makes sense at least for a short interval, except for the case of anyone who may have included the patch in their own fork and maybe even could be in production with it and is now in limbo or worse. In this specific instance the triviality and brief tenure of the patch kind of makes it moot, but for future instances now we have to think twice when recommending a non-committed patch. At a minimum, somewhere there needs to be a notice/warning that use and ownership of uncommitted patches is a potentially questionable and risky activity - and that permisssion can be revoked at any moment. And if the "donate" check box can't be restored, then there needs to be some mechanism for a donor to explicitly cede ownership, to at least confirm the donation even if its legal status may vary from jurisdiction to jurisdiction. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527230#comment-13527230 ] Uwe Schindler commented on SOLR-4030: - Thanks Markus. So restoring patches in JIRA actually works with help of infra, but this is useless here, as we would not use it in our source tree. And: He said it is trivial, so anybody who has interest in this functionality could write the code easily. So I see no problem and we just leave the issue open until somebody has the time to resolve this with a good patch. I think we should ignore Radim for any patches about Lucene and Solr, his social competence seems to be zero. I added a filter to my mail inbox. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527228#comment-13527228 ] Mark Miller commented on SOLR-4030: --- Apache has a pretty clear record in these cases I think - if a contributor wishes to withdraw his own work - especially if it's not yet committed to the codebase - we should simply allow them to do so. I'm sure situations around this can get complicated, but this case does not seem complicated at all. These patches have only one author and they have not been committed yet. We are in the business of accepting patches from *willing* contributors. If someone wants to see these features implemented, I'd suggested writing new patches. Neither issue is very large. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527227#comment-13527227 ] Markus Jelsma commented on SOLR-4030: - We've had the same issues with him at Apache Nutch. After contacting the board it was decided to restore the original patch but not include it in the source tree, and ignore it further. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527224#comment-13527224 ] Uwe Schindler commented on SOLR-4030: - I will contact the board about what's going on here. Unfortunately, the custom JIRA plugin where you had to sign the ASF contribution by clicking the checkbox is no longer working with JIRA 5.2, now used by this server. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527223#comment-13527223 ] Radim Kolar commented on SOLR-4030: --- Unless you have written permission from us to distribute our work, you are losing court case against us after we prove to court that we are authors of code in question. We never lost such cases. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527222#comment-13527222 ] Shawn Heisey commented on SOLR-4117: Mark, it seems like this is also likely to resolve SOLR-4135. Can you look into that? I was seeing another issue in my log that Yonik thought might be tied to the disappearing file issue. If I continue to get that after this patch, I will open another issue. > IO error while trying to get the size of the Directory > -- > > Key: SOLR-4117 > URL: https://issues.apache.org/jira/browse/SOLR-4117 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 5.0 > Environment: 5.0.0.2012.11.28.10.42.06 > Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. >Reporter: Markus Jelsma >Assignee: Mark Miller >Priority: Minor > Fix For: 5.0 > > Attachments: SOLR-4117.patch > > > With SOLR-4032 fixed we see other issues when randomly taking down nodes > (nicely via tomcat restart) while indexing a few million web pages from > Hadoop. We do make sure that at least one node is up for a shard but due to > recovery issues it may not be live. > One node seems to work but generates IO errors in the log and > ZookeeperExeption in the GUI. In the GUI we only see: > {code} > SolrCore Initialization Failures > openindex_f: > org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: > > Please check your logs for more information > {code} > and in the log we only see the following exception: > {code} > 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - > [http-8080-exec-28] - : IO error while trying to get the size of the > Directory:org.apache.lucene.store.NoSuchDirectoryException: directory > '/opt/solr/cores/shard_f/data/index' does not exist > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) > at > org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) > at > org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) > at > org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) > at > org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) > at > org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) > at > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) > at > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev
Re: [jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
Jack, yes re ownership. Mind opening an issue and reattaching? I saw this was done to 2 issues. Thanks! Otis -- SOLR Performance Monitoring - http://sematext.com/spm On Dec 8, 2012 11:31 AM, "Jack Krupansky (JIRA)" wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527187#comment-13527187] > > Jack Krupansky commented on SOLR-4030: > -- > > bq. Now we fork > > Forking is understandable when you have time pressure and other interests > to satisfy. No problem there. Hopefully you can contribute back some of the > work from your fork. > > bq. taking our patches with us. > > How is that related to forking? I mean, sure, you can apply the patch on > your own fork, but why does forking imply that you think you need to delete > the posted patch? > > Besides, didn't you cede ownership of the patch to the community/ASF when > you posted it? So, technically, it is no longer yours, right? > > It sounds like you need a refresher course in "Community 101"! > > > > Use Lucene segment merge throttling > > --- > > > > Key: SOLR-4030 > > URL: https://issues.apache.org/jira/browse/SOLR-4030 > > Project: Solr > > Issue Type: Improvement > >Reporter: Radim Kolar > >Priority: Minor > > Labels: patch > > Fix For: 4.1, 5.0 > > > > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527221#comment-13527221 ] Commit Tag Bot commented on SOLR-4117: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1418725 SOLR-4117: harden size of directory code > IO error while trying to get the size of the Directory > -- > > Key: SOLR-4117 > URL: https://issues.apache.org/jira/browse/SOLR-4117 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 5.0 > Environment: 5.0.0.2012.11.28.10.42.06 > Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. >Reporter: Markus Jelsma >Assignee: Mark Miller >Priority: Minor > Fix For: 5.0 > > Attachments: SOLR-4117.patch > > > With SOLR-4032 fixed we see other issues when randomly taking down nodes > (nicely via tomcat restart) while indexing a few million web pages from > Hadoop. We do make sure that at least one node is up for a shard but due to > recovery issues it may not be live. > One node seems to work but generates IO errors in the log and > ZookeeperExeption in the GUI. In the GUI we only see: > {code} > SolrCore Initialization Failures > openindex_f: > org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: > > Please check your logs for more information > {code} > and in the log we only see the following exception: > {code} > 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - > [http-8080-exec-28] - : IO error while trying to get the size of the > Directory:org.apache.lucene.store.NoSuchDirectoryException: directory > '/opt/solr/cores/shard_f/data/index' does not exist > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) > at > org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) > at > org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) > at > org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) > at > org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) > at > org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) > at > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) > at > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr patches for 4x - how to get committer attention?
I'm curious: What is the best way to ask for committer opinion on the SOLR- patches that I have submitted and other issues that I care about? One big email to dev@l.o where I talk about each issue? A series of emails each asking about one issue? Jira comments asking for attention? I did try asking in #lucene, but it was very late in the evening (US timezones) and this is a time of year when lots of people take holidays. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4599) Compressed term vectors
[ https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527215#comment-13527215 ] Robert Muir commented on LUCENE-4599: - I'm not sure how much more compact ords would really be? start thinking about average word length, shared prefixes and so on, and long references (even though they could be delta-encoded since they are in order, i still imagine 3 or 4 bytes on average if you assume a large terms dict) don't seem to save a lot. I think its way more important to bulk encode the prefix/suffix lengths. > Compressed term vectors > --- > > Key: LUCENE-4599 > URL: https://issues.apache.org/jira/browse/LUCENE-4599 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs, core/termvectors >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: 4.1 > > Attachments: LUCENE-4599.patch > > > We should have codec-compressed term vectors similarly to what we have with > stored fields. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4117: -- Attachment: SOLR-4117.patch Here is a patch that does 2 things: If we find the directory and see a file listed, but then get a file not found trying to access it (it was removed out from under us), just return a 0 size Also, if we can't find the directory at all, try using the newIndexDir. > IO error while trying to get the size of the Directory > -- > > Key: SOLR-4117 > URL: https://issues.apache.org/jira/browse/SOLR-4117 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 5.0 > Environment: 5.0.0.2012.11.28.10.42.06 > Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. >Reporter: Markus Jelsma >Assignee: Mark Miller >Priority: Minor > Fix For: 5.0 > > Attachments: SOLR-4117.patch > > > With SOLR-4032 fixed we see other issues when randomly taking down nodes > (nicely via tomcat restart) while indexing a few million web pages from > Hadoop. We do make sure that at least one node is up for a shard but due to > recovery issues it may not be live. > One node seems to work but generates IO errors in the log and > ZookeeperExeption in the GUI. In the GUI we only see: > {code} > SolrCore Initialization Failures > openindex_f: > org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: > > Please check your logs for more information > {code} > and in the log we only see the following exception: > {code} > 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - > [http-8080-exec-28] - : IO error while trying to get the size of the > Directory:org.apache.lucene.store.NoSuchDirectoryException: directory > '/opt/solr/cores/shard_f/data/index' does not exist > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) > at > org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) > at > org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) > at > org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) > at > org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) > at > org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) > at > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) > at > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4599) Compressed term vectors
[ https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527203#comment-13527203 ] David Smiley commented on LUCENE-4599: -- The ord reference approach seems most interesting to me, even if it's not workable at the moment (based on Mike's comment). If things were changed to make ord's possible then there wouldn't even need to be any term information in term-vectors whatsoever; right? Not even the ord (integer) itself because the array of each term vector is intrinsically in ord-order and aligned exactly to each ord; right? Does anyone know roughly what % of term-vector storage is currently for the term? > Compressed term vectors > --- > > Key: LUCENE-4599 > URL: https://issues.apache.org/jira/browse/LUCENE-4599 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs, core/termvectors >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: 4.1 > > Attachments: LUCENE-4599.patch > > > We should have codec-compressed term vectors similarly to what we have with > stored fields. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4033) No lockType configured for NRTCachingDirectory
[ https://issues.apache.org/jira/browse/SOLR-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4033: -- Affects Version/s: (was: 4.0) 5.0 > No lockType configured for NRTCachingDirectory > -- > > Key: SOLR-4033 > URL: https://issues.apache.org/jira/browse/SOLR-4033 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 5.0 > Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 > 12:37:38 > Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. >Reporter: Markus Jelsma >Assignee: Mark Miller > Fix For: 5.0 > > > Please see: > http://lucene.472066.n3.nabble.com/No-lockType-configured-for-NRTCachingDirectory-td4017235.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4033) No lockType configured for NRTCachingDirectory
[ https://issues.apache.org/jira/browse/SOLR-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4033. --- Resolution: Fixed Fix Version/s: (was: 4.1) > No lockType configured for NRTCachingDirectory > -- > > Key: SOLR-4033 > URL: https://issues.apache.org/jira/browse/SOLR-4033 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 > 12:37:38 > Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. >Reporter: Markus Jelsma >Assignee: Mark Miller > Fix For: 5.0 > > > Please see: > http://lucene.472066.n3.nabble.com/No-lockType-configured-for-NRTCachingDirectory-td4017235.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4033) No lockType configured for NRTCachingDirectory
[ https://issues.apache.org/jira/browse/SOLR-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527201#comment-13527201 ] Mark Miller commented on SOLR-4033: --- Just committed a fix that uses the configured lock type everywhere no matter what. Previously, when not dealing with an index there was a case or two that didn't use it (writing properties files) and places did not use it when it was known that a Directory was already created, so not passing it had no affect. Now we just consistently pass it everywhere - no warning messages, no worries about is it safe to not pass it here, etc. > No lockType configured for NRTCachingDirectory > -- > > Key: SOLR-4033 > URL: https://issues.apache.org/jira/browse/SOLR-4033 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 > 12:37:38 > Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. >Reporter: Markus Jelsma >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > > Please see: > http://lucene.472066.n3.nabble.com/No-lockType-configured-for-NRTCachingDirectory-td4017235.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4033) No lockType configured for NRTCachingDirectory
[ https://issues.apache.org/jira/browse/SOLR-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527199#comment-13527199 ] Commit Tag Bot commented on SOLR-4033: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1418712 SOLR-4033: Consistently use the solrconfig.xml lockType everywhere. > No lockType configured for NRTCachingDirectory > -- > > Key: SOLR-4033 > URL: https://issues.apache.org/jira/browse/SOLR-4033 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 > 12:37:38 > Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. >Reporter: Markus Jelsma >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > > Please see: > http://lucene.472066.n3.nabble.com/No-lockType-configured-for-NRTCachingDirectory-td4017235.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527194#comment-13527194 ] Mark Miller commented on SOLR-4117: --- Thanks Markus - another race around updating to the new index and looking for the size of the index. I'll fix this. > IO error while trying to get the size of the Directory > -- > > Key: SOLR-4117 > URL: https://issues.apache.org/jira/browse/SOLR-4117 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 5.0 > Environment: 5.0.0.2012.11.28.10.42.06 > Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. >Reporter: Markus Jelsma >Assignee: Mark Miller >Priority: Minor > Fix For: 5.0 > > > With SOLR-4032 fixed we see other issues when randomly taking down nodes > (nicely via tomcat restart) while indexing a few million web pages from > Hadoop. We do make sure that at least one node is up for a shard but due to > recovery issues it may not be live. > One node seems to work but generates IO errors in the log and > ZookeeperExeption in the GUI. In the GUI we only see: > {code} > SolrCore Initialization Failures > openindex_f: > org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: > > Please check your logs for more information > {code} > and in the log we only see the following exception: > {code} > 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - > [http-8080-exec-28] - : IO error while trying to get the size of the > Directory:org.apache.lucene.store.NoSuchDirectoryException: directory > '/opt/solr/cores/shard_f/data/index' does not exist > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) > at > org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) > at > org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) > at > org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) > at > org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) > at > org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) > at > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) > at > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527187#comment-13527187 ] Jack Krupansky commented on SOLR-4030: -- bq. Now we fork Forking is understandable when you have time pressure and other interests to satisfy. No problem there. Hopefully you can contribute back some of the work from your fork. bq. taking our patches with us. How is that related to forking? I mean, sure, you can apply the patch on your own fork, but why does forking imply that you think you need to delete the posted patch? Besides, didn't you cede ownership of the patch to the community/ASF when you posted it? So, technically, it is no longer yours, right? It sounds like you need a refresher course in "Community 101"! > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527181#comment-13527181 ] Radim Kolar commented on SOLR-4030: --- You guys had 1 month of time and wasted it. Now we fork and taking our patches with us. Case closed. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4599) Compressed term vectors
[ https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527180#comment-13527180 ] Michael McCandless commented on LUCENE-4599: bq. Does it make sense to put this in an FST where the key is the term bytes and the value is what you're doing now for the positions, offsets, and payloads in a byte array? That's a neat idea :) We should [almost] just be able to use MemoryPostingsFormat, since it already stores all postings in an FST. bq. I think a FST would not compress as much as what LZ4 or Deflate can do? But maybe it could speed up TermsEnum.seekCeil on large documents so it might be an interesting idea regarding random access speed? Likely it would not compress as well, since LZ4/Deflate are able to share common infix fragments too, but FST only shares prefix/suffix. It'd be interesting to test ... but we should explore this (FST-backed TermVectorsFormat) in a new issue I think ... this issue seems awesome enough already :) bq. Or... can we simply reference the terms by ord (an int) instead of writing each term bytes? Using ords matching the main terms dict is a neat idea too! It would be much more compact ... but, when reading the term vectors we'd need to resolve-by-ord against the main terms dictionary (not all postings formats support that: it's optional, and eg our default PF doesn't), which would likely be slower than today. bq. Is that information available somewhere when writing/merging term vectors? Unfortunately, no. We only assign ords when it's time to flush the segment ... but we write term vectors "live" as we index each document. If we changed that, eg buffered up term vectors, then we could get the ords when we wrote them. > Compressed term vectors > --- > > Key: LUCENE-4599 > URL: https://issues.apache.org/jira/browse/LUCENE-4599 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs, core/termvectors >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: 4.1 > > Attachments: LUCENE-4599.patch > > > We should have codec-compressed term vectors similarly to what we have with > stored fields. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1304) Make it possible to force replication of at least some of the config files even if the index hasn't changed
[ https://issues.apache.org/jira/browse/SOLR-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527178#comment-13527178 ] Fredrik Rodland commented on SOLR-1304: --- +1 - really need this. > Make it possible to force replication of at least some of the config files > even if the index hasn't changed > --- > > Key: SOLR-1304 > URL: https://issues.apache.org/jira/browse/SOLR-1304 > Project: Solr > Issue Type: Improvement > Components: replication (java) >Reporter: Otis Gospodnetic >Priority: Minor > Fix For: 4.1 > > > From http://markmail.org/thread/vpk2fsjns7u2uopd > Here is a use case: > * Index is mostly static (nightly updates) > * elevate.xml needs to be changed throughout the day > * elevate.xml needs to be pushed to slaves and solr needs to reload it > This is currently not possible because replication will happen only if the > index > changed in some way. You can't force a commit to fake index change. So one has > to either: > * add/delete dummy docs on master to force index change > * write an external script that copies the config file to slaves -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527176#comment-13527176 ] Michael McCandless commented on SOLR-4030: -- Radim, maybe you can put your patch back? Other users could apply it / improve it, someone else may add a test case, etc., and it will eventually be committed. The process takes time ... Until Solr catches up, users can also look at ElasticSearch ... it's had IO throttling for a while now ( https://github.com/elasticsearch/elasticsearch/issues/2041 ) > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-Tests-trunk-Java6 - Build # 15670 - Still Failing
On Sat, Dec 8, 2012 at 10:30 AM, Mark Miller wrote: > Just as an update/fyi - I went to do this - then realized it would be easier > to like change this check to an assert and put a todo for yonik to look > closer at it or something - but saw that Uwe simply removed the nocommit. > > So yonik can just take a look when he gets a chance. > > Thanks Uwe. Oops! The code block Dawid referenced is no longer needed - I'll finish cleaning up. -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-Tests-trunk-Java6 - Build # 15670 - Still Failing
Everyone donates their time as they can. I'm glad your donating yours to help look into some test fails! I've helped look into / fix a lot myself, and I'm as unfamiliar with many Solr areas as you are. I guess this what happens when you take a bunch of tests for a larger test system that ran pretty well once a day serially and start running them in parallel and all the time on many machines over and over. And then add a bunch of distributed tests running lots of jetty instances, etc. A lot of the other committers didn't really sign up for that change - we really increased the challenge of writing Solr tests that never fail by like a factor of 20! Since I was part of making this change, I've been trying to help out where I can! I can see where some Lucene centric people would be like, what the hell, Lucene tests fail less - but I've been in both camps, and the situation is that it's easy to make Lucene tests for this heavy environment, and it's been a lot of work getting Solr tests solid. That's one of the joys of Lucene - no jetty, no dependencies, no heavy application infrastructure. But non the less, the Solr tests have been improving all the time. Special thanks to Dawid and others in helping make that happen! If you look at some other similar Apache projects that are no libraries and their test code, they would end up being just as difficult to harden to this level. IMO, it's been worth it though! I can run tests in 4 minutes instead of 40, and all the extra test runs find more bugs. So instead of complaining, I'll just keep pitching in to help when I find time. - Mark On Dec 8, 2012, at 7:30 AM, Robert Muir wrote: > My concern is actually that people are just filtering and not looking at test > failures. > > If thats the case, lets not waste valuable computing resources then. we can > do something else with them: such as additional coverage of lucene's test > suite. > > Otherwise, lets unfilter the fails and instead just fix them. The current > situation is very sad: > I fixed a RAM leak in a solr test earlier this week. The exception in the > failure basically told you exactly how to fix it. It was clear to me nobody > really even looked at this fail, it was so easy to fix. > I looked at the J9 bug, honestly i just did basic stuff (not knowing really > anything about solr distributed search). like download the JVM, run tests > with -Xint, and so on. Its like nobody really cares. > > Anyway I guess I'm just frustrated. I'd really like these tests to become > stable, and I'd like to help too, but I can barely even help because I don't > really know solr. Its going to take more people. > > > On Sat, Dec 8, 2012 at 9:41 AM, Mark Miller wrote: > > On Dec 8, 2012, at 5:40 AM, Robert Muir wrote: > > > makes it easier to navigate my gmail by far > > If that's your concern, ever hear of a filter :) > > - Mark > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-Tests-trunk-Java6 - Build # 15670 - Still Failing
My concern is actually that people are just filtering and not looking at test failures. If thats the case, lets not waste valuable computing resources then. we can do something else with them: such as additional coverage of lucene's test suite. Otherwise, lets unfilter the fails and instead just fix them. The current situation is very sad: I fixed a RAM leak in a solr test earlier this week. The exception in the failure basically told you exactly how to fix it. It was clear to me nobody really even looked at this fail, it was so easy to fix. I looked at the J9 bug, honestly i just did basic stuff (not knowing really anything about solr distributed search). like download the JVM, run tests with -Xint, and so on. Its like nobody really cares. Anyway I guess I'm just frustrated. I'd really like these tests to become stable, and I'd like to help too, but I can barely even help because I don't really know solr. Its going to take more people. On Sat, Dec 8, 2012 at 9:41 AM, Mark Miller wrote: > > On Dec 8, 2012, at 5:40 AM, Robert Muir wrote: > > > makes it easier to navigate my gmail by far > > If that's your concern, ever hear of a filter :) > > - Mark > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: [JENKINS] Lucene-Solr-Tests-trunk-Java6 - Build # 15670 - Still Failing
Just as an update/fyi - I went to do this - then realized it would be easier to like change this check to an assert and put a todo for yonik to look closer at it or something - but saw that Uwe simply removed the nocommit. So yonik can just take a look when he gets a chance. Thanks Uwe. - Mark On Dec 8, 2012, at 6:45 AM, Mark Miller wrote: > > On Dec 7, 2012, at 3:37 PM, Dawid Weiss wrote: > >> I think we should revert to make Jenkins happy; Yonik can re-apply later? > > +1 - makes sense to me. > > - Mark > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4599) Compressed term vectors
[ https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527167#comment-13527167 ] Adrien Grand commented on LUCENE-4599: -- I think a FST would not compress as much as what LZ4 or Deflate can do? But maybe it could speed up TermsEnum.seekCeil on large documents so it might be an interesting idea regarding random access speed? bq. can we simply reference the terms by ord (an int) instead of writing each term bytes? Do you mean their ords in the terms dictionary? Is that information available somewhere when writing/merging term vectors? > Compressed term vectors > --- > > Key: LUCENE-4599 > URL: https://issues.apache.org/jira/browse/LUCENE-4599 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs, core/termvectors >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: 4.1 > > Attachments: LUCENE-4599.patch > > > We should have codec-compressed term vectors similarly to what we have with > stored fields. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4591) Make StoredFieldsFormat more configurable
[ https://issues.apache.org/jira/browse/LUCENE-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527160#comment-13527160 ] Adrien Grand commented on LUCENE-4591: -- I had a look at CompressingStoredFieldsWriter and I think that having a different encoding/compression strategy per field would deserve a different StoredFieldsFormat impl (this is a discussion we had in LUCENE-4226, but in that case I think we could open up CompressingStoredFieldsIndexWriter/Reader). However I was thinking that if you don't mind adding one or two extra random seeks, maybe you could reuse it without extending it, like {code} MyCustomStoredFieldsWriter { StoredFieldsWriter defaultSfw; // the default Lucene 4.1 stored fields writer writeField(FieldInfo info, StorableField field) { if (isStandard(field)) { defaultSfw.writeField(info, field); } else { // TODO: custom logic writing non-standard fields to another IndexOutput } } } {code} and similarly for the reader {code} MyCustomStoredFieldsReader { StoredFieldsReader defaultSfr; // the default Lucene 4.1 stored fields reader void visitDocument(int n, StoredFieldVisitor visitor) { // visit standard fields defaultSfr.visitDocument(n, visitor); // TODO then visit specific fields } } {code} Would it work for your use-case? > Make StoredFieldsFormat more configurable > - > > Key: LUCENE-4591 > URL: https://issues.apache.org/jira/browse/LUCENE-4591 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: 4.1 >Reporter: Renaud Delbru > Fix For: 4.1 > > Attachments: LUCENE-4591.patch > > > The current StoredFieldsFormat are implemented with the assumption that only > one type of StoredfieldsFormat is used by the index. > We would like to be able to configure a StoredFieldsFormat per field, > similarly to the PostingsFormat. > There is a few issues that need to be solved for allowing that: > 1) allowing to configure a segment suffix to the StoredFieldsFormat > 2) implement SPI interface in StoredFieldsFormat > 3) create a PerFieldStoredFieldsFormat > We are proposing to start first with 1) by modifying the signature of > StoredFieldsFormat#fieldsReader and StoredFieldsFormat#fieldsWriter so that > they use SegmentReadState and SegmentWriteState instead of the current set of > parameters. > Let us know what you think about this idea. If this is of interest, we can > contribute with a first path for 1). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-Tests-trunk-Java6 - Build # 15670 - Still Failing
On Dec 7, 2012, at 3:37 PM, Dawid Weiss wrote: > I think we should revert to make Jenkins happy; Yonik can re-apply later? +1 - makes sense to me. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-Tests-trunk-Java6 - Build # 15670 - Still Failing
On Dec 8, 2012, at 5:40 AM, Robert Muir wrote: > makes it easier to navigate my gmail by far If that's your concern, ever hear of a filter :) - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-Tests-trunk-Java6 - Build # 15670 - Still Failing
Are people even looking at build failures? If not, lets just disable solr tests and reduce the noise again. makes it easier to navigate my gmail by far On Fri, Dec 7, 2012 at 6:37 PM, Dawid Weiss wrote: > I think Yonik added a nocommit -- > > Author: Yonik Seeley 2012-12-07 15:47:34 > Committer: Yonik Seeley 2012-12-07 15:47:34 > > ... > +if (lowerBound > upperBound) { > + // nocommit > + throw new RuntimeException("WHAAAT?"); > +} > > I think we should revert to make Jenkins happy; Yonik can re-apply later? > > Dawid > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527135#comment-13527135 ] Radim Kolar commented on SOLR-4030: --- It was resolved for me. If you need this open your ticket. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4592) Fix Formula in Javadocs of NumericRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527120#comment-13527120 ] Commit Tag Bot commented on LUCENE-4592: [branch_4x commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1418653 LUCENE-4592: Improve Javadocs of NumericRangeQuery > Fix Formula in Javadocs of NumericRangeQuery > > > Key: LUCENE-4592 > URL: https://issues.apache.org/jira/browse/LUCENE-4592 > Project: Lucene - Core > Issue Type: Bug > Components: general/javadocs >Affects Versions: 4.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4592.patch, nrq-formula-1.png, nrq-formula-2.png > > > The formula in the JavaDocs of NumericRangeQuery that returns the maximum > number of terms in the NRQ's TermsEnum is confusing tthe user. > I will fix the documentation and also add some more numbers to the of > NumericField (e.g. the average number of term in index). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4592) Fix Formula in Javadocs of NumericRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-4592. --- Resolution: Fixed Committed trunk revision: 1418652, 4.x revision: 1418653 > Fix Formula in Javadocs of NumericRangeQuery > > > Key: LUCENE-4592 > URL: https://issues.apache.org/jira/browse/LUCENE-4592 > Project: Lucene - Core > Issue Type: Bug > Components: general/javadocs >Affects Versions: 4.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4592.patch, nrq-formula-1.png, nrq-formula-2.png > > > The formula in the JavaDocs of NumericRangeQuery that returns the maximum > number of terms in the NRQ's TermsEnum is confusing tthe user. > I will fix the documentation and also add some more numbers to the of > NumericField (e.g. the average number of term in index). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4592) Fix Formula in Javadocs of NumericRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527118#comment-13527118 ] Commit Tag Bot commented on LUCENE-4592: [trunk commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1418652 LUCENE-4592: Improve Javadocs of NumericRangeQuery > Fix Formula in Javadocs of NumericRangeQuery > > > Key: LUCENE-4592 > URL: https://issues.apache.org/jira/browse/LUCENE-4592 > Project: Lucene - Core > Issue Type: Bug > Components: general/javadocs >Affects Versions: 4.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4592.patch, nrq-formula-1.png, nrq-formula-2.png > > > The formula in the JavaDocs of NumericRangeQuery that returns the maximum > number of terms in the NRQ's TermsEnum is confusing tthe user. > I will fix the documentation and also add some more numbers to the of > NumericField (e.g. the average number of term in index). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4592) Fix Formula in Javadocs of NumericRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-4592: -- Attachment: nrq-formula-2.png nrq-formula-1.png LUCENE-4592.patch Here the improvements. The (rather complex) formulas are now typesetted with Latex using the page http://1.618034.com/latex.php to produce PNGs out of it. The Latex source code is in the 's alt attribute, transformed with 110dpi. I will commit this soon. > Fix Formula in Javadocs of NumericRangeQuery > > > Key: LUCENE-4592 > URL: https://issues.apache.org/jira/browse/LUCENE-4592 > Project: Lucene - Core > Issue Type: Bug > Components: general/javadocs >Affects Versions: 4.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4592.patch, nrq-formula-1.png, nrq-formula-2.png > > > The formula in the JavaDocs of NumericRangeQuery that returns the maximum > number of terms in the NRQ's TermsEnum is confusing tthe user. > I will fix the documentation and also add some more numbers to the of > NumericField (e.g. the average number of term in index). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527113#comment-13527113 ] Markus Jelsma commented on SOLR-4030: - Why did you mark this as `not a problem`. This issue is not resolved. > Use Lucene segment merge throttling > --- > > Key: SOLR-4030 > URL: https://issues.apache.org/jira/browse/SOLR-4030 > Project: Solr > Issue Type: Improvement >Reporter: Radim Kolar >Priority: Minor > Labels: patch > Fix For: 4.1, 5.0 > > > add argument "maxMergeWriteMBPerSec" to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_09) - Build # 2096 - Still Failing!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/2096/ Java: 64bit/jdk1.7.0_09 -XX:+UseG1GC All tests passed Build Log: [...truncated 13636 lines...] BUILD FAILED C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:88: The following files contain @author tags, tabs or nocommits: * solr/solrj/src/java/org/apache/solr/common/cloud/CompositeIdRouter.java Total time: 53 minutes 12 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 64bit/jdk1.7.0_09 -XX:+UseG1GC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.6.0_37) - Build # 3116 - Still Failing!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/3116/ Java: 32bit/jdk1.6.0_37 -server -XX:+UseParallelGC All tests passed Build Log: [...truncated 12953 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:88: The following files contain @author tags, tabs or nocommits: * solr/solrj/src/java/org/apache/solr/common/cloud/CompositeIdRouter.java Total time: 30 minutes 0 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.6.0_37 -server -XX:+UseParallelGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org