[jira] Updated: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder
[ https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-1644: - Attachment: SOLR-1644.patch implemented as Uri said Provide a clean way to keep flags and helper objects in ResponseBuilder --- Key: SOLR-1644 URL: https://issues.apache.org/jira/browse/SOLR-1644 Project: Solr Issue Type: Improvement Components: search Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1644.patch, SOLR-1644.patch Many components such as StatsComponent, FacetComponent etc keep flags and helper objects in ResponseBuilder. Having to modify the ResponseBuilder for such things is a very kludgy solution. Let us provide a clean way for components to keep arbitrary objects for the duration of a (distributed) search request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1630: Attachment: SOLR-1630.patch I'm not able to reproduce this issue. I used Robin's document, schema and solrconfig.xml in the form of a unit test and it gives an empty spell check response but no exceptions. StringIndexOutOfBoundsException in SpellCheckComponent -- Key: SOLR-1630 URL: https://issues.apache.org/jira/browse/SOLR-1630 Project: Solr Issue Type: Bug Components: Schema and Analysis, spellchecker Affects Versions: 1.4 Environment: Solr 1.4 Lucene 2.9.1 Win XP java version 1.6.0_14 Reporter: Robin Wojciki Assignee: Shalin Shekhar Mangar Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml For some documents/search strings, the SpellCheckComponent throws StringIndexOutOfBoundsException See: http://www.lucidimagination.com/search/document/3be6555227e031fc/ h2. Replication * Save attached schema.xml and solrconfig.xml in apache-solr-1.4.0/example/solr/conf * Start Solr * Index attached bug.xml * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw] It throws a StringIndexOutOfBoundsException {noformat} String index out of range: -7 java.lang.StringIndexOutOfBoundsException: String index out of range: -7 at java.lang.AbstractStringBuilder.replace(Unknown Source) at java.lang.StringBuilder.replace(Unknown Source) at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791325#action_12791325 ] Guillaume Lebourgeois commented on SOLR-1630: - Ok, i'lm gonna try to upload my own config in case it can help. StringIndexOutOfBoundsException in SpellCheckComponent -- Key: SOLR-1630 URL: https://issues.apache.org/jira/browse/SOLR-1630 Project: Solr Issue Type: Bug Components: Schema and Analysis, spellchecker Affects Versions: 1.4 Environment: Solr 1.4 Lucene 2.9.1 Win XP java version 1.6.0_14 Reporter: Robin Wojciki Assignee: Shalin Shekhar Mangar Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml For some documents/search strings, the SpellCheckComponent throws StringIndexOutOfBoundsException See: http://www.lucidimagination.com/search/document/3be6555227e031fc/ h2. Replication * Save attached schema.xml and solrconfig.xml in apache-solr-1.4.0/example/solr/conf * Start Solr * Index attached bug.xml * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw] It throws a StringIndexOutOfBoundsException {noformat} String index out of range: -7 java.lang.StringIndexOutOfBoundsException: String index out of range: -7 at java.lang.AbstractStringBuilder.replace(Unknown Source) at java.lang.StringBuilder.replace(Unknown Source) at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guillaume Lebourgeois updated SOLR-1630: Attachment: spellcheckconfig.xml This file provide a spellcheck configuration and a requesthandler which may raise an exception when making queries Example of queries which work fine : * ?q=test * ?q=my+name+is+henry * ?q=éléphant Example of queries which throw an exception : * ?q=sous-marin * ?q=sous-marin+russe * ?q=sous_marin * ?q=éléphant-blanc It may be linked to the content of the index, and/or the spellcheck index. StringIndexOutOfBoundsException in SpellCheckComponent -- Key: SOLR-1630 URL: https://issues.apache.org/jira/browse/SOLR-1630 Project: Solr Issue Type: Bug Components: Schema and Analysis, spellchecker Affects Versions: 1.4 Environment: Solr 1.4 Lucene 2.9.1 Win XP java version 1.6.0_14 Reporter: Robin Wojciki Assignee: Shalin Shekhar Mangar Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, spellcheckconfig.xml For some documents/search strings, the SpellCheckComponent throws StringIndexOutOfBoundsException See: http://www.lucidimagination.com/search/document/3be6555227e031fc/ h2. Replication * Save attached schema.xml and solrconfig.xml in apache-solr-1.4.0/example/solr/conf * Start Solr * Index attached bug.xml * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw] It throws a StringIndexOutOfBoundsException {noformat} String index out of range: -7 java.lang.StringIndexOutOfBoundsException: String index out of range: -7 at java.lang.AbstractStringBuilder.replace(Unknown Source) at java.lang.StringBuilder.replace(Unknown Source) at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791334#action_12791334 ] Guillaume Lebourgeois edited comment on SOLR-1630 at 12/16/09 11:38 AM: This file provide a spellcheck configuration and a requesthandler which may raise an exception when making queries Example of queries which work fine : * ?q=test * ?q=my+name+is+henry * ?q=éléphant Example of queries which throw an exception : * ?q=sous-marin * ?q=sous-marin+russe * ?q=sous_marin * ?q=éléphant-blanc It may be linked to the content of the index, and/or the spellcheck index. Here is the stack : at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797) at java.lang.StringBuilder.replace(StringBuilder.java:271) at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) was (Author: glebourg): This file provide a spellcheck configuration and a requesthandler which may raise an exception when making queries Example of queries which work fine : * ?q=test * ?q=my+name+is+henry * ?q=éléphant Example of queries which throw an exception : * ?q=sous-marin * ?q=sous-marin+russe * ?q=sous_marin * ?q=éléphant-blanc It may be linked to the content of the index, and/or the spellcheck index. StringIndexOutOfBoundsException in SpellCheckComponent -- Key: SOLR-1630 URL: https://issues.apache.org/jira/browse/SOLR-1630 Project: Solr Issue Type: Bug Components: Schema and Analysis, spellchecker Affects Versions: 1.4 Environment: Solr 1.4 Lucene 2.9.1 Win XP java version 1.6.0_14 Reporter: Robin Wojciki Assignee: Shalin Shekhar Mangar Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, spellcheckconfig.xml For some documents/search strings, the SpellCheckComponent throws StringIndexOutOfBoundsException See: http://www.lucidimagination.com/search/document/3be6555227e031fc/ h2. Replication * Save attached schema.xml and solrconfig.xml in apache-solr-1.4.0/example/solr/conf * Start Solr * Index attached bug.xml * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw] It throws a StringIndexOutOfBoundsException {noformat} String index out of range: -7 java.lang.StringIndexOutOfBoundsException: String index out of range: -7 at java.lang.AbstractStringBuilder.replace(Unknown Source) at java.lang.StringBuilder.replace(Unknown Source) at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248) at
[jira] Created: (SOLR-1661) Remove adminCore from CoreContainer
Remove adminCore from CoreContainer --- Key: SOLR-1661 URL: https://issues.apache.org/jira/browse/SOLR-1661 Project: Solr Issue Type: Task Reporter: Noble Paul Assignee: Noble Paul Priority: Minor Fix For: 1.5 we have deprecated the admin core concept as a part of SOLR-1121. It can be removed completely now. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1647) Remove the option of setting solrconfig from web.xml
[ https://issues.apache.org/jira/browse/SOLR-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-1647: - Attachment: SOLR-1647.patch Remove the option of setting solrconfig from web.xml Key: SOLR-1647 URL: https://issues.apache.org/jira/browse/SOLR-1647 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.5 Attachments: SOLR-1647.patch with SOLR-1621 , it is not required to have an option to set solrconfig from web.xml. Moreover editing web.xml means hacking solr itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791342#action_12791342 ] Shalin Shekhar Mangar commented on SOLR-1630: - Thanks Guillaume, can you give me an example document too? StringIndexOutOfBoundsException in SpellCheckComponent -- Key: SOLR-1630 URL: https://issues.apache.org/jira/browse/SOLR-1630 Project: Solr Issue Type: Bug Components: Schema and Analysis, spellchecker Affects Versions: 1.4 Environment: Solr 1.4 Lucene 2.9.1 Win XP java version 1.6.0_14 Reporter: Robin Wojciki Assignee: Shalin Shekhar Mangar Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, spellcheckconfig.xml For some documents/search strings, the SpellCheckComponent throws StringIndexOutOfBoundsException See: http://www.lucidimagination.com/search/document/3be6555227e031fc/ h2. Replication * Save attached schema.xml and solrconfig.xml in apache-solr-1.4.0/example/solr/conf * Start Solr * Index attached bug.xml * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw] It throws a StringIndexOutOfBoundsException {noformat} String index out of range: -7 java.lang.StringIndexOutOfBoundsException: String index out of range: -7 at java.lang.AbstractStringBuilder.replace(Unknown Source) at java.lang.StringBuilder.replace(Unknown Source) at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1661) Remove adminCore from CoreContainer
[ https://issues.apache.org/jira/browse/SOLR-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-1661: - Attachment: SOLR-1661.patch Remove adminCore from CoreContainer --- Key: SOLR-1661 URL: https://issues.apache.org/jira/browse/SOLR-1661 Project: Solr Issue Type: Task Reporter: Noble Paul Assignee: Noble Paul Priority: Minor Fix For: 1.5 Attachments: SOLR-1661.patch we have deprecated the admin core concept as a part of SOLR-1121. It can be removed completely now. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791360#action_12791360 ] Guillaume Lebourgeois commented on SOLR-1630: - I've been trying to reproduce the bug with a one-document index, but I fail... on the other hand, on index of 500k+ documents this issue is automatic. Maybe it's linked with some kinds of documents ? I don't know, I'm gonna test some other possibilities in case it can help. StringIndexOutOfBoundsException in SpellCheckComponent -- Key: SOLR-1630 URL: https://issues.apache.org/jira/browse/SOLR-1630 Project: Solr Issue Type: Bug Components: Schema and Analysis, spellchecker Affects Versions: 1.4 Environment: Solr 1.4 Lucene 2.9.1 Win XP java version 1.6.0_14 Reporter: Robin Wojciki Assignee: Shalin Shekhar Mangar Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, spellcheckconfig.xml For some documents/search strings, the SpellCheckComponent throws StringIndexOutOfBoundsException See: http://www.lucidimagination.com/search/document/3be6555227e031fc/ h2. Replication * Save attached schema.xml and solrconfig.xml in apache-solr-1.4.0/example/solr/conf * Start Solr * Index attached bug.xml * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw] It throws a StringIndexOutOfBoundsException {noformat} String index out of range: -7 java.lang.StringIndexOutOfBoundsException: String index out of range: -7 at java.lang.AbstractStringBuilder.replace(Unknown Source) at java.lang.StringBuilder.replace(Unknown Source) at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1662) BufferedTokenStream incorrect cloning
BufferedTokenStream incorrect cloning - Key: SOLR-1662 URL: https://issues.apache.org/jira/browse/SOLR-1662 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir As part of writing tests for SOLR-1657, I rewrote one of the base classes (BaseTokenTestCase) to use the new TokenStream API, but also with some additional safety. {code} public static String tsToString(TokenStream in) throws IOException { StringBuilder out = new StringBuilder(); TermAttribute termAtt = (TermAttribute) in.addAttribute(TermAttribute.class); // extra safety to enforce, that the state is not preserved and also // assign bogus values in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); while (in.incrementToken()) { if (out.length() 0) out.append(' '); out.append(termAtt.term()); in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); } in.close(); return out.toString(); } {code} Setting the term text to bogus values helps find bugs in tokenstreams that do not clear or clone properly. In this case there is a problem with a tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A B but does not clone, so the values get overwritten. This can be fixed in two ways: * BufferedTokenStream does the cloning * subclasses are responsible for the cloning The question is which one should it be? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1662) BufferedTokenStream incorrect cloning
[ https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791370#action_12791370 ] Uwe Schindler commented on SOLR-1662: - Just the short desription from the API side in Lucene: Lucene's documentation of TokenStream.next() says: The returned Token is a full private copy (not re-used across calls to next()). AB_AAB_Stream.process() duplicates the token by just putting it uncloned into the outQueue. As the consumer of the BufferedTokenStream assumes that the Token is private it is allowed to change it - and by that it also changes the token in the outQueue. If you e.g. put another TokenFilter in fromt of this AB_AAB_Stream, and modify the token there it would break. In my opinion, the responsibility to clone is in AB_AAB_Stream, BufferedTokenStream will never return the same token twice by itsself. So its a bug in the test. But Robert told me that e.g. RemoveDuplicates has a similar problem. The general contract for writing such streams is: whenever you return a Token from next(), never put it somewhere else uncloned, because the caller can change it. The fix is to do: write((Token) t.clone()); BufferedTokenStream incorrect cloning - Key: SOLR-1662 URL: https://issues.apache.org/jira/browse/SOLR-1662 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir As part of writing tests for SOLR-1657, I rewrote one of the base classes (BaseTokenTestCase) to use the new TokenStream API, but also with some additional safety. {code} public static String tsToString(TokenStream in) throws IOException { StringBuilder out = new StringBuilder(); TermAttribute termAtt = (TermAttribute) in.addAttribute(TermAttribute.class); // extra safety to enforce, that the state is not preserved and also // assign bogus values in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); while (in.incrementToken()) { if (out.length() 0) out.append(' '); out.append(termAtt.term()); in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); } in.close(); return out.toString(); } {code} Setting the term text to bogus values helps find bugs in tokenstreams that do not clear or clone properly. In this case there is a problem with a tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A B but does not clone, so the values get overwritten. This can be fixed in two ways: * BufferedTokenStream does the cloning * subclasses are responsible for the cloning The question is which one should it be? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1662) BufferedTokenStream incorrect cloning
[ https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791374#action_12791374 ] Robert Muir commented on SOLR-1662: --- bq. but Robert told me that e.g. RemoveDuplicates has a similar problem. Right, there is no cloning in RemoveDuplicates. CommonGrams creates a new Token() when it grams, but its not clear that this one is correct either. So if we decide its the responsibility of the subclass, these implementations need thorough tests to see if they are ok or not. If we add the cloning to BufferedTokenStream itself, then we know they are ok... BufferedTokenStream incorrect cloning - Key: SOLR-1662 URL: https://issues.apache.org/jira/browse/SOLR-1662 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir As part of writing tests for SOLR-1657, I rewrote one of the base classes (BaseTokenTestCase) to use the new TokenStream API, but also with some additional safety. {code} public static String tsToString(TokenStream in) throws IOException { StringBuilder out = new StringBuilder(); TermAttribute termAtt = (TermAttribute) in.addAttribute(TermAttribute.class); // extra safety to enforce, that the state is not preserved and also // assign bogus values in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); while (in.incrementToken()) { if (out.length() 0) out.append(' '); out.append(termAtt.term()); in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); } in.close(); return out.toString(); } {code} Setting the term text to bogus values helps find bugs in tokenstreams that do not clear or clone properly. In this case there is a problem with a tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A B but does not clone, so the values get overwritten. This can be fixed in two ways: * BufferedTokenStream does the cloning * subclasses are responsible for the cloning The question is which one should it be? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791414#action_12791414 ] Yonik Seeley commented on SOLR-1131: I'm spot-checking mutiple different patches at this point... but in general, we should strive to not expose the complexity further up the type hierarchy, and we should not limit what subclasses can do. isPolyField() returns true if more than one Fieldable *can* be returned from createFields() createFields() is free to return whatever the heck it likes. And from SchemaField and FieldType's perspective,that's it. Implementation details are up to subclasses and we shouldn't add assumptions in base classes. There should be *no* concept of subFieldTypes or whatever baked into anything. So, from Noble's patch: we shouldn't try caching subfields in SchemaField... and esp not via if (type instanceof DelegatingFieldType)... it really doesn't belong there. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791426#action_12791426 ] Mark Miller commented on SOLR-1277: --- I wonder how we might track load - Currently, wouldn't we have to grab every request handler and add up the requests and track the change in a given period of time? Would it make sense to add total requests received tracking (across handlers), so we don't have to keep polling each/every request handler? Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791425#action_12791425 ] Mark Miller commented on SOLR-1277: --- So based on what we know, it sounds like we are going to have to use a very high timeout for the ZooKeeper client? Then each node will run a thread that periodically updates its availability? When a node chooses its shards for a distributed search, it can look at how long its been since each shard updated itself, and choose or drop based on that? In the event that a *very* long time out period has passed, the client will timeout and the znode will actually be removed? This seems like it will be easier than trying to reconnect after timeouts and managing Solr during the disconnected period? Sound like the update itself might be the current load on that node - then nodes choosing other nodes for a distrib search can use both how recently nodes where updated as well as their reported loads to choose which nodes to select for a search? Does this sound right? Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1663) Add numRequests to SolrCore statistics to make it easier to track load
Add numRequests to SolrCore statistics to make it easier to track load -- Key: SOLR-1663 URL: https://issues.apache.org/jira/browse/SOLR-1663 Project: Solr Issue Type: New Feature Reporter: Mark Miller Priority: Minor Attachments: SOLR-1663.patch As we get SolrCloud up and running, its going to be helpful to track the load on each server. We might add request tracking to SolrCore to help make this easier than looking at each request handler in each core. Number of requests is also only an optional stat at the RequestHandler level. Then you can just cycle through each core and grab how many requests it has received, and track that over a given interval. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1663) Add numRequests to SolrCore statistics to make it easier to track load
[ https://issues.apache.org/jira/browse/SOLR-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-1663: -- Attachment: SOLR-1663.patch Add numRequests to SolrCore statistics to make it easier to track load -- Key: SOLR-1663 URL: https://issues.apache.org/jira/browse/SOLR-1663 Project: Solr Issue Type: New Feature Reporter: Mark Miller Priority: Minor Attachments: SOLR-1663.patch As we get SolrCloud up and running, its going to be helpful to track the load on each server. We might add request tracking to SolrCore to help make this easier than looking at each request handler in each core. Number of requests is also only an optional stat at the RequestHandler level. Then you can just cycle through each core and grab how many requests it has received, and track that over a given interval. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791438#action_12791438 ] Yonik Seeley commented on SOLR-1277: While our designs shouldn't preclude load based node selection, I don't think we should tackle it now - it's fraught with peril. We should allow the configuration of capacity for a node (or host?) and eventually implement a load balancing mechanism that takes such capacity into account. If one node has half the capacity of another, it will be sent half the number of requests. This type of static balancing is easier to predict and test. The other issue with updating statistics is the write cost on zookeeper - we may not want to do it by default, and if we do, we wouldn't want to do it with a high frequency. Some other considerations when choosing nodes for distributed search: - the same node should be used for a particular shard for the multiple phases of a distributed search, both for better consistency between phases, and better caching. - zookeeper could be used to take a node out of service (and other nodes should immediately stop making requests to that node), but each node also needs to be able to determine failure of another node and retry a different node independent of zookeeper. Everything (search traffic) should work when disconnected from zookeeper, based on the last cluster configuration seen. Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791442#action_12791442 ] Mark Miller edited comment on SOLR-1277 at 12/16/09 4:38 PM: - Yeah, I'm not trying to tackle node selection yet - just client timeouts. But if a client is going to be periodically updating a node to state its still in good shape, it seems like it might as well make the update include its current load. Not that thats not something that can'y be easily added later - I mostly through that in because it was part of the previous recommendation on how to handle client timeouts. I don't necessarily like the idea of all of the nodes updating all the time to note their existence, but it seems like our best option from what I gather now. Otherwise, nodes will be timing out all the time - and handling the reconnection seems like a pain - if Solr needs something from ZooKeeper after a GC ends, its going to have to pause and wait for the reconnect. Or I guess, on every ZooKeeper request, build in a timed retry? My main concern at the moment is coming up with a plan for these timeouts though. If we raise the timeout limits, we need another method for determining nodes are down. I suppose another option might be, its up to a node that can't reach another node to tag it as unresponsive? was (Author: markrmil...@gmail.com): Yeah, I'm not trying to tackle node selection yet - just client timeouts. But if a client is going to be periodically updating a node to state its still in good shape, it seems like it might as well make the update include its current load. Not that thats not something that can be easily added later - I mostly through that in because it was part of the previous recommendation on how to handle client timeouts. I don't necessarily like the idea of all of the nodes updating all the time to note their existence, but it seems like our best option from what I gather now. Otherwise, nodes will be timing out all the time - and handling the reconnection seems like a pain - if Solr needs something from ZooKeeper after a GC ends, its going to have to pause and wait for the reconnect. Or I guess, on every ZooKeeper request, build in a timed retry? My main concern at the moment is coming up with a plan for these timeouts though. If we raise the timeout limits, we need another method for determining nodes are down. I suppose another option might be, its up to a node that can't reach another node to tag it as unresponsive? Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791442#action_12791442 ] Mark Miller commented on SOLR-1277: --- Yeah, I'm not trying to tackle node selection yet - just client timeouts. But if a client is going to be periodically updating a node to state its still in good shape, it seems like it might as well make the update include its current load. Not that thats not something that can be easily added later - I mostly through that in because it was part of the previous recommendation on how to handle client timeouts. I don't necessarily like the idea of all of the nodes updating all the time to note their existence, but it seems like our best option from what I gather now. Otherwise, nodes will be timing out all the time - and handling the reconnection seems like a pain - if Solr needs something from ZooKeeper after a GC ends, its going to have to pause and wait for the reconnect. Or I guess, on every ZooKeeper request, build in a timed retry? My main concern at the moment is coming up with a plan for these timeouts though. If we raise the timeout limits, we need another method for determining nodes are down. I suppose another option might be, its up to a node that can't reach another node to tag it as unresponsive? Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791459#action_12791459 ] Yonik Seeley commented on SOLR-1277: bq. I don't necessarily like the idea of all of the nodes updating all the time to note their existence, but it seems like our best option from what I gather now. Not sure I understand... for group membership, I had assumed there would be an ephemeral znode per node. Zookeeper does pings, and deletes the znode when the session expires, but those aren't updates per se. bq. My main concern at the moment is coming up with a plan for these timeouts though. Zookeeper client-server timeouts? Or Solr node-node request timeouts? Zookeeper timeouts need to be handled on a per-case basis - we should design such that most of the time we can continue operating even if we can't talk to zookeeper. Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791474#action_12791474 ] Mark Miller commented on SOLR-1277: --- bq. Not sure I understand... for group membership, I had assumed there would be an ephemeral znode per node. Zookeeper does pings, and deletes the znode when the session expires, but those aren't updates per se. Right - thats the problem I want to address. Ephemeral nodes go away when the client times out - with a low timeout, you can learn relatively fast that a node is down. But because we may have long gc pauses, a low timeout will cause false down reports. And we have to handle reconnection's. But if we raise the timeout to get around these gc pauses, if there really is a problem, it will take a long time to learn about it. One of the recommendations above was to use a lease system instead, where each node does these updates. I'm trying to determine which strategy we actually want to use. Another option given was to let the gc cause a timeout, and then reconnect - but Solr has to wait for the reconnection to occur before it can access ZooKeeper again. {quote} Zookeeper client-server timeouts? Or Solr node-node request timeouts? Zookeeper timeouts need to be handled on a per-case basis - we should design such that most of the time we can continue operating even if we can't talk to zookeeper. {quote} Zookeeper client-server timeouts But as you say above, if a client times out, its ephemeral node goes down, and that shard will no longer be participating in distrib requests hitting other servers (presumably). How can we continue operating? We won't know which shards to hit (I guess we could use the old shards list?) and we won't be part of distributed requests from other shards, because our ephemeral node will be removed ... I'm ref'ing to Patrick Hunt's comments above. Perhaps, because recovery won't be expensive, thats what we want to do - but Solr won't be able to access ZooKeeper until its recovered - so I guess for that brief period, we drop out of other distrib requests, and if we get hit, we just use the old shards list for requests that hit the dropped server? Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791475#action_12791475 ] Mark Miller commented on SOLR-1277: --- {quote} From our experience with hbase (which is the only place we've seen this issue so far, at least to this extent) you need to think about: 1) client timeout value tradeoffs 2) effects of session expiration due to gc pause, potential ways to mitigate for 1) there is a tradeoff (the good thing is that not all clients need to use the same timeout, so you can tune based on the client type, you can even have multiple sessions for a single client, each with it's own timeout) You can set the timeout higher, so if your zk client pauses you don't get expired, however this also means that if your client crashes the session won't be expired until the timeout expires. This means that the rest of your system will not be notified of the change (say you are doing leader election) for longer than you might like. for 2) you need to think about the potential failure cases and their effects. a) Say your ZK client (solr component X) fails (the host crashes), do you need to know about this in 5 seconds, or 30sec? b) Say the host is network partitioned due to a burp in the network that lasts 5 seconds, is this ok, or does the rest of the solr system need to know about this? c) Say component X gc pauses for 4 minutes, do you want the rest of the system to react immed, or consider this ok and just wait around for a while for X to come back but keep in mind that from the perspective of the rest of your system you don't know the difference between a) or b or c (etc...), from their viewpoint X is gone and they don't know why (unless it eventually comes back) In hbase case session expiration is expensive as the region server master will reallocate the table (or some such). In your case the effects of X going down may not be very expensive. If this is the case then having a low(er) session timeout for X may not be a problem. (just deal with the session timeout when it does happen, X will eventually come back) If X recovery is expensive you may want to set the timeout very high. but as I said this makes the system less responsive if X has a real problem. Another option we explored with hbase is to use a lease recipe instead. Set a very high timeout, but have X update the znode (still ephemeral) every N seconds. If the rest of the system (whoever is interested in X status) doesn't see an update from X in T seconds, then perhaps you log a warning (where is X?). Say you don't see an update from X in T*2 seconds, then page the operator warning, maybe problems with X. Say you don't see in T*3 seconds (perhaps this is the timeout you use, in which case the znode is removed), consider X down, cleanup and enact recovery. These are madeup actions/times, but you can see what I'm getting at. With lease it's not all or nothing. You (solr) have the option to take actions based on the lease time, rather than just the znode being deleted in the typical case (all or nothing). The tradeoff here is that it's a bit more complicted for you - you need to implement the lease rather than just relying on the znode being deleted - you would of course set a watch on the znode to get notified when the znode is removed (etc...) {quote} Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791480#action_12791480 ] Mark Miller commented on SOLR-1277: --- bq. so I guess for that brief period, we drop out of other distrib requests, and if we get hit, we just use the old shards list for requests that hit the dropped server? I suppose what I am worried about is when you don't have duplicate shards - or when two shards with the same data have a long gc pause together - if they just drop out, you get results back that are not from the full index. Many would prefer the search just take a bit longer (as it normally would with a gc) than losing results. Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791486#action_12791486 ] Yonik Seeley commented on SOLR-1277: bq. I suppose what I am worried about is when you don't have duplicate shards - or when two shards with the same data have a long gc pause together - if they just drop out, you get results back that are not from the full index. Ahhh, good point. We can't let that happen. But if NodeA said it had ShardX, and then it's ephemeral node went away, it's not dropping out of the cluster... it's just that it's currently unavailable (and we need to return partial results or fail the request). Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791487#action_12791487 ] Patrick Hunt commented on SOLR-1277: You guys are asking the right questions. In particular the issue about how expensive is it to lose a solr node is a good one to think about. Unfort I don't know enough about solr to advise you, but if it's not very expensive to lose/regain a node then just let it timeout. The rest of the system will see this quickly (via ephemeral node/watch) and when the solr node is active again (comes out of the gc pause) it will talk to the zk server, see that it's session has been expired, and re-bootstrap into the solr cloud. Another thing to ask yourself is this if a Solr node pauses for 4 minutes due to GC pause, how different is that from a network partition or crash/reboot of that node? What I'm saying here is, the node is _gone_ for 4 minutes -- what effect does that have on the rest of your system. Say you are expecting some very low SLA from that node, then upping the timeout is not useful here. Loss of the solr node due to gc is no diff than network partition or crash/reboot of the host. Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791505#action_12791505 ] Noble Paul commented on SOLR-1131: -- bq.we shouldn't try caching subfields in SchemaField I believe The SchemaField is an ideal place to cache the 'synthetic' field info. bq.and esp not via if (type instanceof DelegatingFieldType)... it really doesn't belong there. true. It was a quick and dirty way to demo the idea. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791508#action_12791508 ] Yonik Seeley commented on SOLR-1277: bq. Right - thats the problem I want to address. Ephemeral nodes go away when the client times out - with a low timeout, you can learn relatively fast that a node is down. My assumption was to use a longer timeout on zookeeper (the default seems fine) to define who was active. When a node makes a request to a node that is down, it will fail relatively quickly, and can use a local policy to avoid that node for a certain amount of time. Seems like we need to handle these types of failures anyway, regardless of how low we set the zookeeper timeout. Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791529#action_12791529 ] Brian Pinkerton commented on SOLR-1277: --- I think the timeouts are going to have to be different depending on the role of the particular node. In a really distributed setup, indexing nodes are generally more likely to have long GC pauses than searcher nodes, and a lengthy GC pause on an indexer is usually not a problem. However, if a searcher node goes out on a long GC pause then you need to find out fast and bypass the box before too many queries back up and need to be retried (though even this depends on throughput, response time, and number of other available nodes.) Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791557#action_12791557 ] Mark Miller commented on SOLR-1277: --- bq. maybe this is already in the spec Nothing is completely nailed down in the spec - Yonik has done a bunch of work on the SolrCloud page, but a lot of that is: we could do this, or we could do that, or we might do this. We haven't really nailed much down firmly. Still pretty high level at the moment. bq. How are we addressing a failed connection to a slave server, and instead of failing the request, re-making the request to an adjacent slave? We haven't really gotten there. But we want to cover that. What do you propose? The more we get these discussions going, the faster things will start getting nailed down ... bq. A failure is a failure and whether it's the GC or something else, it's really the same thing. Its kind of arbitrary distinctions. Your saying, we would say a GC pause of 4 seconds (under the ZK client timeout) is not a failure, and a GC timeout of 6 seconds (over the ZK client timeout) is a failure. I'm not claiming any distinction is better than another though - just trying to work out the directions we want to go so I can start paddling. I can code till the cows come home with no input, but you might not like the results :) Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791559#action_12791559 ] Mark Miller commented on SOLR-1277: --- {quote} I think the timeouts are going to have to be different depending on the role of the particular node. In a really distributed setup, indexing nodes are generally more likely to have long GC pauses than searcher nodes, and a lengthy GC pause on an indexer is usually not a problem. However, if a searcher node goes out on a long GC pause then you need to find out fast and bypass the box before too many queries back up and need to be retried (though even this depends on throughput, response time, and number of other available nodes.){quote} Currently, I've got a default timeout, with the ability to override it at any node in solr.xml. Do you think thats enough? I can imagine putting the timeout for different roles in ZooKeeper, and then a node gets its timeout there based on its role - but then it would have to make multiple connections - one with a default timeout to get its timeout, and then another with the correct timeout. Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791588#action_12791588 ] Yonik Seeley commented on SOLR-1277: bq. How are we addressing a failed connection to a slave server, and instead of failing the request, re-making the request to an adjacent slave? Yes, I didn't spell it out, but that's the HA part of why you have multiple copies of a shard (in addition to increasing capacity). bq. The way things work now, if someone searched during the GC, theyd get all the results back, the search would just take longer. They'd see the hour glass spinning, know the results where slow for this search, but still coming. I was/am not sure if we wanted to replicate that. I think we always need to support that. If/when a solr request should time out should be on a per-request basis, and the default should probably be to not time out at all (or at least have a very high timeout). This really doesn't have anything to do with zookeeper. Zookeeper gives us the layout of the cluster. It doesn't seem like we need (yet) fast failure detection from zookeeper - other nodes can do this synchronously themselves (and would need to anyway) on things like connection failures. App-level timeouts should not mark the node as failed since we don't know how long the request was supposed to take. Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791633#action_12791633 ] Mahadev konar commented on SOLR-1277: - hi all, this is mahadev from the zookeeper team. One of our users does similar things that you guys have been talking about in the above comments. I am not sure how close I am to your scenario but Ill give it a shot. Feel free to ignore my comments if they sound stupid. One of the things that they do is - lets say you have a machine A that is running a process P and is part of your cluster. The way they track the status of this machine is by having 2 znodes (ZNODE1, ZNODE2) in zookeeper. ZNODE1 is an ephemeral node (created by P) and the other one (ZNODE2) is a normal node which contains process P specific data which is updated from time to time by process P (like last time of update, status of process P - good/bad/ok). If an application/user wants to access P on machine A, they look at the ephemeral node and the data is ZNODE2 to see if process P has any problems (not related to zookeeper) and then the application can decide if process P actually needs to be marked dead or not. Say the ephemeral node ZNODE1 is alive but ZNODE2 shows that process P is in a really bad state, then application will go ahead and mark process P as dead. hope this information is of some help! Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1631) NPE's reported from QueryComponent.mergeIds
[ https://issues.apache.org/jira/browse/SOLR-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791638#action_12791638 ] Harish Agarwal commented on SOLR-1631: -- I'm following up on the original thread as well - just to clarify, the error is being thrown FROM a search, DURING an update. NPE's reported from QueryComponent.mergeIds --- Key: SOLR-1631 URL: https://issues.apache.org/jira/browse/SOLR-1631 Project: Solr Issue Type: Bug Components: search Reporter: Hoss Man Multiple reports of QueryComponent.mergeIds occasionally throwing NPE... http://markmail.org/message/aqzaaphbuow4sa5o http://old.nabble.com/NullPointerException-thrown-during-updates-to-index-to26613309.html#a26613309 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791665#action_12791665 ] Grant Ingersoll commented on SOLR-1131: --- I have a new patch in the works that makes creating the SchemaField lighter weight. I agree w/ Yonik, I don't think this can be cached in general. Also, I've done away with the Delegating Field Type. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791789#action_12791789 ] Noble Paul commented on SOLR-1131: -- I guess we need to revamp the API. The FieldType should act as a factory of SchemaField. And SchemaField does not have to be a final class. Solr Should do all the operations through that SchemaField Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1664) Some Methods in FieldType actually should be in SchemaField
Some Methods in FieldType actually should be in SchemaField --- Key: SOLR-1664 URL: https://issues.apache.org/jira/browse/SOLR-1664 Project: Solr Issue Type: Improvement Reporter: Noble Paul Fix For: 1.5 A lot of methods in FieldType actually should be in SchemaField. As we can see , all the following methods require SchemaField as an argument. The point is that most of the information is only available w/ SchemaField {code:java} public Field createField(SchemaField field, String externalVal, float boost) ; protected Field.TermVector getFieldTermVec(SchemaField field,String internalVal) ; protected Field.Store getFieldStore(SchemaField field,String internalVal); protected Field.Index getFieldIndex(SchemaField field,String internalVal); public ValueSource getValueSource(SchemaField field, QParser parser); public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2, boolean minInclusive, boolean maxInclusive) ; {code} As an enhancement we should treat FieldType as a factory for SchemaField and make SchemaField non final -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791831#action_12791831 ] Noble Paul commented on SOLR-1131: -- I have opened an issue for the same SOLR-1664 Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791836#action_12791836 ] Shalin Shekhar Mangar commented on SOLR-236: Does anybody have a reason for why this should not be committed to trunk as it stands right now? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.