[jira] Updated: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder

2009-12-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1644:
-

Attachment: SOLR-1644.patch

implemented as Uri said

 Provide a clean way to keep flags and helper objects in ResponseBuilder
 ---

 Key: SOLR-1644
 URL: https://issues.apache.org/jira/browse/SOLR-1644
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: SOLR-1644.patch, SOLR-1644.patch


 Many components such as StatsComponent, FacetComponent etc keep flags and 
 helper objects in ResponseBuilder. Having to modify the ResponseBuilder for 
 such things is a very kludgy solution.
 Let us provide a clean way for components to keep arbitrary objects for the 
 duration of a (distributed) search request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent

2009-12-16 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-1630:


Attachment: SOLR-1630.patch

I'm not able to reproduce this issue. I used Robin's document, schema and 
solrconfig.xml in the form of a unit test and it gives an empty spell check 
response but no exceptions.

 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent

2009-12-16 Thread Guillaume Lebourgeois (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791325#action_12791325
 ] 

Guillaume Lebourgeois commented on SOLR-1630:
-

Ok, i'lm gonna try to upload my own config in case it can help.

 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent

2009-12-16 Thread Guillaume Lebourgeois (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guillaume Lebourgeois updated SOLR-1630:


Attachment: spellcheckconfig.xml

This file provide a spellcheck configuration and a requesthandler which may 
raise an exception when making queries

Example of queries which work fine :
  * ?q=test
  * ?q=my+name+is+henry
  * ?q=éléphant

Example of queries which throw an exception :
  * ?q=sous-marin
  * ?q=sous-marin+russe
  * ?q=sous_marin
  * ?q=éléphant-blanc
  
  
   It may be linked to the content of the index, and/or the spellcheck index.

 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, 
 spellcheckconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent

2009-12-16 Thread Guillaume Lebourgeois (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791334#action_12791334
 ] 

Guillaume Lebourgeois edited comment on SOLR-1630 at 12/16/09 11:38 AM:


This file provide a spellcheck configuration and a requesthandler which may 
raise an exception when making queries

Example of queries which work fine :
  * ?q=test
  * ?q=my+name+is+henry
  * ?q=éléphant

Example of queries which throw an exception :
  * ?q=sous-marin
  * ?q=sous-marin+russe
  * ?q=sous_marin
  * ?q=éléphant-blanc
  
  
   It may be linked to the content of the index, and/or the spellcheck index.

Here is the stack :

at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
at java.lang.StringBuilder.replace(StringBuilder.java:271)
at 
org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

  was (Author: glebourg):
This file provide a spellcheck configuration and a requesthandler which may 
raise an exception when making queries

Example of queries which work fine :
  * ?q=test
  * ?q=my+name+is+henry
  * ?q=éléphant

Example of queries which throw an exception :
  * ?q=sous-marin
  * ?q=sous-marin+russe
  * ?q=sous_marin
  * ?q=éléphant-blanc
  
  
   It may be linked to the content of the index, and/or the spellcheck index.
  
 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, 
 spellcheckconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at 
 

[jira] Created: (SOLR-1661) Remove adminCore from CoreContainer

2009-12-16 Thread Noble Paul (JIRA)
Remove adminCore from CoreContainer
---

 Key: SOLR-1661
 URL: https://issues.apache.org/jira/browse/SOLR-1661
 Project: Solr
  Issue Type: Task
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5


we have deprecated the admin core concept as a part of SOLR-1121. It can be 
removed completely now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1647) Remove the option of setting solrconfig from web.xml

2009-12-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1647:
-

Attachment: SOLR-1647.patch

 Remove the option of setting solrconfig from web.xml
 

 Key: SOLR-1647
 URL: https://issues.apache.org/jira/browse/SOLR-1647
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1647.patch


 with SOLR-1621 , it is not required to have an option to set solrconfig from 
 web.xml. Moreover editing web.xml means hacking solr itself. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent

2009-12-16 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791342#action_12791342
 ] 

Shalin Shekhar Mangar commented on SOLR-1630:
-

Thanks Guillaume, can you give me an example document too?

 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, 
 spellcheckconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1661) Remove adminCore from CoreContainer

2009-12-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1661:
-

Attachment: SOLR-1661.patch

 Remove adminCore from CoreContainer
 ---

 Key: SOLR-1661
 URL: https://issues.apache.org/jira/browse/SOLR-1661
 Project: Solr
  Issue Type: Task
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1661.patch


 we have deprecated the admin core concept as a part of SOLR-1121. It can be 
 removed completely now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent

2009-12-16 Thread Guillaume Lebourgeois (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791360#action_12791360
 ] 

Guillaume Lebourgeois commented on SOLR-1630:
-

I've been trying to reproduce the bug with a one-document index, but I 
fail... on the other hand, on index of 500k+ documents this issue is 
automatic. Maybe it's linked with some kinds of documents ? I don't know, I'm 
gonna test some other possibilities in case it can help.

 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, 
 spellcheckconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1662) BufferedTokenStream incorrect cloning

2009-12-16 Thread Robert Muir (JIRA)
BufferedTokenStream incorrect cloning
-

 Key: SOLR-1662
 URL: https://issues.apache.org/jira/browse/SOLR-1662
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir


As part of writing tests for SOLR-1657, I rewrote one of the base classes 
(BaseTokenTestCase) to use the new TokenStream API, but also with some 
additional safety.
{code}
 public static String tsToString(TokenStream in) throws IOException {
StringBuilder out = new StringBuilder();
TermAttribute termAtt = (TermAttribute) 
in.addAttribute(TermAttribute.class);
// extra safety to enforce, that the state is not preserved and also
// assign bogus values
in.clearAttributes();
termAtt.setTermBuffer(bogusTerm);
while (in.incrementToken()) {
  if (out.length()  0)
out.append(' ');
  out.append(termAtt.term());
  in.clearAttributes();
  termAtt.setTermBuffer(bogusTerm);
}

in.close();
return out.toString();
  }
{code}

Setting the term text to bogus values helps find bugs in tokenstreams that do 
not clear or clone properly. In this case there is a problem with a tokenstream 
AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A B but does not 
clone, so the values get overwritten.

This can be fixed in two ways: 
* BufferedTokenStream does the cloning
* subclasses are responsible for the cloning

The question is which one should it be?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1662) BufferedTokenStream incorrect cloning

2009-12-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791370#action_12791370
 ] 

Uwe Schindler commented on SOLR-1662:
-

Just the short desription from the API side in Lucene:
Lucene's documentation of TokenStream.next() says: The returned Token is a 
full private copy (not re-used across calls to next()). 
AB_AAB_Stream.process() duplicates the token by just putting it uncloned into 
the outQueue. As the consumer of the BufferedTokenStream assumes that the Token 
is private it is allowed to change it - and by that it also changes the token 
in the outQueue. If you e.g. put another TokenFilter in fromt of this 
AB_AAB_Stream, and modify the token there it would break.
In my opinion, the responsibility to clone is in AB_AAB_Stream, 
BufferedTokenStream will never return the same token twice by itsself. So its a 
bug in the test. But Robert told me that e.g. RemoveDuplicates has a similar 
problem.
The general contract for writing such streams is: whenever you return a Token 
from next(), never put it somewhere else uncloned, because the caller can 
change it.

The fix is to do: write((Token) t.clone());

 BufferedTokenStream incorrect cloning
 -

 Key: SOLR-1662
 URL: https://issues.apache.org/jira/browse/SOLR-1662
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir

 As part of writing tests for SOLR-1657, I rewrote one of the base classes 
 (BaseTokenTestCase) to use the new TokenStream API, but also with some 
 additional safety.
 {code}
  public static String tsToString(TokenStream in) throws IOException {
 StringBuilder out = new StringBuilder();
 TermAttribute termAtt = (TermAttribute) 
 in.addAttribute(TermAttribute.class);
 // extra safety to enforce, that the state is not preserved and also
 // assign bogus values
 in.clearAttributes();
 termAtt.setTermBuffer(bogusTerm);
 while (in.incrementToken()) {
   if (out.length()  0)
 out.append(' ');
   out.append(termAtt.term());
   in.clearAttributes();
   termAtt.setTermBuffer(bogusTerm);
 }
 in.close();
 return out.toString();
   }
 {code}
 Setting the term text to bogus values helps find bugs in tokenstreams that do 
 not clear or clone properly. In this case there is a problem with a 
 tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A 
 B but does not clone, so the values get overwritten.
 This can be fixed in two ways: 
 * BufferedTokenStream does the cloning
 * subclasses are responsible for the cloning
 The question is which one should it be?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1662) BufferedTokenStream incorrect cloning

2009-12-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791374#action_12791374
 ] 

Robert Muir commented on SOLR-1662:
---

bq. but Robert told me that e.g. RemoveDuplicates has a similar problem.

Right, there is no cloning in RemoveDuplicates. CommonGrams creates a new 
Token() when it grams, but its not clear that this one is correct either.

So if we decide its the responsibility of the subclass, these implementations 
need thorough tests to see if they are ok or not.
If we add the cloning to BufferedTokenStream itself, then we know they are 
ok... 


 BufferedTokenStream incorrect cloning
 -

 Key: SOLR-1662
 URL: https://issues.apache.org/jira/browse/SOLR-1662
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir

 As part of writing tests for SOLR-1657, I rewrote one of the base classes 
 (BaseTokenTestCase) to use the new TokenStream API, but also with some 
 additional safety.
 {code}
  public static String tsToString(TokenStream in) throws IOException {
 StringBuilder out = new StringBuilder();
 TermAttribute termAtt = (TermAttribute) 
 in.addAttribute(TermAttribute.class);
 // extra safety to enforce, that the state is not preserved and also
 // assign bogus values
 in.clearAttributes();
 termAtt.setTermBuffer(bogusTerm);
 while (in.incrementToken()) {
   if (out.length()  0)
 out.append(' ');
   out.append(termAtt.term());
   in.clearAttributes();
   termAtt.setTermBuffer(bogusTerm);
 }
 in.close();
 return out.toString();
   }
 {code}
 Setting the term text to bogus values helps find bugs in tokenstreams that do 
 not clear or clone properly. In this case there is a problem with a 
 tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A 
 B but does not clone, so the values get overwritten.
 This can be fixed in two ways: 
 * BufferedTokenStream does the cloning
 * subclasses are responsible for the cloning
 The question is which one should it be?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791414#action_12791414
 ] 

Yonik Seeley commented on SOLR-1131:


I'm spot-checking mutiple different patches at this point... but in general, we 
should strive to not expose the complexity further up the type hierarchy, and 
we should not limit what subclasses can do.

isPolyField() returns true if more than one Fieldable *can* be returned from 
createFields()
createFields() is free to return whatever the heck it likes.
And from SchemaField and FieldType's perspective,that's it. Implementation 
details are up to subclasses and we shouldn't add assumptions in base classes.  
There should be *no* concept of subFieldTypes or whatever baked into anything.

So, from Noble's patch: we shouldn't try caching subfields in SchemaField... 
and esp not via if (type instanceof DelegatingFieldType)... it really doesn't 
belong there.



 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791426#action_12791426
 ] 

Mark Miller commented on SOLR-1277:
---

I wonder how we might track load -

Currently, wouldn't we have to grab every request handler and add up the 
requests and track the change in a given period of time?

Would it make sense to add total requests received tracking (across handlers), 
so we don't have to keep polling each/every request handler?

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791425#action_12791425
 ] 

Mark Miller commented on SOLR-1277:
---

So based on what we know, it sounds like we are going to have to use a very 
high timeout for the ZooKeeper client?

Then each node will run a thread that periodically updates its availability? 
When a node chooses its shards for a distributed search, it can look at how 
long its been since each shard updated itself, and choose or drop based on 
that? In the event that a *very* long time out period has passed, the client 
will timeout and the znode will actually be removed?

This seems like it will be easier than trying to reconnect after timeouts and 
managing Solr during the disconnected period?

Sound like the update itself might be the current load on that node - then 
nodes choosing other nodes for a distrib search can use both how recently nodes 
where updated as well as their reported loads to choose which nodes to select 
for a search?

Does this sound right?

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1663) Add numRequests to SolrCore statistics to make it easier to track load

2009-12-16 Thread Mark Miller (JIRA)
Add numRequests to SolrCore statistics to make it easier to track load
--

 Key: SOLR-1663
 URL: https://issues.apache.org/jira/browse/SOLR-1663
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
Priority: Minor
 Attachments: SOLR-1663.patch

As we get SolrCloud up and running, its going to be helpful to track the load 
on each server.

We might add request tracking to SolrCore to help make this easier than looking 
at each request handler in each core. Number of requests is also only an 
optional stat at the RequestHandler level.

Then you can just cycle through each core and grab how many requests it has 
received, and track that over a given interval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1663) Add numRequests to SolrCore statistics to make it easier to track load

2009-12-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-1663:
--

Attachment: SOLR-1663.patch

 Add numRequests to SolrCore statistics to make it easier to track load
 --

 Key: SOLR-1663
 URL: https://issues.apache.org/jira/browse/SOLR-1663
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
Priority: Minor
 Attachments: SOLR-1663.patch


 As we get SolrCloud up and running, its going to be helpful to track the load 
 on each server.
 We might add request tracking to SolrCore to help make this easier than 
 looking at each request handler in each core. Number of requests is also only 
 an optional stat at the RequestHandler level.
 Then you can just cycle through each core and grab how many requests it has 
 received, and track that over a given interval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791438#action_12791438
 ] 

Yonik Seeley commented on SOLR-1277:


While our designs shouldn't preclude load based node selection, I don't think 
we should tackle it now - it's fraught with peril.

We should allow the configuration of capacity for a node (or host?) and 
eventually implement a load balancing mechanism that takes such capacity into 
account.  If one node has half the capacity of another, it will be sent half 
the number of requests.   This type of static balancing is easier to predict 
and test.

The other issue with updating statistics is the write cost on zookeeper - we 
may not want to do it by default, and if we do, we wouldn't want to do it with 
a high frequency.

Some other considerations when choosing nodes for distributed search:
 - the same node should be used for a particular shard for the multiple phases 
of a distributed search, both for better consistency between phases, and better 
caching.
 - zookeeper could be used to take a node out of service (and other nodes 
should immediately stop making requests to that node), but each node also needs 
to be able to determine failure of another node and retry a different node 
independent of zookeeper.

Everything (search traffic) should work when disconnected from zookeeper, based 
on the last cluster configuration seen.


 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791442#action_12791442
 ] 

Mark Miller edited comment on SOLR-1277 at 12/16/09 4:38 PM:
-

Yeah, I'm not trying to tackle node selection yet - just client timeouts. But 
if a client is going to be periodically updating a node to state its still in 
good shape, it seems like it might as well make the update include its current 
load. Not that thats not something that can'y be easily added later - I mostly 
through that in because it was part of the previous recommendation on how to 
handle client timeouts.

I don't necessarily like the idea of all of the nodes updating all the time to 
note their existence, but it seems like our best option from what I gather now. 
Otherwise, nodes will be timing out all the time - and handling the 
reconnection seems like a pain - if Solr needs something from ZooKeeper after a 
GC ends, its going to have to pause and wait for the reconnect. Or I guess, on 
every ZooKeeper request, build in a timed retry?

My main concern at the moment is coming up with a plan for these timeouts 
though. If we raise the timeout limits, we need another method for determining 
nodes are down.

I suppose another option might be, its up to a node that can't reach another 
node to tag it as unresponsive?

  was (Author: markrmil...@gmail.com):
Yeah, I'm not trying to tackle node selection yet - just client timeouts. 
But if a client is going to be periodically updating a node to state its still 
in good shape, it seems like it might as well make the update include its 
current load. Not that thats not something that can be easily added later - I 
mostly through that in because it was part of the previous recommendation on 
how to handle client timeouts.

I don't necessarily like the idea of all of the nodes updating all the time to 
note their existence, but it seems like our best option from what I gather now. 
Otherwise, nodes will be timing out all the time - and handling the 
reconnection seems like a pain - if Solr needs something from ZooKeeper after a 
GC ends, its going to have to pause and wait for the reconnect. Or I guess, on 
every ZooKeeper request, build in a timed retry?

My main concern at the moment is coming up with a plan for these timeouts 
though. If we raise the timeout limits, we need another method for determining 
nodes are down.

I suppose another option might be, its up to a node that can't reach another 
node to tag it as unresponsive?
  
 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791442#action_12791442
 ] 

Mark Miller commented on SOLR-1277:
---

Yeah, I'm not trying to tackle node selection yet - just client timeouts. But 
if a client is going to be periodically updating a node to state its still in 
good shape, it seems like it might as well make the update include its current 
load. Not that thats not something that can be easily added later - I mostly 
through that in because it was part of the previous recommendation on how to 
handle client timeouts.

I don't necessarily like the idea of all of the nodes updating all the time to 
note their existence, but it seems like our best option from what I gather now. 
Otherwise, nodes will be timing out all the time - and handling the 
reconnection seems like a pain - if Solr needs something from ZooKeeper after a 
GC ends, its going to have to pause and wait for the reconnect. Or I guess, on 
every ZooKeeper request, build in a timed retry?

My main concern at the moment is coming up with a plan for these timeouts 
though. If we raise the timeout limits, we need another method for determining 
nodes are down.

I suppose another option might be, its up to a node that can't reach another 
node to tag it as unresponsive?

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791459#action_12791459
 ] 

Yonik Seeley commented on SOLR-1277:


bq. I don't necessarily like the idea of all of the nodes updating all the time 
to note their existence, but it seems like our best option from what I gather 
now.

Not sure I understand... for group membership, I had assumed there would be an 
ephemeral znode per node.  Zookeeper does pings, and deletes the znode when the 
session expires, but those aren't updates per se.

bq. My main concern at the moment is coming up with a plan for these timeouts 
though.

Zookeeper client-server timeouts?  Or Solr node-node request timeouts?
Zookeeper timeouts need to be handled on a per-case basis - we should design 
such that most of the time we can continue operating even if we can't talk to 
zookeeper.

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791474#action_12791474
 ] 

Mark Miller commented on SOLR-1277:
---

bq. Not sure I understand... for group membership, I had assumed there would be 
an ephemeral znode per node. Zookeeper does pings, and deletes the znode when 
the session expires, but those aren't updates per se.

Right - thats the problem I want to address. Ephemeral nodes go away when the 
client times out - with a low timeout, you can learn relatively fast that a 
node is down. But because we may have long gc pauses, a low timeout will cause 
false down reports. And we have to handle reconnection's. But if we raise the 
timeout to get around these gc pauses, if there really is a problem, it will 
take a long time to learn about it. One of the recommendations above was to use 
a lease system instead, where each node does these updates. I'm trying to 
determine which strategy we actually want to use. Another option given was to 
let the gc cause a timeout, and then reconnect - but Solr has to wait for the 
reconnection to occur before it can access ZooKeeper again.

{quote}
Zookeeper client-server timeouts? Or Solr node-node request timeouts?
Zookeeper timeouts need to be handled on a per-case basis - we should design 
such that most of the time we can continue operating even if we can't talk to 
zookeeper.
{quote}

Zookeeper client-server timeouts

But as you say above, if a client times out, its ephemeral node goes down, and 
that shard will no longer be participating in distrib requests hitting other 
servers (presumably). How can we continue operating? We won't know which shards 
to hit (I guess we could use the old shards list?) and we won't be part of 
distributed requests from other shards, because our ephemeral node will be 
removed ...

I'm ref'ing to Patrick Hunt's comments above. Perhaps, because recovery won't 
be expensive, thats what we want to do - but Solr won't be able to access 
ZooKeeper until its recovered - so I guess for that brief period, we drop out 
of other distrib requests, and if we get hit, we just use the old shards list 
for requests that hit the dropped server?

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791475#action_12791475
 ] 

Mark Miller commented on SOLR-1277:
---

{quote}
From our experience with hbase (which is the only place we've seen this issue 
so far, at least to this extent) you need to think about:

1) client timeout value tradeoffs
2) effects of session expiration due to gc pause, potential ways to mitigate

for 1) there is a tradeoff (the good thing is that not all clients need to use 
the same timeout, so you can tune based on the client type, you can even have 
multiple sessions for a single client, each with it's own timeout) You can set 
the timeout higher, so if your zk client pauses you don't get expired, however 
this also means that if your client crashes the session won't be expired until 
the timeout expires. This means that the rest of your system will not be 
notified of the change (say you are doing leader election) for longer than you 
might like.

for 2) you need to think about the potential failure cases and their effects. 
a) Say your ZK client (solr component X) fails (the host crashes), do you need 
to know about this in 5 seconds, or 30sec? b) Say the host is network 
partitioned due to a burp in the network that lasts 5 seconds, is this ok, or 
does the rest of the solr system need to know about this? c) Say component X gc 
pauses for 4 minutes, do you want the rest of the system to react immed, or 
consider this ok and just wait around for a while for X to come back but 
keep in mind that from the perspective of the rest of your system you don't 
know the difference between a) or b or c (etc...), from their viewpoint X is 
gone and they don't know why (unless it eventually comes back)

In hbase case session expiration is expensive as the region server master will 
reallocate the table (or some such). In your case the effects of X going down 
may not be very expensive. If this is the case then having a low(er) session 
timeout for X may not be a problem. (just deal with the session timeout when it 
does happen, X will eventually come back)

If X recovery is expensive you may want to set the timeout very high. but as I 
said this makes the system less responsive if X has a real problem. Another 
option we explored with hbase is to use a lease recipe instead. Set a very 
high timeout, but have X update the znode (still ephemeral) every N seconds. If 
the rest of the system (whoever is interested in X status) doesn't see an 
update from X in T seconds, then perhaps you log a warning (where is X?). Say 
you don't see an update from X in T*2 seconds, then page the operator warning, 
maybe problems with X. Say you don't see in T*3 seconds (perhaps this is the 
timeout you use, in which case the znode is removed), consider X down, cleanup 
and enact recovery. These are madeup actions/times, but you can see what I'm 
getting at. With lease it's not all or nothing. You (solr) have the option to 
take actions based on the lease time, rather than just the znode being deleted 
in the typical case (all or nothing). The tradeoff here is that it's a bit more 
complicted for you - you need to implement the lease rather than just relying 
on the znode being deleted - you would of course set a watch on the znode to 
get notified when the znode is removed (etc...)
{quote}

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a 

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791480#action_12791480
 ] 

Mark Miller commented on SOLR-1277:
---

bq. so I guess for that brief period, we drop out of other distrib requests, 
and if we get hit, we just use the old shards list for requests that hit the 
dropped server?

I suppose what I am worried about is when you don't have duplicate shards - or 
when two shards with the same data have a long gc pause together - if they just 
drop out, you get results back that are not from the full index. Many would 
prefer the search just take a bit longer (as it normally would with a gc) than 
losing results.

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791486#action_12791486
 ] 

Yonik Seeley commented on SOLR-1277:


bq. I suppose what I am worried about is when you don't have duplicate shards - 
or when two shards with the same data have a long gc pause together - if they 
just drop out, you get results back that are not from the full index.

Ahhh, good point.  We can't let that happen.  But if NodeA said it had ShardX, 
and then it's ephemeral node went away, it's not dropping out of the cluster... 
it's just that it's currently unavailable (and we need to return partial 
results or fail the request).


 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791487#action_12791487
 ] 

Patrick Hunt commented on SOLR-1277:


You guys are asking the right questions. In particular the issue about how 
expensive is it to lose a solr node is a good one to think about. Unfort I 
don't know enough about solr to advise you, but if it's not very expensive to 
lose/regain a node then just let it timeout. The rest of the system will see 
this quickly (via ephemeral node/watch) and when the solr node is active again 
(comes out of the gc pause) it will talk to the zk server, see that it's 
session has been expired, and re-bootstrap into the solr cloud.

Another thing to ask yourself is this if a Solr node pauses for 4 minutes due 
to GC pause, how different is that from a network partition or crash/reboot of 
that node? What I'm saying here is, the node is _gone_ for 4 minutes -- what 
effect does that have on the rest of your system. Say you are expecting some 
very low SLA from that node, then upping the timeout is not useful here. Loss 
of the solr node due to gc is no diff than network partition or crash/reboot of 
the host.

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791505#action_12791505
 ] 

Noble Paul commented on SOLR-1131:
--

bq.we shouldn't try caching subfields in SchemaField

I believe The SchemaField is an ideal place to cache the 'synthetic' field 
info. 

bq.and esp not via if (type instanceof DelegatingFieldType)... it really 
doesn't belong there.

true. It was a quick and dirty way to demo the idea. 

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791508#action_12791508
 ] 

Yonik Seeley commented on SOLR-1277:


bq. Right - thats the problem I want to address. Ephemeral nodes go away when 
the client times out - with a low timeout, you can learn relatively fast that a 
node is down.

My assumption was to use a longer timeout on zookeeper (the default seems fine) 
to define who was active.

When a node makes a request to a node that is down, it will fail relatively 
quickly, and can use a local policy to avoid that node for a certain amount of 
time.  Seems like we need to handle these types of failures anyway, regardless 
of how low we set the zookeeper timeout.


 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Brian Pinkerton (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791529#action_12791529
 ] 

Brian Pinkerton commented on SOLR-1277:
---

I think the timeouts are going to have to be different depending on the role of 
the particular node.  In a really distributed setup, indexing nodes are 
generally more likely to have long GC pauses than searcher nodes, and a lengthy 
GC pause on an indexer is usually not a problem.  However, if a searcher node 
goes out on a long GC pause then you need to find out fast and bypass the box 
before too many queries back up and need to be retried (though even this 
depends on throughput, response time, and number of other available nodes.)


 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791557#action_12791557
 ] 

Mark Miller commented on SOLR-1277:
---

bq. maybe this is already in the spec

Nothing is completely nailed down in the spec - Yonik has done a bunch of work 
on the SolrCloud page, but a lot of that is: we could do this, or we could do 
that, or we might do this. We haven't really nailed much down firmly. Still 
pretty high level at the moment.

bq. How are we addressing a failed connection to a slave server, and instead of 
failing the request, re-making the request to an adjacent slave?

We haven't really gotten there. But we want to cover that. What do you propose?

The more we get these discussions going, the faster things will start getting 
nailed down ...

bq. A failure is a failure and whether it's the GC or something else, it's 
really the same thing.

Its kind of arbitrary distinctions. Your saying, we would say a GC pause of 4 
seconds (under the ZK client timeout) is not a failure, and a GC timeout of 6 
seconds (over the ZK client timeout) is a failure. I'm not claiming any 
distinction is better than another though - just trying to work out the 
directions we want to go so I can start paddling.

I can code till the cows come home with no input, but you might not like the 
results :)

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791559#action_12791559
 ] 

Mark Miller commented on SOLR-1277:
---

{quote}
I think the timeouts are going to have to be different depending on the role of 
the particular node. In a really distributed setup, indexing nodes are 
generally more likely to have long GC pauses than searcher nodes, and a lengthy 
GC pause on an indexer is usually not a problem. However, if a searcher node 
goes out on a long GC pause then you need to find out fast and bypass the box 
before too many queries back up and need to be retried (though even this 
depends on throughput, response time, and number of other available 
nodes.){quote}

Currently, I've got a default timeout, with the ability to override it at any 
node in solr.xml. Do you think thats enough?

I can imagine putting the timeout for different roles in ZooKeeper, and then a 
node gets its timeout there based on its role - but then it would have to make 
multiple connections - one with a default timeout to get its timeout, and then 
another with the correct timeout.

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791588#action_12791588
 ] 

Yonik Seeley commented on SOLR-1277:


bq. How are we addressing a failed connection to a slave server, and instead of 
failing the request, re-making the request to an adjacent slave?

Yes, I didn't spell it out, but that's the HA part of why you have multiple 
copies of a shard (in addition to increasing capacity).

bq. The way things work now, if someone searched during the GC, theyd get all 
the results back, the search would just take longer. They'd see the hour glass 
spinning, know the results where slow for this search, but still coming. I 
was/am not sure if we wanted to replicate that.

I think we always need to support that.  If/when a solr request should time out 
should be on a per-request basis, and the default should probably be to not 
time out at all (or at least have a very high timeout).  This really doesn't 
have anything to do with zookeeper.

Zookeeper gives us the layout of the cluster.  It doesn't seem like we need 
(yet) fast failure detection from zookeeper - other nodes can do this 
synchronously themselves (and would need to anyway) on things like connection 
failures.  App-level timeouts should not mark the node as failed since we don't 
know how long the request was supposed to take.


 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791633#action_12791633
 ] 

Mahadev konar commented on SOLR-1277:
-

hi all,
 this is mahadev from the zookeeper team. One of our users does similar things 
that you guys have been talking about in the above comments. I am not sure how 
close I am to your scenario but Ill give it a shot. Feel free to ignore my 
comments if they sound stupid. One of the things that they do is -  lets say 
you have a machine A that is running a process P and is part of your cluster. 
The way they track the status of this machine is by having 2 znodes (ZNODE1, 
ZNODE2) in zookeeper. ZNODE1 is an ephemeral node (created by P) and the other 
one (ZNODE2) is a normal node which contains process  P specific data  which is 
updated from time to time by process P (like last time of update, status of 
process P - good/bad/ok). If an application/user wants to access P on machine 
A, they look at the ephemeral node and the data is ZNODE2 to see if process P 
has any problems (not related to zookeeper) and then the application can decide 
if process P actually needs to be marked dead or not. Say the ephemeral node 
ZNODE1 is alive but ZNODE2 shows that process P is in a really bad state, then 
application will go ahead and mark process P as dead. hope this information is 
of  some help!



 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1631) NPE's reported from QueryComponent.mergeIds

2009-12-16 Thread Harish Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791638#action_12791638
 ] 

Harish Agarwal commented on SOLR-1631:
--

I'm following up on the original thread as well - just to clarify, the error is 
being thrown FROM a search, DURING an update.

 NPE's reported from QueryComponent.mergeIds
 ---

 Key: SOLR-1631
 URL: https://issues.apache.org/jira/browse/SOLR-1631
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Hoss Man

 Multiple reports of QueryComponent.mergeIds occasionally throwing NPE...
 http://markmail.org/message/aqzaaphbuow4sa5o
 http://old.nabble.com/NullPointerException-thrown-during-updates-to-index-to26613309.html#a26613309

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-16 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791665#action_12791665
 ] 

Grant Ingersoll commented on SOLR-1131:
---

I have a new patch in the works that makes creating the SchemaField lighter 
weight.  I agree w/ Yonik, I don't think this can be cached in general.  Also, 
I've done away with the Delegating Field Type.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791789#action_12791789
 ] 

Noble Paul commented on SOLR-1131:
--

I guess we need to revamp the API.

The FieldType should act as a factory of SchemaField. And SchemaField does not 
have to be a final class. Solr Should do all the operations through that 
SchemaField

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1664) Some Methods in FieldType actually should be in SchemaField

2009-12-16 Thread Noble Paul (JIRA)
Some Methods in FieldType actually should be in SchemaField
---

 Key: SOLR-1664
 URL: https://issues.apache.org/jira/browse/SOLR-1664
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
 Fix For: 1.5


A lot of methods in FieldType actually should be in SchemaField. As we can see 
, all the following methods require SchemaField as an argument. The point is 
that most of the information is only available w/ SchemaField
{code:java}
public Field createField(SchemaField field, String externalVal, float boost) ;
protected Field.TermVector getFieldTermVec(SchemaField field,String 
internalVal) ;
protected Field.Store getFieldStore(SchemaField field,String internalVal);
protected Field.Index getFieldIndex(SchemaField field,String internalVal);
public ValueSource getValueSource(SchemaField field, QParser parser);
public Query getRangeQuery(QParser parser, SchemaField field, String part1, 
String part2, boolean minInclusive, boolean maxInclusive) ;
{code}

As an enhancement we should treat FieldType as a factory for SchemaField and 
make SchemaField non final

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791831#action_12791831
 ] 

Noble Paul commented on SOLR-1131:
--

I have opened an issue for the same SOLR-1664

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2009-12-16 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791836#action_12791836
 ] 

Shalin Shekhar Mangar commented on SOLR-236:


Does anybody have a reason for why this should not be committed to trunk as it 
stands right now?

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.