date:20091216


 [ 
https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1644:
-

Attachment: SOLR-1644.patch

implemented as Uri said

 Provide a clean way to keep flags and helper objects in ResponseBuilder
 ---

 Key: SOLR-1644
 URL: https://issues.apache.org/jira/browse/SOLR-1644
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: SOLR-1644.patch, SOLR-1644.patch


 Many components such as StatsComponent, FacetComponent etc keep flags and 
 helper objects in ResponseBuilder. Having to modify the ResponseBuilder for 
 such things is a very kludgy solution.
 Let us provide a clean way for components to keep arbitrary objects for the 
 duration of a (distributed) search request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent

2009-12-16 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-1630:


Attachment: SOLR-1630.patch

I'm not able to reproduce this issue. I used Robin's document, schema and 
solrconfig.xml in the form of a unit test and it gives an empty spell check 
response but no exceptions.

 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent


[ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791325#action_12791325
 ] 

Guillaume Lebourgeois commented on SOLR-1630:
-

Ok, i'lm gonna try to upload my own config in case it can help.

 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent


 [ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guillaume Lebourgeois updated SOLR-1630:


Attachment: spellcheckconfig.xml

This file provide a spellcheck configuration and a requesthandler which may 
raise an exception when making queries

Example of queries which work fine :
  * ?q=test
  * ?q=my+name+is+henry
  * ?q=éléphant

Example of queries which throw an exception :
  * ?q=sous-marin
  * ?q=sous-marin+russe
  * ?q=sous_marin
  * ?q=éléphant-blanc
  
  
   It may be linked to the content of the index, and/or the spellcheck index.

 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, 
 spellcheckconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent


[ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791334#action_12791334
 ] 

Guillaume Lebourgeois edited comment on SOLR-1630 at 12/16/09 11:38 AM:


This file provide a spellcheck configuration and a requesthandler which may 
raise an exception when making queries

Example of queries which work fine :
  * ?q=test
  * ?q=my+name+is+henry
  * ?q=éléphant

Example of queries which throw an exception :
  * ?q=sous-marin
  * ?q=sous-marin+russe
  * ?q=sous_marin
  * ?q=éléphant-blanc
  
  
   It may be linked to the content of the index, and/or the spellcheck index.

Here is the stack :

at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
at java.lang.StringBuilder.replace(StringBuilder.java:271)
at 
org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

  was (Author: glebourg):
This file provide a spellcheck configuration and a requesthandler which may 
raise an exception when making queries

Example of queries which work fine :
  * ?q=test
  * ?q=my+name+is+henry
  * ?q=éléphant

Example of queries which throw an exception :
  * ?q=sous-marin
  * ?q=sous-marin+russe
  * ?q=sous_marin
  * ?q=éléphant-blanc
  
  
   It may be linked to the content of the index, and/or the spellcheck index.
  
 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, 
 spellcheckconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at

[jira] Created: (SOLR-1661) Remove adminCore from CoreContainer

Remove adminCore from CoreContainer
---

 Key: SOLR-1661
 URL: https://issues.apache.org/jira/browse/SOLR-1661
 Project: Solr
  Issue Type: Task
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5


we have deprecated the admin core concept as a part of SOLR-1121. It can be 
removed completely now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1647) Remove the option of setting solrconfig from web.xml


 [ 
https://issues.apache.org/jira/browse/SOLR-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1647:
-

Attachment: SOLR-1647.patch

 Remove the option of setting solrconfig from web.xml
 

 Key: SOLR-1647
 URL: https://issues.apache.org/jira/browse/SOLR-1647
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1647.patch


 with SOLR-1621 , it is not required to have an option to set solrconfig from 
 web.xml. Moreover editing web.xml means hacking solr itself. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent

2009-12-16 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791342#action_12791342
 ] 

Shalin Shekhar Mangar commented on SOLR-1630:
-

Thanks Guillaume, can you give me an example document too?

 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, 
 spellcheckconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1661) Remove adminCore from CoreContainer


 [ 
https://issues.apache.org/jira/browse/SOLR-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1661:
-

Attachment: SOLR-1661.patch

 Remove adminCore from CoreContainer
 ---

 Key: SOLR-1661
 URL: https://issues.apache.org/jira/browse/SOLR-1661
 Project: Solr
  Issue Type: Task
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1661.patch


 we have deprecated the admin core concept as a part of SOLR-1121. It can be 
 removed completely now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent


[ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791360#action_12791360
 ] 

Guillaume Lebourgeois commented on SOLR-1630:
-

I've been trying to reproduce the bug with a one-document index, but I 
fail... on the other hand, on index of 500k+ documents this issue is 
automatic. Maybe it's linked with some kinds of documents ? I don't know, I'm 
gonna test some other possibilities in case it can help.

 StringIndexOutOfBoundsException in SpellCheckComponent
 --

 Key: SOLR-1630
 URL: https://issues.apache.org/jira/browse/SOLR-1630
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, spellchecker
Affects Versions: 1.4
 Environment: Solr 1.4
 Lucene 2.9.1
 Win XP
 java version 1.6.0_14
Reporter: Robin Wojciki
Assignee: Shalin Shekhar Mangar
 Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, 
 spellcheckconfig.xml


 For some documents/search strings, the SpellCheckComponent throws 
 StringIndexOutOfBoundsException
 See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
 h2. Replication
  * Save attached schema.xml and solrconfig.xml in 
 apache-solr-1.4.0/example/solr/conf
  * Start Solr
  * Index attached bug.xml
  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
 It throws a StringIndexOutOfBoundsException
 {noformat} String index out of range: -7
 java.lang.StringIndexOutOfBoundsException: String index out of range: -7
   at java.lang.AbstractStringBuilder.replace(Unknown Source)
   at java.lang.StringBuilder.replace(Unknown Source)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1662) BufferedTokenStream incorrect cloning

2009-12-16 Thread Robert Muir (JIRA)

BufferedTokenStream incorrect cloning
-

 Key: SOLR-1662
 URL: https://issues.apache.org/jira/browse/SOLR-1662
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir


As part of writing tests for SOLR-1657, I rewrote one of the base classes 
(BaseTokenTestCase) to use the new TokenStream API, but also with some 
additional safety.
{code}
 public static String tsToString(TokenStream in) throws IOException {
StringBuilder out = new StringBuilder();
TermAttribute termAtt = (TermAttribute) 
in.addAttribute(TermAttribute.class);
// extra safety to enforce, that the state is not preserved and also
// assign bogus values
in.clearAttributes();
termAtt.setTermBuffer(bogusTerm);
while (in.incrementToken()) {
  if (out.length()  0)
out.append(' ');
  out.append(termAtt.term());
  in.clearAttributes();
  termAtt.setTermBuffer(bogusTerm);
}

in.close();
return out.toString();
  }
{code}

Setting the term text to bogus values helps find bugs in tokenstreams that do 
not clear or clone properly. In this case there is a problem with a tokenstream 
AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A B but does not 
clone, so the values get overwritten.

This can be fixed in two ways: 
* BufferedTokenStream does the cloning
* subclasses are responsible for the cloning

The question is which one should it be?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1662) BufferedTokenStream incorrect cloning

2009-12-16 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791370#action_12791370
 ] 

Uwe Schindler commented on SOLR-1662:
-

Just the short desription from the API side in Lucene:
Lucene's documentation of TokenStream.next() says: The returned Token is a 
full private copy (not re-used across calls to next()). 
AB_AAB_Stream.process() duplicates the token by just putting it uncloned into 
the outQueue. As the consumer of the BufferedTokenStream assumes that the Token 
is private it is allowed to change it - and by that it also changes the token 
in the outQueue. If you e.g. put another TokenFilter in fromt of this 
AB_AAB_Stream, and modify the token there it would break.
In my opinion, the responsibility to clone is in AB_AAB_Stream, 
BufferedTokenStream will never return the same token twice by itsself. So its a 
bug in the test. But Robert told me that e.g. RemoveDuplicates has a similar 
problem.
The general contract for writing such streams is: whenever you return a Token 
from next(), never put it somewhere else uncloned, because the caller can 
change it.

The fix is to do: write((Token) t.clone());

 BufferedTokenStream incorrect cloning
 -

 Key: SOLR-1662
 URL: https://issues.apache.org/jira/browse/SOLR-1662
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir

 As part of writing tests for SOLR-1657, I rewrote one of the base classes 
 (BaseTokenTestCase) to use the new TokenStream API, but also with some 
 additional safety.
 {code}
  public static String tsToString(TokenStream in) throws IOException {
 StringBuilder out = new StringBuilder();
 TermAttribute termAtt = (TermAttribute) 
 in.addAttribute(TermAttribute.class);
 // extra safety to enforce, that the state is not preserved and also
 // assign bogus values
 in.clearAttributes();
 termAtt.setTermBuffer(bogusTerm);
 while (in.incrementToken()) {
   if (out.length()  0)
 out.append(' ');
   out.append(termAtt.term());
   in.clearAttributes();
   termAtt.setTermBuffer(bogusTerm);
 }
 in.close();
 return out.toString();
   }
 {code}
 Setting the term text to bogus values helps find bugs in tokenstreams that do 
 not clear or clone properly. In this case there is a problem with a 
 tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A 
 B but does not clone, so the values get overwritten.
 This can be fixed in two ways: 
 * BufferedTokenStream does the cloning
 * subclasses are responsible for the cloning
 The question is which one should it be?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1662) BufferedTokenStream incorrect cloning

2009-12-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791374#action_12791374
 ] 

Robert Muir commented on SOLR-1662:
---

bq. but Robert told me that e.g. RemoveDuplicates has a similar problem.

Right, there is no cloning in RemoveDuplicates. CommonGrams creates a new 
Token() when it grams, but its not clear that this one is correct either.

So if we decide its the responsibility of the subclass, these implementations 
need thorough tests to see if they are ok or not.
If we add the cloning to BufferedTokenStream itself, then we know they are 
ok... 


 BufferedTokenStream incorrect cloning
 -

 Key: SOLR-1662
 URL: https://issues.apache.org/jira/browse/SOLR-1662
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir

 As part of writing tests for SOLR-1657, I rewrote one of the base classes 
 (BaseTokenTestCase) to use the new TokenStream API, but also with some 
 additional safety.
 {code}
  public static String tsToString(TokenStream in) throws IOException {
 StringBuilder out = new StringBuilder();
 TermAttribute termAtt = (TermAttribute) 
 in.addAttribute(TermAttribute.class);
 // extra safety to enforce, that the state is not preserved and also
 // assign bogus values
 in.clearAttributes();
 termAtt.setTermBuffer(bogusTerm);
 while (in.incrementToken()) {
   if (out.length()  0)
 out.append(' ');
   out.append(termAtt.term());
   in.clearAttributes();
   termAtt.setTermBuffer(bogusTerm);
 }
 in.close();
 return out.toString();
   }
 {code}
 Setting the term text to bogus values helps find bugs in tokenstreams that do 
 not clear or clone properly. In this case there is a problem with a 
 tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A 
 B but does not clone, so the values get overwritten.
 This can be fixed in two ways: 
 * BufferedTokenStream does the cloning
 * subclasses are responsible for the cloning
 The question is which one should it be?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

[
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791414#action_12791414
]

Yonik Seeley commented on SOLR-1131:

I'm spot-checking mutiple different patches at this point... but in general, we
should strive to not expose the complexity further up the type hierarchy, and
we should not limit what subclasses can do.

isPolyField() returns true if more than one Fieldable *can* be returned from
createFields()
createFields() is free to return whatever the heck it likes.
And from SchemaField and FieldType's perspective,that's it. Implementation
details are up to subclasses and we shouldn't add assumptions in base classes.
There should be *no* concept of subFieldTypes or whatever baked into anything.

So, from Noble's patch: we shouldn't try caching subfields in SchemaField...
and esp not via if (type instanceof DelegatingFieldType)... it really doesn't
belong there.

Allow a single field type to index multiple fields
--

Key: SOLR-1131
URL: https://issues.apache.org/jira/browse/SOLR-1131
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
Fix For: 1.5

Attachments: SOLR-1131-IndexMultipleFields.patch,
SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt,
SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch,
SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch,
SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch

In a few special cases, it makes sense for a single field (the concept) to
be indexed as a set of Fields (lucene Field). Consider SOLR-773. The
concept point may be best indexed in a variety of ways:
* geohash (sincle lucene field)
* lat field, lon field (two double fields)
* cartesian tiers (a series of fields with tokens to say if it exists within
that region)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)


[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791426#action_12791426
 ] 

Mark Miller commented on SOLR-1277:
---

I wonder how we might track load -

Currently, wouldn't we have to grab every request handler and add up the 
requests and track the change in a given period of time?

Would it make sense to add total requests received tracking (across handlers), 
so we don't have to keep polling each/every request handler?

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)


[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791425#action_12791425
 ] 

Mark Miller commented on SOLR-1277:
---

So based on what we know, it sounds like we are going to have to use a very 
high timeout for the ZooKeeper client?

Then each node will run a thread that periodically updates its availability? 
When a node chooses its shards for a distributed search, it can look at how 
long its been since each shard updated itself, and choose or drop based on 
that? In the event that a *very* long time out period has passed, the client 
will timeout and the znode will actually be removed?

This seems like it will be easier than trying to reconnect after timeouts and 
managing Solr during the disconnected period?

Sound like the update itself might be the current load on that node - then 
nodes choosing other nodes for a distrib search can use both how recently nodes 
where updated as well as their reported loads to choose which nodes to select 
for a search?

Does this sound right?

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1663) Add numRequests to SolrCore statistics to make it easier to track load

Add numRequests to SolrCore statistics to make it easier to track load
--

 Key: SOLR-1663
 URL: https://issues.apache.org/jira/browse/SOLR-1663
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
Priority: Minor
 Attachments: SOLR-1663.patch

As we get SolrCloud up and running, its going to be helpful to track the load 
on each server.

We might add request tracking to SolrCore to help make this easier than looking 
at each request handler in each core. Number of requests is also only an 
optional stat at the RequestHandler level.

Then you can just cycle through each core and grab how many requests it has 
received, and track that over a given interval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1663) Add numRequests to SolrCore statistics to make it easier to track load


 [ 
https://issues.apache.org/jira/browse/SOLR-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-1663:
--

Attachment: SOLR-1663.patch

 Add numRequests to SolrCore statistics to make it easier to track load
 --

 Key: SOLR-1663
 URL: https://issues.apache.org/jira/browse/SOLR-1663
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
Priority: Minor
 Attachments: SOLR-1663.patch


 As we get SolrCloud up and running, its going to be helpful to track the load 
 on each server.
 We might add request tracking to SolrCore to help make this easier than 
 looking at each request handler in each core. Number of requests is also only 
 an optional stat at the RequestHandler level.
 Then you can just cycle through each core and grab how many requests it has 
 received, and track that over a given interval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791438#action_12791438
]

Yonik Seeley commented on SOLR-1277:

While our designs shouldn't preclude load based node selection, I don't think
we should tackle it now - it's fraught with peril.

We should allow the configuration of capacity for a node (or host?) and
eventually implement a load balancing mechanism that takes such capacity into
account. If one node has half the capacity of another, it will be sent half
the number of requests. This type of static balancing is easier to predict
and test.

The other issue with updating statistics is the write cost on zookeeper - we
may not want to do it by default, and if we do, we wouldn't want to do it with
a high frequency.

Some other considerations when choosing nodes for distributed search:
- the same node should be used for a particular shard for the multiple phases
of a distributed search, both for better consistency between phases, and better
caching.
- zookeeper could be used to take a node out of service (and other nodes
should immediately stop making requests to that node), but each node also needs
to be able to determine failure of another node and retry a different node
independent of zookeeper.

Everything (search traffic) should work when disconnected from zookeeper, based
on the last cluster configuration seen.

Implement a Solr specific naming service (using Zookeeper)
--

Key: SOLR-1277
URL: https://issues.apache.org/jira/browse/SOLR-1277
Project: Solr
Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 1.5

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

The goal is to give Solr server clusters self-healing attributes
where if a server fails, indexing and searching don't stop and
all of the partitions remain searchable. For configuration, the
ability to centrally deploy a new configuration without servers
going offline.
We can start with basic failover and start from there?
Features:
* Automatic failover (i.e. when a server fails, clients stop
trying to index to or search it)
* Centralized configuration management (i.e. new solrconfig.xml
or schema.xml propagates to a live Solr cluster)
* Optionally allow shards of a partition to be moved to another
server (i.e. if a server gets hot, move the hot segments out to
cooler servers). Ideally we'd have a way to detect hot segments
and move them seamlessly. With NRT this becomes somewhat more
difficult but not impossible?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791442#action_12791442
]

Mark Miller edited comment on SOLR-1277 at 12/16/09 4:38 PM:
-

Yeah, I'm not trying to tackle node selection yet - just client timeouts. But
if a client is going to be periodically updating a node to state its still in
good shape, it seems like it might as well make the update include its current
load. Not that thats not something that can'y be easily added later - I mostly
through that in because it was part of the previous recommendation on how to
handle client timeouts.

I don't necessarily like the idea of all of the nodes updating all the time to
note their existence, but it seems like our best option from what I gather now.
Otherwise, nodes will be timing out all the time - and handling the
reconnection seems like a pain - if Solr needs something from ZooKeeper after a
GC ends, its going to have to pause and wait for the reconnect. Or I guess, on
every ZooKeeper request, build in a timed retry?

My main concern at the moment is coming up with a plan for these timeouts
though. If we raise the timeout limits, we need another method for determining
nodes are down.

I suppose another option might be, its up to a node that can't reach another
node to tag it as unresponsive?

was (Author: markrmil...@gmail.com):
Yeah, I'm not trying to tackle node selection yet - just client timeouts.
But if a client is going to be periodically updating a node to state its still
in good shape, it seems like it might as well make the update include its
current load. Not that thats not something that can be easily added later - I
mostly through that in because it was part of the previous recommendation on
how to handle client timeouts.

My main concern at the moment is coming up with a plan for these timeouts
though. If we raise the timeout limits, we need another method for determining
nodes are down.

I suppose another option might be, its up to a node that can't reach another
node to tag it as unresponsive?

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791442#action_12791442
]

Mark Miller commented on SOLR-1277:
---

Yeah, I'm not trying to tackle node selection yet - just client timeouts. But
if a client is going to be periodically updating a node to state its still in
good shape, it seems like it might as well make the update include its current
load. Not that thats not something that can be easily added later - I mostly
through that in because it was part of the previous recommendation on how to
handle client timeouts.

My main concern at the moment is coming up with a plan for these timeouts
though. If we raise the timeout limits, we need another method for determining
nodes are down.

I suppose another option might be, its up to a node that can't reach another
node to tag it as unresponsive?

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791459#action_12791459
]

Yonik Seeley commented on SOLR-1277:

bq. I don't necessarily like the idea of all of the nodes updating all the time
to note their existence, but it seems like our best option from what I gather
now.

Not sure I understand... for group membership, I had assumed there would be an
ephemeral znode per node. Zookeeper does pings, and deletes the znode when the
session expires, but those aren't updates per se.

bq. My main concern at the moment is coming up with a plan for these timeouts
though.

Zookeeper client-server timeouts? Or Solr node-node request timeouts?
Zookeeper timeouts need to be handled on a per-case basis - we should design
such that most of the time we can continue operating even if we can't talk to
zookeeper.

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791474#action_12791474
]

Mark Miller commented on SOLR-1277:
---

bq. Not sure I understand... for group membership, I had assumed there would be
an ephemeral znode per node. Zookeeper does pings, and deletes the znode when
the session expires, but those aren't updates per se.

Right - thats the problem I want to address. Ephemeral nodes go away when the
client times out - with a low timeout, you can learn relatively fast that a
node is down. But because we may have long gc pauses, a low timeout will cause
false down reports. And we have to handle reconnection's. But if we raise the
timeout to get around these gc pauses, if there really is a problem, it will
take a long time to learn about it. One of the recommendations above was to use
a lease system instead, where each node does these updates. I'm trying to
determine which strategy we actually want to use. Another option given was to
let the gc cause a timeout, and then reconnect - but Solr has to wait for the
reconnection to occur before it can access ZooKeeper again.

{quote}
Zookeeper client-server timeouts? Or Solr node-node request timeouts?
Zookeeper timeouts need to be handled on a per-case basis - we should design
such that most of the time we can continue operating even if we can't talk to
zookeeper.
{quote}

Zookeeper client-server timeouts

But as you say above, if a client times out, its ephemeral node goes down, and
that shard will no longer be participating in distrib requests hitting other
servers (presumably). How can we continue operating? We won't know which shards
to hit (I guess we could use the old shards list?) and we won't be part of
distributed requests from other shards, because our ephemeral node will be
removed ...

I'm ref'ing to Patrick Hunt's comments above. Perhaps, because recovery won't
be expensive, thats what we want to do - but Solr won't be able to access
ZooKeeper until its recovered - so I guess for that brief period, we drop out
of other distrib requests, and if we get hit, we just use the old shards list
for requests that hit the dropped server?

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791475#action_12791475
]

Mark Miller commented on SOLR-1277:
---

{quote}
From our experience with hbase (which is the only place we've seen this issue
so far, at least to this extent) you need to think about:

1) client timeout value tradeoffs
2) effects of session expiration due to gc pause, potential ways to mitigate

for 1) there is a tradeoff (the good thing is that not all clients need to use
the same timeout, so you can tune based on the client type, you can even have
multiple sessions for a single client, each with it's own timeout) You can set
the timeout higher, so if your zk client pauses you don't get expired, however
this also means that if your client crashes the session won't be expired until
the timeout expires. This means that the rest of your system will not be
notified of the change (say you are doing leader election) for longer than you
might like.

for 2) you need to think about the potential failure cases and their effects.
a) Say your ZK client (solr component X) fails (the host crashes), do you need
to know about this in 5 seconds, or 30sec? b) Say the host is network
partitioned due to a burp in the network that lasts 5 seconds, is this ok, or
does the rest of the solr system need to know about this? c) Say component X gc
pauses for 4 minutes, do you want the rest of the system to react immed, or
consider this ok and just wait around for a while for X to come back but
keep in mind that from the perspective of the rest of your system you don't
know the difference between a) or b or c (etc...), from their viewpoint X is
gone and they don't know why (unless it eventually comes back)

In hbase case session expiration is expensive as the region server master will
reallocate the table (or some such). In your case the effects of X going down
may not be very expensive. If this is the case then having a low(er) session
timeout for X may not be a problem. (just deal with the session timeout when it
does happen, X will eventually come back)

If X recovery is expensive you may want to set the timeout very high. but as I
said this makes the system less responsive if X has a real problem. Another
option we explored with hbase is to use a lease recipe instead. Set a very
high timeout, but have X update the znode (still ephemeral) every N seconds. If
the rest of the system (whoever is interested in X status) doesn't see an
update from X in T seconds, then perhaps you log a warning (where is X?). Say
you don't see an update from X in T*2 seconds, then page the operator warning,
maybe problems with X. Say you don't see in T*3 seconds (perhaps this is the
timeout you use, in which case the znode is removed), consider X down, cleanup
and enact recovery. These are madeup actions/times, but you can see what I'm
getting at. With lease it's not all or nothing. You (solr) have the option to
take actions based on the lease time, rather than just the znode being deleted
in the typical case (all or nothing). The tradeoff here is that it's a bit more
complicted for you - you need to implement the lease rather than just relying
on the znode being deleted - you would of course set a watch on the znode to
get notified when the znode is removed (etc...)
{quote}

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791480#action_12791480
]

Mark Miller commented on SOLR-1277:
---

bq. so I guess for that brief period, we drop out of other distrib requests,
and if we get hit, we just use the old shards list for requests that hit the
dropped server?

I suppose what I am worried about is when you don't have duplicate shards - or
when two shards with the same data have a long gc pause together - if they just
drop out, you get results back that are not from the full index. Many would
prefer the search just take a bit longer (as it normally would with a gc) than
losing results.

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791486#action_12791486
]

Yonik Seeley commented on SOLR-1277:

bq. I suppose what I am worried about is when you don't have duplicate shards -
or when two shards with the same data have a long gc pause together - if they
just drop out, you get results back that are not from the full index.

Ahhh, good point. We can't let that happen. But if NodeA said it had ShardX,
and then it's ephemeral node went away, it's not dropping out of the cluster...
it's just that it's currently unavailable (and we need to return partial
results or fail the request).

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Patrick Hunt (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791487#action_12791487
]

Patrick Hunt commented on SOLR-1277:

You guys are asking the right questions. In particular the issue about how
expensive is it to lose a solr node is a good one to think about. Unfort I
don't know enough about solr to advise you, but if it's not very expensive to
lose/regain a node then just let it timeout. The rest of the system will see
this quickly (via ephemeral node/watch) and when the solr node is active again
(comes out of the gc pause) it will talk to the zk server, see that it's
session has been expired, and re-bootstrap into the solr cloud.

Another thing to ask yourself is this if a Solr node pauses for 4 minutes due
to GC pause, how different is that from a network partition or crash/reboot of
that node? What I'm saying here is, the node is _gone_ for 4 minutes -- what
effect does that have on the rest of your system. Say you are expecting some
very low SLA from that node, then upping the timeout is not useful here. Loss
of the solr node due to gc is no diff than network partition or crash/reboot of
the host.

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields


[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791505#action_12791505
 ] 

Noble Paul commented on SOLR-1131:
--

bq.we shouldn't try caching subfields in SchemaField

I believe The SchemaField is an ideal place to cache the 'synthetic' field 
info. 

bq.and esp not via if (type instanceof DelegatingFieldType)... it really 
doesn't belong there.

true. It was a quick and dirty way to demo the idea. 

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791508#action_12791508
]

Yonik Seeley commented on SOLR-1277:

bq. Right - thats the problem I want to address. Ephemeral nodes go away when
the client times out - with a low timeout, you can learn relatively fast that a
node is down.

My assumption was to use a longer timeout on zookeeper (the default seems fine)
to define who was active.

When a node makes a request to a node that is down, it will fail relatively
quickly, and can use a local policy to avoid that node for a certain amount of
time. Seems like we need to handle these types of failures anyway, regardless
of how low we set the zookeeper timeout.

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Brian Pinkerton (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791529#action_12791529
]

Brian Pinkerton commented on SOLR-1277:
---

I think the timeouts are going to have to be different depending on the role of
the particular node. In a really distributed setup, indexing nodes are
generally more likely to have long GC pauses than searcher nodes, and a lengthy
GC pause on an indexer is usually not a problem. However, if a searcher node
goes out on a long GC pause then you need to find out fast and bypass the box
before too many queries back up and need to be retried (though even this
depends on throughput, response time, and number of other available nodes.)

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791557#action_12791557
]

Mark Miller commented on SOLR-1277:
---

bq. maybe this is already in the spec

Nothing is completely nailed down in the spec - Yonik has done a bunch of work
on the SolrCloud page, but a lot of that is: we could do this, or we could do
that, or we might do this. We haven't really nailed much down firmly. Still
pretty high level at the moment.

bq. How are we addressing a failed connection to a slave server, and instead of
failing the request, re-making the request to an adjacent slave?

We haven't really gotten there. But we want to cover that. What do you propose?

The more we get these discussions going, the faster things will start getting
nailed down ...

bq. A failure is a failure and whether it's the GC or something else, it's
really the same thing.

Its kind of arbitrary distinctions. Your saying, we would say a GC pause of 4
seconds (under the ZK client timeout) is not a failure, and a GC timeout of 6
seconds (over the ZK client timeout) is a failure. I'm not claiming any
distinction is better than another though - just trying to work out the
directions we want to go so I can start paddling.

I can code till the cows come home with no input, but you might not like the
results :)

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791559#action_12791559
]

Mark Miller commented on SOLR-1277:
---

{quote}
I think the timeouts are going to have to be different depending on the role of
the particular node. In a really distributed setup, indexing nodes are
generally more likely to have long GC pauses than searcher nodes, and a lengthy
GC pause on an indexer is usually not a problem. However, if a searcher node
goes out on a long GC pause then you need to find out fast and bypass the box
before too many queries back up and need to be retried (though even this
depends on throughput, response time, and number of other available
nodes.){quote}

Currently, I've got a default timeout, with the ability to override it at any
node in solr.xml. Do you think thats enough?

I can imagine putting the timeout for different roles in ZooKeeper, and then a
node gets its timeout there based on its role - but then it would have to make
multiple connections - one with a default timeout to get its timeout, and then
another with the correct timeout.

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791588#action_12791588
]

Yonik Seeley commented on SOLR-1277:

bq. How are we addressing a failed connection to a slave server, and instead of
failing the request, re-making the request to an adjacent slave?

Yes, I didn't spell it out, but that's the HA part of why you have multiple
copies of a shard (in addition to increasing capacity).

bq. The way things work now, if someone searched during the GC, theyd get all
the results back, the search would just take longer. They'd see the hour glass
spinning, know the results where slow for this search, but still coming. I
was/am not sure if we wanted to replicate that.

I think we always need to support that. If/when a solr request should time out
should be on a per-request basis, and the default should probably be to not
time out at all (or at least have a very high timeout). This really doesn't
have anything to do with zookeeper.

Zookeeper gives us the layout of the cluster. It doesn't seem like we need
(yet) fast failure detection from zookeeper - other nodes can do this
synchronously themselves (and would need to anyway) on things like connection
failures. App-level timeouts should not mark the node as failed since we don't
know how long the request was supposed to take.

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-16 Thread Mahadev konar (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791633#action_12791633
]

Mahadev konar commented on SOLR-1277:
-

hi all,
this is mahadev from the zookeeper team. One of our users does similar things
that you guys have been talking about in the above comments. I am not sure how
close I am to your scenario but Ill give it a shot. Feel free to ignore my
comments if they sound stupid. One of the things that they do is - lets say
you have a machine A that is running a process P and is part of your cluster.
The way they track the status of this machine is by having 2 znodes (ZNODE1,
ZNODE2) in zookeeper. ZNODE1 is an ephemeral node (created by P) and the other
one (ZNODE2) is a normal node which contains process P specific data which is
updated from time to time by process P (like last time of update, status of
process P - good/bad/ok). If an application/user wants to access P on machine
A, they look at the ephemeral node and the data is ZNODE2 to see if process P
has any problems (not related to zookeeper) and then the application can decide
if process P actually needs to be marked dead or not. Say the ephemeral node
ZNODE1 is alive but ZNODE2 shows that process P is in a really bad state, then
application will go ahead and mark process P as dead. hope this information is
of some help!

Implement a Solr specific naming service (using Zookeeper)
--

Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch,
SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

Original Estimate: 672h
Remaining Estimate: 672h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1631) NPE's reported from QueryComponent.mergeIds

2009-12-16 Thread Harish Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791638#action_12791638
 ] 

Harish Agarwal commented on SOLR-1631:
--

I'm following up on the original thread as well - just to clarify, the error is 
being thrown FROM a search, DURING an update.

 NPE's reported from QueryComponent.mergeIds
 ---

 Key: SOLR-1631
 URL: https://issues.apache.org/jira/browse/SOLR-1631
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Hoss Man

 Multiple reports of QueryComponent.mergeIds occasionally throwing NPE...
 http://markmail.org/message/aqzaaphbuow4sa5o
 http://old.nabble.com/NullPointerException-thrown-during-updates-to-index-to26613309.html#a26613309

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-16 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791665#action_12791665
 ] 

Grant Ingersoll commented on SOLR-1131:
---

I have a new patch in the works that makes creating the SchemaField lighter 
weight.  I agree w/ Yonik, I don't think this can be cached in general.  Also, 
I've done away with the Delegating Field Type.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields


[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791789#action_12791789
 ] 

Noble Paul commented on SOLR-1131:
--

I guess we need to revamp the API.

The FieldType should act as a factory of SchemaField. And SchemaField does not 
have to be a final class. Solr Should do all the operations through that 
SchemaField

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1664) Some Methods in FieldType actually should be in SchemaField

Some Methods in FieldType actually should be in SchemaField
---

 Key: SOLR-1664
 URL: https://issues.apache.org/jira/browse/SOLR-1664
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
 Fix For: 1.5


A lot of methods in FieldType actually should be in SchemaField. As we can see 
, all the following methods require SchemaField as an argument. The point is 
that most of the information is only available w/ SchemaField
{code:java}
public Field createField(SchemaField field, String externalVal, float boost) ;
protected Field.TermVector getFieldTermVec(SchemaField field,String 
internalVal) ;
protected Field.Store getFieldStore(SchemaField field,String internalVal);
protected Field.Index getFieldIndex(SchemaField field,String internalVal);
public ValueSource getValueSource(SchemaField field, QParser parser);
public Query getRangeQuery(QParser parser, SchemaField field, String part1, 
String part2, boolean minInclusive, boolean maxInclusive) ;
{code}

As an enhancement we should treat FieldType as a factory for SchemaField and 
make SchemaField non final

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields