[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer

2013-09-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761648#comment-13761648
 ] 

Mark Miller commented on SOLR-4816:
---

Also, FYI, there are a few remaining issues to smooth out, so a handful of non 
solrcloud tests in the solrj package are failing. I'll have a second pass up 
that resolves these remaining issues before long.

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4816) Add document routing to CloudSolrServer

2013-09-08 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4816:
--

Attachment: SOLR-4816.patch

Here is my first pass on top of Joel's work. Comments to come.

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5006) CREATESHARD command for 'implicit' shards

2013-09-08 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761625#comment-13761625
 ] 

Noble Paul commented on SOLR-5006:
--

Let's open a separate issue for the ref guide

> CREATESHARD command for 'implicit' shards
> -
>
> Key: SOLR-5006
> URL: https://issues.apache.org/jira/browse/SOLR-5006
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
>
> Custom sharding requires a CREATESHARD/DELETESHARD commands
> It may not be applicable to hash based sharding 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5006) CREATESHARD command for 'implicit' shards

2013-09-08 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761623#comment-13761623
 ] 

Noble Paul commented on SOLR-5006:
--

Yes, It's an omission. Thanks for pointing it out

> CREATESHARD command for 'implicit' shards
> -
>
> Key: SOLR-5006
> URL: https://issues.apache.org/jira/browse/SOLR-5006
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
>
> Custom sharding requires a CREATESHARD/DELETESHARD commands
> It may not be applicable to hash based sharding 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5006) CREATESHARD command for 'implicit' shards

2013-09-08 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761620#comment-13761620
 ] 

Jack Krupansky commented on SOLR-5006:
--

The OverseerCollectionProcessor#createShard method supports the createNodeSet 
parameter, but the CollectionsHandler#handleCreateShard method does not copy 
that parameter from the request. Is this an oversight and intended feature for 
4.5, or dead code, or just for future enhancement?

Also, action=CREATESHARD and action=DELETESHARD need to be added to the Solr 
refGuide.


> CREATESHARD command for 'implicit' shards
> -
>
> Key: SOLR-5006
> URL: https://issues.apache.org/jira/browse/SOLR-5006
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
>
> Custom sharding requires a CREATESHARD/DELETESHARD commands
> It may not be applicable to hash based sharding 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4465) Configurable Collectors

2013-09-08 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761604#comment-13761604
 ] 

Kranti Parisa edited comment on SOLR-4465 at 9/9/13 4:30 AM:
-

Does any of those tickets support configurable collectors and choosing them 
dynamically thru request params? Is SOLR-5045 the one to use? If so, how does 
it work if I don't want to aggregate by any field, but want to do custom 
collecting/mixing.


  was (Author: krantiparisa):
Does any of those tickets support configurable collectors and choosing them 
dynamically thru request params? Is SOLR-5045 the one to use?

  
> Configurable Collectors
> ---
>
> Key: SOLR-4465
> URL: https://issues.apache.org/jira/browse/SOLR-4465
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.1
>Reporter: Joel Bernstein
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch
>
>
> This ticket provides a patch to add pluggable collectors to Solr. This patch 
> was generated and tested with Solr 4.1.
> This is how the patch functions:
> Collectors are plugged into Solr in the solconfig.xml using the new 
> collectorFactory element. For example:
> 
> 
> The elements above define two collector factories. The first one is the 
> "default" collectorFactory. The class attribute points to 
> org.apache.solr.handler.component.CollectorFactory, which implements logic 
> that returns the default TopScoreDocCollector and TopFieldCollector. 
> To create your own collectorFactory you must subclass the default 
> CollectorFactory and at a minimum override the getCollector method to return 
> your new collector. 
> The parameter "cl" turns on pluggable collectors:
> cl=true
> If cl is not in the parameters, Solr will automatically use the default 
> collectorFactory.
> *Pluggable Doclist Sorting With the Docs Collector*
> You can specify two types of pluggable collectors. The first type is the docs 
> collector. For example:
> cl.docs=
> The above param points to a named collectorFactory in the solrconfig.xml to 
> construct the collector. The docs collectorFactorys must return a collector 
> that extends the TopDocsCollector base class. Docs collectors are responsible 
> for collecting the doclist.
> You can specify only one docs collector per query.
> You can pass parameters to the docs collector using local params syntax. For 
> example:
> cl.docs=\{! sort=mycustomesort\}mycollector
> If cl=true and a docs collector is not specified, Solr will use the default 
> collectorFactory to create the docs collector.
> *Pluggable Custom Analytics With Delegating Collectors*
> You can also specify any number of custom analytic collectors with the 
> "cl.analytic" parameter. Analytic collectors are designed to collect 
> something else besides the doclist. Typically this would be some type of 
> custom analytic. For example:
> cl.analytic=sum
> The parameter above specifies a analytic collector named sum. Like the docs 
> collectors, "sum" points to a named collectorFactory in the solrconfig.xml. 
> You can specificy any number of analytic collectors by adding additional 
> cl.analytic parameters.
> Analytic collector factories must return Collector instances that extend 
> DelegatingCollector. 
> A sample analytic collector is provided in the patch through the 
> org.apache.solr.handler.component.SumCollectorFactory.
> This collectorFactory provides a very simple DelegatingCollector that groups 
> by a field and sums a column of floats. The sum collector is not designed to 
> be a fully functional sum function but to be a proof of concept for pluggable 
> analytics through delegating collectors.
> You can send parameters to analytic collectors with solr local param syntax.
> For example:
> cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
> The "id" parameter is mandatory for analytic collectors and is used to 
> identify the output from the collector. In this example the "groupby" and 
> "column" params tell the sum collector which field to group by and sum.
> Analytic collectors are passed a reference to the ResponseBuilder and can 
> place maps with analytic output directory into the SolrQueryResponse with the 
> add() method.
> Maps that are placed in the SolrQueryResponse are automatically added to the 
> outgoing response. The response will include a list named cl.analytic., 
> where id is specified in the local param.
> *Distributed Search*
> The CollectorFactory also has a method

[jira] [Comment Edited] (SOLR-4465) Configurable Collectors

2013-09-08 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761604#comment-13761604
 ] 

Kranti Parisa edited comment on SOLR-4465 at 9/9/13 4:27 AM:
-

Does any of those tickets support configurable collectors and choosing them 
dynamically thru request params? Is SOLR-5045 the one to use?


  was (Author: krantiparisa):
Does any of those tickets support configurable collectors and choosing them 
dynamically thru request params?
  
> Configurable Collectors
> ---
>
> Key: SOLR-4465
> URL: https://issues.apache.org/jira/browse/SOLR-4465
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.1
>Reporter: Joel Bernstein
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch
>
>
> This ticket provides a patch to add pluggable collectors to Solr. This patch 
> was generated and tested with Solr 4.1.
> This is how the patch functions:
> Collectors are plugged into Solr in the solconfig.xml using the new 
> collectorFactory element. For example:
> 
> 
> The elements above define two collector factories. The first one is the 
> "default" collectorFactory. The class attribute points to 
> org.apache.solr.handler.component.CollectorFactory, which implements logic 
> that returns the default TopScoreDocCollector and TopFieldCollector. 
> To create your own collectorFactory you must subclass the default 
> CollectorFactory and at a minimum override the getCollector method to return 
> your new collector. 
> The parameter "cl" turns on pluggable collectors:
> cl=true
> If cl is not in the parameters, Solr will automatically use the default 
> collectorFactory.
> *Pluggable Doclist Sorting With the Docs Collector*
> You can specify two types of pluggable collectors. The first type is the docs 
> collector. For example:
> cl.docs=
> The above param points to a named collectorFactory in the solrconfig.xml to 
> construct the collector. The docs collectorFactorys must return a collector 
> that extends the TopDocsCollector base class. Docs collectors are responsible 
> for collecting the doclist.
> You can specify only one docs collector per query.
> You can pass parameters to the docs collector using local params syntax. For 
> example:
> cl.docs=\{! sort=mycustomesort\}mycollector
> If cl=true and a docs collector is not specified, Solr will use the default 
> collectorFactory to create the docs collector.
> *Pluggable Custom Analytics With Delegating Collectors*
> You can also specify any number of custom analytic collectors with the 
> "cl.analytic" parameter. Analytic collectors are designed to collect 
> something else besides the doclist. Typically this would be some type of 
> custom analytic. For example:
> cl.analytic=sum
> The parameter above specifies a analytic collector named sum. Like the docs 
> collectors, "sum" points to a named collectorFactory in the solrconfig.xml. 
> You can specificy any number of analytic collectors by adding additional 
> cl.analytic parameters.
> Analytic collector factories must return Collector instances that extend 
> DelegatingCollector. 
> A sample analytic collector is provided in the patch through the 
> org.apache.solr.handler.component.SumCollectorFactory.
> This collectorFactory provides a very simple DelegatingCollector that groups 
> by a field and sums a column of floats. The sum collector is not designed to 
> be a fully functional sum function but to be a proof of concept for pluggable 
> analytics through delegating collectors.
> You can send parameters to analytic collectors with solr local param syntax.
> For example:
> cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
> The "id" parameter is mandatory for analytic collectors and is used to 
> identify the output from the collector. In this example the "groupby" and 
> "column" params tell the sum collector which field to group by and sum.
> Analytic collectors are passed a reference to the ResponseBuilder and can 
> place maps with analytic output directory into the SolrQueryResponse with the 
> add() method.
> Maps that are placed in the SolrQueryResponse are automatically added to the 
> outgoing response. The response will include a list named cl.analytic., 
> where id is specified in the local param.
> *Distributed Search*
> The CollectorFactory also has a method called merge(). This method aggregates 
> the results from each of the shards during distributed search. The "default" 
> CollectoryFactory

[jira] [Commented] (SOLR-4465) Configurable Collectors

2013-09-08 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761604#comment-13761604
 ] 

Kranti Parisa commented on SOLR-4465:
-

Does any of those tickets support configurable collectors and choosing them 
dynamically thru request params?

> Configurable Collectors
> ---
>
> Key: SOLR-4465
> URL: https://issues.apache.org/jira/browse/SOLR-4465
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.1
>Reporter: Joel Bernstein
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch
>
>
> This ticket provides a patch to add pluggable collectors to Solr. This patch 
> was generated and tested with Solr 4.1.
> This is how the patch functions:
> Collectors are plugged into Solr in the solconfig.xml using the new 
> collectorFactory element. For example:
> 
> 
> The elements above define two collector factories. The first one is the 
> "default" collectorFactory. The class attribute points to 
> org.apache.solr.handler.component.CollectorFactory, which implements logic 
> that returns the default TopScoreDocCollector and TopFieldCollector. 
> To create your own collectorFactory you must subclass the default 
> CollectorFactory and at a minimum override the getCollector method to return 
> your new collector. 
> The parameter "cl" turns on pluggable collectors:
> cl=true
> If cl is not in the parameters, Solr will automatically use the default 
> collectorFactory.
> *Pluggable Doclist Sorting With the Docs Collector*
> You can specify two types of pluggable collectors. The first type is the docs 
> collector. For example:
> cl.docs=
> The above param points to a named collectorFactory in the solrconfig.xml to 
> construct the collector. The docs collectorFactorys must return a collector 
> that extends the TopDocsCollector base class. Docs collectors are responsible 
> for collecting the doclist.
> You can specify only one docs collector per query.
> You can pass parameters to the docs collector using local params syntax. For 
> example:
> cl.docs=\{! sort=mycustomesort\}mycollector
> If cl=true and a docs collector is not specified, Solr will use the default 
> collectorFactory to create the docs collector.
> *Pluggable Custom Analytics With Delegating Collectors*
> You can also specify any number of custom analytic collectors with the 
> "cl.analytic" parameter. Analytic collectors are designed to collect 
> something else besides the doclist. Typically this would be some type of 
> custom analytic. For example:
> cl.analytic=sum
> The parameter above specifies a analytic collector named sum. Like the docs 
> collectors, "sum" points to a named collectorFactory in the solrconfig.xml. 
> You can specificy any number of analytic collectors by adding additional 
> cl.analytic parameters.
> Analytic collector factories must return Collector instances that extend 
> DelegatingCollector. 
> A sample analytic collector is provided in the patch through the 
> org.apache.solr.handler.component.SumCollectorFactory.
> This collectorFactory provides a very simple DelegatingCollector that groups 
> by a field and sums a column of floats. The sum collector is not designed to 
> be a fully functional sum function but to be a proof of concept for pluggable 
> analytics through delegating collectors.
> You can send parameters to analytic collectors with solr local param syntax.
> For example:
> cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
> The "id" parameter is mandatory for analytic collectors and is used to 
> identify the output from the collector. In this example the "groupby" and 
> "column" params tell the sum collector which field to group by and sum.
> Analytic collectors are passed a reference to the ResponseBuilder and can 
> place maps with analytic output directory into the SolrQueryResponse with the 
> add() method.
> Maps that are placed in the SolrQueryResponse are automatically added to the 
> outgoing response. The response will include a list named cl.analytic., 
> where id is specified in the local param.
> *Distributed Search*
> The CollectorFactory also has a method called merge(). This method aggregates 
> the results from each of the shards during distributed search. The "default" 
> CollectoryFactory implements the default merge logic for merging documents 
> from each shard. If you define a different docs collector you can override 
> the default merge method to merge documents in accordance with how they are 
> collected at the shard lev

[jira] [Assigned] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager

2013-09-08 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-5215:
-

Assignee: Mark Miller

> Deadlock in Solr Cloud ConnectionManager
> 
>
> Key: SOLR-5215
> URL: https://issues.apache.org/jira/browse/SOLR-5215
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, SolrCloud
>Affects Versions: 4.2.1
> Environment: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 
> x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_18"
> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>Reporter: Ricardo Merizalde
>Assignee: Mark Miller
>
> We are constantly seeing a deadlocks in our production application servers.
> The problem seems to be that a thread A:
> - tries to process an event and acquires the ConnectionManager lock
> - the update callback acquires connectionUpdateLock and invokes 
> waitForConnected
> - waitForConnected tries to acquire the ConnectionManager lock (which already 
> has)
> - waitForConnected calls wait and release the ConnectionManager lock (but 
> still has the connectionUpdateLock)
> The a thread B:
> - tries to process an event and acquires the ConnectionManager lock
> - the update call back tries to acquire connectionUpdateLock but gets blocked 
> holding the ConnectionManager lock and preventing thread A from getting out 
> of the wait state.
>  
> Here is part of the thread dump:
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x59965800 
> nid=0x3e81 waiting for monitor entry [0x57169000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
> - waiting to lock <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x5ad4 
> nid=0x3e67 waiting for monitor entry [0x4dbd4000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - waiting to lock <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x2aac4c2f7000 
> nid=0x3d9a waiting for monitor entry [0x42821000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - locked <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> Found one Java-level deadlock:
> =
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
> org.apache.solr.common.cloud.ConnectionManager),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x2aac4c314978 (object 0x2aab1b0e0f78, a 
> java.lang.Object),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
> org.apache.solr.common.cloud.ConnectionManager),
>   which is held by "http-0

[jira] [Updated] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager

2013-09-08 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5215:
--

Fix Version/s: 5.0
   4.5

> Deadlock in Solr Cloud ConnectionManager
> 
>
> Key: SOLR-5215
> URL: https://issues.apache.org/jira/browse/SOLR-5215
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, SolrCloud
>Affects Versions: 4.2.1
> Environment: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 
> x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_18"
> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>Reporter: Ricardo Merizalde
>Assignee: Mark Miller
> Fix For: 4.5, 5.0
>
>
> We are constantly seeing a deadlocks in our production application servers.
> The problem seems to be that a thread A:
> - tries to process an event and acquires the ConnectionManager lock
> - the update callback acquires connectionUpdateLock and invokes 
> waitForConnected
> - waitForConnected tries to acquire the ConnectionManager lock (which already 
> has)
> - waitForConnected calls wait and release the ConnectionManager lock (but 
> still has the connectionUpdateLock)
> The a thread B:
> - tries to process an event and acquires the ConnectionManager lock
> - the update call back tries to acquire connectionUpdateLock but gets blocked 
> holding the ConnectionManager lock and preventing thread A from getting out 
> of the wait state.
>  
> Here is part of the thread dump:
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x59965800 
> nid=0x3e81 waiting for monitor entry [0x57169000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
> - waiting to lock <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x5ad4 
> nid=0x3e67 waiting for monitor entry [0x4dbd4000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - waiting to lock <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x2aac4c2f7000 
> nid=0x3d9a waiting for monitor entry [0x42821000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - locked <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> Found one Java-level deadlock:
> =
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
> org.apache.solr.common.cloud.ConnectionManager),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x2aac4c314978 (object 0x2aab1b0e0f78, a 
> java.lang.Object),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
> org.apache.solr.common.cloud.

[jira] [Created] (SOLR-5221) CloudSolrServer should default to 15 seconds for the zk client timeout, just like Solr core does.

2013-09-08 Thread Mark Miller (JIRA)
Mark Miller created SOLR-5221:
-

 Summary: CloudSolrServer should default to 15 seconds for the zk 
client timeout, just like Solr core does.
 Key: SOLR-5221
 URL: https://issues.apache.org/jira/browse/SOLR-5221
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.5, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5221) CloudSolrServer should default to 15 seconds for the zk client timeout, just like Solr core does.

2013-09-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761592#comment-13761592
 ] 

Mark Miller commented on SOLR-5221:
---

It currently defaults to 10 - the old core default.

> CloudSolrServer should default to 15 seconds for the zk client timeout, just 
> like Solr core does.
> -
>
> Key: SOLR-5221
> URL: https://issues.apache.org/jira/browse/SOLR-5221
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: A reference to a commercial algorithm in comments - is this all right?

2013-09-08 Thread Mark Miller
I see no problem with it.

- Mark


On Sun, Sep 8, 2013 at 2:50 PM, Dawid Weiss wrote:

> As part of a recent commit I cleaned up the comments surrounding the
> clustering extension in the Solr example. As part of this I added
> comments concerning configuration of clustering algorithms in the
> Carrot2 framework, but also a helpers that refer to our commercial
> clustering algorithm Lingo3G. They seem harmless to me, as in:
>
>   
>name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm
>
> I admit the reason I included these was not to promote the algorithm,
> but to limit the number of support requests we get where users are not
> sure how to modify Solr configuration to use Lingo3G out of the box...
>
> Is this something that is ok or does it bother anybody? If so, let me
> know and I will remove those two references from comments.
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
- Mark


[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer

2013-09-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761550#comment-13761550
 ] 

Mark Miller commented on SOLR-4816:
---

bq. this "high priority" ... Jira ... is still listed as "Minor".

My personal priority list has nothing to do with the severity in JIRA for this 
issue. I'm assigned and working on this - surprising or not.

I have stated that this is an important issue that is on the road map and that 
it is high priority for me to get into 4.5. Nothing has changed.

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761545#comment-13761545
 ] 

Benson Margulies commented on LUCENE-5202:
--

Well, it only took me about 10 minutes to code a class that did what I needed 
once you goosed me into coding it. I suspect that there's something that LTF 
does that I _don't_ need that explains why it is so complex. The rolling buffer 
suggests to me that it's supporting some much more flexible idea about 
lookahead than just 'grab a batch, process them, regurgitate the results 
(including extra tokens), grab the next batch.'

Or in other words, since there are analyzers in Lucene that are still using 
pre-AttributeSource methods to handle creating additional tokens, one would 
think that there would be a use for a base class that could support them easily.

in any case, you're welcome.

> LookaheadTokenFilter consumes an extra token in nextToken
> -
>
> Key: LUCENE-5202
> URL: https://issues.apache.org/jira/browse/LUCENE-5202
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.3.1
>Reporter: Benson Margulies
> Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded 
> a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
> tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
> itself, which seems to me to consume a token so that it's not seen when the 
> derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work 
> to try to use the afterPosition method to set up attributes of the token that 
> we're 'after'. Probably that was never intended. However, I'm hoping for some 
> feedback as to whether the rest of the structure here is as intended for 
> subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761544#comment-13761544
 ] 

Michael McCandless commented on LUCENE-5202:


OK I'll commit this fix ... thanks for iterating here :)  If you have any ideas 
on how to make LookaheadTF more useful please keep raising them!

> LookaheadTokenFilter consumes an extra token in nextToken
> -
>
> Key: LUCENE-5202
> URL: https://issues.apache.org/jira/browse/LUCENE-5202
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.3.1
>Reporter: Benson Margulies
> Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded 
> a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
> tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
> itself, which seems to me to consume a token so that it's not seen when the 
> derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work 
> to try to use the afterPosition method to set up attributes of the token that 
> we're 'after'. Probably that was never intended. However, I'm hoping for some 
> feedback as to whether the rest of the structure here is as intended for 
> subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-09-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761540#comment-13761540
 ] 

Michael McCandless commented on LUCENE-5123:


{quote}
1. move write() from PostingsFormat to FieldsConsumer
2. make the "push" api a subclass of FieldsConsumer that has a final 
implementation of write() and exposes the abstract api it has today (e.g. 
addField)
{quote}

I started down this path (moved the write method to FieldsConsumer, and created 
a PushFieldsConsumer subclass that impls final write, exposing the current API) 
but ... this causes problems for wrapping/delegating PostingsConsumers (e.g. 
AssertingPF, BloomPF, PulsingPF) since suddenly they must be strongly typed to 
accept only PushFieldsConsumer.  Either that or I guess we could cut each of 
these over to write().

I mean, it exposes a real issue w/ the current patch: you cannot wrap 
SimpleTextPF (or any future PF that uses the pull API) inside these PFs that 
use the push API.  Not sure what to do ...



> invert the codec postings API
> -
>
> Key: LUCENE-5123
> URL: https://issues.apache.org/jira/browse/LUCENE-5123
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch
>
>
> Currently FieldsConsumer/PostingsConsumer/etc is a "push" oriented api, e.g. 
> FreqProxTermsWriter streams the postings at flush, and the default merge() 
> takes the incoming codec api and filters out deleted docs and "pushes" via 
> same api (but that can be overridden).
> It could be cleaner if we allowed for a "pull" model instead (like 
> DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
> itself and just passed this to the codec consumer.
> This would give the codec more flexibility to e.g. do multiple passes if it 
> wanted to do things like encode high-frequency terms more efficiently with a 
> bitset-like encoding or other things...
> A codec can try to do things like this to some extent today, but its very 
> difficult (look at buffering in Pulsing). We made this change with DV and it 
> made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: A reference to a commercial algorithm in comments - is this all right?

2013-09-08 Thread Dawid Weiss
I think less configuration there is an improvement :)

The problem seems to be that "carrot.algorithm" can be any clustering
component that plugs into the Carrot2 framework -- including our
commercial algorithm (I don't think there's anything else besides
that). We suck at brand differentiation and many of our customer have
found it difficult to tell the difference between Carrot2, Lingo,
Lingo3G and where to put the configuration bits and pieces in Solr
code. So while it doesn't harm any users of the open source algorithms
it helps those already using (or willing to try) the commercial
algorithm to locate the relevant bits.

Dawid

On Sun, Sep 8, 2013 at 8:57 PM, Simon Willnauer
 wrote:
> I don't think it's an issue - if it helps users to conclude how to get
> it I think it's actually an improvement!
>
> simon
>
> On Sun, Sep 8, 2013 at 8:50 PM, Dawid Weiss
>  wrote:
>> As part of a recent commit I cleaned up the comments surrounding the
>> clustering extension in the Solr example. As part of this I added
>> comments concerning configuration of clustering algorithms in the
>> Carrot2 framework, but also a helpers that refer to our commercial
>> clustering algorithm Lingo3G. They seem harmless to me, as in:
>>
>>   
>>   > name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm
>>
>> I admit the reason I included these was not to promote the algorithm,
>> but to limit the number of support requests we get where users are not
>> sure how to modify Solr configuration to use Lingo3G out of the box...
>>
>> Is this something that is ok or does it bother anybody? If so, let me
>> know and I will remove those two references from comments.
>>
>> Dawid
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: A reference to a commercial algorithm in comments - is this all right?

2013-09-08 Thread Simon Willnauer
I don't think it's an issue - if it helps users to conclude how to get
it I think it's actually an improvement!

simon

On Sun, Sep 8, 2013 at 8:50 PM, Dawid Weiss
 wrote:
> As part of a recent commit I cleaned up the comments surrounding the
> clustering extension in the Solr example. As part of this I added
> comments concerning configuration of clustering algorithms in the
> Carrot2 framework, but also a helpers that refer to our commercial
> clustering algorithm Lingo3G. They seem harmless to me, as in:
>
>   
>name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm
>
> I admit the reason I included these was not to promote the algorithm,
> but to limit the number of support requests we get where users are not
> sure how to modify Solr configuration to use Lingo3G out of the box...
>
> Is this something that is ok or does it bother anybody? If so, let me
> know and I will remove those two references from comments.
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



A reference to a commercial algorithm in comments - is this all right?

2013-09-08 Thread Dawid Weiss
As part of a recent commit I cleaned up the comments surrounding the
clustering extension in the Solr example. As part of this I added
comments concerning configuration of clustering algorithms in the
Carrot2 framework, but also a helpers that refer to our commercial
clustering algorithm Lingo3G. They seem harmless to me, as in:

  
  org.carrot2.clustering.lingo.LingoClusteringAlgorithm

I admit the reason I included these was not to promote the algorithm,
but to limit the number of support requests we get where users are not
sure how to modify Solr configuration to use Lingo3G out of the box...

Is this something that is ok or does it bother anybody? If so, let me
know and I will remove those two references from comments.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5217) CachedSqlEntity fails with stored procedure

2013-09-08 Thread Hardik Upadhyay (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761505#comment-13761505
 ] 

Hardik Upadhyay commented on SOLR-5217:
---

CachedSqlEntityProcessor should take in consideration where clauses in case of 
sql query and parameters passed in case of stored procedure.

> CachedSqlEntity fails with stored procedure
> ---
>
> Key: SOLR-5217
> URL: https://issues.apache.org/jira/browse/SOLR-5217
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Hardik Upadhyay
> Attachments: db-data-config.xml
>
>
> When using DIH with CachedSqlEntityProcessor and importing data from MS-sql 
> using stored procedures, it imports data for nested entities only once and 
> then every call with different arguments for nested entities are only served 
> from cache.My db-data-config is attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5217) CachedSqlEntity fails with stored procedure

2013-09-08 Thread Hardik Upadhyay (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761503#comment-13761503
 ] 

Hardik Upadhyay commented on SOLR-5217:
---

Yes,over the iteration on parent entity, child entity's parametrized stored 
procedure params are changing but,CachedSqlEntityProcessor returns same 
result.More over tracing DB calls it revels the fact that those SPs are called 
only once during the DIH run.

> CachedSqlEntity fails with stored procedure
> ---
>
> Key: SOLR-5217
> URL: https://issues.apache.org/jira/browse/SOLR-5217
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Hardik Upadhyay
> Attachments: db-data-config.xml
>
>
> When using DIH with CachedSqlEntityProcessor and importing data from MS-sql 
> using stored procedures, it imports data for nested entities only once and 
> then every call with different arguments for nested entities are only served 
> from cache.My db-data-config is attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer

2013-09-08 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761489#comment-13761489
 ] 

Jack Krupansky commented on SOLR-4816:
--

I was surprised to see that this "high priority" is still not committed for 
4.5. Although, the actual Jira priority is still listed as "Minor".

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



copy/paste typo in solr.cloud.Overseer.getShardNames exception

2013-09-08 Thread Jack Krupansky
In org.apache.solr.cloud.Overseer.getShardNames of branch_4x, the second 
exception message is an exact copy of the first, but probably should be 
something like “shards param must specify at least one shard”:

static void getShardNames(List shardNames, String shards) {
   if(shards ==null)
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "shards" + " 
is a required param");
   for (String s : shards.split(",")) {
 if(s ==null || s.trim().isEmpty()) continue;
 shardNames.add(s.trim());
   }
   if(shardNames.isEmpty())
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "shards" + " 
is a required param");

}

-- Jack Krupansky

[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761475#comment-13761475
 ] 

Benson Margulies commented on LUCENE-5202:
--

OK, I see.

So I'll leave it to you to apply this patch to pick up the fix you made.

thanks

> LookaheadTokenFilter consumes an extra token in nextToken
> -
>
> Key: LUCENE-5202
> URL: https://issues.apache.org/jira/browse/LUCENE-5202
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.3.1
>Reporter: Benson Margulies
> Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded 
> a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
> tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
> itself, which seems to me to consume a token so that it's not seen when the 
> derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work 
> to try to use the afterPosition method to set up attributes of the token that 
> we're 'after'. Probably that was never intended. However, I'm hoping for some 
> feedback as to whether the rest of the structure here is as intended for 
> subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761264#comment-13761264
 ] 

Michael McCandless commented on LUCENE-5202:


bq. There's a call to peekToken in nextToken used to detect the end of the 
input. When that gets called, a token 'moves' from the input to the positions, 
so the calls to peekToken in my code never see it.

OK I think I see.

So, your peekSentence has peek'd N tokens, up until it saw a '.' token.  Then, 
your incrementToken does nextToken() to get through those buffered tokens, 
tweaking atts before returning, but then on the first nextToken() after the 
lookahead buffer is exhausted, peekToken() is called directly from nextToken() 
and you have no chance to intercept that.

But note that this token doesn't actually move to positions (get buffered); it 
just "passes through", i.e. when nextToken returns the atts of that new token 
are "live" in the attributes and you could examine it "live".

Or, maybe, you could use a counter, incremented as you peek tokens in 
peekSentence, and then decremented as you nextToken() off the lookahead, and 
once that reaches 0 you peekSentence() again?  Or, maybe LookaheadTF should do 
this for you, e.g. provide a lookaheadCount saying how many tokens are in the 
lookahead buffer.

Net/net, it may be a lot easier to just make your own dedicated class :)  It 
would have direct control over the buffer, so you wouldn't have to deal with 
the confusing flow of LookaheadTF.


> LookaheadTokenFilter consumes an extra token in nextToken
> -
>
> Key: LUCENE-5202
> URL: https://issues.apache.org/jira/browse/LUCENE-5202
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.3.1
>Reporter: Benson Margulies
> Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded 
> a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
> tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
> itself, which seems to me to consume a token so that it's not seen when the 
> derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work 
> to try to use the afterPosition method to set up attributes of the token that 
> we're 'after'. Probably that was never intended. However, I'm hoping for some 
> feedback as to whether the rest of the structure here is as intended for 
> subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 374 - Failure

2013-09-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/374/

1 tests failed.
REGRESSION:  org.apache.lucene.index.Test2BPostings.test

Error Message:
GC overhead limit exceeded

Stack Trace:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
__randomizedtesting.SeedInfo.seed([3E09E7626DB890C6:B65DD8B8C344FD3E]:0)
at 
org.apache.lucene.document.Document.storedFieldsIterator(Document.java:306)
at org.apache.lucene.document.Document.access$100(Document.java:45)
at org.apache.lucene.document.Document$2.iterator(Document.java:300)
at 
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:194)
at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:254)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:446)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1519)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1189)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1170)
at org.apache.lucene.index.Test2BPostings.test(Test2BPostings.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)




Build Log:
[...truncated 1108 lines...]
   [junit4] Suite: org.apache.lucene.index.Test2BPostings
   [junit4]   2> NOTE: download the large Jenkins line-docs file by running 
'ant get-jenkins-line-docs' in the lucene directory.
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=Test2BPostings 
-Dtests.method=test -Dtests.seed=3E09E7626DB890C6 -Dtests.multiplier=2 
-Dtests.nightly=true -Dtests.slow=true 
-Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
-Dtests.locale=sk -Dtests.timezone=Europe/Vatican 
-Dtests.file.encoding=ISO-8859-1
   [junit4] ERROR171s J0 | Test2BPostings.test <<<
   [junit4]> Throwable #1: java.lang.OutOfMemoryError: GC overhead limit 
exceeded
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([3E09E7626DB890C6:B65DD8B8C344FD3E]:0)
   [junit4]>at 
org.apache.lucene.document.Document.storedFieldsIterator(Document.java:306)
   [junit4]>at 
org.apache.lucene.document.Document.access$100(Document.java:45)
   [junit4]>at 
org.apache.lucene.document.Document$2.iterator(Document.java:300)
   [junit4]>at 
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:194)
   [junit4]>at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:254)
   [junit4]>at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:446)
   [junit4]>at 
org.apache.lucene.index.IndexWriter.updateDocument(Inde

[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine

2013-09-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761260#comment-13761260
 ] 

ASF subversion and git services commented on SOLR-5201:
---

Commit 1520859 from [~teofili] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1520859 ]

SOLR-5201 - patch backported to branch_4x

> UIMAUpdateRequestProcessor should reuse the AnalysisEngine
> --
>
> Key: SOLR-5201
> URL: https://issues.apache.org/jira/browse/SOLR-5201
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - UIMA
>Affects Versions: 4.4
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5201-ae-cache-every-request_branch_4x.patch, 
> SOLR-5201-ae-cache-only-single-request_branch_4x.patch
>
>
> As reported in http://markmail.org/thread/2psiyl4ukaejl4fx 
> UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request 
> which is bad for performance therefore it'd be nice if such AEs could be 
> reused whenever that's possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine

2013-09-08 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761259#comment-13761259
 ] 

Tommaso Teofili commented on SOLR-5201:
---

ok good, thanks. I'll merge it to branch_4x too.

> UIMAUpdateRequestProcessor should reuse the AnalysisEngine
> --
>
> Key: SOLR-5201
> URL: https://issues.apache.org/jira/browse/SOLR-5201
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - UIMA
>Affects Versions: 4.4
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5201-ae-cache-every-request_branch_4x.patch, 
> SOLR-5201-ae-cache-only-single-request_branch_4x.patch
>
>
> As reported in http://markmail.org/thread/2psiyl4ukaejl4fx 
> UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request 
> which is bad for performance therefore it'd be nice if such AEs could be 
> reused whenever that's possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761254#comment-13761254
 ] 

Benson Margulies commented on LUCENE-5202:
--

Yes, that's what I have and it works, except for the problem I wrote this test 
case to demonstrate. There's a call to peekToken in nextToken used to detect 
the end of the input. When that gets called, a token 'moves' from the input to 
the positions, so the calls to peekToken in my code never see it.

Either I'm supposed to call restoreState to examine it, or there's a problem 
here. If I'm supposed to call restoreState, I need to figure out how to notice 
(by looking at positions?) that I'm in that situation. Or there's some problem 
in my logic for deciding when to do my next load of peeks, so that nextToken is 
never supposed to reach that call to peek, but I can't figure out what it is.


> LookaheadTokenFilter consumes an extra token in nextToken
> -
>
> Key: LUCENE-5202
> URL: https://issues.apache.org/jira/browse/LUCENE-5202
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.3.1
>Reporter: Benson Margulies
> Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded 
> a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
> tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
> itself, which seems to me to consume a token so that it's not seen when the 
> derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work 
> to try to use the afterPosition method to set up attributes of the token that 
> we're 'after'. Probably that was never intended. However, I'm hoping for some 
> feedback as to whether the rest of the structure here is as intended for 
> subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761246#comment-13761246
 ] 

Michael McCandless commented on LUCENE-5202:


Oh, sorry, I see; I indeed thought you were trying to create new tokens (and, 
changed the test to do so).

OK, so for your first case (just changing attrs based on looked-ahead tokens), 
afterPosition is not the right place to do that: this method is effectively 
called after the last token leaving the current position has been emitted, and 
before setting attrs to the state for the next token.  It's basically "between" 
tokens.

If you just want to change the att values, I think you should do that in your 
incrementToken, i.e. it would first call nextToken(), and if that returned 
true, it would then futz w/ the attrs and return true.  Would that work?

> LookaheadTokenFilter consumes an extra token in nextToken
> -
>
> Key: LUCENE-5202
> URL: https://issues.apache.org/jira/browse/LUCENE-5202
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.3.1
>Reporter: Benson Margulies
> Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded 
> a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
> tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
> itself, which seems to me to consume a token so that it's not seen when the 
> derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work 
> to try to use the afterPosition method to set up attributes of the token that 
> we're 'after'. Probably that was never intended. However, I'm hoping for some 
> feedback as to whether the rest of the structure here is as intended for 
> subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org