[jira] [Commented] (LUCENE-5021) NextDoc NPE safety when bulk collecting

2013-09-09 Thread Simon Endele (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761664#comment-13761664
 ] 

Simon Endele commented on LUCENE-5021:
--

I think what you originally searched for is this: SOLR-5020

> NextDoc NPE safety when bulk collecting
> ---
>
> Key: LUCENE-5021
> URL: https://issues.apache.org/jira/browse/LUCENE-5021
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/other
>Affects Versions: 3.6.2
> Environment: Any with custom filters
>Reporter: Alexis Torres Paderewski
>  Labels: NPE,, Null-Safety, Scorer
>
> Hello,
> I would like to apply ACL once as a PostFilter and I therefore need to bulk 
> this call since round trips would severely decrease performances.
> I tried to just stack them on the DelegatingCollector using this collect :
> @Override
> public void collect(int doc) throws IOException {
> while ((doc = scorer.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
> docs.put(getDocumentId(doc), doc);
> }
> batchCollect();
> }
> Depending on the Scorer it may or it may not work. Indeed when the Scorer is 
> "Safe"  that is when it handles 
> the case in which the scorer is exhausted and is called once again after 
> exhaustion.
> This is the case of the (e.g. DisjunctionMaxScorer, ConstantScorer):
> if (numScorers == 0) return doc = NO_MORE_DOCS; 
> On the other hand, when using the DisjunctionSumScorer, it either asserts on 
> "NO_MORE_DOCS", or it throws a NPE.
> Shouldn't we copy the DisjunctionMaxScorer mechanism to protect nextDoc of an 
> exausted iterator using either current doc or checking numbers of subScorers ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine

2013-09-09 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili resolved SOLR-5201.
---

Resolution: Fixed

> UIMAUpdateRequestProcessor should reuse the AnalysisEngine
> --
>
> Key: SOLR-5201
> URL: https://issues.apache.org/jira/browse/SOLR-5201
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - UIMA
>Affects Versions: 4.4
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5201-ae-cache-every-request_branch_4x.patch, 
> SOLR-5201-ae-cache-only-single-request_branch_4x.patch
>
>
> As reported in http://markmail.org/thread/2psiyl4ukaejl4fx 
> UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request 
> which is bad for performance therefore it'd be nice if such AEs could be 
> reused whenever that's possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5222) Sorting on dynamic fields using DocValues sorts empty values always first

2013-09-09 Thread Pascal Chollet (JIRA)
Pascal Chollet created SOLR-5222:


 Summary: Sorting on dynamic fields using DocValues sorts empty 
values always first
 Key: SOLR-5222
 URL: https://issues.apache.org/jira/browse/SOLR-5222
 Project: Solr
  Issue Type: Bug
Reporter: Pascal Chollet
Priority: Minor


When using DocValues for sort fields, "sortMissingLast=true" seems not to work 
- which makes sense as DocValues require a value for every document. The 
workaround is to use a default value which is alphanumericly sorted last. But 
when specifying the sort field as a dynamic field, the default value is not 
applied when a document does not contain that field.
To make it work, I had to define every single sort field explicitly.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5149) Query facet to respect mincount

2013-09-09 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated SOLR-5149:


Attachment: SOLR-5149-trunk.patch

Updated patch for trunk.

> Query facet to respect mincount
> ---
>
> Key: SOLR-5149
> URL: https://issues.apache.org/jira/browse/SOLR-5149
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 4.4
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5149-trunk.patch, SOLR-5149-trunk.patch, 
> SOLR-5149-trunk.patch, SOLR-5149-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2013-09-09 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4478:
-

Assignee: (was: Erick Erickson)

> Allow cores to specify a named config set in non-SolrCloud mode
> ---
>
> Key: SOLR-4478
> URL: https://issues.apache.org/jira/browse/SOLR-4478
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.2, 5.0
>Reporter: Erick Erickson
> Attachments: SOLR-4478.patch, SOLR-4478.patch
>
>
> Part of moving forward to "the new way", after SOLR-4196 etc... I propose an 
> additional parameter specified on the  node in solr.xml or as a 
> parameter in the "discovery" mode core.properties file, call it configSet, 
> where the value provided is a path to a directory, either absolute or 
> relative. Really, this is as though you copied the conf directory somewhere 
> to be used by more than one core.
> Straw-man: There will be a directory /configsets which will be the 
> default. If the configSet parameter is, say, "myconf", then I'd expect a 
> directory named "myconf" to exist in /configsets, which would look 
> something like
> /configsets/myconf/schema.xml
>   solrconfig.xml
>   stopwords.txt
>   velocity
>   velocity/query.vm
> etc.
> If multiple cores used the same configSet, schema, solrconfig etc. would all 
> be shared (i.e. shareSchema="true" would be assumed). I don't see a good 
> use-case for _not_ sharing schemas, so I don't propose to allow this to be 
> turned off. Hmmm, what if shareSchema is explicitly set to false in the 
> solr.xml or properties file? I'd guess it should be honored but maybe log a 
> warning?
> Mostly I'm putting this up for comments. I know that there are already 
> thoughts about how this all should work floating around, so before I start 
> any work on this I thought I'd at least get an idea of whether this is the 
> way people are thinking about going.
> Configset can be either a relative or absolute path, if relative it's assumed 
> to be relative to .
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer

2013-09-09 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761797#comment-13761797
 ] 

Joel Bernstein commented on SOLR-4816:
--

Awesome! Looks like javabin transport is part of this well. My earlier tests 
showed this provided a large performance increase.

Also looks like you cleaned up the UpdateRequestExt, which is good. 

Hope to have a chance today to apply the patch and test things out.

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4816) Add document routing to CloudSolrServer

2013-09-09 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761797#comment-13761797
 ] 

Joel Bernstein edited comment on SOLR-4816 at 9/9/13 12:32 PM:
---

Awesome! Looks like javabin transport is part of this as well. My earlier tests 
showed this provided a large performance increase.

Also looks like you cleaned up the UpdateRequestExt, which is good. 

Hope to have a chance today to apply the patch and test things out.

  was (Author: joel.bernstein):
Awesome! Looks like javabin transport is part of this well. My earlier 
tests showed this provided a large performance increase.

Also looks like you cleaned up the UpdateRequestExt, which is good. 

Hope to have a chance today to apply the patch and test things out.
  
> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4465) Configurable Collectors

2013-09-09 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761802#comment-13761802
 ] 

Joel Bernstein commented on SOLR-4465:
--

That collecting/mixing ticket is still to come. It is going to be similar to 
SOLR-5045, accept you'll be able to plugin Rankers using the PostFilter 
mechanism. Still need to work out some of the details of this though.

> Configurable Collectors
> ---
>
> Key: SOLR-4465
> URL: https://issues.apache.org/jira/browse/SOLR-4465
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.1
>Reporter: Joel Bernstein
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch
>
>
> This ticket provides a patch to add pluggable collectors to Solr. This patch 
> was generated and tested with Solr 4.1.
> This is how the patch functions:
> Collectors are plugged into Solr in the solconfig.xml using the new 
> collectorFactory element. For example:
> 
> 
> The elements above define two collector factories. The first one is the 
> "default" collectorFactory. The class attribute points to 
> org.apache.solr.handler.component.CollectorFactory, which implements logic 
> that returns the default TopScoreDocCollector and TopFieldCollector. 
> To create your own collectorFactory you must subclass the default 
> CollectorFactory and at a minimum override the getCollector method to return 
> your new collector. 
> The parameter "cl" turns on pluggable collectors:
> cl=true
> If cl is not in the parameters, Solr will automatically use the default 
> collectorFactory.
> *Pluggable Doclist Sorting With the Docs Collector*
> You can specify two types of pluggable collectors. The first type is the docs 
> collector. For example:
> cl.docs=
> The above param points to a named collectorFactory in the solrconfig.xml to 
> construct the collector. The docs collectorFactorys must return a collector 
> that extends the TopDocsCollector base class. Docs collectors are responsible 
> for collecting the doclist.
> You can specify only one docs collector per query.
> You can pass parameters to the docs collector using local params syntax. For 
> example:
> cl.docs=\{! sort=mycustomesort\}mycollector
> If cl=true and a docs collector is not specified, Solr will use the default 
> collectorFactory to create the docs collector.
> *Pluggable Custom Analytics With Delegating Collectors*
> You can also specify any number of custom analytic collectors with the 
> "cl.analytic" parameter. Analytic collectors are designed to collect 
> something else besides the doclist. Typically this would be some type of 
> custom analytic. For example:
> cl.analytic=sum
> The parameter above specifies a analytic collector named sum. Like the docs 
> collectors, "sum" points to a named collectorFactory in the solrconfig.xml. 
> You can specificy any number of analytic collectors by adding additional 
> cl.analytic parameters.
> Analytic collector factories must return Collector instances that extend 
> DelegatingCollector. 
> A sample analytic collector is provided in the patch through the 
> org.apache.solr.handler.component.SumCollectorFactory.
> This collectorFactory provides a very simple DelegatingCollector that groups 
> by a field and sums a column of floats. The sum collector is not designed to 
> be a fully functional sum function but to be a proof of concept for pluggable 
> analytics through delegating collectors.
> You can send parameters to analytic collectors with solr local param syntax.
> For example:
> cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
> The "id" parameter is mandatory for analytic collectors and is used to 
> identify the output from the collector. In this example the "groupby" and 
> "column" params tell the sum collector which field to group by and sum.
> Analytic collectors are passed a reference to the ResponseBuilder and can 
> place maps with analytic output directory into the SolrQueryResponse with the 
> add() method.
> Maps that are placed in the SolrQueryResponse are automatically added to the 
> outgoing response. The response will include a list named cl.analytic., 
> where id is specified in the local param.
> *Distributed Search*
> The CollectorFactory also has a method called merge(). This method aggregates 
> the results from each of the shards during distributed search. The "default" 
> CollectoryFactory implements the default merge logic for merging documents 
> from each shard. If you define a different docs collector you can over

[jira] [Comment Edited] (SOLR-4465) Configurable Collectors

2013-09-09 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761802#comment-13761802
 ] 

Joel Bernstein edited comment on SOLR-4465 at 9/9/13 12:39 PM:
---

The collecting/mixing ticket is still to come. It is going to be similar to 
SOLR-5045, accept you'll be able to plugin Rankers using the PostFilter 
mechanism. Still need to work out some of the details of this though.

  was (Author: joel.bernstein):
That collecting/mixing ticket is still to come. It is going to be similar 
to SOLR-5045, accept you'll be able to plugin Rankers using the PostFilter 
mechanism. Still need to work out some of the details of this though.
  
> Configurable Collectors
> ---
>
> Key: SOLR-4465
> URL: https://issues.apache.org/jira/browse/SOLR-4465
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.1
>Reporter: Joel Bernstein
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch
>
>
> This ticket provides a patch to add pluggable collectors to Solr. This patch 
> was generated and tested with Solr 4.1.
> This is how the patch functions:
> Collectors are plugged into Solr in the solconfig.xml using the new 
> collectorFactory element. For example:
> 
> 
> The elements above define two collector factories. The first one is the 
> "default" collectorFactory. The class attribute points to 
> org.apache.solr.handler.component.CollectorFactory, which implements logic 
> that returns the default TopScoreDocCollector and TopFieldCollector. 
> To create your own collectorFactory you must subclass the default 
> CollectorFactory and at a minimum override the getCollector method to return 
> your new collector. 
> The parameter "cl" turns on pluggable collectors:
> cl=true
> If cl is not in the parameters, Solr will automatically use the default 
> collectorFactory.
> *Pluggable Doclist Sorting With the Docs Collector*
> You can specify two types of pluggable collectors. The first type is the docs 
> collector. For example:
> cl.docs=
> The above param points to a named collectorFactory in the solrconfig.xml to 
> construct the collector. The docs collectorFactorys must return a collector 
> that extends the TopDocsCollector base class. Docs collectors are responsible 
> for collecting the doclist.
> You can specify only one docs collector per query.
> You can pass parameters to the docs collector using local params syntax. For 
> example:
> cl.docs=\{! sort=mycustomesort\}mycollector
> If cl=true and a docs collector is not specified, Solr will use the default 
> collectorFactory to create the docs collector.
> *Pluggable Custom Analytics With Delegating Collectors*
> You can also specify any number of custom analytic collectors with the 
> "cl.analytic" parameter. Analytic collectors are designed to collect 
> something else besides the doclist. Typically this would be some type of 
> custom analytic. For example:
> cl.analytic=sum
> The parameter above specifies a analytic collector named sum. Like the docs 
> collectors, "sum" points to a named collectorFactory in the solrconfig.xml. 
> You can specificy any number of analytic collectors by adding additional 
> cl.analytic parameters.
> Analytic collector factories must return Collector instances that extend 
> DelegatingCollector. 
> A sample analytic collector is provided in the patch through the 
> org.apache.solr.handler.component.SumCollectorFactory.
> This collectorFactory provides a very simple DelegatingCollector that groups 
> by a field and sums a column of floats. The sum collector is not designed to 
> be a fully functional sum function but to be a proof of concept for pluggable 
> analytics through delegating collectors.
> You can send parameters to analytic collectors with solr local param syntax.
> For example:
> cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
> The "id" parameter is mandatory for analytic collectors and is used to 
> identify the output from the collector. In this example the "groupby" and 
> "column" params tell the sum collector which field to group by and sum.
> Analytic collectors are passed a reference to the ResponseBuilder and can 
> place maps with analytic output directory into the SolrQueryResponse with the 
> add() method.
> Maps that are placed in the SolrQueryResponse are automatically added to the 
> outgoing response. The response will include a list named cl.analytic., 
> where id is specified in the local param.
> *Dis

[jira] [Created] (SOLR-5223) SolrCloud should use JavaBin communication by default.

2013-09-09 Thread Mark Miller (JIRA)
Mark Miller created SOLR-5223:
-

 Summary: SolrCloud should use JavaBin communication by default.
 Key: SOLR-5223
 URL: https://issues.apache.org/jira/browse/SOLR-5223
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.5, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3249) Allow CloudSolrServer and SolrCmdDistributor to use JavaBin.

2013-09-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3249:
--

Summary: Allow CloudSolrServer and SolrCmdDistributor to use JavaBin.  
(was: Look into making CloudSolrServer and SolrCmdDistributor talk fully in 
JavaBin)

> Allow CloudSolrServer and SolrCmdDistributor to use JavaBin.
> 
>
> Key: SOLR-3249
> URL: https://issues.apache.org/jira/browse/SOLR-3249
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3249) Allow CloudSolrServer and SolrCmdDistributor to use JavaBin.

2013-09-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3249:
--

Fix Version/s: 5.0
   4.5

> Allow CloudSolrServer and SolrCmdDistributor to use JavaBin.
> 
>
> Key: SOLR-3249
> URL: https://issues.apache.org/jira/browse/SOLR-3249
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.5, 5.0
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3249) Allow CloudSolrServer and SolrCmdDistributor to use JavaBin.

2013-09-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761838#comment-13761838
 ] 

Mark Miller commented on SOLR-3249:
---

Because it made sense to tackle destroying UpdateRequestExt in SOLR-4816, I did 
some work on this as part of the patch in SOLR-4816.

> Allow CloudSolrServer and SolrCmdDistributor to use JavaBin.
> 
>
> Key: SOLR-3249
> URL: https://issues.apache.org/jira/browse/SOLR-3249
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.5, 5.0
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4816) Add document routing to CloudSolrServer

2013-09-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4816:
--

Attachment: SOLR-4816.patch

Here is a cleaned up patch. All tests are passing for me.

I've made some mostly minor changes as well as:

* Removed UpdateRequestExt and the Router workaround for it.
* Randomly enable/disable parallel updates in tests.
* Adds SolrCloud javabin support since it was inline with merging 
UpdateRequestExt into UpdateRequest.
* Enables parallel updates by default - the more I have thought about this, the 
more I've started feeling we should change this default. The minor back compat 
issue around it is not worth the slow default.

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer

2013-09-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761891#comment-13761891
 ] 

Mark Miller commented on SOLR-4816:
---

The patch also has the work fro SOLR-3249: "Allow CloudSolrServer and 
SolrCmdDistributor to use JavaBin", but it does not yet make it the default for 
CloudSolrServer or switch to it in the SolrCmdDistributor - I have made 
SOLR-5223 to track that change after this goes in.

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5224) SolrCmdDistributor flush functions should combine original request params

2013-09-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-5224:
-

Assignee: Mark Miller

> SolrCmdDistributor flush functions should combine original request params
> -
>
> Key: SOLR-5224
> URL: https://issues.apache.org/jira/browse/SOLR-5224
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.3.1, 4.4
>Reporter: ludovic Boutros
>Assignee: Mark Miller
>
> The flush commands in the class SolrCmdDistributor do not combine original 
> request params into external update requests.
> The actual code is :
> {code:title=SolrCmdDistributor.java|borderStyle=solid}
>   UpdateRequestExt ureq = new UpdateRequestExt();
>   
>   ModifiableSolrParams combinedParams = new ModifiableSolrParams();
>   
>   for (AddRequest aReq : alist) {
> AddUpdateCommand cmd = aReq.cmd;
> combinedParams.add(aReq.params);
>
> ureq.add(cmd.solrDoc, cmd.commitWithin, cmd.overwrite);
>   }
>   
>   if (ureq.getParams() == null) ureq.setParams(new 
> ModifiableSolrParams());
>   ureq.getParams().add(combinedParams);
> {code} 
> but, the params from the original request: cmd.getReq().getParams() should be 
> combined as well in order to get them back in custom update processors for 
> instance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5224) SolrCmdDistributor flush functions should combine original request params

2013-09-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5224:
--

Fix Version/s: 5.0
   4.5

> SolrCmdDistributor flush functions should combine original request params
> -
>
> Key: SOLR-5224
> URL: https://issues.apache.org/jira/browse/SOLR-5224
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.3.1, 4.4
>Reporter: ludovic Boutros
>Assignee: Mark Miller
> Fix For: 4.5, 5.0
>
>
> The flush commands in the class SolrCmdDistributor do not combine original 
> request params into external update requests.
> The actual code is :
> {code:title=SolrCmdDistributor.java|borderStyle=solid}
>   UpdateRequestExt ureq = new UpdateRequestExt();
>   
>   ModifiableSolrParams combinedParams = new ModifiableSolrParams();
>   
>   for (AddRequest aReq : alist) {
> AddUpdateCommand cmd = aReq.cmd;
> combinedParams.add(aReq.params);
>
> ureq.add(cmd.solrDoc, cmd.commitWithin, cmd.overwrite);
>   }
>   
>   if (ureq.getParams() == null) ureq.setParams(new 
> ModifiableSolrParams());
>   ureq.getParams().add(combinedParams);
> {code} 
> but, the params from the original request: cmd.getReq().getParams() should be 
> combined as well in order to get them back in custom update processors for 
> instance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-09-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761960#comment-13761960
 ] 

ASF subversion and git services commented on LUCENE-3069:
-

Commit 1521173 from [~billy] in branch 'dev/trunk'
[ https://svn.apache.org/r1521173 ]

LUCENE-3069: Lucene should have an entirely memory resident term dictionary

> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5224) SolrCmdDistributor flush functions should combine original request params

2013-09-09 Thread ludovic Boutros (JIRA)
ludovic Boutros created SOLR-5224:
-

 Summary: SolrCmdDistributor flush functions should combine 
original request params
 Key: SOLR-5224
 URL: https://issues.apache.org/jira/browse/SOLR-5224
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.3.1
Reporter: ludovic Boutros


The flush commands in the class SolrCmdDistributor do not combine original 
request params into external update requests.

The actual code is :

{code:title=SolrCmdDistributor.java|borderStyle=solid}
  UpdateRequestExt ureq = new UpdateRequestExt();
  
  ModifiableSolrParams combinedParams = new ModifiableSolrParams();
  
  for (AddRequest aReq : alist) {
AddUpdateCommand cmd = aReq.cmd;
combinedParams.add(aReq.params);
   
ureq.add(cmd.solrDoc, cmd.commitWithin, cmd.overwrite);
  }
  
  if (ureq.getParams() == null) ureq.setParams(new ModifiableSolrParams());
  ureq.getParams().add(combinedParams);
{code} 

but, the params from the original request: cmd.getReq().getParams() should be 
combined as well in order to get them back in custom update processors for 
instance.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762005#comment-13762005
 ] 

ASF subversion and git services commented on LUCENE-5202:
-

Commit 1521183 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1521183 ]

LUCENE-5202: allow afterPosition() to insert a token at the end as well

> LookaheadTokenFilter consumes an extra token in nextToken
> -
>
> Key: LUCENE-5202
> URL: https://issues.apache.org/jira/browse/LUCENE-5202
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.3.1
>Reporter: Benson Margulies
> Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded 
> a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
> tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
> itself, which seems to me to consume a token so that it's not seen when the 
> derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work 
> to try to use the afterPosition method to set up attributes of the token that 
> we're 'after'. Probably that was never intended. However, I'm hoping for some 
> feedback as to whether the rest of the structure here is as intended for 
> subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762011#comment-13762011
 ] 

Michael McCandless commented on LUCENE-5202:


bq. I suspect that there's something that LTF does that I don't need that 
explains why it is so complex.

I think it's trying to support arbitrary lookahead, and insertion of new 
tokens.  Sort of what a SynonymFilter would need.

But it's obviously not easy to use yet :)

> LookaheadTokenFilter consumes an extra token in nextToken
> -
>
> Key: LUCENE-5202
> URL: https://issues.apache.org/jira/browse/LUCENE-5202
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.3.1
>Reporter: Benson Margulies
> Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded 
> a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
> tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
> itself, which seems to me to consume a token so that it's not seen when the 
> derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work 
> to try to use the afterPosition method to set up attributes of the token that 
> we're 'after'. Probably that was never intended. However, I'm hoping for some 
> feedback as to whether the rest of the structure here is as intended for 
> subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-09 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-5202.


   Resolution: Fixed
Fix Version/s: 4.5
   5.0

> LookaheadTokenFilter consumes an extra token in nextToken
> -
>
> Key: LUCENE-5202
> URL: https://issues.apache.org/jira/browse/LUCENE-5202
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.3.1
>Reporter: Benson Margulies
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded 
> a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
> tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
> itself, which seems to me to consume a token so that it's not seen when the 
> derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work 
> to try to use the afterPosition method to set up attributes of the token that 
> we're 'after'. Probably that was never intended. However, I'm hoping for some 
> feedback as to whether the rest of the structure here is as intended for 
> subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager

2013-09-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5215:
--

Attachment: SOLR-5215.patch

I don't think we actually really need that separate update lock at all. This 
patch removes it.

> Deadlock in Solr Cloud ConnectionManager
> 
>
> Key: SOLR-5215
> URL: https://issues.apache.org/jira/browse/SOLR-5215
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, SolrCloud
>Affects Versions: 4.2.1
> Environment: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 
> x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_18"
> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>Reporter: Ricardo Merizalde
>Assignee: Mark Miller
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5215.patch
>
>
> We are constantly seeing a deadlocks in our production application servers.
> The problem seems to be that a thread A:
> - tries to process an event and acquires the ConnectionManager lock
> - the update callback acquires connectionUpdateLock and invokes 
> waitForConnected
> - waitForConnected tries to acquire the ConnectionManager lock (which already 
> has)
> - waitForConnected calls wait and release the ConnectionManager lock (but 
> still has the connectionUpdateLock)
> The a thread B:
> - tries to process an event and acquires the ConnectionManager lock
> - the update call back tries to acquire connectionUpdateLock but gets blocked 
> holding the ConnectionManager lock and preventing thread A from getting out 
> of the wait state.
>  
> Here is part of the thread dump:
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x59965800 
> nid=0x3e81 waiting for monitor entry [0x57169000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
> - waiting to lock <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x5ad4 
> nid=0x3e67 waiting for monitor entry [0x4dbd4000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - waiting to lock <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x2aac4c2f7000 
> nid=0x3d9a waiting for monitor entry [0x42821000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - locked <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> Found one Java-level deadlock:
> =
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
> org.apache.solr.common.cloud.ConnectionManager),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x2aac4c314978 (object 0x2aab1b0e0f78, a 
> java.lang.Object),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0

Re: A reference to a commercial algorithm in comments - is this all right?

2013-09-09 Thread Dawid Weiss
Good point, Hoss. I'll think about it. There's actually something that
is a good candidate for such a reference --
http://carrot2.github.io/solr-integration-strategies/

Perhaps it'd be better to just point people there, where there are
code samples and appropriate instructions. Thanks,

Dawid


On Mon, Sep 9, 2013 at 7:26 PM, Chris Hostetter
 wrote:
>
> : Subject: A reference to a commercial algorithm in comments - is this all
> : right?
>
> I have no objections to the root of your concern: mentioning the
> commercial plugin, and how to activate it, in the solr config comments.
>
>
> In general though, i wonder if it would be simpler / more straight forward
> not include either the "Currently available open source algorithms..." or
> "A commercial algorithm..." sections in the example solr config at all
> and instead just have a shorter comment with a pointer to a URL (either on
> the solr wiki, or carrot2.org) that lists the Algos and the FQN people
> should use to configure them.
>
> That way there's less risk that the comment gets stale because we forget
> to update it, or confuses someone when carrot2 adds a new algo that works
> fine with Solr X.Y, but the Solr X.Y was released before that Algo was
> added, so its not mentioned in the comment, etc...  All of that can be
> updated on whatever page the URL points to indepenent of the release.
>
>
>  
>   name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm
>
> ?
>
> -Hoss
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761994#comment-13761994
 ] 

ASF subversion and git services commented on LUCENE-5202:
-

Commit 1521182 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1521182 ]

LUCENE-5202: allow afterPosition() to insert a token at the end as well

> LookaheadTokenFilter consumes an extra token in nextToken
> -
>
> Key: LUCENE-5202
> URL: https://issues.apache.org/jira/browse/LUCENE-5202
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.3.1
>Reporter: Benson Margulies
> Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded 
> a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
> tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
> itself, which seems to me to consume a token so that it's not seen when the 
> derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work 
> to try to use the afterPosition method to set up attributes of the token that 
> we're 'after'. Probably that was never intended. However, I'm hoping for some 
> feedback as to whether the rest of the structure here is as intended for 
> subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: A reference to a commercial algorithm in comments - is this all right?

2013-09-09 Thread Chris Hostetter

: Subject: A reference to a commercial algorithm in comments - is this all
: right?

I have no objections to the root of your concern: mentioning the
commercial plugin, and how to activate it, in the solr config comments.


In general though, i wonder if it would be simpler / more straight forward 
not include either the "Currently available open source algorithms..." or 
"A commercial algorithm..." sections in the example solr config at all 
and instead just have a shorter comment with a pointer to a URL (either on 
the solr wiki, or carrot2.org) that lists the Algos and the FQN people 
should use to configure them.

That way there's less risk that the comment gets stale because we forget 
to update it, or confuses someone when carrot2 adds a new algo that works 
fine with Solr X.Y, but the Solr X.Y was released before that Algo was 
added, so its not mentioned in the comment, etc...  All of that can be 
updated on whatever page the URL points to indepenent of the release.


 
 org.carrot2.clustering.lingo.LingoClusteringAlgorithm

?

-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager

2013-09-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762138#comment-13762138
 ] 

ASF subversion and git services commented on SOLR-5215:
---

Commit 1521236 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1521236 ]

SOLR-5215: Fix possibility of deadlock in ZooKeeper ConnectionManager.

> Deadlock in Solr Cloud ConnectionManager
> 
>
> Key: SOLR-5215
> URL: https://issues.apache.org/jira/browse/SOLR-5215
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, SolrCloud
>Affects Versions: 4.2.1
> Environment: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 
> x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_18"
> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>Reporter: Ricardo Merizalde
>Assignee: Mark Miller
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5215.patch
>
>
> We are constantly seeing a deadlocks in our production application servers.
> The problem seems to be that a thread A:
> - tries to process an event and acquires the ConnectionManager lock
> - the update callback acquires connectionUpdateLock and invokes 
> waitForConnected
> - waitForConnected tries to acquire the ConnectionManager lock (which already 
> has)
> - waitForConnected calls wait and release the ConnectionManager lock (but 
> still has the connectionUpdateLock)
> The a thread B:
> - tries to process an event and acquires the ConnectionManager lock
> - the update call back tries to acquire connectionUpdateLock but gets blocked 
> holding the ConnectionManager lock and preventing thread A from getting out 
> of the wait state.
>  
> Here is part of the thread dump:
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x59965800 
> nid=0x3e81 waiting for monitor entry [0x57169000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
> - waiting to lock <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x5ad4 
> nid=0x3e67 waiting for monitor entry [0x4dbd4000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - waiting to lock <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x2aac4c2f7000 
> nid=0x3d9a waiting for monitor entry [0x42821000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - locked <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> Found one Java-level deadlock:
> =
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
> org.apache.solr.common.cloud.ConnectionManager),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock mon

Re: A reference to a commercial algorithm in comments - is this all right?

2013-09-09 Thread Yonik Seeley
The title of this thread caught my eye...
"commercial" vs open source should not matter.
References / configs for any 3rd party projects (including other ASF
projects) should be judged solely on  their usefulness to Solr users.

-Yonik
http://lucidworks.com


On Mon, Sep 9, 2013 at 1:31 PM, Dawid Weiss
 wrote:
> Good point, Hoss. I'll think about it. There's actually something that
> is a good candidate for such a reference --
> http://carrot2.github.io/solr-integration-strategies/
>
> Perhaps it'd be better to just point people there, where there are
> code samples and appropriate instructions. Thanks,
>
> Dawid
>
>
> On Mon, Sep 9, 2013 at 7:26 PM, Chris Hostetter
>  wrote:
>>
>> : Subject: A reference to a commercial algorithm in comments - is this all
>> : right?
>>
>> I have no objections to the root of your concern: mentioning the
>> commercial plugin, and how to activate it, in the solr config comments.
>>
>>
>> In general though, i wonder if it would be simpler / more straight forward
>> not include either the "Currently available open source algorithms..." or
>> "A commercial algorithm..." sections in the example solr config at all
>> and instead just have a shorter comment with a pointer to a URL (either on
>> the solr wiki, or carrot2.org) that lists the Algos and the FQN people
>> should use to configure them.
>>
>> That way there's less risk that the comment gets stale because we forget
>> to update it, or confuses someone when carrot2 adds a new algo that works
>> fine with Solr X.Y, but the Solr X.Y was released before that Algo was
>> added, so its not mentioned in the comment, etc...  All of that can be
>> updated on whatever page the URL points to indepenent of the release.
>>
>>
>>  
>>  > name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm
>>
>> ?
>>
>> -Hoss
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager

2013-09-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762147#comment-13762147
 ] 

ASF subversion and git services commented on SOLR-5215:
---

Commit 1521239 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1521239 ]

SOLR-5215: Fix possibility of deadlock in ZooKeeper ConnectionManager.

> Deadlock in Solr Cloud ConnectionManager
> 
>
> Key: SOLR-5215
> URL: https://issues.apache.org/jira/browse/SOLR-5215
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, SolrCloud
>Affects Versions: 4.2.1
> Environment: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 
> x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_18"
> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>Reporter: Ricardo Merizalde
>Assignee: Mark Miller
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5215.patch
>
>
> We are constantly seeing a deadlocks in our production application servers.
> The problem seems to be that a thread A:
> - tries to process an event and acquires the ConnectionManager lock
> - the update callback acquires connectionUpdateLock and invokes 
> waitForConnected
> - waitForConnected tries to acquire the ConnectionManager lock (which already 
> has)
> - waitForConnected calls wait and release the ConnectionManager lock (but 
> still has the connectionUpdateLock)
> The a thread B:
> - tries to process an event and acquires the ConnectionManager lock
> - the update call back tries to acquire connectionUpdateLock but gets blocked 
> holding the ConnectionManager lock and preventing thread A from getting out 
> of the wait state.
>  
> Here is part of the thread dump:
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x59965800 
> nid=0x3e81 waiting for monitor entry [0x57169000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
> - waiting to lock <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x5ad4 
> nid=0x3e67 waiting for monitor entry [0x4dbd4000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - waiting to lock <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x2aac4c2f7000 
> nid=0x3d9a waiting for monitor entry [0x42821000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - locked <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> Found one Java-level deadlock:
> =
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
> org.apache.solr.common.cloud.ConnectionManager),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0.0.0-8080-82-EventThread":
>   waitin

[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762213#comment-13762213
 ] 

ASF subversion and git services commented on LUCENE-5197:
-

Commit 1521267 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1521267 ]

LUCENE-5197: Added SegmentReader.ramBytesUsed

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch, LUCENE-5197.patch, 
> LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5222) Sorting on dynamic fields using DocValues sorts empty values always first

2013-09-09 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned SOLR-5222:
--

Assignee: Hoss Man

> Sorting on dynamic fields using DocValues sorts empty values always first
> -
>
> Key: SOLR-5222
> URL: https://issues.apache.org/jira/browse/SOLR-5222
> Project: Solr
>  Issue Type: Bug
>Reporter: Pascal Chollet
>Assignee: Hoss Man
>Priority: Minor
>
> When using DocValues for sort fields, "sortMissingLast=true" seems not to 
> work - which makes sense as DocValues require a value for every document. The 
> workaround is to use a default value which is alphanumericly sorted last. But 
> when specifying the sort field as a dynamic field, the default value is not 
> applied when a document does not contain that field.
> To make it work, I had to define every single sort field explicitly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5225) Support the setting of key/values on Collections API RELOAD

2013-09-09 Thread Tim Vaillancourt (JIRA)
Tim Vaillancourt created SOLR-5225:
--

 Summary: Support the setting of key/values on Collections API 
RELOAD
 Key: SOLR-5225
 URL: https://issues.apache.org/jira/browse/SOLR-5225
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Tim Vaillancourt


I'd like to propose the support of being able to set "collection.=" 
on Collections API 'RELOAD' as well as CREATE (which is currently supported).

A user without this ability needs to edit their key/values through a different 
method (which feels inconsistent), if they wanted to change them post 
Collections-API-CREATE. There are some dangers introduced, however.

Here is the current description of this functionality on CREATE:

"collection.= - causes a property of = to be set if 
a new collection is being created."

@http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3247) LBHttpSolrServer constructor ignores passed in ResponseParser

2013-09-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-3247.
---

Resolution: Duplicate

> LBHttpSolrServer constructor ignores passed in ResponseParser
> -
>
> Key: SOLR-3247
> URL: https://issues.apache.org/jira/browse/SOLR-3247
> Project: Solr
>  Issue Type: Bug
>Reporter: Grant Ingersoll
>Priority: Minor
>
> The constructor on line 191 accepts a ResponseParser object, but it ignores 
> it.  We should either drop that constructor or honor setting it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5222) Sorting on dynamic fields using DocValues sorts empty values always first

2013-09-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762205#comment-13762205
 ] 

Robert Muir commented on SOLR-5222:
---

Seems like the real bug is unrelated to docvalues: if you try to supply a 
default value for a dynamic field you should get an exception.

> Sorting on dynamic fields using DocValues sorts empty values always first
> -
>
> Key: SOLR-5222
> URL: https://issues.apache.org/jira/browse/SOLR-5222
> Project: Solr
>  Issue Type: Bug
>Reporter: Pascal Chollet
>Assignee: Hoss Man
>Priority: Minor
>
> When using DocValues for sort fields, "sortMissingLast=true" seems not to 
> work - which makes sense as DocValues require a value for every document. The 
> workaround is to use a default value which is alphanumericly sorted last. But 
> when specifying the sort field as a dynamic field, the default value is not 
> applied when a document does not contain that field.
> To make it work, I had to define every single sort field explicitly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-09 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5197.
-

   Resolution: Fixed
Fix Version/s: 4.5
   5.0

Thanks Areek!

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch, LUCENE-5197.patch, 
> LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762249#comment-13762249
 ] 

ASF subversion and git services commented on LUCENE-5197:
-

Commit 1521284 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1521284 ]

LUCENE-5197: Added SegmentReader.ramBytesUsed

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch, LUCENE-5197.patch, 
> LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5226) Add Lucene index heap size usage in the Solr admin UI

2013-09-09 Thread Areek Zillur (JIRA)
Areek Zillur created SOLR-5226:
--

 Summary: Add Lucene index heap size usage in the Solr admin UI
 Key: SOLR-5226
 URL: https://issues.apache.org/jira/browse/SOLR-5226
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Areek Zillur


Recently a method was implemented in Lucene to estimate the index heap usage
(https://issues.apache.org/jira/browse/LUCENE-5197). It would be very helpful 
to display this information in the admin UI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2548) Multithreaded faceting

2013-09-09 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2548:
---

Attachment: SOLR-2548_multithreaded_faceting,_dsmiley.patch

This issue just got on my radar; I like working on threading problems.

I commend the progress made but I think it can be improved:
# I think it's counter-intuitive that if a user supplies facet.threads=2 then 3 
cpu cores will actually be used (assuming >2 fields to facet on)
# Only the first facet.threads worth of facets are actually done concurrently; 
the rest are done serially.
# Even if the previous problem was solved, the use of the main calling thread 
to compute facets (beyond facet.threads) means that if by bad luck the main 
thread is computing the most intensive facets to compute, the other threads 
will sit idle once they are done when it would be better to have remaining work 
queued up.
# in the event of an exception in one worker; the rest should be cancelled
# ExecutionException is a wrapping exception; you should unwrap it and wrap 
SolrException on its contents, not the ExecutionException itself.

The attached patch fixes all these problems, keeps it no more complex and 
perhaps simpler (IMO), and without increasing the lines-of-code count.

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
> SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762321#comment-13762321
 ] 

Robert Muir commented on SOLR-2548:
---

Its a bad idea to call Future.cancel here.

If any of the faceting methods are blocked on IO (e.g. docvalues faceting), 
this will close file descriptors with NIO/MMAP directory implementations: see 
the documentation in org.apache.lucene.store for more information.

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
> SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



TermsEnum and Trie Int

2013-09-09 Thread Joel Bernstein
Hi,

I wrote some code that iterates through an int field terms using a
TermsEnum and it works great.

Tried the same code on a Trie Int, precision step 8 field and the code no
longer works properly, because of the precision step terms in the index.

How can I distinguish between the original term and the precision step
terms while iterating through the TermsEnum?

Thanks,
Joel


Re: TermsEnum and Trie Int

2013-09-09 Thread Yonik Seeley
UnInvertedField does this...
final String prefix =
TrieField.getMainValuePrefix(searcher.getSchema().getFieldType(field));

-Yonik
http://lucidworks.com


On Mon, Sep 9, 2013 at 5:35 PM, Joel Bernstein  wrote:
> Hi,
>
> I wrote some code that iterates through an int field terms using a TermsEnum
> and it works great.
>
> Tried the same code on a Trie Int, precision step 8 field and the code no
> longer works properly, because of the precision step terms in the index.
>
> How can I distinguish between the original term and the precision step terms
> while iterating through the TermsEnum?
>
> Thanks,
> Joel
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2646) Integrate Solr benchmarking support into the Benchmark module

2013-09-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2646:
--

Attachment: SOLR-2646.patch

I've attached a patch that is updated to trunk.

> Integrate Solr benchmarking support into the Benchmark module
> -
>
> Key: SOLR-2646
> URL: https://issues.apache.org/jira/browse/SOLR-2646
> Project: Solr
>  Issue Type: New Feature
>Reporter: Mark Miller
> Attachments: chart.jpg, Dev-SolrBenchmarkModule.pdf, SOLR-2646.patch, 
> SOLR-2646.patch, SOLR-2646.patch, SOLR-2646.patch, SOLR-2646.patch, 
> SOLR-2646.patch, SolrIndexingPerfHistory.pdf
>
>
> As part of my buzzwords Solr pef talk, I did some work to allow some Solr 
> benchmarking with the benchmark module.
> I'll attach a patch with the current work I've done soon - there is still a 
> fair amount to clean up and fix - a couple hacks or three - but it's already 
> fairly useful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: TermsEnum and Trie Int

2013-09-09 Thread Joel Bernstein
thanks!


On Mon, Sep 9, 2013 at 6:17 PM, Yonik Seeley  wrote:

> UnInvertedField does this...
> final String prefix =
> TrieField.getMainValuePrefix(searcher.getSchema().getFieldType(field));
>
> -Yonik
> http://lucidworks.com
>
>
> On Mon, Sep 9, 2013 at 5:35 PM, Joel Bernstein  wrote:
> > Hi,
> >
> > I wrote some code that iterates through an int field terms using a
> TermsEnum
> > and it works great.
> >
> > Tried the same code on a Trie Int, precision step 8 field and the code no
> > longer works properly, because of the precision step terms in the index.
> >
> > How can I distinguish between the original term and the precision step
> terms
> > while iterating through the TermsEnum?
> >
> > Thanks,
> > Joel
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
Joel Bernstein
Professional Services LucidWorks


[jira] [Commented] (SOLR-5222) Sorting on dynamic fields using DocValues sorts empty values always first

2013-09-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762395#comment-13762395
 ] 

ASF subversion and git services commented on SOLR-5222:
---

Commit 1521304 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1521304 ]

SOLR-5222: test proving that dynamicField's using docValues work as expected 
with missing values

> Sorting on dynamic fields using DocValues sorts empty values always first
> -
>
> Key: SOLR-5222
> URL: https://issues.apache.org/jira/browse/SOLR-5222
> Project: Solr
>  Issue Type: Bug
>Reporter: Pascal Chollet
>Assignee: Hoss Man
>Priority: Minor
>
> When using DocValues for sort fields, "sortMissingLast=true" seems not to 
> work - which makes sense as DocValues require a value for every document. The 
> workaround is to use a default value which is alphanumericly sorted last. But 
> when specifying the sort field as a dynamic field, the default value is not 
> applied when a document does not contain that field.
> To make it work, I had to define every single sort field explicitly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5227) attempting to configured a defaultValue on a dynamicField should fail.

2013-09-09 Thread Hoss Man (JIRA)
Hoss Man created SOLR-5227:
--

 Summary: attempting to configured a defaultValue on a dynamicField 
should fail.
 Key: SOLR-5227
 URL: https://issues.apache.org/jira/browse/SOLR-5227
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man


In SOLR-5222 Pascal noted that he did not get the behavior expected when using 
sortMissingLast with a dynamicField using docValues in Solr < 4.5 -- but up to 
Solr 4.4, docValues required a default value, so he should have gotten a hard 
error as soon as he tried specifying a default value on a dynamicField.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5222) Sorting on dynamic fields using DocValues sorts empty values always first

2013-09-09 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-5222.


   Resolution: Fixed
Fix Version/s: 5.0
   4.5

I've committed my test refactoring/additions for DocValuesMissingTest to also 
cover the dynamic field variations...

Committed revision 1521304.
Committed revision 1521307.

bq. Seems like the real bug is unrelated to docvalues

Agreed -- my point was just that up to Solr 4.4, it should have been impossible 
to configure dynamicFields with docValues, because of the mutually exclusive 
requirements.  I've opened SOLR-5227 to fix the dynamicField+defaultValue error 
checking problem.



> Sorting on dynamic fields using DocValues sorts empty values always first
> -
>
> Key: SOLR-5222
> URL: https://issues.apache.org/jira/browse/SOLR-5222
> Project: Solr
>  Issue Type: Bug
>Reporter: Pascal Chollet
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 4.5, 5.0
>
>
> When using DocValues for sort fields, "sortMissingLast=true" seems not to 
> work - which makes sense as DocValues require a value for every document. The 
> workaround is to use a default value which is alphanumericly sorted last. But 
> when specifying the sort field as a dynamic field, the default value is not 
> applied when a document does not contain that field.
> To make it work, I had to define every single sort field explicitly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5222) Sorting on dynamic fields using DocValues sorts empty values always first

2013-09-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762400#comment-13762400
 ] 

ASF subversion and git services commented on SOLR-5222:
---

Commit 1521307 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1521307 ]

SOLR-5222: test proving that dynamicField's using docValues work as expected 
with missing values (merge r1521304)

> Sorting on dynamic fields using DocValues sorts empty values always first
> -
>
> Key: SOLR-5222
> URL: https://issues.apache.org/jira/browse/SOLR-5222
> Project: Solr
>  Issue Type: Bug
>Reporter: Pascal Chollet
>Assignee: Hoss Man
>Priority: Minor
>
> When using DocValues for sort fields, "sortMissingLast=true" seems not to 
> work - which makes sense as DocValues require a value for every document. The 
> workaround is to use a default value which is alphanumericly sorted last. But 
> when specifying the sort field as a dynamic field, the default value is not 
> applied when a document does not contain that field.
> To make it work, I had to define every single sort field explicitly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5228) Don't require or be inside of -- or that be inside of

2013-09-09 Thread Hoss Man (JIRA)
Hoss Man created SOLR-5228:
--

 Summary: Don't require  or  be inside of 
 -- or that  be inside of 
 Key: SOLR-5228
 URL: https://issues.apache.org/jira/browse/SOLR-5228
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Hoss Man
Assignee: Hoss Man


On the solr-user mailing list, Nutan recently mentioned spending days trying to 
track down a problem that turned out to be because he had attempted to add a 
{{}} that was outside of the {{}} block in his 
schema.xml -- Solr was just silently ignoring it.

We have made improvements in other areas of config validation by generating 
statup errors when tags/attributes are found that are not expected -- but in 
this case i think we should just stop expecting/requiring that the {{}} 
and {{}} tags will be used to group these sorts of things.  I think 
schema.xml parsing should just start ignoring them and only care about finding 
the {{}}, {{}}, and {{}} tags wherever they may 
be.

If people want to keep using them, fine.  If people want to mix fieldTypes and 
fields side by side (perhaps specify a fieldType, then list all the fields 
using it) fine.  I don't see any value in forcing people to use them, but we 
definitely shouldn't leave things the way they are with otherwise perfectly 
valid field/type declarations being silently ignored.

---

I'll take this on unless i see any objections.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5227) attempting to configured a dynamicField as required, or using a default value, should fail.

2013-09-09 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5227:
---

Summary: attempting to configured a dynamicField as required, or using a 
default value, should fail.  (was: attempting to configured a defaultValue on a 
dynamicField should fail.)

looking at the code, i realize we have the same potential problem if someone 
tries to make a dynamicField "required"

> attempting to configured a dynamicField as required, or using a default 
> value, should fail.
> ---
>
> Key: SOLR-5227
> URL: https://issues.apache.org/jira/browse/SOLR-5227
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Hoss Man
>
> In SOLR-5222 Pascal noted that he did not get the behavior expected when 
> using sortMissingLast with a dynamicField using docValues in Solr < 4.5 -- 
> but up to Solr 4.4, docValues required a default value, so he should have 
> gotten a hard error as soon as he tried specifying a default value on a 
> dynamicField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-09 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762425#comment-13762425
 ] 

Erick Erickson commented on SOLR-2548:
--

[~dsmiley]

See below...

I'm not seeing points 1-3. I think you might be missing the distinction between 
adding fields to the pending queue and actually doing the faceting:

(1) I don't think so. If facet.threads == 2, the third time around the counter 
is -1 so the field gets added to the pending structure, it's not executed on at 
all until one of the other threads completes.

(2) I'm not seeing it. Every time a task completes, another is started from the 
pending list. The main thread is just sitting around waiting for the child 
threads to complete. Mostly this is for my edification, I have no objection to 
the semaphore approach. In fact it's a little cleaner, the second "for (String 
f : facetFs) {" loop is somewhat loosely coupled.

(3) Not quite sure about this either. I don't see where the main thread is used 
to compute any facets. Well, except in the intentionally serial case when the 
directExecutor is used and the old behavior is desired. Items are just added to 
the pending queue once you exceed facet.threads. That queue is consumed to 
submit other tasks to new threads via 
"completionService.submit(pending.removeFirst());" in the second loop. The main 
thread never computes facets. Or I'm just blind to it.

(4) That makes sense, although I'll defer to Robert.

(5) OK. I did have some trouble in the tests though, some of them were 
expecting 400 response code and the SERVER_ERROR is 500 as I remember so don't 
be surprised if there's an issue there when you run the full test suite if you 
haven't already. I made some effort to give back the same errors as the tests 
expected which may account for some of the weirdness you saw in the exception 
handling.

You'll notice I punted on Adrien's comment "Is there any reason why you didn't 
make facet queries and facet ranges multi-threaded"... feel free ;).

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
> SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5228) Don't require or be inside of -- or that be inside of

2013-09-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762455#comment-13762455
 ] 

Robert Muir commented on SOLR-5228:
---

I think its annoying the fields and fieldTypes have to be in separate sections 
too. This makes it hard for you to logically arrange things in such a way that 
its readable without lots of scrolling up and down and getting lost.

Can we just go the simple route of deprecating 'fields' and 'types' in 4.x 
(throw error in 5.x), and in 4.x also allow field/fieldtypes to be "top-level" 
in the schema.

I think this is ultimately simpler than just willy-nilly allowing shit to be 
nested underneath anywhere: thats hard to maintain: and it still allows people 
who want to group types/fields together to do that, and those that want to put 
them side-by-side to do that too.

> Don't require  or  be inside of  -- or that 
>  be inside of 
> -
>
> Key: SOLR-5228
> URL: https://issues.apache.org/jira/browse/SOLR-5228
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Hoss Man
>Assignee: Hoss Man
>
> On the solr-user mailing list, Nutan recently mentioned spending days trying 
> to track down a problem that turned out to be because he had attempted to add 
> a {{}} that was outside of the {{}} block in his 
> schema.xml -- Solr was just silently ignoring it.
> We have made improvements in other areas of config validation by generating 
> statup errors when tags/attributes are found that are not expected -- but in 
> this case i think we should just stop expecting/requiring that the 
> {{}} and {{}} tags will be used to group these sorts of 
> things.  I think schema.xml parsing should just start ignoring them and only 
> care about finding the {{}}, {{}}, and {{}} 
> tags wherever they may be.
> If people want to keep using them, fine.  If people want to mix fieldTypes 
> and fields side by side (perhaps specify a fieldType, then list all the 
> fields using it) fine.  I don't see any value in forcing people to use them, 
> but we definitely shouldn't leave things the way they are with otherwise 
> perfectly valid field/type declarations being silently ignored.
> ---
> I'll take this on unless i see any objections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5228) Don't require or be inside of -- or that be inside of

2013-09-09 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762497#comment-13762497
 ] 

Erick Erickson commented on SOLR-5228:
--

not very far below where fieldType and fields are parsed out with a path that 
includes  or  there's this bit for copyField:

  expression = "//" + COPY_FIELD;
  nodes = (NodeList) xpath.evaluate(expression, document, 
XPathConstants.NODESET);

  for (int i=0; i tags inside the 
 tag and it worked which surprised me at the time 

Seems like the model we could use, we wouldn't even need to formally deprecate 
the  or  tags, just comment that they were no longer necessary.

FWIW

> Don't require  or  be inside of  -- or that 
>  be inside of 
> -
>
> Key: SOLR-5228
> URL: https://issues.apache.org/jira/browse/SOLR-5228
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Hoss Man
>Assignee: Hoss Man
>
> On the solr-user mailing list, Nutan recently mentioned spending days trying 
> to track down a problem that turned out to be because he had attempted to add 
> a {{}} that was outside of the {{}} block in his 
> schema.xml -- Solr was just silently ignoring it.
> We have made improvements in other areas of config validation by generating 
> statup errors when tags/attributes are found that are not expected -- but in 
> this case i think we should just stop expecting/requiring that the 
> {{}} and {{}} tags will be used to group these sorts of 
> things.  I think schema.xml parsing should just start ignoring them and only 
> care about finding the {{}}, {{}}, and {{}} 
> tags wherever they may be.
> If people want to keep using them, fine.  If people want to mix fieldTypes 
> and fields side by side (perhaps specify a fieldType, then list all the 
> fields using it) fine.  I don't see any value in forcing people to use them, 
> but we definitely shouldn't leave things the way they are with otherwise 
> perfectly valid field/type declarations being silently ignored.
> ---
> I'll take this on unless i see any objections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5228) Don't require or be inside of -- or that be inside of

2013-09-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762508#comment-13762508
 ] 

Robert Muir commented on SOLR-5228:
---

thats the willy-nilly approach I mentioned: I dont like it.

if we are gonna do that, no point in using xml at all, we get no value from it, 
only horrors.

The problem here is not field/dynamicField elements and "where they can be", 
the problem is the fieldType/types elements: they are useless and bring no 
value. Lets get rid of them.

> Don't require  or  be inside of  -- or that 
>  be inside of 
> -
>
> Key: SOLR-5228
> URL: https://issues.apache.org/jira/browse/SOLR-5228
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Hoss Man
>Assignee: Hoss Man
>
> On the solr-user mailing list, Nutan recently mentioned spending days trying 
> to track down a problem that turned out to be because he had attempted to add 
> a {{}} that was outside of the {{}} block in his 
> schema.xml -- Solr was just silently ignoring it.
> We have made improvements in other areas of config validation by generating 
> statup errors when tags/attributes are found that are not expected -- but in 
> this case i think we should just stop expecting/requiring that the 
> {{}} and {{}} tags will be used to group these sorts of 
> things.  I think schema.xml parsing should just start ignoring them and only 
> care about finding the {{}}, {{}}, and {{}} 
> tags wherever they may be.
> If people want to keep using them, fine.  If people want to mix fieldTypes 
> and fields side by side (perhaps specify a fieldType, then list all the 
> fields using it) fine.  I don't see any value in forcing people to use them, 
> but we definitely shouldn't leave things the way they are with otherwise 
> perfectly valid field/type declarations being silently ignored.
> ---
> I'll take this on unless i see any objections.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-09 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762537#comment-13762537
 ] 

Yonik Seeley commented on SOLR-2548:


bq. Only the first facet.threads worth of facets are actually done 
concurrently; the rest are done serially.

I remember that being my initial reaction too - but then when you think a 
little about it, you realize that it's not the case.

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
> SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-09 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762543#comment-13762543
 ] 

Yonik Seeley commented on SOLR-2548:


bq. in the event of an exception in one worker; the rest should be cancelled

In addition to Robert's comment that points out why we never want to use cancel 
on anything that does IO, we shouldn't add complexity trying to optimize an 
error case.

bq. ExecutionException is a wrapping exception; you should unwrap it and wrap 
SolrException on its contents, not the ExecutionException itself.

We should definitely strive to make the multi-threading as transparent as 
possible (i.e. exceptions should be as close as possible to the non-threaded 
case).

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
> SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5227) attempting to configured a dynamicField as required, or using a default value, should fail.

2013-09-09 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5227:
---

Attachment: SOLR-5227.patch

patch with tests.

I'll include some details in the upgrading section of CHANGES.txt when 
committing.

> attempting to configured a dynamicField as required, or using a default 
> value, should fail.
> ---
>
> Key: SOLR-5227
> URL: https://issues.apache.org/jira/browse/SOLR-5227
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Hoss Man
> Attachments: SOLR-5227.patch
>
>
> In SOLR-5222 Pascal noted that he did not get the behavior expected when 
> using sortMissingLast with a dynamicField using docValues in Solr < 4.5 -- 
> but up to Solr 4.4, docValues required a default value, so he should have 
> gotten a hard error as soon as he tried specifying a default value on a 
> dynamicField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762630#comment-13762630
 ] 

Robert Muir commented on SOLR-2548:
---

Just as a (likely controversial) suggestion in general here, its hard to "see" 
the single-threaded case (which is the most common case).

I think its a little too sneaky here and would be actually a lot easier 
long-term if the single-threaded case was explicitly separate from the 
multi-threaded one.

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
> SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-09 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762654#comment-13762654
 ] 

Erick Erickson commented on SOLR-2548:
--

bq: Just as a (likely controversial) suggestion in general here, its hard to 
"see" the single-threaded case (which is the most common case).

No, not controversial at all. I had to look at that pretty hard to see that it 
was a single-threaded case, I tried to add a comment, mostly so I wouldn't have 
to try to figure it out again next time I was in that code ;)

I'm all in favor of a little more verbosity here, just didn't do it...

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
> SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Can we use TREC data set in open source?

2013-09-09 Thread Han Jiang
Back in 2007 Grant contacted with NIST about making TREC collection
available to our community:

http://mail-archives.apache.org/mod_mbox/lucene-dev/200708.mbox/browser

I think a try for this is really important to our project and people who
use Lucene. All these years the speed performance is mainly tuned on
Wikipedia, however it's not very 'standard':

* it doesn't represent how real-world search works;
* it cannot be used to evaluate the relevance of our scoring models;
* researchers tend to do experiments on other data sets, and usually it is
  hard to know whether Lucene performs its best performance;

And personally I agree with this line:

> I think it would encourage Lucene users/developers to think about
> relevance as much as we think about speed.

There's been much work to make Lucene's scoring models pluggable in 4.0,
and it'll be great if we can explore more about it. It is very appealing to
see a high-performance library work along with state-of-the-art ranking
methods.


And about TREC data set, the problems we met are:

1. NIST/TREC does not own the original collections, therefore it might be
   necessary to have direct contact with those organizations who really did,
   such as:

   http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html
   http://lemurproject.org/clueweb12/

2. Currently, there is no open-source license for any of the data sets, so
   it won't be as 'open' as Wikipedia is.

   As is proposed by Grant, a possibility is to make the data set accessible
   only to committers instead of all users. It is not very open-source then,
   but TREC data sets is public and usually available to researchers, so
   people can still reproduce performance test.

I'm quite curious, has anyone explored getting an open-source license for
one of those data sets? And is our community still interested about this
issue after all these years?



-- 
Han Jiang

Team of Search Engine and Web Mining,
School of Electronic Engineering and Computer Science,
Peking University, China


Building a codec with terms of custom Comparator

2013-09-09 Thread John Wang
Hi guys:

   In the codec api, it seems you can set term order via an arbitrary
Comparator.

   I tried to use this to create a term dictionary of an order dictated by
my own Comparator.

   The problem arises when building the FST. Specifically
BlockTreeTermsWriter.finishTerm() <- here the ordering decided earlier by
the Comparator (from codec) is lost, and errors due to out of
order.

   Any ideas on how to fix this?

Thanks

-John


Re: Building a codec with terms of custom Comparator

2013-09-09 Thread Robert Muir
You can implement your own term dictionary with a different order: but
BlockTreeTermsReader doesnt support this (its terms must be in binary
order)

On Mon, Sep 9, 2013 at 8:27 PM, John Wang  wrote:
> Hi guys:
>
>In the codec api, it seems you can set term order via an arbitrary
> Comparator.
>
>I tried to use this to create a term dictionary of an order dictated by
> my own Comparator.
>
>The problem arises when building the FST. Specifically
> BlockTreeTermsWriter.finishTerm() <- here the ordering decided earlier by
> the Comparator (from codec) is lost, and errors due to out of
> order.
>
>Any ideas on how to fix this?
>
> Thanks
>
> -John

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Can we use TREC data set in open source?

2013-09-09 Thread Shai Erera
I read here http://lemurproject.org/clueweb09/ that there is a hosted
version of ClueWeb09 (the latest is ClueWeb12, for which I don't find a
hosted version), and to get access to it, someone from the ASF will need to
sign an Organizational Agreement with them as well as each individual in
the project will need to sign an Individual Agreement (retained by the
ASF). Perhaps this can be available only to committers.

Though, we need to get access to ClueWeb12 if we want to publish Lucene
results on the latest data set. TREC papers are already based on that
version.

But if we just want to measure performance, relevancy etc., ClueWeb09 could
be a good start.

Shai

On Tue, Sep 10, 2013 at 5:53 AM, Han Jiang  wrote:

> Back in 2007 Grant contacted with NIST about making TREC collection
> available to our community:
>
> http://mail-archives.apache.org/mod_mbox/lucene-dev/200708.mbox/browser
>
> I think a try for this is really important to our project and people who
> use Lucene. All these years the speed performance is mainly tuned on
> Wikipedia, however it's not very 'standard':
>
> * it doesn't represent how real-world search works;
> * it cannot be used to evaluate the relevance of our scoring models;
> * researchers tend to do experiments on other data sets, and usually it is
>   hard to know whether Lucene performs its best performance;
>
> And personally I agree with this line:
>
> > I think it would encourage Lucene users/developers to think about
> > relevance as much as we think about speed.
>
> There's been much work to make Lucene's scoring models pluggable in 4.0,
> and it'll be great if we can explore more about it. It is very appealing
> to
> see a high-performance library work along with state-of-the-art ranking
> methods.
>
>
> And about TREC data set, the problems we met are:
>
> 1. NIST/TREC does not own the original collections, therefore it might be
>necessary to have direct contact with those organizations who really
> did,
>such as:
>
>http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html
>http://lemurproject.org/clueweb12/
>
> 2. Currently, there is no open-source license for any of the data sets, so
>it won't be as 'open' as Wikipedia is.
>
>As is proposed by Grant, a possibility is to make the data set
> accessible
>only to committers instead of all users. It is not very open-source
> then,
>but TREC data sets is public and usually available to researchers, so
>people can still reproduce performance test.
>
> I'm quite curious, has anyone explored getting an open-source license for
> one of those data sets? And is our community still interested about this
> issue after all these years?
>
>
>
> --
> Han Jiang
>
> Team of Search Engine and Web Mining,
> School of Electronic Engineering and Computer Science,
> Peking University, China
>


[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-09 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762750#comment-13762750
 ] 

David Smiley commented on SOLR-2548:


bq. If any of the faceting methods are blocked on IO (e.g. docvalues faceting), 
this will close file descriptors with NIO/MMAP directory implementations: see 
the documentation in org.apache.lucene.store for more information.

Ok; I'll look into that later.

{quote}
> Only the first facet.threads worth of facets are actually done concurrently; 
> the rest are done serially.

I remember that being my initial reaction too - but then when you think a 
little about it, you realize that it's not the case.
{quote}

Aha; now I see it!  This is confusing code -- adding to the 
completionService/executor in two different loops; and the 2nd loop is 
particularly un-obvious to me.

bq. Just as a (likely controversial) suggestion in general here, its hard to 
"see" the single-threaded case (which is the most common case).

+0 not controversial to me

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
> SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-09 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762775#comment-13762775
 ] 

David Smiley commented on SOLR-2548:


BTW sorry for raising all these supposed shortfalls when the more serious ones 
have turned out to be invalid.  I guess it just underscores what we all know -- 
multithreaded code is confusing.  All the more reason to try to document it 
better and/or to try to code it clearly.

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
> SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org