date:20150504


[ 
https://issues.apache.org/jira/browse/SOLR-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526501#comment-14526501
 ] 

Markus Jelsma commented on SOLR-7435:
-

Hi [~joel.bernstein], can you try the following unit test?

{code}
  @Test
  public void testSOLR7435() throws Exception {
for (int i = 0; i  15000; i++) {
  String[] doc = {id, String.valueOf(i) , a_i, 
String.valueOf(random().nextInt(1)), b_i, 
String.valueOf(random().nextInt(1))};
  assertU(adoc(doc));
}

assertU(commit());

ModifiableSolrParams params = new ModifiableSolrParams();
params.add(q, *:*);
params.add(fq, {!collapse field=a_i});
params.add(fq, {!collapse field=b_i});
assertQ(req(params, indent, on), *[count(//doc)=0]);
  }
{code}

It fails on my machine using: ant test  -Dtestcase=TestCollapseQParserPlugin 
-Dtests.method=testSOLR7435 -Dtests.seed=2B7D48BE88DE05E7 -Dtests.slow=true 
-Dtests.locale=en_ZA -Dtests.timezone=America/Araguaina -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII


 NPE in FieldCollapsingQParser
 -

 Key: SOLR-7435
 URL: https://issues.apache.org/jira/browse/SOLR-7435
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.1
Reporter: Markus Jelsma
Priority: Minor
 Fix For: 5.2


 Not even sure it would work anyway, i tried to collapse on two distinct 
 fields, ending up with this:
 select?q=*:*fq={!collapse field=qst}fq={!collapse field=rdst}
 {code}
 584550 [qtp1121454968-20] ERROR org.apache.solr.servlet.SolrDispatchFilter  [ 
   suggests] – null:java.lang.NullPointerException
 at 
 org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743)
 at 
 org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:780)
 at 
 org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:203)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1660)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1479)
 at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:556)
 at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at

[jira] [Updated] (SOLR-7436) Solr stops printing stacktraces in log and output


 [ 
https://issues.apache.org/jira/browse/SOLR-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated SOLR-7436:

Attachment: solr-8983-console.log

 Solr stops printing stacktraces in log and output
 -

 Key: SOLR-7436
 URL: https://issues.apache.org/jira/browse/SOLR-7436
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.1
 Environment: Local 5.1
Reporter: Markus Jelsma
 Attachments: solr-8983-console.log


 After a short while, Solr suddenly stops printing stacktraces in the log and 
 output. 
 {code}
 251043 [qtp1121454968-17] INFO  org.apache.solr.core.SolrCore.Request  [   
 suggests] - [suggests] webapp=/solr path=/select 
 params={q=*:*fq={!collapse+field%3Dquery_digest}fq={!collapse+field%3Dresult_digest}}
  status=500 QTime=3 
 251043 [qtp1121454968-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  [ 
   suggests] - null:java.lang.NullPointerException
 at 
 org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743)
 at 
 org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:780)
 at 
 org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:203)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1660)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1479)
 at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:556)
 at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:745)
 251184 [qtp1121454968-17] ERROR org.apache.solr.core.SolrCore  [   suggests] 
 - java.lang.NullPointerException
 at 
 org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743)

[jira] [Commented] (SOLR-7436) Solr stops printing stacktraces in log and output

2015-05-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526504#comment-14526504
 ] 

Markus Jelsma commented on SOLR-7436:
-

Hello, this is a local Solr 5.1 running on:

java version 1.7.0_79
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.10.2)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

See attached log. I fire a query that produces a NPE (see SOLR-7435). I repeat 
it a couple of times and then the stack trace is gone.

 Solr stops printing stacktraces in log and output
 -

 Key: SOLR-7436
 URL: https://issues.apache.org/jira/browse/SOLR-7436
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.1
 Environment: Local 5.1
Reporter: Markus Jelsma
 Attachments: solr-8983-console.log


 After a short while, Solr suddenly stops printing stacktraces in the log and 
 output. 
 {code}
 251043 [qtp1121454968-17] INFO  org.apache.solr.core.SolrCore.Request  [   
 suggests] - [suggests] webapp=/solr path=/select 
 params={q=*:*fq={!collapse+field%3Dquery_digest}fq={!collapse+field%3Dresult_digest}}
  status=500 QTime=3 
 251043 [qtp1121454968-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  [ 
   suggests] - null:java.lang.NullPointerException
 at 
 org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743)
 at 
 org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:780)
 at 
 org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:203)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1660)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1479)
 at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:556)
 at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at

Re: Running 5.1.0 test-suite via maven

2015-05-04 Thread Dawid Weiss

[junit4] ERROR   3.81s J2 | TestDirectoryTaxonomyWriter.testConcurrency
 
[junit4] Throwable #1: java.lang.NoSuchMethodError:
 java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;

Sorry about the delay. This indicates your code was compiled with
JDK1.8 but is executed with Java  1.8. This method's signature used
to be an interface, but is a covariant pointing at a specialized
subclass in 1.8.

You need to compile the code with the version of Java you intend to
run with. Things will in general work if you compile with an older
version and try to run with a newer version but not the other way
around.

You can cross-compile with javac from a newer version of the JDK to an
older version but you'd have to specify bootclasspath to the older
version anyway (bytecode/source flag in javac is not enough) so
there's really no sensible reason to do it in the first place.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6196) Include geo3d package, along with Lucene integration to make it useful


[ 
https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526613#comment-14526613
 ] 

ASF subversion and git services commented on LUCENE-6196:
-

Commit 1677595 from [~dsmiley] in branch 'dev/branches/lucene6196'
[ https://svn.apache.org/r1677595 ]

LUCENE-6196: Reformat code.  Removed System.err  legacy comments in test. 
Fixed test compile warning.

 Include geo3d package, along with Lucene integration to make it useful
 --

 Key: LUCENE-6196
 URL: https://issues.apache.org/jira/browse/LUCENE-6196
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: Karl Wright
Assignee: David Smiley
 Attachments: LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, 
 LUCENE-6196_Geo3d.patch, ShapeImpl.java, geo3d-tests.zip, geo3d.zip


 I would like to explore contributing a geo3d package to Lucene.  This can be 
 used in conjunction with Lucene search, both for generating geohashes (via 
 spatial4j) for complex geographic shapes, as well as limiting results 
 resulting from those queries to those results within the exact shape in 
 highly performant ways.
 The package uses 3d planar geometry to do its magic, which basically limits 
 computation necessary to determine membership (once a shape has been 
 initialized, of course) to only multiplications and additions, which makes it 
 feasible to construct a performant BoostSource-based filter for geographic 
 shapes.  The math is somewhat more involved when generating geohashes, but is 
 still more than fast enough to do a good job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Where Search Meets Machine Learning

2015-05-04 Thread Doug Turnbull

Awesome, I think I could learn a lot from you.

Do you have a decent amount of user data? Sounds like you have a ton.

I noticed that information retrieval problems fall into a sort-of layered
pyramid. At the topmopst point is someone like Google where the sheer
amount of high quality user behavior data that search truly is a machine
learning problem, much as you propose. As you move down the pyramid the
quality of user data diminishes.

Eventually you get to a very thick layer of middle-class search
applications that value relevance, but have very modest amounts or no user
data. For most of them, even if they tracked their searches over a year,
they *might* get good data over their top 50 searches. (I know cause they
send me the spreadsheet and say fix it!). The best they can use analytics
data is after-action troubleshooting. Actual user emails complaining about
the search can be more useful than behavior data!

So at this layer, the goal is to construct inverted indices that reflect
features likely to be important to users. In a sense this becomes more of a
programming task than a large-scale optimization task. You have content
experts that tell you either precisely or vaguely what the search solution
ought to do (presumably they represent users). If you're lucky, this will
be informed by some ad-hoc usability testing.

So you end up doing a mix of data modeling and using queries intelligently.
And perhaps some specific kinds of programming to develop specific scoring
functions, etc.
http://opensourceconnections.com/blog/2014/12/08/title-search-when-relevancy-is-only-skin-deep/

One advantage to this approach is for many search applications, you might
be able to explain how the ranking function function works in terms of a
set of specific rules. This also might allow points where domain experts
can tweak an overall ranking strategy. It becomes somewhat predictable to
them and controllable.

Anyway, I'm forever curious about the boundary line between this sort of
work, and search is truly a machine learning problem work. I have seen a
fair amount of gray area where user data might be decent or possibly
misleading, and you have to sort out and do a lot of data janitor work to
figure it out.

Good stuff!
-Doug

On Fri, May 1, 2015 at 6:16 PM, J. Delgado joaquin.delg...@gmail.com
wrote:

Doug,

Thanks for your insights. We actually started with trying to build off of
features and boosting weights combined with built-in relevance scoring
http://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html.
We also played around with replacing and/or combining the default score
with other computations using function_score
http://www.elastic.co/guide/en/elasticsearch/guide/current/function-score-query.html
query, with
but as you mentioned in your article, the crux of the problem is *how to
figure out the weights that control each features influence*:

*Once important features are placed in the search engine the final
problem becomes balancing and regulating their influence. Should text-based
factors matter more than sales based factors? Should exact text matches
matter more than synonym-based matches? What about metadata we glean from
machine learning – how much weight should this play*?

Furthermore, this only covers cases where the scoring can be represented
as a function of such weights! We felt that this approach was short sighted
as some of the problems we are dealing with (e.g. product recommendations,
response prediction, real-time bidding for advertising, etc) have a very
large feature space, sometimes requiring *dimensionality reduction* (e.g.
Matrix Factorization techniques) or learning from past actions/feedback
(e.g. clickthrough data, bidding win rates, remaining budget, etc.). All
this seemed well suited for for Machine (supervised) Learning tasks such as
prediction based on past training data (classification or regression).
These algorithms usually have an offline model building phase and an online
evaluator phase that uses the created model to perform the
prediction/scoring during query evaluation. Additionally, some of the best
algorithms in machine learning (Random Forest, Support Vector Machines,
Deep Learning/Neural Networks, etc.) are not linear combinations of
feature-weights requiring additional data structure (e.g. trees, support
vectors) to support the computation.

Since there is no one-size-fits all predictive algorithm we architected
the solution so any algorithm that implements our interface can be used. We
tried this out with algorithms available in Weka
http://www.cs.waikato.ac.nz/ml/weka/ and Spark MLib
https://spark.apache.org/docs/1.2.1/mllib-guide.html (only linear
models for now) and it worked! In any case, nothing prevents us from
leverage the text based analysis of features and the default scoring
available within the plugin, which can be combined with the results of the
prediction.

To demonstrate its general utility

Solr website - problem with anchor links

2015-05-04 Thread Shawn Heisey

When I try to use a URL with an anchor link on the Solr website, it
doesn't work right:

https://lucene.apache.org/solr/resources.html#mailing-lists

On both Firefox and Chrome, this URL doesn't quite go to the right spot.
 It would be the right spot if the floating header at the top of of the
page wasn't there.  I'm guessing some CSS trickery is required to get it
to anchor below that floating header.  I did find the following, and
when I have time to digest it, I may be able to try and fix the problem,
but finding that time is the hard part.

http://stackoverflow.com/questions/10732690/offsetting-an-html-anchor-to-adjust-for-fixed-header

If somebody knows exactly how to fix it and has the time, feel free to
take this problem!

Thanks,
Shawn

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud

2015-05-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526684#comment-14526684
 ] 

ASF subversion and git services commented on SOLR-6220:
---

Commit 1677607 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1677607 ]

SOLR-6220: Rule Based Replica Assignment during collection creation

 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a node 
 does not exist with 100GB disk, nodes are picked up the order of size say a 
 85GB node would be picked up over 80GB disk node
 disk:100~
 {noformat}
 Examples:
 {noformat}
 #in each rack there can be max two replicas of A given shard
  rack:*,shard:*,replica:3
 //in each rack there can be max two replicas of ANY replica
  rack:*,shard:**,replica:2
  rack:*,replica:3
  #in each node there should be a max one replica of EACH shard

[jira] [Comment Edited] (SOLR-7435) NPE in FieldCollapsingQParser


[ 
https://issues.apache.org/jira/browse/SOLR-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526501#comment-14526501
 ] 

Markus Jelsma edited comment on SOLR-7435 at 5/4/15 2:58 PM:
-

Hi [~joel.bernstein], can you try the following unit test?

{code}
  @Test
  public void testSOLR7435() throws Exception {
for (int i = 0; i  15000; i++) {
  String[] doc = {id, String.valueOf(i) , a_i, 
String.valueOf(random().nextInt(1)), b_i, 
String.valueOf(random().nextInt(1))};
  assertU(adoc(doc));
}

assertU(commit());

ModifiableSolrParams params = new ModifiableSolrParams();
params.add(q, *:*);
params.add(fq, {!collapse field=a_i});
params.add(fq, {!collapse field=b_i});
assertQ(req(params, indent, on), *[count(//doc)=0]);
  }
{code}

It fails on my machine using: ant test  -Dtestcase=TestCollapseQParserPlugin 
-Dtests.method=testSOLR7435 -Dtests.seed=2B7D48BE88DE05E7 -Dtests.slow=true 
-Dtests.locale=en_ZA -Dtests.timezone=America/Araguaina -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII

edit: hmm, it sometimes failes.


was (Author: markus17):
Hi [~joel.bernstein], can you try the following unit test?

{code}
  @Test
  public void testSOLR7435() throws Exception {
for (int i = 0; i  15000; i++) {
  String[] doc = {id, String.valueOf(i) , a_i, 
String.valueOf(random().nextInt(1)), b_i, 
String.valueOf(random().nextInt(1))};
  assertU(adoc(doc));
}

assertU(commit());

ModifiableSolrParams params = new ModifiableSolrParams();
params.add(q, *:*);
params.add(fq, {!collapse field=a_i});
params.add(fq, {!collapse field=b_i});
assertQ(req(params, indent, on), *[count(//doc)=0]);
  }
{code}

It fails on my machine using: ant test  -Dtestcase=TestCollapseQParserPlugin 
-Dtests.method=testSOLR7435 -Dtests.seed=2B7D48BE88DE05E7 -Dtests.slow=true 
-Dtests.locale=en_ZA -Dtests.timezone=America/Araguaina -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII


 NPE in FieldCollapsingQParser
 -

 Key: SOLR-7435
 URL: https://issues.apache.org/jira/browse/SOLR-7435
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.1
Reporter: Markus Jelsma
Priority: Minor
 Fix For: 5.2


 Not even sure it would work anyway, i tried to collapse on two distinct 
 fields, ending up with this:
 select?q=*:*fq={!collapse field=qst}fq={!collapse field=rdst}
 {code}
 584550 [qtp1121454968-20] ERROR org.apache.solr.servlet.SolrDispatchFilter  [ 
   suggests] – null:java.lang.NullPointerException
 at 
 org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743)
 at 
 org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:780)
 at 
 org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:203)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1660)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1479)
 at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:556)
 at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

[jira] [Updated] (SOLR-7275) Pluggable authorization module in Solr

2015-05-04 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-7275:
---
Attachment: SOLR-7484.patch

Updated patch. This doesn't incorporate the context bit as that depends on 
committing of SOLR-7484 which I plan to commit in a bit.
It changes the public SolrAuthorizationResponse and also how the statusCode set 
for the SolrAuthorizationResponse impacts the processing in SDF.

 Pluggable authorization module in Solr
 --

 Key: SOLR-7275
 URL: https://issues.apache.org/jira/browse/SOLR-7275
 Project: Solr
  Issue Type: Sub-task
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
 SOLR-7484.patch


 Solr needs an interface that makes it easy for different authorization 
 systems to be plugged into it. Here's what I plan on doing:
 Define an interface {{SolrAuthorizationPlugin}} with one single method 
 {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and 
 return an {{SolrAuthorizationResponse}} object. The object as of now would 
 only contain a single boolean value but in the future could contain more 
 information e.g. ACL for document filtering etc.
 The reason why we need a context object is so that the plugin doesn't need to 
 understand Solr's capabilities e.g. how to extract the name of the collection 
 or other information from the incoming request as there are multiple ways to 
 specify the target collection for a request. Similarly request type can be 
 specified by {{qt}} or {{/handler_name}}.
 Flow:
 Request - SolrDispatchFilter - isAuthorized(context) - Process/Return.
 {code}
 public interface SolrAuthorizationPlugin {
   public SolrAuthorizationResponse isAuthorized(SolrRequestContext context);
 }
 {code}
 {code}
 public  class SolrRequestContext {
   UserInfo; // Will contain user context from the authentication layer.
   HTTPRequest request;
   Enum OperationType; // Correlated with user roles.
   String[] CollectionsAccessed;
   String[] FieldsAccessed;
   String Resource;
 }
 {code}
 {code}
 public class SolrAuthorizationResponse {
   boolean authorized;
   public boolean isAuthorized();
 }
 {code}
 User Roles: 
 * Admin
 * Collection Level:
   * Query
   * Update
   * Admin
 Using this framework, an implementation could be written for specific 
 security systems e.g. Apache Ranger or Sentry. It would keep all the security 
 system specific code out of Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526734#comment-14526734
 ] 

ASF subversion and git services commented on SOLR-6220:
---

Commit 1677614 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1677614 ]

SOLR-6220: setting eol style

 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a node 
 does not exist with 100GB disk, nodes are picked up the order of size say a 
 85GB node would be picked up over 80GB disk node
 disk:100~
 {noformat}
 Examples:
 {noformat}
 #in each rack there can be max two replicas of A given shard
  rack:*,shard:*,replica:3
 //in each rack there can be max two replicas of ANY replica
  rack:*,shard:**,replica:2
  rack:*,replica:3
  #in each node there should be a max one replica of EACH shard
  node:*,shard:*,replica:1-
  #in each

[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.7.0) - Build # 2215 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2215/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  org.apache.solr.TestDistributedSearch.test

Error Message:
Error from server at http://127.0.0.1:61465/aq_vcz/jo/collection1: 
java.lang.NullPointerException  at 
org.apache.solr.search.grouping.distributed.responseprocessor.TopGroupsShardResponseProcessor.process(TopGroupsShardResponseProcessor.java:102)
  at 
org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:744)
  at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:727)
  at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:388)
  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2047)  at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:841) 
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:453)
  at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:223)
  at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
  at 
org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:105)
  at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
  at 
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83)  
at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364)  at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
  at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) 
 at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
  at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)  
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
  at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
  at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)  
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)  
at org.eclipse.jetty.server.Server.handle(Server.java:497)  at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)  at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)  at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)  at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
  at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) 
 at java.lang.Thread.run(Thread.java:745) 

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:61465/aq_vcz/jo/collection1: 
java.lang.NullPointerException
at 
org.apache.solr.search.grouping.distributed.responseprocessor.TopGroupsShardResponseProcessor.process(TopGroupsShardResponseProcessor.java:102)
at 
org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:744)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:727)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:388)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2047)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:841)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:453)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:223)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:105)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83)
at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at

[jira] [Updated] (SOLR-6878) solr.ManagedSynonymFilterFactory all-to-all synonym switch (aka. expand)

2015-05-04 Thread Timothy Potter (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-6878:
-
Attachment: SOLR-6878.patch

Here is an updated patch that implements the idea Hossman laid out in his 
comment. Basically, if the client sends in a list instead of a map, the 
expand=true logic is applied as the time of update, i.e. this is syntactic 
sugar for building up the mappings from a list of symmetric synonyms.

There's no need to support a list for expand=false because that is simply a 
mapping of all the terms to the last term in the list, which is already 
supported by the API. Thus, expand=true is implied when the update request 
contains a list and not a map.

 solr.ManagedSynonymFilterFactory all-to-all synonym switch (aka. expand)
 

 Key: SOLR-6878
 URL: https://issues.apache.org/jira/browse/SOLR-6878
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.10.2
Reporter: Tomasz Sulkowski
Assignee: Timothy Potter
  Labels: ManagedSynonymFilterFactory, REST, SOLR
 Attachments: SOLR-6878.patch, SOLR-6878.patch


 Hi,
 After switching from SynonymFilterFactory to ManagedSynonymFilterFactory I 
 have found out that there is no way to set an all-to-all synonyms relation. 
 Basically (judgind from google search) there is a need for expand 
 functionality switch (known from SynonymFilterFactory) which will treat all 
 synonyms with its keyword as equal.
 For example: if we define a car:[wagen,ride] relation it would 
 translate a query that includes one of the synonyms or keyword to car or 
 wagen or ride independently of which word was used from those three.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud

2015-05-04 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526770#comment-14526770
 ] 

Tomás Fernández Löbbe commented on SOLR-6220:
-

Ideally, most warnings should be fixed :) , but at least the one in 
{{SnitchContext}}:

{code:java}
  public SimpleSolrResponse invoke(UpdateShardHandler shardHandler,  final 
String url, String path, SolrParams params)
  throws IOException, SolrServerException {
GenericSolrRequest request = new GenericSolrRequest(SolrRequest.METHOD.GET, 
path, params);
NamedListObject rsp = new HttpSolrClient(url, 
shardHandler.getHttpClient(), new BinaryResponseParser()).request(request);
request.response.nl = rsp;
return request.response;
  }
{code}
Resource leak: 'unassigned Closeable value' is never closed


 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud

2015-05-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526772#comment-14526772
 ] 

Anshum Gupta commented on SOLR-6220:


This seems to have broken {{ant precommit}}.

{code}
[forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses 
default charset]
[forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest 
(RuleEngineTest.java:63)
[forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses 
default charset]
[forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest 
(RuleEngineTest.java:108)
[forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses 
default charset]
[forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest 
(RuleEngineTest.java:185)
{code}

 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a node 
 does not

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526783#comment-14526783
 ] 

ASF subversion and git services commented on SOLR-6220:
---

Commit 1677622 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1677622 ]

SOLR-6220: Fixes forbidden method invocation String#getBytes() in RuleEngineTest

 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a node 
 does not exist with 100GB disk, nodes are picked up the order of size say a 
 85GB node would be picked up over 80GB disk node
 disk:100~
 {noformat}
 Examples:
 {noformat}
 #in each rack there can be max two replicas of A given shard
  rack:*,shard:*,replica:3
 //in each rack there can be max two replicas of ANY replica
  rack:*,shard:**,replica:2
  rack:*,replica:3
  #in each node there should be a max one replica of

[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_60-ea-b12) - Build # 12560 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12560/
Java: 64bit/jdk1.8.0_60-ea-b12 -XX:-UseCompressedOops -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 31493 lines...]
-check-forbidden-all:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.8
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.8
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.4
[forbidden-apis] Reading API signatures: 
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Reading API signatures: 
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/solr.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses 
default charset]
[forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest 
(RuleEngineTest.java:63)
[forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses 
default charset]
[forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest 
(RuleEngineTest.java:108)
[forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses 
default charset]
[forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest 
(RuleEngineTest.java:185)
[forbidden-apis] Scanned 2654 (and 1668 related) class file(s) for forbidden 
API invocations (in 0.93s), 3 error(s).

BUILD FAILED
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:526: The following 
error occurred while executing this line:
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:97: The following 
error occurred while executing this line:
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build.xml:329: The 
following error occurred while executing this line:
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/common-build.xml:494: 
Check for forbidden API calls failed, see log.

Total time: 47 minutes 34 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.8.0_45) - Build # 4766 - Failure!

2015-05-04 Thread ASF subversion and git services (JIRA)

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4766/
Java: 64bit/jdk1.8.0_45 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 31469 lines...]
-check-forbidden-all:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.8
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.8
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.4
[forbidden-apis] Reading API signatures: 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\tools\forbiddenApis\base.txt
[forbidden-apis] Reading API signatures: 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\tools\forbiddenApis\servlet-api.txt
[forbidden-apis] Reading API signatures: 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\tools\forbiddenApis\solr.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses 
default charset]
[forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest 
(RuleEngineTest.java:63)
[forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses 
default charset]
[forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest 
(RuleEngineTest.java:108)
[forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses 
default charset]
[forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest 
(RuleEngineTest.java:185)
[forbidden-apis] Scanned 2654 (and 1668 related) class file(s) for forbidden 
API invocations (in 1.70s), 3 error(s).

BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:526: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:97: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build.xml:329: 
The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:494:
 Check for forbidden API calls failed, see log.

Total time: 68 minutes 32 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526826#comment-14526826
 ] 

ASF subversion and git services commented on SOLR-6220:
---

Commit 1677635 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1677635 ]

SOLR-6220: use closeable in try block

 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a node 
 does not exist with 100GB disk, nodes are picked up the order of size say a 
 85GB node would be picked up over 80GB disk node
 disk:100~
 {noformat}
 Examples:
 {noformat}
 #in each rack there can be max two replicas of A given shard
  rack:*,shard:*,replica:3
 //in each rack there can be max two replicas of ANY replica
  rack:*,shard:**,replica:2
  rack:*,replica:3
  #in each node there should be a max one replica of EACH shard
  node:*,shard:*,replica:1-

[jira] [Updated] (LUCENE-6372) hashCode/equals for SpanPositionCheckQuery and subclasses

2015-05-04 Thread Paul Elschot (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-6372:
-
Attachment: LUCENE-6372.patch

Patch of 4 May 2015.
Simplifies hashCode/equals for all subclasses of SpanQuery.
Removes this == other checks in equals(), this might affect performance.
Adds a few Objects.requireNonNull calls in constructors.
Leaves various getBoost calls in hashCode implementations to super.
Removes hashCode/equals from SpanFirstQuery, not needed anymore.
Uses new collectPayloads attribute in SpanNearQuery hashCode/equals.


 hashCode/equals for SpanPositionCheckQuery and subclasses
 -

 Key: LUCENE-6372
 URL: https://issues.apache.org/jira/browse/LUCENE-6372
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Paul Elschot
 Attachments: LUCENE-6372.patch


 Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6372) Simplify hashCode/equals for SpanQuery subclasses

2015-05-04 Thread Paul Elschot (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-6372:
-
Summary: Simplify hashCode/equals for SpanQuery subclasses  (was: 
hashCode/equals for SpanPositionCheckQuery and subclasses)

 Simplify hashCode/equals for SpanQuery subclasses
 -

 Key: LUCENE-6372
 URL: https://issues.apache.org/jira/browse/LUCENE-6372
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Paul Elschot
 Attachments: LUCENE-6372.patch


 Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6372) Simplify hashCode/equals for SpanQuery subclasses

2015-05-04 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526840#comment-14526840
 ] 

Paul Elschot commented on LUCENE-6372:
--

See also LUCENE-6333

 Simplify hashCode/equals for SpanQuery subclasses
 -

 Key: LUCENE-6372
 URL: https://issues.apache.org/jira/browse/LUCENE-6372
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Paul Elschot
 Attachments: LUCENE-6372.patch


 Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-NightlyTests-5.x - Build # 837 - Failure

2015-05-04 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/837/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test

Error Message:
Invalid content type: 

Stack Trace:
org.apache.http.ParseException: Invalid content type: 
at 
__randomizedtesting.SeedInfo.seed([491AB9BB25277433:C14E86618BDB19CB]:0)
at org.apache.http.entity.ContentType.parse(ContentType.java:273)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:513)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)
at 
org.apache.solr.cloud.CloudInspectUtil.compareResults(CloudInspectUtil.java:224)
at 
org.apache.solr.cloud.CloudInspectUtil.compareResults(CloudInspectUtil.java:166)
at 
org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.testIndexingBatchPerRequestWithHttpSolrClient(FullSolrCloudDistribCmdsTest.java:676)
at 
org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test(FullSolrCloudDistribCmdsTest.java:152)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud

2015-05-04 Thread Jessica Cheng Mallet (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526889#comment-14526889
 ] 

Jessica Cheng Mallet commented on SOLR-6220:


It'll also be nice to have a new collection API to modify the rule for a 
collection so that we can add rules for an existing collection or modify a bad 
rule set.

 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a node 
 does not exist with 100GB disk, nodes are picked up the order of size say a 
 85GB node would be picked up over 80GB disk node
 disk:100~
 {noformat}
 Examples:
 {noformat}
 #in each rack there can be max two replicas of A given shard
  rack:*,shard:*,replica:3
 //in each rack there can be max two replicas of ANY replica
  rack:*,shard:**,replica:2
  rack:*,replica:3
  #in each node there should be a max one replica of EACH shard

[JENKINS] Lucene-Solr-5.x-Windows (64bit/jdk1.8.0_45) - Build # 4646 - Failure!

2015-05-04 Thread ASF subversion and git services (JIRA)

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4646/
Java: 64bit/jdk1.8.0_45 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  org.apache.solr.cloud.CollectionsAPIDistributedZkTest.test

Error Message:
Error from server at http://127.0.0.1:53267: Could not find collection : 
awholynewstresscollection_collection1_0

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:53267: Could not find collection : 
awholynewstresscollection_collection1_0
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:376)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:328)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1074)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:846)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:789)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.addReplicaTest(CollectionsAPIDistributedZkTest.java:1120)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.test(CollectionsAPIDistributedZkTest.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526928#comment-14526928
 ] 

ASF subversion and git services commented on SOLR-6220:
---

Commit 1677642 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1677642 ]

SOLR-6220: Fix javadocs for precommit to pass

 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a node 
 does not exist with 100GB disk, nodes are picked up the order of size say a 
 85GB node would be picked up over 80GB disk node
 disk:100~
 {noformat}
 Examples:
 {noformat}
 #in each rack there can be max two replicas of A given shard
  rack:*,shard:*,replica:3
 //in each rack there can be max two replicas of ANY replica
  rack:*,shard:**,replica:2
  rack:*,replica:3
  #in each node there should be a max one replica of EACH shard

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526930#comment-14526930
 ] 

Anshum Gupta commented on SOLR-6220:


That would be a good thing to have. Can you create a new JIRA for that if one 
doesn't already exist?

 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a node 
 does not exist with 100GB disk, nodes are picked up the order of size say a 
 85GB node would be picked up over 80GB disk node
 disk:100~
 {noformat}
 Examples:
 {noformat}
 #in each rack there can be max two replicas of A given shard
  rack:*,shard:*,replica:3
 //in each rack there can be max two replicas of ANY replica
  rack:*,shard:**,replica:2
  rack:*,replica:3
  #in each node there should be a max one replica of EACH shard
  node:*,shard:*,replica:1-
  #in each node there should be a max one replica of ANY shard

[jira] [Commented] (SOLR-7458) Expose HDFS Block Locality Metrics

2015-05-04 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526933#comment-14526933
 ] 

Mike Drob commented on SOLR-7458:
-

Did some digging with HDFS folks and it looks like BlockLocation::host is 
generally a hostname (and not an ip, with caveat that your cluster is 
configured reasonably). The general solution to determining a hostname for a 
machine is very difficult, since any given server could have multiple 
interfaces with multiple names for each alias, etc. We probably just have to 
rely on some well known one that we can get, and not spend too much effort 
worrying about if localhost is good enough.

Well look at SolrXmlConfig. Will add in ConcurrentHashMap.

 Expose HDFS Block Locality Metrics
 --

 Key: SOLR-7458
 URL: https://issues.apache.org/jira/browse/SOLR-7458
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mike Drob
Assignee: Mark Miller
  Labels: metrics
 Attachments: SOLR-7458.patch, SOLR-7458.patch


 We should publish block locality metrics when using HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud

2015-05-04 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526949#comment-14526949
 ] 

Noble Paul commented on SOLR-6220:
--

It's planned and I would like to piggy back on the modify collection API 
SOLR-5132

 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a node 
 does not exist with 100GB disk, nodes are picked up the order of size say a 
 85GB node would be picked up over 80GB disk node
 disk:100~
 {noformat}
 Examples:
 {noformat}
 #in each rack there can be max two replicas of A given shard
  rack:*,shard:*,replica:3
 //in each rack there can be max two replicas of ANY replica
  rack:*,shard:**,replica:2
  rack:*,replica:3
  #in each node there should be a max one replica of EACH shard
  node:*,shard:*,replica:1-
  #in each node there should be a max one replica of ANY shard
  node:*,shard:**,replica:1-

[jira] [Closed] (SOLR-6288) Create a parser and rule engine for the rules syntax

2015-05-04 Thread Noble Paul (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul closed SOLR-6288.

Resolution: Won't Fix

makes no sense anymore

 Create a parser and rule engine for the rules syntax
 

 Key: SOLR-6288
 URL: https://issues.apache.org/jira/browse/SOLR-6288
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch


 The proposed syntax needs to be parsed and given the tags for a bunch of 
 nodes it should be able to asign replicas to nodes or just bailout if  it not 
 possible



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5132) Implement a modifyCollection API

2015-05-04 Thread Noble Paul (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5132:
-
Description: 
A new “modifyCollection” API will be introduced to:
# Turn on/off collectionApiMode (see SOLR-5096)
# Modify values of maxShardsPerNode for the collection
# Modify value of replicationFactor for entire collection (apply to each and 
every slice)
# Modify values of replicationFactor on a per-slice basis
# Modify rules
#Modify Snitch

  was:
A new “modifyCollection” API will be introduced to:
# Turn on/off collectionApiMode (see SOLR-5096)
# Modify values of maxShardsPerNode for the collection
# Modify value of replicationFactor for entire collection (apply to each and 
every slice)
# Modify values of replicationFactor on a per-slice basis


 Implement a modifyCollection API
 

 Key: SOLR-5132
 URL: https://issues.apache.org/jira/browse/SOLR-5132
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, Trunk

 Attachments: SOLR-5132.patch


 A new “modifyCollection” API will be introduced to:
 # Turn on/off collectionApiMode (see SOLR-5096)
 # Modify values of maxShardsPerNode for the collection
 # Modify value of replicationFactor for entire collection (apply to each and 
 every slice)
 # Modify values of replicationFactor on a per-slice basis
 # Modify rules
 #Modify Snitch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_60-ea-b12) - Build # 12561 - Still Failing!

2015-05-04 Thread ASF subversion and git services (JIRA)

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12561/
Java: 64bit/jdk1.8.0_60-ea-b12 -XX:+UseCompressedOops -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 53539 lines...]
BUILD FAILED
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:526: The following 
error occurred while executing this line:
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:90: The following 
error occurred while executing this line:
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build.xml:641: The 
following error occurred while executing this line:
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1963: 
The following error occurred while executing this line:
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:2002: 
Compile failed; see the compiler error output for details.

Total time: 48 minutes 32 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7484) Refactor SolrDispatchFilter.doFilter(...) method


[ 
https://issues.apache.org/jira/browse/SOLR-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526966#comment-14526966
 ] 

ASF subversion and git services commented on SOLR-7484:
---

Commit 1677644 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1677644 ]

SOLR-7484: Refactor SolrDispatchFilter to extract all Solr specific 
implementation detail to HttpSolrCall and also extract methods from within the 
current SDF.doFilter(..) logic making things easier to manage. HttpSolrCall 
converts the processing to a 3-step process i.e. Construct, Init, and Call so 
the context of the request would be available after Init and before the actual 
call operation.

 Refactor SolrDispatchFilter.doFilter(...) method
 

 Key: SOLR-7484
 URL: https://issues.apache.org/jira/browse/SOLR-7484
 Project: Solr
  Issue Type: Improvement
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Attachments: SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, 
 SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, 
 SOLR-7484.patch


 Currently almost everything that's done in SDF.doFilter() is sequential. We 
 should refactor it to clean up the code and make things easier to manage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud

2015-05-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526994#comment-14526994
 ] 

ASF subversion and git services commented on SOLR-6220:
---

Commit 1677648 from [~noble.paul] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1677648 ]

SOLR-6220: Rule Based Replica Assignment during collection creation

 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a node 
 does not exist with 100GB disk, nodes are picked up the order of size say a 
 85GB node would be picked up over 80GB disk node
 disk:100~
 {noformat}
 Examples:
 {noformat}
 #in each rack there can be max two replicas of A given shard
  rack:*,shard:*,replica:3
 //in each rack there can be max two replicas of ANY replica
  rack:*,shard:**,replica:2
  rack:*,replica:3
  #in each node there should be a max one replica of

[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 2261 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/2261/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC

1 tests failed.
FAILED:  
org.apache.solr.cloud.CollectionsAPIAsyncDistributedZkTest.testSolrJAPICalls

Error Message:
Shard split did not complete. Last recorded state: running 
expected:[completed] but was:[running]

Stack Trace:
org.junit.ComparisonFailure: Shard split did not complete. Last recorded state: 
running expected:[completed] but was:[running]
at 
__randomizedtesting.SeedInfo.seed([F3C731DE89271288:ABA3BDBF8F4DBA5C]:0)
at org.junit.Assert.assertEquals(Assert.java:125)
at 
org.apache.solr.cloud.CollectionsAPIAsyncDistributedZkTest.testSolrJAPICalls(CollectionsAPIAsyncDistributedZkTest.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at

[jira] [Updated] (SOLR-6968) add hyperloglog in statscomponent as an approximate count

2015-05-04 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-6968:
---
Attachment: SOLR-6968.patch

Updated patch now includes an HllOptions class w/tests for parsing various 
knobs for tunning...

* {{cardinality=true}} and {{cardinality=false}} still supported for basic 
defaults
* can also specify huerstic based {{cardinality=N}} where N is a number between 
0.0 and 1.0 inclusive indicating how much accuracy you care about
** 0 == minimum accuracy, conserve as much ram as possible
** 1.0 == maximum accuracy, spend as much ram as possible
** {{cardinality=true}} roughly the same as {{cardinality=0.33}}
* additional advanced local params for overriding the hueristic based on a 
knowledge of HLL:
** {{hllLog2m=N}} (raw int passed to HLL API)
** {{hllRegwidth=N}} (raw int passed to HLL API)
** hll param prefix choosen based on implementation details similar to how 
{{percentiles}} supports {{tdigestCompression}}
*** if/when we change the implementation details of how we compute cardinality, 
these can be ignored and new tunning options can be introduced.
* {{hllPreHashed=BOOL}}
** only works with Long based fields (by design)


 add hyperloglog in statscomponent as an approximate count
 -

 Key: SOLR-6968
 URL: https://issues.apache.org/jira/browse/SOLR-6968
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, 
 SOLR-6968.patch


 stats component currently supports calcDistinct but it's terribly 
 inefficient -- especially in distib mode.
 we should add support for using hyperloglog to compute an approximate count 
 of distinct values (using localparams via SOLR-6349 to control the precision 
 of the approximation)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion

2015-05-04 Thread Sachin Goyal (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527025#comment-14527025
]

Sachin Goyal commented on SOLR-7121:

Thanks for the patch file [~mark.mil...@oblivion.ch]! I will add a patch file
in the future along with pull request updation.
Please see my comments below:

\\
{quote}I think we want to look at making these new tests much faster.{quote}
Please let me know how much time you are seeing for the running of the newly
added tests.
I think the new tests are using the existing actual Solr Cloud infrastructure
and probably will need a little bit of time to setup and shutdown ZK, Cloud
etc. unless we are happy with unit tests instead of functional. But if you have
any ideas for the particular tests added in this ticket, I will be happy to
improve upon the same.

\\
\\
{quote}The test suite with this patch doesn't yet fully pass for me
either.{quote}
Can you please run those failing tests without the patch and let me know if
they are still failing?
The build seems to be passing at my end.

\\
\\
{quote}What is the motivation behind the core regex matching and multiple
config entries? Do you really need to configure different healthcheck
thresholds per core in a collection?{quote}
At a very minimum, we may want to configure the cores differently for different
collections.
The regular expression approach allows us to have a single configuration file
for collections serving million documents and running on more powerful machines
and also for collections serving a couple thousand small documents and running
on less powerful machines.
Without the regular expression, one would need separate configuration files for
separate collections which is somewhat of a pain to manage.
So basically, the regular expressions help define different thresholds for solr
running on heterogeneous hardware.

\\
\\
{quote}We also want to make it clear this functionality only works with
SolrCloud and think about how that should best be expressed in the code - this
bleeds a bit of SolrCloud specific code out of ZkController and into SolrCore
in a way we have not really done yet I think.{quote}
I agree to some extent. However, please note that all the new code is protected
with *cc.isZooKeeperAware()* and it should not affect non-cloud-aware code.
If you have more specific thoughts on improving this, I would be happy to
refactor the current patch.

\\
\\
{quote}What if we are the leader and publish a down state due to overload?
Shouldn't we also give up our leader position?{quote}
I am a little confused on this one.
Wouldn't a down state trigger re-election? If not, it should probably be fixed
elsewhere by asking non-leaders to start the election process.
In any case, note that this code will be reached only when the leader is near
exhaustion.
Without this code, it would have tipped over completely and would have needed a
restart.
So, this code helps the leader node to survive a crash and become available in
the future.

Solr nodes should go down based on configurable thresholds and not rely on
resource exhaustion
--

Key: SOLR-7121
URL: https://issues.apache.org/jira/browse/SOLR-7121
Project: Solr
Issue Type: New Feature
Reporter: Sachin Goyal
Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch,
SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch

Currently, there is no way to control when a Solr node goes down.
If the server is having high GC pauses or too many threads or is just getting
too many queries due to some bad load-balancer, the cores in the machine keep
on serving unless they exhaust the machine's resources and everything comes
to a stall.
Such a slow-dying core can affect other cores as well by taking huge time to
serve their distributed queries.
There should be a way to specify some threshold values beyond which the
targeted core can its ill-health and proactively go down to recover.
When the load improves, the core should come up automatically.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion

2015-05-04 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-7121:
-

Assignee: Mark Miller

 Solr nodes should go down based on configurable thresholds and not rely on 
 resource exhaustion
 --

 Key: SOLR-7121
 URL: https://issues.apache.org/jira/browse/SOLR-7121
 Project: Solr
  Issue Type: New Feature
Reporter: Sachin Goyal
Assignee: Mark Miller
 Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, 
 SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch


 Currently, there is no way to control when a Solr node goes down.
 If the server is having high GC pauses or too many threads or is just getting 
 too many queries due to some bad load-balancer, the cores in the machine keep 
 on serving unless they exhaust the machine's resources and everything comes 
 to a stall.
 Such a slow-dying core can affect other cores as well by taking huge time to 
 serve their distributed queries.
 There should be a way to specify some threshold values beyond which the 
 targeted core can its ill-health and proactively go down to recover.
 When the load improves, the core should come up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-7499) Remove/Deprecate the name parameter from the ADDREPLICA Collection API call

2015-05-04 Thread Varun Thacker (JIRA)

Varun Thacker created SOLR-7499:
---

 Summary: Remove/Deprecate the name parameter from the ADDREPLICA 
Collection API call
 Key: SOLR-7499
 URL: https://issues.apache.org/jira/browse/SOLR-7499
 Project: Solr
  Issue Type: Bug
Reporter: Varun Thacker
Priority: Minor


Right now we take a name parameter in the ADDREPLICA call. We use that as the 
core name for the replica. Are there any use cases where specifying the name of 
the core for the replica is useful?

Here are the disadvantages of doing so -
1. We don't verify if the name is unique in the collection. So if a conflicting 
name ends up in the same node then the call will fail.
2. If it core is created on some other node, it will fail with 
legacyCloud=false as that checks for uniqueness in core names.

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
 - The ref guide has never documented the name parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.7.0) - Build # 2216 - Still Failing!

2015-05-04 Thread ASF subversion and git services (JIRA)

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2216/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 9411 lines...]
[javac] Compiling 532 source files to 
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/build/solr-core/classes/test
[javac] 
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:175:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new

[jira] [Commented] (LUCENE-6196) Include geo3d package, along with Lucene integration to make it useful


[ 
https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527093#comment-14527093
 ] 

ASF subversion and git services commented on LUCENE-6196:
-

Commit 1677656 from [~dsmiley] in branch 'dev/branches/lucene6196'
[ https://svn.apache.org/r1677656 ]

LUCENE-6196: Fix javadoc issues; ant precommit is happy.

 Include geo3d package, along with Lucene integration to make it useful
 --

 Key: LUCENE-6196
 URL: https://issues.apache.org/jira/browse/LUCENE-6196
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: Karl Wright
Assignee: David Smiley
 Attachments: LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, 
 LUCENE-6196_Geo3d.patch, ShapeImpl.java, geo3d-tests.zip, geo3d.zip


 I would like to explore contributing a geo3d package to Lucene.  This can be 
 used in conjunction with Lucene search, both for generating geohashes (via 
 spatial4j) for complex geographic shapes, as well as limiting results 
 resulting from those queries to those results within the exact shape in 
 highly performant ways.
 The package uses 3d planar geometry to do its magic, which basically limits 
 computation necessary to determine membership (once a shape has been 
 initialized, of course) to only multiplications and additions, which makes it 
 feasible to construct a performant BoostSource-based filter for geographic 
 shapes.  The math is somewhat more involved when generating geohashes, but is 
 still more than fast enough to do a good job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6450) Add simple encoded GeoPointField type to core


 [ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6450:
---
Attachment: LUCENE-6450.patch

Was out last week but had some time this weekend to add TermsEnum logic to 
visit only those ranges along the SFC that represent the bounding box.

Updated patch attached.  Benchmarks are below:

*QuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time: 2449.08 sec
Index Size: 13G
Mean Query Time: 0.066 sec

*PackedQuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time:  1945.288 sec
Index Size: 11G
Mean Query Time:  0.058 sec

*GeoPointField*

Index Time:  180.872 sec
Index Size: 1.8G
Mean Query Time:  0.107 sec



 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core


[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527103#comment-14527103
 ] 

Nicholas Knize edited comment on LUCENE-6450 at 5/4/15 7:20 PM:


Was out last week but had some time this weekend to add TermsEnum logic to 
visit only those ranges along the SFC that represent the bounding box.

Updated patch attached - this code currently exists in sandbox.  Benchmarks are 
below:

*QuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time: 2449.08 sec
Index Size: 13G
Mean Query Time: 0.066 sec

*PackedQuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time:  1945.288 sec
Index Size: 11G
Mean Query Time:  0.058 sec

*GeoPointField*

Index Time:  180.872 sec
Index Size: 1.8G
Mean Query Time:  0.107 sec




was (Author: nknize):
Was out last week but had some time this weekend to add TermsEnum logic to 
visit only those ranges along the SFC that represent the bounding box.

Updated patch attached.  Benchmarks are below:

*QuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time: 2449.08 sec
Index Size: 13G
Mean Query Time: 0.066 sec

*PackedQuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time:  1945.288 sec
Index Size: 11G
Mean Query Time:  0.058 sec

*GeoPointField*

Index Time:  180.872 sec
Index Size: 1.8G
Mean Query Time:  0.107 sec



 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527103#comment-14527103
 ] 

Nicholas Knize edited comment on LUCENE-6450 at 5/4/15 7:23 PM:


Was out last week but had some time this weekend to add TermsEnum logic to 
visit only those ranges along the SFC that represent the bounding box.

Updated patch attached - this code currently exists in sandbox.  Benchmarks 
(using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are 
below:

Data Set:  60M points of Planet OSM GPS data

*QuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time: 2449.08 sec
Index Size: 13G
Mean Query Time: 0.066 sec

*PackedQuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time:  1945.288 sec
Index Size: 11G
Mean Query Time:  0.058 sec

*GeoPointField*

Index Time:  180.872 sec
Index Size: 1.8G
Mean Query Time:  0.107 sec




was (Author: nknize):
Was out last week but had some time this weekend to add TermsEnum logic to 
visit only those ranges along the SFC that represent the bounding box.

Updated patch attached - this code currently exists in sandbox.  Benchmarks are 
below:

*QuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time: 2449.08 sec
Index Size: 13G
Mean Query Time: 0.066 sec

*PackedQuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time:  1945.288 sec
Index Size: 11G
Mean Query Time:  0.058 sec

*GeoPointField*

Index Time:  180.872 sec
Index Size: 1.8G
Mean Query Time:  0.107 sec



 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6196) Include geo3d package, along with Lucene integration to make it useful


[ 
https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527118#comment-14527118
 ] 

ASF subversion and git services commented on LUCENE-6196:
-

Commit 1677658 from [~dsmiley] in branch 'dev/branches/lucene6196'
[ https://svn.apache.org/r1677658 ]

LUCENE-6196: Mark @lucene.experimental or @lucene.internal

 Include geo3d package, along with Lucene integration to make it useful
 --

 Key: LUCENE-6196
 URL: https://issues.apache.org/jira/browse/LUCENE-6196
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: Karl Wright
Assignee: David Smiley
 Attachments: LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, 
 LUCENE-6196_Geo3d.patch, ShapeImpl.java, geo3d-tests.zip, geo3d.zip


 I would like to explore contributing a geo3d package to Lucene.  This can be 
 used in conjunction with Lucene search, both for generating geohashes (via 
 spatial4j) for complex geographic shapes, as well as limiting results 
 resulting from those queries to those results within the exact shape in 
 highly performant ways.
 The package uses 3d planar geometry to do its magic, which basically limits 
 computation necessary to determine membership (once a shape has been 
 initialized, of course) to only multiplications and additions, which makes it 
 feasible to construct a performant BoostSource-based filter for geographic 
 shapes.  The math is somewhat more involved when generating geohashes, but is 
 still more than fast enough to do a good job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Solr-Artifacts-5.x - Build # 819 - Failure

2015-05-04 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Solr-Artifacts-5.x/819/

No tests ran.

Build Log:
[...truncated 27323 lines...]
[javac] Compiling 532 source files to 
/usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/build/solr-core/classes/test
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:175:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given

[jira] [Updated] (LUCENE-6450) Add simple encoded GeoPointField type to core


 [ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6450:
---
Attachment: LUCENE-6450.patch

Updated patch to remove some superfluous code in GeoUtils.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-5.x-Windows (64bit/jdk1.7.0_80) - Build # 4648 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4648/
Java: 64bit/jdk1.7.0_80 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 9538 lines...]
[javac] Compiling 532 source files to 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\classes\test
[javac] 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:71:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:77:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:117:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:129:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:141:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:153:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:164:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:175:
 error: constructor ReplicaAssigner in class

[jira] [Created] (LUCENE-6462) Latin Stemmer for lucene

2015-05-04 Thread Niki (JIRA)

Niki created LUCENE-6462:


 Summary: Latin Stemmer for lucene
 Key: LUCENE-6462
 URL: https://issues.apache.org/jira/browse/LUCENE-6462
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Niki


In the latest lucene package there is no stemmer for Latin language. I have a 
stemmer for latin language which is a rule based program based on the grammar 
and rules of Latin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.8.0) - Build # 2217 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2217/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 54189 lines...]
BUILD FAILED
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:536: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:90: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/build.xml:641: The 
following error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:1990: 
The following error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:2023: 
Compile failed; see the compiler error output for details.

Total time: 97 minutes 51 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Where Search Meets Machine Learning

2015-05-04 Thread J. Delgado

Sorry, as I was saying, the machine learning approach, is NOT limited to
having lots of user action data. In fact having little or no user action
data is commonly referred to as the cold start problem in recommender
systems. In which case, it is useful to exploit content based similarities
as well as context (such as location, time-of-the-day, day-of-the-week,
site-section, device type, etc) to make predictions/scoring. This can still
be combined with the usual IR based scoring to keep semantics as the
driving force.

-J

On Monday, May 4, 2015, J. Delgado joaquin.delg...@gmail.com wrote:

 BTW, as i mentioned, the machine learning

 On Monday, May 4, 2015, J. Delgado joaquin.delg...@gmail.com
 javascript:_e(%7B%7D,'cvml','joaquin.delg...@gmail.com'); wrote:

 I totally agree that it depends at the task at hand and the
 amount/quality of the data that you can get hold of.

 The problem of relevancy in traditional document/semantic information
 retrieval (IR) task is such a hard thing because there is little or no
 source of truth you could use as training data (unless you you something
 like TREC for a limited set of documents to evaluate) in most cases.
 Additionally the feedback data you get from users, if it exists, is very
 noisy. It this case prior knowledge, encoded as attributes-weights, crafted
 functions, and heuristics is your best bet. You can however mine the
 content itself by leveraging clustering/topic modeling via LDA which is
 unsupervised learning algorithm and use that as input. Or perhaps
 Labeled-LDA and Multi-Grain LDA, another topic model for classification and
 sentiment analysis, which are supervised algorithms, in which case you can
 still use the approach I suggested.

 However, for search tasks that involve e-commerce, advertisements,
 recommendations, etc., there seems to be more data that can be captured
 from users interactions with the system/site, that can be used as signals
 and users' actions (adding things to wish lists, clicks for more info,
 conversions, etc.) is much more telling about the intention/values the user
 give to what is presented to them. Then viewing search as a machine
 learning/multi-objective optimization problem makes sense.

 My point is that search engines nowadays is used for all these use cases,
 thus it is worth exploring all the venues exposed in this thread.

 Cheers,

 -- Joaquin

 On Mon, May 4, 2015 at 2:31 PM, Tom Burton-West tburt...@umich.edu
 wrote:

 Hi Doug and Joaquin,

 This is a really interesting discussion.  Joaquin, I'm looking forward
 to taking your code for a test drive.  Thank you for making it publicly
 available.

 Doug,  I'm interested in your pyramid observation.  I work with academic
 search which has some of the problems unique queries/information needs and
 of data sparsity you mention in your blog post.

 This article makes a similar argument that massive amounts of user data
 are so important for modern search engines that it is essentially a barrier
 to entry for new web search engines.
 Usage Data in Web Search: Benefits and Limitations. Ricardo Baeza-Yates and
 Yoelle Maarek.  In Proceedings of SSDBM'2012, Chania, Crete, June 2012.
 http://www.springerlink.com/index/58255K40151U036N.pdf

  Tom


 I noticed that information retrieval problems fall into a sort-of
 layered pyramid. At the topmopst point is someone like Google where the
 sheer amount of high quality user behavior data that search truly is a
 machine learning problem, much as you propose. As you move down the pyramid
 the quality of user data diminishes.

 Eventually you get to a very thick layer of middle-class search
 applications that value relevance, but have very modest amounts or no user
 data. For most of them, even if they tracked their searches over a year,
 they *might* get good data over their top 50 searches. (I know cause they
 send me the spreadsheet and say fix it!). The best they can use analytics
 data is after-action troubleshooting. Actual user emails complaining about
 the search can be more useful than behavior data!

[jira] [Created] (SOLR-7501) map-reduce index tool has timing bugs

2015-05-04 Thread Shenghua Wan (JIRA)

Shenghua Wan created SOLR-7501:
--

 Summary: map-reduce index tool has timing bugs
 Key: SOLR-7501
 URL: https://issues.apache.org/jira/browse/SOLR-7501
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Reporter: Shenghua Wan
Priority: Minor


map-reduce index tool has timing bugs in several classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_45) - Build # 12568 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12568/
Java: 32bit/jdk1.8.0_45 -server -XX:+UseParallelGC

1 tests failed.
FAILED:  
org.apache.solr.cloud.CollectionsAPIAsyncDistributedZkTest.testSolrJAPICalls

Error Message:
Shard split did not complete. Last recorded state: running 
expected:[completed] but was:[running]

Stack Trace:
org.junit.ComparisonFailure: Shard split did not complete. Last recorded state: 
running expected:[completed] but was:[running]
at 
__randomizedtesting.SeedInfo.seed([A5A68D59067CF8A2:FDC2013800165076]:0)
at org.junit.Assert.assertEquals(Assert.java:125)
at 
org.apache.solr.cloud.CollectionsAPIAsyncDistributedZkTest.testSolrJAPICalls(CollectionsAPIAsyncDistributedZkTest.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at

Re: Where Search Meets Machine Learning

2015-05-04 Thread J. Delgado

I totally agree that it depends at the task at hand and the amount/quality
of the data that you can get hold of.

The problem of relevancy in traditional document/semantic information
retrieval (IR) task is such a hard thing because there is little or no
source of truth you could use as training data (unless you you something
like TREC for a limited set of documents to evaluate) in most cases.
Additionally the feedback data you get from users, if it exists, is very
noisy. It this case prior knowledge, encoded as attributes-weights, crafted
functions, and heuristics is your best bet. You can however mine the
content itself by leveraging clustering/topic modeling via LDA which is
unsupervised learning algorithm and use that as input. Or perhaps
Labeled-LDA and Multi-Grain LDA, another topic model for classification and
sentiment analysis, which are supervised algorithms, in which case you can
still use the approach I suggested.

However, for search tasks that involve e-commerce, advertisements,
recommendations, etc., there seems to be more data that can be captured
from users interactions with the system/site, that can be used as signals
and users' actions (adding things to wish lists, clicks for more info,
conversions, etc.) is much more telling about the intention/values the user
give to what is presented to them. Then viewing search as a machine
learning/multi-objective optimization problem makes sense.

My point is that search engines nowadays is used for all these use cases,
thus it is worth exploring all the venues exposed in this thread.

Cheers,

-- Joaquin

On Mon, May 4, 2015 at 2:31 PM, Tom Burton-West tburt...@umich.edu wrote:

 Hi Doug and Joaquin,

 This is a really interesting discussion.  Joaquin, I'm looking forward to
 taking your code for a test drive.  Thank you for making it publicly
 available.

 Doug,  I'm interested in your pyramid observation.  I work with academic
 search which has some of the problems unique queries/information needs and
 of data sparsity you mention in your blog post.

 This article makes a similar argument that massive amounts of user data
 are so important for modern search engines that it is essentially a barrier
 to entry for new web search engines.
 Usage Data in Web Search: Benefits and Limitations. Ricardo Baeza-Yates and
 Yoelle Maarek.  In Proceedings of SSDBM'2012, Chania, Crete, June 2012.
 http://www.springerlink.com/index/58255K40151U036N.pdf

  Tom


 I noticed that information retrieval problems fall into a sort-of layered
 pyramid. At the topmopst point is someone like Google where the sheer
 amount of high quality user behavior data that search truly is a machine
 learning problem, much as you propose. As you move down the pyramid the
 quality of user data diminishes.

 Eventually you get to a very thick layer of middle-class search
 applications that value relevance, but have very modest amounts or no user
 data. For most of them, even if they tracked their searches over a year,
 they *might* get good data over their top 50 searches. (I know cause they
 send me the spreadsheet and say fix it!). The best they can use analytics
 data is after-action troubleshooting. Actual user emails complaining about
 the search can be more useful than behavior data!

Re: Where Search Meets Machine Learning

2015-05-04 Thread J. Delgado

BTW, as i mentioned, the machine learning

On Monday, May 4, 2015, J. Delgado joaquin.delg...@gmail.com wrote:

 I totally agree that it depends at the task at hand and the amount/quality
 of the data that you can get hold of.

 The problem of relevancy in traditional document/semantic information
 retrieval (IR) task is such a hard thing because there is little or no
 source of truth you could use as training data (unless you you something
 like TREC for a limited set of documents to evaluate) in most cases.
 Additionally the feedback data you get from users, if it exists, is very
 noisy. It this case prior knowledge, encoded as attributes-weights, crafted
 functions, and heuristics is your best bet. You can however mine the
 content itself by leveraging clustering/topic modeling via LDA which is
 unsupervised learning algorithm and use that as input. Or perhaps
 Labeled-LDA and Multi-Grain LDA, another topic model for classification and
 sentiment analysis, which are supervised algorithms, in which case you can
 still use the approach I suggested.

 However, for search tasks that involve e-commerce, advertisements,
 recommendations, etc., there seems to be more data that can be captured
 from users interactions with the system/site, that can be used as signals
 and users' actions (adding things to wish lists, clicks for more info,
 conversions, etc.) is much more telling about the intention/values the user
 give to what is presented to them. Then viewing search as a machine
 learning/multi-objective optimization problem makes sense.

 My point is that search engines nowadays is used for all these use cases,
 thus it is worth exploring all the venues exposed in this thread.

 Cheers,

 -- Joaquin

 On Mon, May 4, 2015 at 2:31 PM, Tom Burton-West tburt...@umich.edu
 javascript:_e(%7B%7D,'cvml','tburt...@umich.edu'); wrote:

 Hi Doug and Joaquin,

 This is a really interesting discussion.  Joaquin, I'm looking forward to
 taking your code for a test drive.  Thank you for making it publicly
 available.

 Doug,  I'm interested in your pyramid observation.  I work with academic
 search which has some of the problems unique queries/information needs and
 of data sparsity you mention in your blog post.

 This article makes a similar argument that massive amounts of user data
 are so important for modern search engines that it is essentially a barrier
 to entry for new web search engines.
 Usage Data in Web Search: Benefits and Limitations. Ricardo Baeza-Yates and
 Yoelle Maarek.  In Proceedings of SSDBM'2012, Chania, Crete, June 2012.
 http://www.springerlink.com/index/58255K40151U036N.pdf

  Tom


 I noticed that information retrieval problems fall into a sort-of
 layered pyramid. At the topmopst point is someone like Google where the
 sheer amount of high quality user behavior data that search truly is a
 machine learning problem, much as you propose. As you move down the pyramid
 the quality of user data diminishes.

 Eventually you get to a very thick layer of middle-class search
 applications that value relevance, but have very modest amounts or no user
 data. For most of them, even if they tracked their searches over a year,
 they *might* get good data over their top 50 searches. (I know cause they
 send me the spreadsheet and say fix it!). The best they can use analytics
 data is after-action troubleshooting. Actual user emails complaining about
 the search can be more useful than behavior data!

[GitHub] lucene-solr pull request: fix timing bugs in map-reduce contrib fo...

2015-05-04 Thread wanshenghua

GitHub user wanshenghua opened a pull request:

https://github.com/apache/lucene-solr/pull/146

fix timing bugs in map-reduce contrib for indexing

https://issues.apache.org/jira/browse/SOLR-7501

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wanshenghua/lucene-solr SOLR_7501

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #146


commit 06a81f2658f29a83db692c46232b4569c0321352
Author: Shenghua Wan s...@walmartlabs.com
Date:   2015-05-05T03:23:17Z

fix timing bugs in map-reduce contrib for indexing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7501) map-reduce index tool has timing bugs

2015-05-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527841#comment-14527841
 ] 

ASF GitHub Bot commented on SOLR-7501:
--

GitHub user wanshenghua opened a pull request:

https://github.com/apache/lucene-solr/pull/146

fix timing bugs in map-reduce contrib for indexing

https://issues.apache.org/jira/browse/SOLR-7501

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wanshenghua/lucene-solr SOLR_7501

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #146


commit 06a81f2658f29a83db692c46232b4569c0321352
Author: Shenghua Wan s...@walmartlabs.com
Date:   2015-05-05T03:23:17Z

fix timing bugs in map-reduce contrib for indexing




 map-reduce index tool has timing bugs
 -

 Key: SOLR-7501
 URL: https://issues.apache.org/jira/browse/SOLR-7501
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Reporter: Shenghua Wan
Priority: Minor

 map-reduce index tool has timing bugs in several classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7501) map-reduce index tool has timing bugs

2015-05-04 Thread Shenghua Wan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shenghua Wan updated SOLR-7501:
---
Description: 
map-reduce index tool has timing bugs in several classes.

bug fix is provided in https://github.com/apache/lucene-solr/pull/146

  was:map-reduce index tool has timing bugs in several classes.


 map-reduce index tool has timing bugs
 -

 Key: SOLR-7501
 URL: https://issues.apache.org/jira/browse/SOLR-7501
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Reporter: Shenghua Wan
Priority: Minor

 map-reduce index tool has timing bugs in several classes.
 bug fix is provided in https://github.com/apache/lucene-solr/pull/146



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-5.x-Linux (32bit/jdk1.7.0_80) - Build # 12393 - Still Failing!

2015-05-04 Thread ASF subversion and git services (JIRA)

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/12393/
Java: 32bit/jdk1.7.0_80 -server -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 9437 lines...]
[javac] Compiling 532 source files to 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/classes/test
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:175:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^

[jira] [Commented] (LUCENE-6462) Latin Stemmer for lucene

2015-05-04 Thread Niki (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527828#comment-14527828
 ] 

Niki commented on LUCENE-6462:
--

When searching for a LatinStemmer, I found this link from Lucene/Solr 
https://github.com/scherziglu/solr/blob/master/solr-analysis/src/main/java/org/apache/lucene/analysis/la/LatinStemmer.java.
 
This program does not stem most words properly and also unnecessarily adds an 
'i' amongst other things.
I modified the above code to accomodate the rules of stemming in Latin.

 Latin Stemmer for lucene
 

 Key: LUCENE-6462
 URL: https://issues.apache.org/jira/browse/LUCENE-6462
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Niki

 In the latest lucene package there is no stemmer for Latin language. I have a 
 stemmer for latin language which is a rule based program based on the grammar 
 and rules of Latin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 3063 - Still Failing

2015-05-04 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3063/

All tests passed

Build Log:
[...truncated 9349 lines...]
[javac] Compiling 532 source files to 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/build/solr-core/classes/test
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac]

[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud


[ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527905#comment-14527905
 ] 

ASF subversion and git services commented on SOLR-6220:
---

Commit 1677741 from sha...@apache.org in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1677741 ]

SOLR-6220: Fix compile error on Java7

 Replica placement strategy for solrcloud
 

 Key: SOLR-6220
 URL: https://issues.apache.org/jira/browse/SOLR-6220
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
 SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch


 h1.Objective
 Most cloud based systems allow to specify rules on how the replicas/nodes of 
 a cluster are allocated . Solr should have a flexible mechanism through which 
 we should be able to control allocation of replicas or later change it to 
 suit the needs of the system
 All configurations are per collection basis. The rules are applied whenever a 
 replica is created in any of the shards in a given collection during
  * collection creation
  * shard splitting
  * add replica
  * createsshard
 There are two aspects to how replicas are placed: snitch and placement. 
 h2.snitch 
 How to identify the tags of nodes. Snitches are configured through collection 
 create command with the snitch param  . eg: snitch=EC2Snitch or 
 snitch=class:EC2Snitch
 h2.ImplicitSnitch 
 This is shipped by default with Solr. user does not need to specify 
 {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
 present in the rules , it is automatically used,
 tags provided by ImplicitSnitch
 # cores :  No:of cores in the node
 # disk : Disk space available in the node 
 # host : host name of the node
 # node: node name 
 # D.* : These are values available from systrem propertes. {{D.key}} means a 
 value that is passed to the node as {{-Dkey=keyValue}} during the node 
 startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
 h2.Rules 
 This tells how many replicas for a given shard needs to be assigned to nodes 
 with the given key value pairs. These parameters will be passed on to the 
 collection CREATE api as a multivalued parameter  rule . The values will be 
 saved in the state of the collection as follows
 {code:Javascript}
 {
  “mycollection”:{
   “snitch”: {
   class:“ImplicitSnitch”
 }
   “rules”:[{cores:4-}, 
  {replica:1 ,shard :* ,node:*},
  {disk:100}]
 }
 {code}
 A rule is specified as a pseudo JSON syntax . which is a map of keys and 
 values
 *Each collection can have any number of rules. As long as the rules do not 
 conflict with each other it should be OK. Or else an error is thrown
 * In each rule , shard and replica can be omitted
 ** default value of  replica is {{\*}} means ANY or you can specify a count 
 and an operand such as {{}} (less than) or {{}} (greater than)
 ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
 {{**}} means ANY.  default value is {{\*\*}} (ANY)
 * There should be exactly one extra condition in a rule other than {{shard}} 
 and {{replica}}.  
 * all keys other than {{shard}} and {{replica}} are called tags and the tags 
 are nothing but values provided by the snitch for each node
 * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
 by the system implicitly 
 h3.How are nodes picked up? 
 Nodes are not picked up in random. The rules are used to first sort the nodes 
 according to affinity. For example, if there is a rule that says 
 {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
 if the rule is {{disk:100-}} nodes with lesser disk space will be given 
 priority. If everything else is equal , nodes with fewer cores are given 
 higher priority
 h3.Fuzzy match
 Fuzzy match can be applied when strict matches fail .The values can be 
 prefixed {{~}} to specify fuzziness
 example rule
 {noformat}
  #Example requirement use only one replica of a shard in a host if possible, 
 if no matches found , relax that rule. 
 rack:*,shard:*,replica:2~
 #Another example, assign all replicas to nodes with disk space of 100GB or 
 more,, or relax the rule if not possible. This will ensure that if a node 
 does not exist with 100GB disk, nodes are picked up the order of size say a 
 85GB node would be picked up over 80GB disk node
 disk:100~
 {noformat}
 Examples:
 {noformat}
 #in each rack there can be max two replicas of A given shard
  rack:*,shard:*,replica:3
 //in each rack there can be max two replicas of ANY replica
  rack:*,shard:**,replica:2
  rack:*,replica:3
  #in each node there should be a max one replica of EACH shard

[JENKINS] Lucene-Solr-5.x-Linux (32bit/jdk1.8.0_45) - Build # 12395 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/12395/
Java: 32bit/jdk1.8.0_45 -client -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 52931 lines...]
BUILD FAILED
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:536: The following 
error occurred while executing this line:
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:90: The following error 
occurred while executing this line:
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build.xml:641: The following 
error occurred while executing this line:
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:1990: The 
following error occurred while executing this line:
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:2023: 
Compile failed; see the compiler error output for details.

Total time: 57 minutes 24 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.8.0) - Build # 2218 - Still Failing!

2015-05-04 Thread ASF subversion and git services (JIRA)

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2218/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 54321 lines...]
BUILD FAILED
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:536: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:90: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/build.xml:641: The 
following error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:1990: 
The following error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:2023: 
Compile failed; see the compiler error output for details.

Total time: 98 minutes 6 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.8.0) - Build # 2218 - Still Failing!

2015-05-04 Thread Shalin Shekhar Mangar

I committed a fix. There was a compile error with Java7 in one of the tests
added in SOLR-6220.

On Tue, May 5, 2015 at 10:43 AM, Policeman Jenkins Server 
jenk...@thetaphi.de wrote:

 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2218/
 Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseG1GC

 All tests passed

 Build Log:
 [...truncated 54321 lines...]
 BUILD FAILED
 /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:536: The
 following error occurred while executing this line:
 /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:90: The
 following error occurred while executing this line:
 /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/build.xml:641: The
 following error occurred while executing this line:
 /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:1990:
 The following error occurred while executing this line:
 /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:2023:
 Compile failed; see the compiler error output for details.

 Total time: 98 minutes 6 seconds
 Build step 'Invoke Ant' marked build as failure
 Archiving artifacts
 Recording test results
 Email was triggered for: Failure - Any
 Sending email for trigger: Failure - Any




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Regards,
Shalin Shekhar Mangar.

[jira] [Commented] (LUCENE-6196) Include geo3d package, along with Lucene integration to make it useful


[ 
https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527191#comment-14527191
 ] 

ASF subversion and git services commented on LUCENE-6196:
-

Commit 1677670 from [~dsmiley] in branch 'dev/branches/lucene6196'
[ https://svn.apache.org/r1677670 ]

LUCENE-6196: committing Karl's latest patch
https://reviews.apache.org/r/33811/ (diff #3)

 Include geo3d package, along with Lucene integration to make it useful
 --

 Key: LUCENE-6196
 URL: https://issues.apache.org/jira/browse/LUCENE-6196
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: Karl Wright
Assignee: David Smiley
 Attachments: LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, 
 LUCENE-6196_Geo3d.patch, ShapeImpl.java, geo3d-tests.zip, geo3d.zip


 I would like to explore contributing a geo3d package to Lucene.  This can be 
 used in conjunction with Lucene search, both for generating geohashes (via 
 spatial4j) for complex geographic shapes, as well as limiting results 
 resulting from those queries to those results within the exact shape in 
 highly performant ways.
 The package uses 3d planar geometry to do its magic, which basically limits 
 computation necessary to determine membership (once a shape has been 
 initialized, of course) to only multiplications and additions, which makes it 
 feasible to construct a performant BoostSource-based filter for geographic 
 shapes.  The math is somewhat more involved when generating geohashes, but is 
 still more than fast enough to do a good job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-5.x-Linux (64bit/jdk1.7.0_80) - Build # 12389 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/12389/
Java: 64bit/jdk1.7.0_80 -XX:+UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 9617 lines...]
[javac] Compiling 532 source files to 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/classes/test
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:175:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(

[jira] [Commented] (SOLR-7436) Solr stops printing stacktraces in log and output

2015-05-04 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527279#comment-14527279
 ] 

Hoss Man commented on SOLR-7436:


Best guess, based on random googling since there's nothing in solr that i could 
think of to explain this, is that you are running into this HotSpot gotcha...

http://jawspeak.com/2010/05/26/hotspot-caused-exceptions-to-lose-their-stack-traces-in-production-and-the-fix/

https://stackoverflow.com/questions/2295015/log4j-not-printing-the-stacktrace-for-exceptions

bq. The compiler in the server VM now provides correct stack backtraces for all 
cold built-in exceptions. For performance purposes, when such an exception is 
thrown a few times, the method may be recompiled. After recompilation, the 
compiler may choose a faster tactic using preallocated exceptions that do not 
provide a stack trace. To disable completely the use of preallocated 
exceptions, use this new flag: -XX:-OmitStackTraceInFastThrow.

 Solr stops printing stacktraces in log and output
 -

 Key: SOLR-7436
 URL: https://issues.apache.org/jira/browse/SOLR-7436
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.1
 Environment: Local 5.1
Reporter: Markus Jelsma
 Attachments: solr-8983-console.log


 After a short while, Solr suddenly stops printing stacktraces in the log and 
 output. 
 {code}
 251043 [qtp1121454968-17] INFO  org.apache.solr.core.SolrCore.Request  [   
 suggests] - [suggests] webapp=/solr path=/select 
 params={q=*:*fq={!collapse+field%3Dquery_digest}fq={!collapse+field%3Dresult_digest}}
  status=500 QTime=3 
 251043 [qtp1121454968-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  [ 
   suggests] - null:java.lang.NullPointerException
 at 
 org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743)
 at 
 org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:780)
 at 
 org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:203)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1660)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1479)
 at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:556)
 at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at

Re: Where Search Meets Machine Learning

2015-05-04 Thread Tom Burton-West

Hi Doug and Joaquin,

This is a really interesting discussion.  Joaquin, I'm looking forward to
taking your code for a test drive.  Thank you for making it publicly
available.

Doug,  I'm interested in your pyramid observation.  I work with academic
search which has some of the problems unique queries/information needs and
of data sparsity you mention in your blog post.

This article makes a similar argument that massive amounts of user data are
so important for modern search engines that it is essentially a barrier to
entry for new web search engines.
Usage Data in Web Search: Benefits and Limitations. Ricardo Baeza-Yates and
Yoelle Maarek.  In Proceedings of SSDBM'2012, Chania, Crete, June 2012.
http://www.springerlink.com/index/58255K40151U036N.pdf

 Tom


 I noticed that information retrieval problems fall into a sort-of layered
 pyramid. At the topmopst point is someone like Google where the sheer
 amount of high quality user behavior data that search truly is a machine
 learning problem, much as you propose. As you move down the pyramid the
 quality of user data diminishes.

 Eventually you get to a very thick layer of middle-class search
 applications that value relevance, but have very modest amounts or no user
 data. For most of them, even if they tracked their searches over a year,
 they *might* get good data over their top 50 searches. (I know cause they
 send me the spreadsheet and say fix it!). The best they can use analytics
 data is after-action troubleshooting. Actual user emails complaining about
 the search can be more useful than behavior data!

[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread Michael McCandless (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527392#comment-14527392
]

Michael McCandless commented on LUCENE-6450:

This new approach is nice!

I don't fully understand all the geo math, but I think I get the gist:
you recursively approximate the target shape using smaller and smaller
ranges from the morton encoding, and then record when that z-shape is
fully within the query and avoid the post-filtering for those ranges.

This visits fewer terms than the original patch, which did just a
single range that can (w/ the right 'adversary') visit a great many
false terms.

It's impressive how fast this is, without using any NumericField
prefix terms. I think we can explore that later and we should commit
this approach now ..

Maybe add a test case w/ more data, e.g. randomized test? It could
index a bunch of random points, and then run random rects/shapes and
do the dumb slow check every single doc check and confirm query hits
agree.

Add simple encoded GeoPointField type to core
-

Key: LUCENE-6450
URL: https://issues.apache.org/jira/browse/LUCENE-6450
Project: Lucene - Core
Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch,
LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch

At the moment all spatial capabilities, including basic point based indexing
and querying, require the lucene-spatial module. The spatial module, designed
to handle all things geo, requires dependency overhead (s4j, jts) to provide
spatial rigor for even the most simplistic spatial search use-cases (e.g.,
lat/lon bounding box, point in poly, distance search). This feature trims the
overhead by adding a new GeoPointField type to core along with
GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This
field is intended as a straightforward lightweight type for the most basic
geo point use-cases without the overhead.
The field uses simple bit twiddling operations (currently morton hashing) to
encode lat/lon into a single long term. The queries leverage simple
multi-phase filtering that starts by leveraging NumericRangeQuery to reduce
candidate terms deferring the more expensive mathematics to the smaller
candidate sets.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-7275) Pluggable authorization module in Solr


[ 
https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527530#comment-14527530
 ] 

Anshum Gupta edited comment on SOLR-7275 at 5/4/15 11:13 PM:
-

Patch updated to trunk. Working on integrating the context object.


was (Author: anshumg):
Patch updated to trunk working on integrating the context.

 Pluggable authorization module in Solr
 --

 Key: SOLR-7275
 URL: https://issues.apache.org/jira/browse/SOLR-7275
 Project: Solr
  Issue Type: Sub-task
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
 SOLR-7275.patch, SOLR-7275.patch


 Solr needs an interface that makes it easy for different authorization 
 systems to be plugged into it. Here's what I plan on doing:
 Define an interface {{SolrAuthorizationPlugin}} with one single method 
 {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and 
 return an {{SolrAuthorizationResponse}} object. The object as of now would 
 only contain a single boolean value but in the future could contain more 
 information e.g. ACL for document filtering etc.
 The reason why we need a context object is so that the plugin doesn't need to 
 understand Solr's capabilities e.g. how to extract the name of the collection 
 or other information from the incoming request as there are multiple ways to 
 specify the target collection for a request. Similarly request type can be 
 specified by {{qt}} or {{/handler_name}}.
 Flow:
 Request - SolrDispatchFilter - isAuthorized(context) - Process/Return.
 {code}
 public interface SolrAuthorizationPlugin {
   public SolrAuthorizationResponse isAuthorized(SolrRequestContext context);
 }
 {code}
 {code}
 public  class SolrRequestContext {
   UserInfo; // Will contain user context from the authentication layer.
   HTTPRequest request;
   Enum OperationType; // Correlated with user roles.
   String[] CollectionsAccessed;
   String[] FieldsAccessed;
   String Resource;
 }
 {code}
 {code}
 public class SolrAuthorizationResponse {
   boolean authorized;
   public boolean isAuthorized();
 }
 {code}
 User Roles: 
 * Admin
 * Collection Level:
   * Query
   * Update
   * Admin
 Using this framework, an implementation could be written for specific 
 security systems e.g. Apache Ranger or Sentry. It would keep all the security 
 system specific code out of Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7275) Pluggable authorization module in Solr


 [ 
https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-7275:
---
Attachment: (was: SOLR-7484.patch)

 Pluggable authorization module in Solr
 --

 Key: SOLR-7275
 URL: https://issues.apache.org/jira/browse/SOLR-7275
 Project: Solr
  Issue Type: Sub-task
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
 SOLR-7275.patch


 Solr needs an interface that makes it easy for different authorization 
 systems to be plugged into it. Here's what I plan on doing:
 Define an interface {{SolrAuthorizationPlugin}} with one single method 
 {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and 
 return an {{SolrAuthorizationResponse}} object. The object as of now would 
 only contain a single boolean value but in the future could contain more 
 information e.g. ACL for document filtering etc.
 The reason why we need a context object is so that the plugin doesn't need to 
 understand Solr's capabilities e.g. how to extract the name of the collection 
 or other information from the incoming request as there are multiple ways to 
 specify the target collection for a request. Similarly request type can be 
 specified by {{qt}} or {{/handler_name}}.
 Flow:
 Request - SolrDispatchFilter - isAuthorized(context) - Process/Return.
 {code}
 public interface SolrAuthorizationPlugin {
   public SolrAuthorizationResponse isAuthorized(SolrRequestContext context);
 }
 {code}
 {code}
 public  class SolrRequestContext {
   UserInfo; // Will contain user context from the authentication layer.
   HTTPRequest request;
   Enum OperationType; // Correlated with user roles.
   String[] CollectionsAccessed;
   String[] FieldsAccessed;
   String Resource;
 }
 {code}
 {code}
 public class SolrAuthorizationResponse {
   boolean authorized;
   public boolean isAuthorized();
 }
 {code}
 User Roles: 
 * Admin
 * Collection Level:
   * Query
   * Update
   * Admin
 Using this framework, an implementation could be written for specific 
 security systems e.g. Apache Ranger or Sentry. It would keep all the security 
 system specific code out of Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7500) Remove pathPrefix from SolrDispatchFilter as Solr no longer runs as a part of a bigger webapp


 [ 
https://issues.apache.org/jira/browse/SOLR-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-7500:
---
Attachment: SOLR-7500.patch

 Remove pathPrefix from SolrDispatchFilter as Solr no longer runs as a part of 
 a bigger webapp
 -

 Key: SOLR-7500
 URL: https://issues.apache.org/jira/browse/SOLR-7500
 Project: Solr
  Issue Type: Improvement
Reporter: Anshum Gupta
Assignee: Anshum Gupta
Priority: Minor
 Attachments: SOLR-7500.patch


 SolrDispatchFilter has support for Solr running as part of a bigger webapp 
 but as we've moved away from that concept, it makes sense to clean up the 
 code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.9.0-ea-b60) - Build # 12564 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12564/
Java: 64bit/jdk1.9.0-ea-b60 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  org.apache.solr.cloud.BasicDistributedZkTest.test

Error Message:
commitWithin did not work on node: http://127.0.0.1:36402/collection1 
expected:68 but was:67

Stack Trace:
java.lang.AssertionError: commitWithin did not work on node: 
http://127.0.0.1:36402/collection1 expected:68 but was:67
at 
__randomizedtesting.SeedInfo.seed([A9AAF956597BE1F8:21FEC68CF7878C00]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.solr.cloud.BasicDistributedZkTest.test(BasicDistributedZkTest.java:344)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:502)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at

[jira] [Updated] (SOLR-7275) Pluggable authorization module in Solr


 [ 
https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-7275:
---
Attachment: SOLR-7275.patch

Patch updated to trunk working on integrating the context.

 Pluggable authorization module in Solr
 --

 Key: SOLR-7275
 URL: https://issues.apache.org/jira/browse/SOLR-7275
 Project: Solr
  Issue Type: Sub-task
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
 SOLR-7275.patch, SOLR-7275.patch


 Solr needs an interface that makes it easy for different authorization 
 systems to be plugged into it. Here's what I plan on doing:
 Define an interface {{SolrAuthorizationPlugin}} with one single method 
 {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and 
 return an {{SolrAuthorizationResponse}} object. The object as of now would 
 only contain a single boolean value but in the future could contain more 
 information e.g. ACL for document filtering etc.
 The reason why we need a context object is so that the plugin doesn't need to 
 understand Solr's capabilities e.g. how to extract the name of the collection 
 or other information from the incoming request as there are multiple ways to 
 specify the target collection for a request. Similarly request type can be 
 specified by {{qt}} or {{/handler_name}}.
 Flow:
 Request - SolrDispatchFilter - isAuthorized(context) - Process/Return.
 {code}
 public interface SolrAuthorizationPlugin {
   public SolrAuthorizationResponse isAuthorized(SolrRequestContext context);
 }
 {code}
 {code}
 public  class SolrRequestContext {
   UserInfo; // Will contain user context from the authentication layer.
   HTTPRequest request;
   Enum OperationType; // Correlated with user roles.
   String[] CollectionsAccessed;
   String[] FieldsAccessed;
   String Resource;
 }
 {code}
 {code}
 public class SolrAuthorizationResponse {
   boolean authorized;
   public boolean isAuthorized();
 }
 {code}
 User Roles: 
 * Admin
 * Collection Level:
   * Query
   * Update
   * Admin
 Using this framework, an implementation could be written for specific 
 security systems e.g. Apache Ranger or Sentry. It would keep all the security 
 system specific code out of Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7275) Pluggable authorization module in Solr


[ 
https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527592#comment-14527592
 ] 

Anshum Gupta commented on SOLR-7275:


Right now, this also lacks a mechanism to Reload / Reinit without restarting 
the node. Perhaps it'd be a good idea to have an API to do that.

 Pluggable authorization module in Solr
 --

 Key: SOLR-7275
 URL: https://issues.apache.org/jira/browse/SOLR-7275
 Project: Solr
  Issue Type: Sub-task
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
 SOLR-7275.patch, SOLR-7275.patch


 Solr needs an interface that makes it easy for different authorization 
 systems to be plugged into it. Here's what I plan on doing:
 Define an interface {{SolrAuthorizationPlugin}} with one single method 
 {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and 
 return an {{SolrAuthorizationResponse}} object. The object as of now would 
 only contain a single boolean value but in the future could contain more 
 information e.g. ACL for document filtering etc.
 The reason why we need a context object is so that the plugin doesn't need to 
 understand Solr's capabilities e.g. how to extract the name of the collection 
 or other information from the incoming request as there are multiple ways to 
 specify the target collection for a request. Similarly request type can be 
 specified by {{qt}} or {{/handler_name}}.
 Flow:
 Request - SolrDispatchFilter - isAuthorized(context) - Process/Return.
 {code}
 public interface SolrAuthorizationPlugin {
   public SolrAuthorizationResponse isAuthorized(SolrRequestContext context);
 }
 {code}
 {code}
 public  class SolrRequestContext {
   UserInfo; // Will contain user context from the authentication layer.
   HTTPRequest request;
   Enum OperationType; // Correlated with user roles.
   String[] CollectionsAccessed;
   String[] FieldsAccessed;
   String Resource;
 }
 {code}
 {code}
 public class SolrAuthorizationResponse {
   boolean authorized;
   public boolean isAuthorized();
 }
 {code}
 User Roles: 
 * Admin
 * Collection Level:
   * Query
   * Update
   * Admin
 Using this framework, an implementation could be written for specific 
 security systems e.g. Apache Ranger or Sentry. It would keep all the security 
 system specific code out of Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

[
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527410#comment-14527410
]

Nicholas Knize commented on LUCENE-6450:

That's right. The old patch was a naive scan the world approach. Really
unusable at scale. As said this one approximates the bounding box as the set
of ranges on the space filling curve. I think [~dsmiley] had also suggested
random testing, which is definitely necessary. I'll add some randomized testing
and post a new patch.

Add simple encoded GeoPointField type to core
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr website - problem with anchor links

2015-05-04 Thread Steve Rowe

Hi Shawn,

The books on the same page (h3 headings as well) had the same problem - these 
are linked from the front page, so it was really noticeable.  I fixed the 
problem for the books by directly specifying h3 class=“offset”title/h3 
(instead of markdown syntax ### Title ###) - you can see that class in 
base.css: http://lucene.apache.org/solr/assets/styles/base.css - it shifts 
content down far enough that you can see it below the floating header.

I didn’t apply the fix everywhere in that file because the markdown flavor used 
by the ASF CMS doesn’t have the ability to specify HTML tag attributes, and 
many parts of resources.mdtext are still just markdown, so I didn’t want to 
make it messy (well, messier really) by including more HTML.  But since 
markdown auto-creates anchors for all h3 headings, it makes sense to make 
them not look terrible when people link directly to them, so I’ve just 
converted all the H3 headings from:

### Heading ###

to:

h3 class=“offsetHeading/h3

Steve
 
 On May 4, 2015, at 9:59 AM, Shawn Heisey apa...@elyograg.org wrote:
 
 When I try to use a URL with an anchor link on the Solr website, it
 doesn't work right:
 
 https://lucene.apache.org/solr/resources.html#mailing-lists
 
 On both Firefox and Chrome, this URL doesn't quite go to the right spot.
 It would be the right spot if the floating header at the top of of the
 page wasn't there.  I'm guessing some CSS trickery is required to get it
 to anchor below that floating header.  I did find the following, and
 when I have time to digest it, I may be able to try and fix the problem,
 but finding that time is the hard part.
 
 http://stackoverflow.com/questions/10732690/offsetting-an-html-anchor-to-adjust-for-fixed-header
 
 If somebody knows exactly how to fix it and has the time, feel free to
 take this problem!
 
 Thanks,
 Shawn
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-5.x-Linux (32bit/jdk1.7.0_80) - Build # 12391 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/12391/
Java: 32bit/jdk1.7.0_80 -client -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 9578 lines...]
[javac] Compiling 532 source files to 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/classes/test
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] MapPosition, String mapping = new ReplicaAssigner(
[javac] ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]   ^
[javac]   required: 
ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState
[javac]   found: 
ListRule,Map,ListString,HashMap,ArrayListObject,null,null
[javac]   reason: actual argument ArrayListObject cannot be converted to 
ListString by method invocation conversion
[javac] 
/home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:175:
 error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied 
to given types;
[javac] mapping = new ReplicaAssigner(
[javac]

[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 2262 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/2262/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  org.apache.solr.cloud.MultiThreadedOCPTest.test

Error Message:
Captured an uncaught exception in thread: Thread[id=4003, 
name=parallelCoreAdminExecutor-1947-thread-8, state=RUNNABLE, 
group=TGRP-MultiThreadedOCPTest]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=4003, 
name=parallelCoreAdminExecutor-1947-thread-8, state=RUNNABLE, 
group=TGRP-MultiThreadedOCPTest]
at 
__randomizedtesting.SeedInfo.seed([83BF6554D449E287:BEB5A8E7AB58F7F]:0)
Caused by: java.lang.AssertionError: Too many closes on SolrCore
at __randomizedtesting.SeedInfo.seed([83BF6554D449E287]:0)
at org.apache.solr.core.SolrCore.close(SolrCore.java:1138)
at org.apache.solr.common.util.IOUtils.closeQuietly(IOUtils.java:31)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:535)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:494)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:628)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:213)
at 
org.apache.solr.handler.admin.CoreAdminHandler$ParallelCoreAdminHandlerThread.run(CoreAdminHandler.java:1249)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)




Build Log:
[...truncated 9522 lines...]
   [junit4] Suite: org.apache.solr.cloud.MultiThreadedOCPTest
   [junit4]   2 Creating dataDir: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/J0/temp/solr.cloud.MultiThreadedOCPTest
 83BF6554D449E287-001/init-core-data-001
   [junit4]   2 540661 T3688 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl 
(false) and clientAuth (false)
   [junit4]   2 540661 T3688 oas.BaseDistributedSearchTestCase.initHostContext 
Setting hostContext system property: /hfdh/s
   [junit4]   2 540663 T3688 oasc.ZkTestServer.run STARTING ZK TEST SERVER
   [junit4]   2 540663 T3689 oasc.ZkTestServer$2$1.setClientPort client 
port:0.0.0.0/0.0.0.0:0
   [junit4]   2 540663 T3689 oasc.ZkTestServer$ZKServerMain.runFromConfig 
Starting server
   [junit4]   2 540765 T3688 oasc.ZkTestServer.run start zk server on 
port:55932
   [junit4]   2 540766 T3688 
oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default 
ZkCredentialsProvider
   [junit4]   2 540770 T3688 oascc.ConnectionManager.waitForConnected Waiting 
for client to connect to ZooKeeper
   [junit4]   2 540782 T3696 oascc.ConnectionManager.process Watcher 
org.apache.solr.common.cloud.ConnectionManager@72ae8ad5 
name:ZooKeeperConnection Watcher:127.0.0.1:55932 got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None
   [junit4]   2 540782 T3688 oascc.ConnectionManager.waitForConnected Client 
is connected to ZooKeeper
   [junit4]   2 540783 T3688 oascc.SolrZkClient.createZkACLProvider Using 
default ZkACLProvider
   [junit4]   2 540783 T3688 oascc.SolrZkClient.makePath makePath: /solr
   [junit4]   2 540793 T3688 
oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default 
ZkCredentialsProvider
   [junit4]   2 540796 T3688 oascc.ConnectionManager.waitForConnected Waiting 
for client to connect to ZooKeeper
   [junit4]   2 540799 T3699 oascc.ConnectionManager.process Watcher 
org.apache.solr.common.cloud.ConnectionManager@52b95ac3 
name:ZooKeeperConnection Watcher:127.0.0.1:55932/solr got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None
   [junit4]   2 540799 T3688 oascc.ConnectionManager.waitForConnected Client 
is connected to ZooKeeper
   [junit4]   2 540800 T3688 oascc.SolrZkClient.createZkACLProvider Using 
default ZkACLProvider
   [junit4]   2 540800 T3688 oascc.SolrZkClient.makePath makePath: 
/collections/collection1
   [junit4]   2 540805 T3688 oascc.SolrZkClient.makePath makePath: 
/collections/collection1/shards
   [junit4]   2 540810 T3688 oascc.SolrZkClient.makePath makePath: 
/collections/control_collection
   [junit4]   2 540815 T3688 oascc.SolrZkClient.makePath makePath: 
/collections/control_collection/shards
   [junit4]   2 540820 T3688 oasc.AbstractZkTestCase.putConfig put 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files/solr/collection1/conf/solrconfig-tlog.xml
 to /configs/conf1/solrconfig.xml
   [junit4]   2 540820 T3688 oascc.SolrZkClient.makePath makePath: 
/configs/conf1/solrconfig.xml
   [junit4]   2 540828 T3688 oasc.AbstractZkTestCase.putConfig put

[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527489#comment-14527489
 ] 

Uwe Schindler commented on LUCENE-6450:
---

Hi,

I will look into this tomorrow (it is too late) now... This looks like it has a 
completely separate TermsEnum and query impl. Why not extend MultiTermQuery 
directly and let NRQ live on its own?

Uwe

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread Uwe Schindler (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527496#comment-14527496
]

Uwe Schindler edited comment on LUCENE-6450 at 5/4/15 10:42 PM:

bq. It's impressive how fast this is, without using any NumericField prefix
terms. I think we can explore that later and we should commit this approach now
..

We should first compare how this behaves on *large* bboxes, so a random test /
perf test spanning large parts of world and large indexes with maaany
points would be good (whole atlantic, whole africa,...). It is also mentioned
that it does not allow to cross date line, which is easy to do by splitting
into 2 queries, one left of date line, one right. I can help with that. Then we
should also test perf with queries spanning whole pacific :-)

was (Author: thetaphi):
bq. It's impressive how fast this is, without using any NumericField
prefix terms. I think we can explore that later and we should commit
this approach now ..

Add simple encoded GeoPointField type to core
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread Uwe Schindler (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527496#comment-14527496
]

Uwe Schindler commented on LUCENE-6450:
---

bq. It's impressive how fast this is, without using any NumericField
prefix terms. I think we can explore that later and we should commit
this approach now ..

Add simple encoded GeoPointField type to core
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.8.0_45) - Build # 4767 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4767/
Java: 64bit/jdk1.8.0_45 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

3 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores

Error Message:
ERROR: SolrIndexSearcher opens=51 closes=50

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=51 closes=50
at __randomizedtesting.SeedInfo.seed([366927323FB5382C]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:496)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:232)
at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:799)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at java.lang.Thread.run(Thread.java:745)


FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores

Error Message:
1 thread leaked from SUITE scope at org.apache.solr.core.TestLazyCores: 1) 
Thread[id=9505, name=searcherExecutor-4428-thread-1, state=WAITING, 
group=TGRP-TestLazyCores] at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.core.TestLazyCores: 
   1) Thread[id=9505, name=searcherExecutor-4428-thread-1, state=WAITING, 
group=TGRP-TestLazyCores]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at __randomizedtesting.SeedInfo.seed([366927323FB5382C]:0)


FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores

Error Message:
There are still zombie threads that couldn't be terminated:1) 
Thread[id=9505,

[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core


[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527300#comment-14527300
 ] 

David Smiley commented on LUCENE-6450:
--

Nice code Nick!  LGTM.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core


[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527318#comment-14527318
 ] 

David Smiley commented on LUCENE-6450:
--

Just curious; how did that Python RTree benchmark compare? 
https://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/src/python/SearchOSM.py?spec=svn188e330ea8c34a9720cbf0414d2ed19f6a843a3dr=188e330ea8c34a9720cbf0414d2ed19f6a843a3d#1

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527103#comment-14527103
 ] 

Nicholas Knize edited comment on LUCENE-6450 at 5/4/15 7:37 PM:


Was out last week but had some time this weekend to add TermsEnum logic to 
visit only those ranges along the SFC that represent the bounding box.

Updated patch attached - this code currently exists in sandbox.  Benchmarks 
(using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are 
below:

Data Set:  60M points of Planet OSM GPS data 
(http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png)

*QuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time: 2449.08 sec
Index Size: 13G
Mean Query Time: 0.066 sec

*PackedQuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time:  1945.288 sec
Index Size: 11G
Mean Query Time:  0.058 sec

*GeoPointField*

Index Time:  180.872 sec
Index Size: 1.8G
Mean Query Time:  0.107 sec




was (Author: nknize):
Was out last week but had some time this weekend to add TermsEnum logic to 
visit only those ranges along the SFC that represent the bounding box.

Updated patch attached - this code currently exists in sandbox.  Benchmarks 
(using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are 
below:

Data Set:  60M points of Planet OSM GPS data

*QuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time: 2449.08 sec
Index Size: 13G
Mean Query Time: 0.066 sec

*PackedQuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time:  1945.288 sec
Index Size: 11G
Mean Query Time:  0.058 sec

*GeoPointField*

Index Time:  180.872 sec
Index Size: 1.8G
Mean Query Time:  0.107 sec



 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7484) Refactor SolrDispatchFilter.doFilter(...) method


[ 
https://issues.apache.org/jira/browse/SOLR-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527132#comment-14527132
 ] 

ASF subversion and git services commented on SOLR-7484:
---

Commit 1677660 from [~anshumg] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1677660 ]

SOLR-7484: Refactor SolrDispatchFilter to extract all Solr specific 
implementation detail to HttpSolrCall and also extract methods from within the 
current SDF.doFilter(..) logic making things easier to manage. HttpSolrCall 
converts the processing to a 3-step process i.e. Construct, Init, and Call so 
the context of the request would be available after Init and before the actual 
call operation.(merge from trunk)

 Refactor SolrDispatchFilter.doFilter(...) method
 

 Key: SOLR-7484
 URL: https://issues.apache.org/jira/browse/SOLR-7484
 Project: Solr
  Issue Type: Improvement
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Attachments: SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, 
 SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, 
 SOLR-7484.patch


 Currently almost everything that's done in SDF.doFilter() is sequential. We 
 should refactor it to clean up the code and make things easier to manage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core


[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527103#comment-14527103
 ] 

Nicholas Knize edited comment on LUCENE-6450 at 5/4/15 7:55 PM:


Was out last week but had some time this weekend to add TermsEnum logic to 
visit only those ranges along the SFC that represent the bounding box.

Updated patch attached - this code currently exists in sandbox.  Benchmarks 
(using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are 
below:

Data Set:  60M points of Planet OSM GPS data 
(http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png)

*QuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time: 2449.08 sec
Index Size: 13G
Mean Query Time: 0.066 sec

*PackedQuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time:  1945.288 sec
Index Size: 11G
Mean Query Time:  0.058 sec

*GeoHashPrefixTree*

Index Time:  695.079 sec
Index Size: 4.2G
Mean Query Time:  0.071 sec

*GeoPointField*

Index Time:  180.872 sec
Index Size: 1.8G
Mean Query Time:  0.107 sec




was (Author: nknize):
Was out last week but had some time this weekend to add TermsEnum logic to 
visit only those ranges along the SFC that represent the bounding box.

Updated patch attached - this code currently exists in sandbox.  Benchmarks 
(using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are 
below:

Data Set:  60M points of Planet OSM GPS data 
(http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png)

*QuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time: 2449.08 sec
Index Size: 13G
Mean Query Time: 0.066 sec

*PackedQuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time:  1945.288 sec
Index Size: 11G
Mean Query Time:  0.058 sec

*GeoPointField*

Index Time:  180.872 sec
Index Size: 1.8G
Mean Query Time:  0.107 sec



 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6196) Include geo3d package, along with Lucene integration to make it useful


[ 
https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527230#comment-14527230
 ] 

David Smiley commented on LUCENE-6196:
--

I think the Geo3d branch, technically {{lucene6196}}, is now ready to merge 
into trunk, and then the 5x branch.  I could generate a patch, but unless there 
are process reasons (e.g. I have to?) or technical reasons I am unaware of, 
I'll simply merge in the branch.  The CHANGES.txt entry I plan to add is as 
follows:

{noformat}
* LUCENE-6196: New Spatial Geo3d API with partial Spatial4j integration.
  It is a set of shapes implemented using 3D planar geometry for calculating
  spatial relations on the surface of a sphere. Shapes include Point, BBox,
  Circle, Path (buffered line string), and Polygon.
  (Karl Wright via David Smiley)
{noformat}
Karl, if you suggest any changes then just let me know.  If I don't get another 
+1 then I'll commit in two days.

 Include geo3d package, along with Lucene integration to make it useful
 --

 Key: LUCENE-6196
 URL: https://issues.apache.org/jira/browse/LUCENE-6196
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: Karl Wright
Assignee: David Smiley
 Attachments: LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, 
 LUCENE-6196_Geo3d.patch, ShapeImpl.java, geo3d-tests.zip, geo3d.zip


 I would like to explore contributing a geo3d package to Lucene.  This can be 
 used in conjunction with Lucene search, both for generating geohashes (via 
 spatial4j) for complex geographic shapes, as well as limiting results 
 resulting from those queries to those results within the exact shape in 
 highly performant ways.
 The package uses 3d planar geometry to do its magic, which basically limits 
 computation necessary to determine membership (once a shape has been 
 initialized, of course) to only multiplications and additions, which makes it 
 feasible to construct a performant BoostSource-based filter for geographic 
 shapes.  The math is somewhat more involved when generating geohashes, but is 
 still more than fast enough to do a good job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7484) Refactor SolrDispatchFilter to move all Solr specific implementation to another class


 [ 
https://issues.apache.org/jira/browse/SOLR-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-7484:
---
Summary: Refactor SolrDispatchFilter to move all Solr specific 
implementation to another class  (was: Refactor SolrDispatchFilter)

 Refactor SolrDispatchFilter to move all Solr specific implementation to 
 another class
 -

 Key: SOLR-7484
 URL: https://issues.apache.org/jira/browse/SOLR-7484
 Project: Solr
  Issue Type: Improvement
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Fix For: 5.2

 Attachments: SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, 
 SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, 
 SOLR-7484.patch


 Currently almost everything that's done in SDF.doFilter() is sequential. We 
 should refactor it to clean up the code and make things easier to manage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7484) Refactor SolrDispatchFilter


 [ 
https://issues.apache.org/jira/browse/SOLR-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-7484:
---
Summary: Refactor SolrDispatchFilter  (was: Refactor 
SolrDispatchFilter.doFilter(...) method)

 Refactor SolrDispatchFilter
 ---

 Key: SOLR-7484
 URL: https://issues.apache.org/jira/browse/SOLR-7484
 Project: Solr
  Issue Type: Improvement
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Fix For: 5.2

 Attachments: SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, 
 SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, 
 SOLR-7484.patch


 Currently almost everything that's done in SDF.doFilter() is sequential. We 
 should refactor it to clean up the code and make things easier to manage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core


[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527103#comment-14527103
 ] 

Nicholas Knize edited comment on LUCENE-6450 at 5/4/15 8:00 PM:


Was out last week but had some time this weekend to add TermsEnum logic to 
visit only those ranges along the SFC that represent the bounding box.

Updated patch attached - this code currently exists in sandbox.  Benchmarks 
(using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are 
below:

Data Set:  60M points of Planet OSM GPS data 
(http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png)

*QuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time: 2449.08 sec
Index Size: 13G
Mean Query Time: 0.066 sec

*PackedQuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time:  1945.288 sec
Index Size: 11G
Mean Query Time:  0.058 sec

*GeoHashPrefixTree*

Index Time:  695.079 sec
Index Size: 4.2G
Mean Query Time:  0.071 sec

*GeoPointField*

Index Time:  180.872 sec
Index Size: 1.8G
Mean Query Time:  0.107 sec

Hardware: 8 core System76 Ubuntu 14.10 laptop w/ 16GB memory



was (Author: nknize):
Was out last week but had some time this weekend to add TermsEnum logic to 
visit only those ranges along the SFC that represent the bounding box.

Updated patch attached - this code currently exists in sandbox.  Benchmarks 
(using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are 
below:

Data Set:  60M points of Planet OSM GPS data 
(http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png)

*QuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time: 2449.08 sec
Index Size: 13G
Mean Query Time: 0.066 sec

*PackedQuadPrefixTree* 
Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29

Index Time:  1945.288 sec
Index Size: 11G
Mean Query Time:  0.058 sec

*GeoHashPrefixTree*

Index Time:  695.079 sec
Index Size: 4.2G
Mean Query Time:  0.071 sec

*GeoPointField*

Index Time:  180.872 sec
Index Size: 1.8G
Mean Query Time:  0.107 sec



 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion

2015-05-04 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527212#comment-14527212
 ] 

Mark Miller commented on SOLR-7121:
---

bq. Without the regular expression, one would need separate configuration files 
for separate collections which is somewhat of a pain to manage.

Couldn't you make the same argument for all of the config in solrconfig.xml?

It seems that all SolrCores in the same collection will want the same config, 
and you usually would want to use different config for other collections if you 
want any of it to vary.

 Solr nodes should go down based on configurable thresholds and not rely on 
 resource exhaustion
 --

 Key: SOLR-7121
 URL: https://issues.apache.org/jira/browse/SOLR-7121
 Project: Solr
  Issue Type: New Feature
Reporter: Sachin Goyal
Assignee: Mark Miller
 Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, 
 SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch


 Currently, there is no way to control when a Solr node goes down.
 If the server is having high GC pauses or too many threads or is just getting 
 too many queries due to some bad load-balancer, the cores in the machine keep 
 on serving unless they exhaust the machine's resources and everything comes 
 to a stall.
 Such a slow-dying core can affect other cores as well by taking huge time to 
 serve their distributed queries.
 There should be a way to specify some threshold values beyond which the 
 targeted core can its ill-health and proactively go down to recover.
 When the load improves, the core should come up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527338#comment-14527338
 ] 

Michael McCandless commented on LUCENE-6450:


Here's the OSM subset I'm using for the benchmarks:
http://people.apache.org/~mikemccand/latlon.subsetPlusAllLondon.txt.lzma

It's a random 1/50th of the latest OSM export (as of last week), but
includes all points within London, UK.

The search benchmark then runs a fixed set (225 total) of axis-aligned
rectangle intersects queries around London.

Look for Index/SearchOSM/GeoPoint.java/py in luceneutil...

I ran the same benchmarks (except for Packed/QuadPrefixTree):

*Geopoint*

  Index time: 157.3 sec (incl. forceMerge)
  Index size: 1.8 GB
  Mean query time: .077 sec
  221,119,062 total hits

*GeoHashPrefixTree*

  Index time: 628.5 sec (incl. forceMerge)
  Index size: 4.2 GB
  Mean query time: .039 sec
  221,120,027 total hits

*libspatialindex* (using Python Rtree wrapper)

  Index time: 469.6 sec
  Index size: 2.6 GB
  Mean query time: .158 sec
  221,118,844 total hits

The first geopoint patch here got exactly the same total hit count as
libspatialindex, but now it's different, I think because of the
precision control to control how deep the ranges recurse.  I think
it's also expected geohash won't get the same hit count since it's
doing a bit of quantizing (level 11 ... not sure what that equates to
in meters).

I'm surprised the Rtree impl is so slow ...


 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7484) Refactor SolrDispatchFilter.doFilter(...) method


 [ 
https://issues.apache.org/jira/browse/SOLR-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-7484:
---
Fix Version/s: 5.2

 Refactor SolrDispatchFilter.doFilter(...) method
 

 Key: SOLR-7484
 URL: https://issues.apache.org/jira/browse/SOLR-7484
 Project: Solr
  Issue Type: Improvement
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Fix For: 5.2

 Attachments: SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, 
 SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, 
 SOLR-7484.patch


 Currently almost everything that's done in SDF.doFilter() is sequential. We 
 should refactor it to clean up the code and make things easier to manage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

[
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527148#comment-14527148
]

David Smiley commented on LUCENE-6450:
--

Can you please direct me to the luceneutil and geo benchmark? I'm curious
what that's about.

The numbers look nice. Small indexes and fast index time :-) It'd be
interesting to try GeoHashPrefixTree, which will have smaller indexes than
Quad. I'll check out your code shortly.

Add simple encoded GeoPointField type to core
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core