[jira] [Commented] (LUCENE-6045) Refator classifier APIs to work better with multi threading
[ https://issues.apache.org/jira/browse/LUCENE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526489#comment-14526489 ] ASF subversion and git services commented on LUCENE-6045: - Commit 1677573 from [~teofili] in branch 'dev/trunk' [ https://svn.apache.org/r1677573 ] LUCENE-6045 - immutable ClassificationResult, minor fixes Refator classifier APIs to work better with multi threading --- Key: LUCENE-6045 URL: https://issues.apache.org/jira/browse/LUCENE-6045 Project: Lucene - Core Issue Type: Improvement Components: modules/classification Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: Trunk In https://issues.apache.org/jira/browse/LUCENE-4345?focusedCommentId=13454729page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13454729 [~simonw] pointed out that the current Classifier API doesn't work well in multi threading environments: bq. The interface you defined has some problems with respect to Multi-Threading IMO. The interface itself suggests that this class is stateful and you have to call methods in a certain order and at the same you need to make sure that it is not published for read access before training is done. I think it would be wise to pass in all needed objects as constructor arguments and make the references final so it can be shared across threads and add an interface that represents the trained model computed offline? In this case it doesn't really matter but in the future it might make sense. We can also skip the model interface entirely and remove the training method until we have some impls that really need to be trained. I missed that at that point but I think for 6.0 (?) it would be wise to rearrange the API to address that properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7435) NPE in FieldCollapsingQParser
[ https://issues.apache.org/jira/browse/SOLR-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526501#comment-14526501 ] Markus Jelsma commented on SOLR-7435: - Hi [~joel.bernstein], can you try the following unit test? {code} @Test public void testSOLR7435() throws Exception { for (int i = 0; i 15000; i++) { String[] doc = {id, String.valueOf(i) , a_i, String.valueOf(random().nextInt(1)), b_i, String.valueOf(random().nextInt(1))}; assertU(adoc(doc)); } assertU(commit()); ModifiableSolrParams params = new ModifiableSolrParams(); params.add(q, *:*); params.add(fq, {!collapse field=a_i}); params.add(fq, {!collapse field=b_i}); assertQ(req(params, indent, on), *[count(//doc)=0]); } {code} It fails on my machine using: ant test -Dtestcase=TestCollapseQParserPlugin -Dtests.method=testSOLR7435 -Dtests.seed=2B7D48BE88DE05E7 -Dtests.slow=true -Dtests.locale=en_ZA -Dtests.timezone=America/Araguaina -Dtests.asserts=true -Dtests.file.encoding=US-ASCII NPE in FieldCollapsingQParser - Key: SOLR-7435 URL: https://issues.apache.org/jira/browse/SOLR-7435 Project: Solr Issue Type: Bug Affects Versions: 5.1 Reporter: Markus Jelsma Priority: Minor Fix For: 5.2 Not even sure it would work anyway, i tried to collapse on two distinct fields, ending up with this: select?q=*:*fq={!collapse field=qst}fq={!collapse field=rdst} {code} 584550 [qtp1121454968-20] ERROR org.apache.solr.servlet.SolrDispatchFilter [ suggests] – null:java.lang.NullPointerException at org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743) at org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:780) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:203) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1660) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1479) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:556) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at
[jira] [Updated] (SOLR-7436) Solr stops printing stacktraces in log and output
[ https://issues.apache.org/jira/browse/SOLR-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated SOLR-7436: Attachment: solr-8983-console.log Solr stops printing stacktraces in log and output - Key: SOLR-7436 URL: https://issues.apache.org/jira/browse/SOLR-7436 Project: Solr Issue Type: Bug Affects Versions: 5.1 Environment: Local 5.1 Reporter: Markus Jelsma Attachments: solr-8983-console.log After a short while, Solr suddenly stops printing stacktraces in the log and output. {code} 251043 [qtp1121454968-17] INFO org.apache.solr.core.SolrCore.Request [ suggests] - [suggests] webapp=/solr path=/select params={q=*:*fq={!collapse+field%3Dquery_digest}fq={!collapse+field%3Dresult_digest}} status=500 QTime=3 251043 [qtp1121454968-17] ERROR org.apache.solr.servlet.SolrDispatchFilter [ suggests] - null:java.lang.NullPointerException at org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743) at org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:780) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:203) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1660) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1479) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:556) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) 251184 [qtp1121454968-17] ERROR org.apache.solr.core.SolrCore [ suggests] - java.lang.NullPointerException at org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743)
[jira] [Commented] (SOLR-7436) Solr stops printing stacktraces in log and output
[ https://issues.apache.org/jira/browse/SOLR-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526504#comment-14526504 ] Markus Jelsma commented on SOLR-7436: - Hello, this is a local Solr 5.1 running on: java version 1.7.0_79 OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.10.2) OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode) See attached log. I fire a query that produces a NPE (see SOLR-7435). I repeat it a couple of times and then the stack trace is gone. Solr stops printing stacktraces in log and output - Key: SOLR-7436 URL: https://issues.apache.org/jira/browse/SOLR-7436 Project: Solr Issue Type: Bug Affects Versions: 5.1 Environment: Local 5.1 Reporter: Markus Jelsma Attachments: solr-8983-console.log After a short while, Solr suddenly stops printing stacktraces in the log and output. {code} 251043 [qtp1121454968-17] INFO org.apache.solr.core.SolrCore.Request [ suggests] - [suggests] webapp=/solr path=/select params={q=*:*fq={!collapse+field%3Dquery_digest}fq={!collapse+field%3Dresult_digest}} status=500 QTime=3 251043 [qtp1121454968-17] ERROR org.apache.solr.servlet.SolrDispatchFilter [ suggests] - null:java.lang.NullPointerException at org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743) at org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:780) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:203) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1660) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1479) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:556) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at
Re: Running 5.1.0 test-suite via maven
[junit4] ERROR 3.81s J2 | TestDirectoryTaxonomyWriter.testConcurrency [junit4] Throwable #1: java.lang.NoSuchMethodError: java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView; Sorry about the delay. This indicates your code was compiled with JDK1.8 but is executed with Java 1.8. This method's signature used to be an interface, but is a covariant pointing at a specialized subclass in 1.8. You need to compile the code with the version of Java you intend to run with. Things will in general work if you compile with an older version and try to run with a newer version but not the other way around. You can cross-compile with javac from a newer version of the JDK to an older version but you'd have to specify bootclasspath to the older version anyway (bytecode/source flag in javac is not enough) so there's really no sensible reason to do it in the first place. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6196) Include geo3d package, along with Lucene integration to make it useful
[ https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526613#comment-14526613 ] ASF subversion and git services commented on LUCENE-6196: - Commit 1677595 from [~dsmiley] in branch 'dev/branches/lucene6196' [ https://svn.apache.org/r1677595 ] LUCENE-6196: Reformat code. Removed System.err legacy comments in test. Fixed test compile warning. Include geo3d package, along with Lucene integration to make it useful -- Key: LUCENE-6196 URL: https://issues.apache.org/jira/browse/LUCENE-6196 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: Karl Wright Assignee: David Smiley Attachments: LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, LUCENE-6196_Geo3d.patch, ShapeImpl.java, geo3d-tests.zip, geo3d.zip I would like to explore contributing a geo3d package to Lucene. This can be used in conjunction with Lucene search, both for generating geohashes (via spatial4j) for complex geographic shapes, as well as limiting results resulting from those queries to those results within the exact shape in highly performant ways. The package uses 3d planar geometry to do its magic, which basically limits computation necessary to determine membership (once a shape has been initialized, of course) to only multiplications and additions, which makes it feasible to construct a performant BoostSource-based filter for geographic shapes. The math is somewhat more involved when generating geohashes, but is still more than fast enough to do a good job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Where Search Meets Machine Learning
Awesome, I think I could learn a lot from you. Do you have a decent amount of user data? Sounds like you have a ton. I noticed that information retrieval problems fall into a sort-of layered pyramid. At the topmopst point is someone like Google where the sheer amount of high quality user behavior data that search truly is a machine learning problem, much as you propose. As you move down the pyramid the quality of user data diminishes. Eventually you get to a very thick layer of middle-class search applications that value relevance, but have very modest amounts or no user data. For most of them, even if they tracked their searches over a year, they *might* get good data over their top 50 searches. (I know cause they send me the spreadsheet and say fix it!). The best they can use analytics data is after-action troubleshooting. Actual user emails complaining about the search can be more useful than behavior data! So at this layer, the goal is to construct inverted indices that reflect features likely to be important to users. In a sense this becomes more of a programming task than a large-scale optimization task. You have content experts that tell you either precisely or vaguely what the search solution ought to do (presumably they represent users). If you're lucky, this will be informed by some ad-hoc usability testing. So you end up doing a mix of data modeling and using queries intelligently. And perhaps some specific kinds of programming to develop specific scoring functions, etc. http://opensourceconnections.com/blog/2014/12/08/title-search-when-relevancy-is-only-skin-deep/ One advantage to this approach is for many search applications, you might be able to explain how the ranking function function works in terms of a set of specific rules. This also might allow points where domain experts can tweak an overall ranking strategy. It becomes somewhat predictable to them and controllable. Anyway, I'm forever curious about the boundary line between this sort of work, and search is truly a machine learning problem work. I have seen a fair amount of gray area where user data might be decent or possibly misleading, and you have to sort out and do a lot of data janitor work to figure it out. Good stuff! -Doug On Fri, May 1, 2015 at 6:16 PM, J. Delgado joaquin.delg...@gmail.com wrote: Doug, Thanks for your insights. We actually started with trying to build off of features and boosting weights combined with built-in relevance scoring http://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html. We also played around with replacing and/or combining the default score with other computations using function_score http://www.elastic.co/guide/en/elasticsearch/guide/current/function-score-query.html query, with but as you mentioned in your article, the crux of the problem is *how to figure out the weights that control each features influence*: *Once important features are placed in the search engine the final problem becomes balancing and regulating their influence. Should text-based factors matter more than sales based factors? Should exact text matches matter more than synonym-based matches? What about metadata we glean from machine learning – how much weight should this play*? Furthermore, this only covers cases where the scoring can be represented as a function of such weights! We felt that this approach was short sighted as some of the problems we are dealing with (e.g. product recommendations, response prediction, real-time bidding for advertising, etc) have a very large feature space, sometimes requiring *dimensionality reduction* (e.g. Matrix Factorization techniques) or learning from past actions/feedback (e.g. clickthrough data, bidding win rates, remaining budget, etc.). All this seemed well suited for for Machine (supervised) Learning tasks such as prediction based on past training data (classification or regression). These algorithms usually have an offline model building phase and an online evaluator phase that uses the created model to perform the prediction/scoring during query evaluation. Additionally, some of the best algorithms in machine learning (Random Forest, Support Vector Machines, Deep Learning/Neural Networks, etc.) are not linear combinations of feature-weights requiring additional data structure (e.g. trees, support vectors) to support the computation. Since there is no one-size-fits all predictive algorithm we architected the solution so any algorithm that implements our interface can be used. We tried this out with algorithms available in Weka http://www.cs.waikato.ac.nz/ml/weka/ and Spark MLib https://spark.apache.org/docs/1.2.1/mllib-guide.html (only linear models for now) and it worked! In any case, nothing prevents us from leverage the text based analysis of features and the default scoring available within the plugin, which can be combined with the results of the prediction. To demonstrate its general utility
Solr website - problem with anchor links
When I try to use a URL with an anchor link on the Solr website, it doesn't work right: https://lucene.apache.org/solr/resources.html#mailing-lists On both Firefox and Chrome, this URL doesn't quite go to the right spot. It would be the right spot if the floating header at the top of of the page wasn't there. I'm guessing some CSS trickery is required to get it to anchor below that floating header. I did find the following, and when I have time to digest it, I may be able to try and fix the problem, but finding that time is the hard part. http://stackoverflow.com/questions/10732690/offsetting-an-html-anchor-to-adjust-for-fixed-header If somebody knows exactly how to fix it and has the time, feel free to take this problem! Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526684#comment-14526684 ] ASF subversion and git services commented on SOLR-6220: --- Commit 1677607 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1677607 ] SOLR-6220: Rule Based Replica Assignment during collection creation Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of EACH shard
[jira] [Comment Edited] (SOLR-7435) NPE in FieldCollapsingQParser
[ https://issues.apache.org/jira/browse/SOLR-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526501#comment-14526501 ] Markus Jelsma edited comment on SOLR-7435 at 5/4/15 2:58 PM: - Hi [~joel.bernstein], can you try the following unit test? {code} @Test public void testSOLR7435() throws Exception { for (int i = 0; i 15000; i++) { String[] doc = {id, String.valueOf(i) , a_i, String.valueOf(random().nextInt(1)), b_i, String.valueOf(random().nextInt(1))}; assertU(adoc(doc)); } assertU(commit()); ModifiableSolrParams params = new ModifiableSolrParams(); params.add(q, *:*); params.add(fq, {!collapse field=a_i}); params.add(fq, {!collapse field=b_i}); assertQ(req(params, indent, on), *[count(//doc)=0]); } {code} It fails on my machine using: ant test -Dtestcase=TestCollapseQParserPlugin -Dtests.method=testSOLR7435 -Dtests.seed=2B7D48BE88DE05E7 -Dtests.slow=true -Dtests.locale=en_ZA -Dtests.timezone=America/Araguaina -Dtests.asserts=true -Dtests.file.encoding=US-ASCII edit: hmm, it sometimes failes. was (Author: markus17): Hi [~joel.bernstein], can you try the following unit test? {code} @Test public void testSOLR7435() throws Exception { for (int i = 0; i 15000; i++) { String[] doc = {id, String.valueOf(i) , a_i, String.valueOf(random().nextInt(1)), b_i, String.valueOf(random().nextInt(1))}; assertU(adoc(doc)); } assertU(commit()); ModifiableSolrParams params = new ModifiableSolrParams(); params.add(q, *:*); params.add(fq, {!collapse field=a_i}); params.add(fq, {!collapse field=b_i}); assertQ(req(params, indent, on), *[count(//doc)=0]); } {code} It fails on my machine using: ant test -Dtestcase=TestCollapseQParserPlugin -Dtests.method=testSOLR7435 -Dtests.seed=2B7D48BE88DE05E7 -Dtests.slow=true -Dtests.locale=en_ZA -Dtests.timezone=America/Araguaina -Dtests.asserts=true -Dtests.file.encoding=US-ASCII NPE in FieldCollapsingQParser - Key: SOLR-7435 URL: https://issues.apache.org/jira/browse/SOLR-7435 Project: Solr Issue Type: Bug Affects Versions: 5.1 Reporter: Markus Jelsma Priority: Minor Fix For: 5.2 Not even sure it would work anyway, i tried to collapse on two distinct fields, ending up with this: select?q=*:*fq={!collapse field=qst}fq={!collapse field=rdst} {code} 584550 [qtp1121454968-20] ERROR org.apache.solr.servlet.SolrDispatchFilter [ suggests] – null:java.lang.NullPointerException at org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743) at org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:780) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:203) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1660) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1479) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:556) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at
[jira] [Updated] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-7275: --- Attachment: SOLR-7484.patch Updated patch. This doesn't incorporate the context bit as that depends on committing of SOLR-7484 which I plan to commit in a bit. It changes the public SolrAuthorizationResponse and also how the statusCode set for the SolrAuthorizationResponse impacts the processing in SDF. Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7484.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526734#comment-14526734 ] ASF subversion and git services commented on SOLR-6220: --- Commit 1677614 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1677614 ] SOLR-6220: setting eol style Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of EACH shard node:*,shard:*,replica:1- #in each
[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.7.0) - Build # 2215 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2215/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 1 tests failed. FAILED: org.apache.solr.TestDistributedSearch.test Error Message: Error from server at http://127.0.0.1:61465/aq_vcz/jo/collection1: java.lang.NullPointerException at org.apache.solr.search.grouping.distributed.responseprocessor.TopGroupsShardResponseProcessor.process(TopGroupsShardResponseProcessor.java:102) at org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:744) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:727) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:388) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2047) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:841) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:453) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:223) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:105) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:61465/aq_vcz/jo/collection1: java.lang.NullPointerException at org.apache.solr.search.grouping.distributed.responseprocessor.TopGroupsShardResponseProcessor.process(TopGroupsShardResponseProcessor.java:102) at org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:744) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:727) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:388) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2047) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:841) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:453) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:223) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:105) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at
[jira] [Updated] (SOLR-6878) solr.ManagedSynonymFilterFactory all-to-all synonym switch (aka. expand)
[ https://issues.apache.org/jira/browse/SOLR-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter updated SOLR-6878: - Attachment: SOLR-6878.patch Here is an updated patch that implements the idea Hossman laid out in his comment. Basically, if the client sends in a list instead of a map, the expand=true logic is applied as the time of update, i.e. this is syntactic sugar for building up the mappings from a list of symmetric synonyms. There's no need to support a list for expand=false because that is simply a mapping of all the terms to the last term in the list, which is already supported by the API. Thus, expand=true is implied when the update request contains a list and not a map. solr.ManagedSynonymFilterFactory all-to-all synonym switch (aka. expand) Key: SOLR-6878 URL: https://issues.apache.org/jira/browse/SOLR-6878 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.10.2 Reporter: Tomasz Sulkowski Assignee: Timothy Potter Labels: ManagedSynonymFilterFactory, REST, SOLR Attachments: SOLR-6878.patch, SOLR-6878.patch Hi, After switching from SynonymFilterFactory to ManagedSynonymFilterFactory I have found out that there is no way to set an all-to-all synonyms relation. Basically (judgind from google search) there is a need for expand functionality switch (known from SynonymFilterFactory) which will treat all synonyms with its keyword as equal. For example: if we define a car:[wagen,ride] relation it would translate a query that includes one of the synonyms or keyword to car or wagen or ride independently of which word was used from those three. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526770#comment-14526770 ] Tomás Fernández Löbbe commented on SOLR-6220: - Ideally, most warnings should be fixed :) , but at least the one in {{SnitchContext}}: {code:java} public SimpleSolrResponse invoke(UpdateShardHandler shardHandler, final String url, String path, SolrParams params) throws IOException, SolrServerException { GenericSolrRequest request = new GenericSolrRequest(SolrRequest.METHOD.GET, path, params); NamedListObject rsp = new HttpSolrClient(url, shardHandler.getHttpClient(), new BinaryResponseParser()).request(request); request.response.nl = rsp; return request.response; } {code} Resource leak: 'unassigned Closeable value' is never closed Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526772#comment-14526772 ] Anshum Gupta commented on SOLR-6220: This seems to have broken {{ant precommit}}. {code} [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:63) [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:108) [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:185) {code} Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526783#comment-14526783 ] ASF subversion and git services commented on SOLR-6220: --- Commit 1677622 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1677622 ] SOLR-6220: Fixes forbidden method invocation String#getBytes() in RuleEngineTest Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_60-ea-b12) - Build # 12560 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12560/ Java: 64bit/jdk1.8.0_60-ea-b12 -XX:-UseCompressedOops -XX:+UseG1GC All tests passed Build Log: [...truncated 31493 lines...] -check-forbidden-all: [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.8 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.8 [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.4 [forbidden-apis] Reading API signatures: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/base.txt [forbidden-apis] Reading API signatures: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/servlet-api.txt [forbidden-apis] Reading API signatures: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/solr.txt [forbidden-apis] Loading classes to check... [forbidden-apis] Scanning for API signatures and dependencies... [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:63) [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:108) [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:185) [forbidden-apis] Scanned 2654 (and 1668 related) class file(s) for forbidden API invocations (in 0.93s), 3 error(s). BUILD FAILED /home/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:526: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:97: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build.xml:329: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/common-build.xml:494: Check for forbidden API calls failed, see log. Total time: 47 minutes 34 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.8.0_45) - Build # 4766 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4766/ Java: 64bit/jdk1.8.0_45 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 31469 lines...] -check-forbidden-all: [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.8 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.8 [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.4 [forbidden-apis] Reading API signatures: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\tools\forbiddenApis\base.txt [forbidden-apis] Reading API signatures: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\tools\forbiddenApis\servlet-api.txt [forbidden-apis] Reading API signatures: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\tools\forbiddenApis\solr.txt [forbidden-apis] Loading classes to check... [forbidden-apis] Scanning for API signatures and dependencies... [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:63) [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:108) [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:185) [forbidden-apis] Scanned 2654 (and 1668 related) class file(s) for forbidden API invocations (in 1.70s), 3 error(s). BUILD FAILED C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:526: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:97: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build.xml:329: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:494: Check for forbidden API calls failed, see log. Total time: 68 minutes 32 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526826#comment-14526826 ] ASF subversion and git services commented on SOLR-6220: --- Commit 1677635 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1677635 ] SOLR-6220: use closeable in try block Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of EACH shard node:*,shard:*,replica:1-
[jira] [Updated] (LUCENE-6372) hashCode/equals for SpanPositionCheckQuery and subclasses
[ https://issues.apache.org/jira/browse/LUCENE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-6372: - Attachment: LUCENE-6372.patch Patch of 4 May 2015. Simplifies hashCode/equals for all subclasses of SpanQuery. Removes this == other checks in equals(), this might affect performance. Adds a few Objects.requireNonNull calls in constructors. Leaves various getBoost calls in hashCode implementations to super. Removes hashCode/equals from SpanFirstQuery, not needed anymore. Uses new collectPayloads attribute in SpanNearQuery hashCode/equals. hashCode/equals for SpanPositionCheckQuery and subclasses - Key: LUCENE-6372 URL: https://issues.apache.org/jira/browse/LUCENE-6372 Project: Lucene - Core Issue Type: Improvement Reporter: Paul Elschot Attachments: LUCENE-6372.patch Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6372) Simplify hashCode/equals for SpanQuery subclasses
[ https://issues.apache.org/jira/browse/LUCENE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-6372: - Summary: Simplify hashCode/equals for SpanQuery subclasses (was: hashCode/equals for SpanPositionCheckQuery and subclasses) Simplify hashCode/equals for SpanQuery subclasses - Key: LUCENE-6372 URL: https://issues.apache.org/jira/browse/LUCENE-6372 Project: Lucene - Core Issue Type: Improvement Reporter: Paul Elschot Attachments: LUCENE-6372.patch Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6372) Simplify hashCode/equals for SpanQuery subclasses
[ https://issues.apache.org/jira/browse/LUCENE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526840#comment-14526840 ] Paul Elschot commented on LUCENE-6372: -- See also LUCENE-6333 Simplify hashCode/equals for SpanQuery subclasses - Key: LUCENE-6372 URL: https://issues.apache.org/jira/browse/LUCENE-6372 Project: Lucene - Core Issue Type: Improvement Reporter: Paul Elschot Attachments: LUCENE-6372.patch Spin off from LUCENE-6308, see the comments there from around 23 March 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-5.x - Build # 837 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/837/ 1 tests failed. REGRESSION: org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test Error Message: Invalid content type: Stack Trace: org.apache.http.ParseException: Invalid content type: at __randomizedtesting.SeedInfo.seed([491AB9BB25277433:C14E86618BDB19CB]:0) at org.apache.http.entity.ContentType.parse(ContentType.java:273) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:513) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958) at org.apache.solr.cloud.CloudInspectUtil.compareResults(CloudInspectUtil.java:224) at org.apache.solr.cloud.CloudInspectUtil.compareResults(CloudInspectUtil.java:166) at org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.testIndexingBatchPerRequestWithHttpSolrClient(FullSolrCloudDistribCmdsTest.java:676) at org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test(FullSolrCloudDistribCmdsTest.java:152) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526889#comment-14526889 ] Jessica Cheng Mallet commented on SOLR-6220: It'll also be nice to have a new collection API to modify the rule for a collection so that we can add rules for an existing collection or modify a bad rule set. Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of EACH shard
[JENKINS] Lucene-Solr-5.x-Windows (64bit/jdk1.8.0_45) - Build # 4646 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4646/ Java: 64bit/jdk1.8.0_45 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 1 tests failed. FAILED: org.apache.solr.cloud.CollectionsAPIDistributedZkTest.test Error Message: Error from server at http://127.0.0.1:53267: Could not find collection : awholynewstresscollection_collection1_0 Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:53267: Could not find collection : awholynewstresscollection_collection1_0 at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227) at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:376) at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:328) at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1074) at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:846) at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:789) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.addReplicaTest(CollectionsAPIDistributedZkTest.java:1120) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.test(CollectionsAPIDistributedZkTest.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526928#comment-14526928 ] ASF subversion and git services commented on SOLR-6220: --- Commit 1677642 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1677642 ] SOLR-6220: Fix javadocs for precommit to pass Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of EACH shard
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526930#comment-14526930 ] Anshum Gupta commented on SOLR-6220: That would be a good thing to have. Can you create a new JIRA for that if one doesn't already exist? Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of EACH shard node:*,shard:*,replica:1- #in each node there should be a max one replica of ANY shard
[jira] [Commented] (SOLR-7458) Expose HDFS Block Locality Metrics
[ https://issues.apache.org/jira/browse/SOLR-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526933#comment-14526933 ] Mike Drob commented on SOLR-7458: - Did some digging with HDFS folks and it looks like BlockLocation::host is generally a hostname (and not an ip, with caveat that your cluster is configured reasonably). The general solution to determining a hostname for a machine is very difficult, since any given server could have multiple interfaces with multiple names for each alias, etc. We probably just have to rely on some well known one that we can get, and not spend too much effort worrying about if localhost is good enough. Well look at SolrXmlConfig. Will add in ConcurrentHashMap. Expose HDFS Block Locality Metrics -- Key: SOLR-7458 URL: https://issues.apache.org/jira/browse/SOLR-7458 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mike Drob Assignee: Mark Miller Labels: metrics Attachments: SOLR-7458.patch, SOLR-7458.patch We should publish block locality metrics when using HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526949#comment-14526949 ] Noble Paul commented on SOLR-6220: -- It's planned and I would like to piggy back on the modify collection API SOLR-5132 Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of EACH shard node:*,shard:*,replica:1- #in each node there should be a max one replica of ANY shard node:*,shard:**,replica:1-
[jira] [Closed] (SOLR-6288) Create a parser and rule engine for the rules syntax
[ https://issues.apache.org/jira/browse/SOLR-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul closed SOLR-6288. Resolution: Won't Fix makes no sense anymore Create a parser and rule engine for the rules syntax Key: SOLR-6288 URL: https://issues.apache.org/jira/browse/SOLR-6288 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch The proposed syntax needs to be parsed and given the tags for a bunch of nodes it should be able to asign replicas to nodes or just bailout if it not possible -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5132) Implement a modifyCollection API
[ https://issues.apache.org/jira/browse/SOLR-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5132: - Description: A new “modifyCollection” API will be introduced to: # Turn on/off collectionApiMode (see SOLR-5096) # Modify values of maxShardsPerNode for the collection # Modify value of replicationFactor for entire collection (apply to each and every slice) # Modify values of replicationFactor on a per-slice basis # Modify rules #Modify Snitch was: A new “modifyCollection” API will be introduced to: # Turn on/off collectionApiMode (see SOLR-5096) # Modify values of maxShardsPerNode for the collection # Modify value of replicationFactor for entire collection (apply to each and every slice) # Modify values of replicationFactor on a per-slice basis Implement a modifyCollection API Key: SOLR-5132 URL: https://issues.apache.org/jira/browse/SOLR-5132 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.9, Trunk Attachments: SOLR-5132.patch A new “modifyCollection” API will be introduced to: # Turn on/off collectionApiMode (see SOLR-5096) # Modify values of maxShardsPerNode for the collection # Modify value of replicationFactor for entire collection (apply to each and every slice) # Modify values of replicationFactor on a per-slice basis # Modify rules #Modify Snitch -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_60-ea-b12) - Build # 12561 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12561/ Java: 64bit/jdk1.8.0_60-ea-b12 -XX:+UseCompressedOops -XX:+UseSerialGC All tests passed Build Log: [...truncated 53539 lines...] BUILD FAILED /home/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:526: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:90: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build.xml:641: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1963: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:2002: Compile failed; see the compiler error output for details. Total time: 48 minutes 32 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7484) Refactor SolrDispatchFilter.doFilter(...) method
[ https://issues.apache.org/jira/browse/SOLR-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526966#comment-14526966 ] ASF subversion and git services commented on SOLR-7484: --- Commit 1677644 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1677644 ] SOLR-7484: Refactor SolrDispatchFilter to extract all Solr specific implementation detail to HttpSolrCall and also extract methods from within the current SDF.doFilter(..) logic making things easier to manage. HttpSolrCall converts the processing to a 3-step process i.e. Construct, Init, and Call so the context of the request would be available after Init and before the actual call operation. Refactor SolrDispatchFilter.doFilter(...) method Key: SOLR-7484 URL: https://issues.apache.org/jira/browse/SOLR-7484 Project: Solr Issue Type: Improvement Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch Currently almost everything that's done in SDF.doFilter() is sequential. We should refactor it to clean up the code and make things easier to manage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526994#comment-14526994 ] ASF subversion and git services commented on SOLR-6220: --- Commit 1677648 from [~noble.paul] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1677648 ] SOLR-6220: Rule Based Replica Assignment during collection creation Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 2261 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/2261/ Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC 1 tests failed. FAILED: org.apache.solr.cloud.CollectionsAPIAsyncDistributedZkTest.testSolrJAPICalls Error Message: Shard split did not complete. Last recorded state: running expected:[completed] but was:[running] Stack Trace: org.junit.ComparisonFailure: Shard split did not complete. Last recorded state: running expected:[completed] but was:[running] at __randomizedtesting.SeedInfo.seed([F3C731DE89271288:ABA3BDBF8F4DBA5C]:0) at org.junit.Assert.assertEquals(Assert.java:125) at org.apache.solr.cloud.CollectionsAPIAsyncDistributedZkTest.testSolrJAPICalls(CollectionsAPIAsyncDistributedZkTest.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Updated] (SOLR-6968) add hyperloglog in statscomponent as an approximate count
[ https://issues.apache.org/jira/browse/SOLR-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-6968: --- Attachment: SOLR-6968.patch Updated patch now includes an HllOptions class w/tests for parsing various knobs for tunning... * {{cardinality=true}} and {{cardinality=false}} still supported for basic defaults * can also specify huerstic based {{cardinality=N}} where N is a number between 0.0 and 1.0 inclusive indicating how much accuracy you care about ** 0 == minimum accuracy, conserve as much ram as possible ** 1.0 == maximum accuracy, spend as much ram as possible ** {{cardinality=true}} roughly the same as {{cardinality=0.33}} * additional advanced local params for overriding the hueristic based on a knowledge of HLL: ** {{hllLog2m=N}} (raw int passed to HLL API) ** {{hllRegwidth=N}} (raw int passed to HLL API) ** hll param prefix choosen based on implementation details similar to how {{percentiles}} supports {{tdigestCompression}} *** if/when we change the implementation details of how we compute cardinality, these can be ignored and new tunning options can be introduced. * {{hllPreHashed=BOOL}} ** only works with Long based fields (by design) add hyperloglog in statscomponent as an approximate count - Key: SOLR-6968 URL: https://issues.apache.org/jira/browse/SOLR-6968 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch stats component currently supports calcDistinct but it's terribly inefficient -- especially in distib mode. we should add support for using hyperloglog to compute an approximate count of distinct values (using localparams via SOLR-6349 to control the precision of the approximation) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion
[ https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527025#comment-14527025 ] Sachin Goyal commented on SOLR-7121: Thanks for the patch file [~mark.mil...@oblivion.ch]! I will add a patch file in the future along with pull request updation. Please see my comments below: \\ {quote}I think we want to look at making these new tests much faster.{quote} Please let me know how much time you are seeing for the running of the newly added tests. I think the new tests are using the existing actual Solr Cloud infrastructure and probably will need a little bit of time to setup and shutdown ZK, Cloud etc. unless we are happy with unit tests instead of functional. But if you have any ideas for the particular tests added in this ticket, I will be happy to improve upon the same. \\ \\ {quote}The test suite with this patch doesn't yet fully pass for me either.{quote} Can you please run those failing tests without the patch and let me know if they are still failing? The build seems to be passing at my end. \\ \\ {quote}What is the motivation behind the core regex matching and multiple config entries? Do you really need to configure different healthcheck thresholds per core in a collection?{quote} At a very minimum, we may want to configure the cores differently for different collections. The regular expression approach allows us to have a single configuration file for collections serving million documents and running on more powerful machines and also for collections serving a couple thousand small documents and running on less powerful machines. Without the regular expression, one would need separate configuration files for separate collections which is somewhat of a pain to manage. So basically, the regular expressions help define different thresholds for solr running on heterogeneous hardware. \\ \\ {quote}We also want to make it clear this functionality only works with SolrCloud and think about how that should best be expressed in the code - this bleeds a bit of SolrCloud specific code out of ZkController and into SolrCore in a way we have not really done yet I think.{quote} I agree to some extent. However, please note that all the new code is protected with *cc.isZooKeeperAware()* and it should not affect non-cloud-aware code. If you have more specific thoughts on improving this, I would be happy to refactor the current patch. \\ \\ {quote}What if we are the leader and publish a down state due to overload? Shouldn't we also give up our leader position?{quote} I am a little confused on this one. Wouldn't a down state trigger re-election? If not, it should probably be fixed elsewhere by asking non-leaders to start the election process. In any case, note that this code will be reached only when the leader is near exhaustion. Without this code, it would have tipped over completely and would have needed a restart. So, this code helps the leader node to survive a crash and become available in the future. Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion -- Key: SOLR-7121 URL: https://issues.apache.org/jira/browse/SOLR-7121 Project: Solr Issue Type: New Feature Reporter: Sachin Goyal Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch Currently, there is no way to control when a Solr node goes down. If the server is having high GC pauses or too many threads or is just getting too many queries due to some bad load-balancer, the cores in the machine keep on serving unless they exhaust the machine's resources and everything comes to a stall. Such a slow-dying core can affect other cores as well by taking huge time to serve their distributed queries. There should be a way to specify some threshold values beyond which the targeted core can its ill-health and proactively go down to recover. When the load improves, the core should come up automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion
[ https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-7121: - Assignee: Mark Miller Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion -- Key: SOLR-7121 URL: https://issues.apache.org/jira/browse/SOLR-7121 Project: Solr Issue Type: New Feature Reporter: Sachin Goyal Assignee: Mark Miller Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch Currently, there is no way to control when a Solr node goes down. If the server is having high GC pauses or too many threads or is just getting too many queries due to some bad load-balancer, the cores in the machine keep on serving unless they exhaust the machine's resources and everything comes to a stall. Such a slow-dying core can affect other cores as well by taking huge time to serve their distributed queries. There should be a way to specify some threshold values beyond which the targeted core can its ill-health and proactively go down to recover. When the load improves, the core should come up automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7499) Remove/Deprecate the name parameter from the ADDREPLICA Collection API call
Varun Thacker created SOLR-7499: --- Summary: Remove/Deprecate the name parameter from the ADDREPLICA Collection API call Key: SOLR-7499 URL: https://issues.apache.org/jira/browse/SOLR-7499 Project: Solr Issue Type: Bug Reporter: Varun Thacker Priority: Minor Right now we take a name parameter in the ADDREPLICA call. We use that as the core name for the replica. Are there any use cases where specifying the name of the core for the replica is useful? Here are the disadvantages of doing so - 1. We don't verify if the name is unique in the collection. So if a conflicting name ends up in the same node then the call will fail. 2. If it core is created on some other node, it will fail with legacyCloud=false as that checks for uniqueness in core names. https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica - The ref guide has never documented the name parameter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.7.0) - Build # 2216 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2216/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseParallelGC All tests passed Build Log: [...truncated 9411 lines...] [javac] Compiling 532 source files to /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/build/solr-core/classes/test [javac] /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:175: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new
[jira] [Commented] (LUCENE-6196) Include geo3d package, along with Lucene integration to make it useful
[ https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527093#comment-14527093 ] ASF subversion and git services commented on LUCENE-6196: - Commit 1677656 from [~dsmiley] in branch 'dev/branches/lucene6196' [ https://svn.apache.org/r1677656 ] LUCENE-6196: Fix javadoc issues; ant precommit is happy. Include geo3d package, along with Lucene integration to make it useful -- Key: LUCENE-6196 URL: https://issues.apache.org/jira/browse/LUCENE-6196 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: Karl Wright Assignee: David Smiley Attachments: LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, LUCENE-6196_Geo3d.patch, ShapeImpl.java, geo3d-tests.zip, geo3d.zip I would like to explore contributing a geo3d package to Lucene. This can be used in conjunction with Lucene search, both for generating geohashes (via spatial4j) for complex geographic shapes, as well as limiting results resulting from those queries to those results within the exact shape in highly performant ways. The package uses 3d planar geometry to do its magic, which basically limits computation necessary to determine membership (once a shape has been initialized, of course) to only multiplications and additions, which makes it feasible to construct a performant BoostSource-based filter for geographic shapes. The math is somewhat more involved when generating geohashes, but is still more than fast enough to do a good job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6450: --- Attachment: LUCENE-6450.patch Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached. Benchmarks are below: *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527103#comment-14527103 ] Nicholas Knize edited comment on LUCENE-6450 at 5/4/15 7:20 PM: Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached - this code currently exists in sandbox. Benchmarks are below: *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec was (Author: nknize): Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached. Benchmarks are below: *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527103#comment-14527103 ] Nicholas Knize edited comment on LUCENE-6450 at 5/4/15 7:23 PM: Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached - this code currently exists in sandbox. Benchmarks (using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are below: Data Set: 60M points of Planet OSM GPS data *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec was (Author: nknize): Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached - this code currently exists in sandbox. Benchmarks are below: *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6196) Include geo3d package, along with Lucene integration to make it useful
[ https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527118#comment-14527118 ] ASF subversion and git services commented on LUCENE-6196: - Commit 1677658 from [~dsmiley] in branch 'dev/branches/lucene6196' [ https://svn.apache.org/r1677658 ] LUCENE-6196: Mark @lucene.experimental or @lucene.internal Include geo3d package, along with Lucene integration to make it useful -- Key: LUCENE-6196 URL: https://issues.apache.org/jira/browse/LUCENE-6196 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: Karl Wright Assignee: David Smiley Attachments: LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, LUCENE-6196_Geo3d.patch, ShapeImpl.java, geo3d-tests.zip, geo3d.zip I would like to explore contributing a geo3d package to Lucene. This can be used in conjunction with Lucene search, both for generating geohashes (via spatial4j) for complex geographic shapes, as well as limiting results resulting from those queries to those results within the exact shape in highly performant ways. The package uses 3d planar geometry to do its magic, which basically limits computation necessary to determine membership (once a shape has been initialized, of course) to only multiplications and additions, which makes it feasible to construct a performant BoostSource-based filter for geographic shapes. The math is somewhat more involved when generating geohashes, but is still more than fast enough to do a good job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-Artifacts-5.x - Build # 819 - Failure
Build: https://builds.apache.org/job/Solr-Artifacts-5.x/819/ No tests ran. Build Log: [...truncated 27323 lines...] [javac] Compiling 532 source files to /usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/build/solr-core/classes/test [javac] /usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Solr-Artifacts-5.x/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:175: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given
[jira] [Updated] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6450: --- Attachment: LUCENE-6450.patch Updated patch to remove some superfluous code in GeoUtils. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-Windows (64bit/jdk1.7.0_80) - Build # 4648 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4648/ Java: 64bit/jdk1.7.0_80 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 9538 lines...] [javac] Compiling 532 source files to C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\classes\test [javac] C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:71: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:77: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:117: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:129: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:141: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:153: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:164: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\core\src\test\org\apache\solr\cloud\rule\RuleEngineTest.java:175: error: constructor ReplicaAssigner in class
[jira] [Created] (LUCENE-6462) Latin Stemmer for lucene
Niki created LUCENE-6462: Summary: Latin Stemmer for lucene Key: LUCENE-6462 URL: https://issues.apache.org/jira/browse/LUCENE-6462 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: Niki In the latest lucene package there is no stemmer for Latin language. I have a stemmer for latin language which is a rule based program based on the grammar and rules of Latin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.8.0) - Build # 2217 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2217/ Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC All tests passed Build Log: [...truncated 54189 lines...] BUILD FAILED /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:536: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:90: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/build.xml:641: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:1990: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:2023: Compile failed; see the compiler error output for details. Total time: 97 minutes 51 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Where Search Meets Machine Learning
Sorry, as I was saying, the machine learning approach, is NOT limited to having lots of user action data. In fact having little or no user action data is commonly referred to as the cold start problem in recommender systems. In which case, it is useful to exploit content based similarities as well as context (such as location, time-of-the-day, day-of-the-week, site-section, device type, etc) to make predictions/scoring. This can still be combined with the usual IR based scoring to keep semantics as the driving force. -J On Monday, May 4, 2015, J. Delgado joaquin.delg...@gmail.com wrote: BTW, as i mentioned, the machine learning On Monday, May 4, 2015, J. Delgado joaquin.delg...@gmail.com javascript:_e(%7B%7D,'cvml','joaquin.delg...@gmail.com'); wrote: I totally agree that it depends at the task at hand and the amount/quality of the data that you can get hold of. The problem of relevancy in traditional document/semantic information retrieval (IR) task is such a hard thing because there is little or no source of truth you could use as training data (unless you you something like TREC for a limited set of documents to evaluate) in most cases. Additionally the feedback data you get from users, if it exists, is very noisy. It this case prior knowledge, encoded as attributes-weights, crafted functions, and heuristics is your best bet. You can however mine the content itself by leveraging clustering/topic modeling via LDA which is unsupervised learning algorithm and use that as input. Or perhaps Labeled-LDA and Multi-Grain LDA, another topic model for classification and sentiment analysis, which are supervised algorithms, in which case you can still use the approach I suggested. However, for search tasks that involve e-commerce, advertisements, recommendations, etc., there seems to be more data that can be captured from users interactions with the system/site, that can be used as signals and users' actions (adding things to wish lists, clicks for more info, conversions, etc.) is much more telling about the intention/values the user give to what is presented to them. Then viewing search as a machine learning/multi-objective optimization problem makes sense. My point is that search engines nowadays is used for all these use cases, thus it is worth exploring all the venues exposed in this thread. Cheers, -- Joaquin On Mon, May 4, 2015 at 2:31 PM, Tom Burton-West tburt...@umich.edu wrote: Hi Doug and Joaquin, This is a really interesting discussion. Joaquin, I'm looking forward to taking your code for a test drive. Thank you for making it publicly available. Doug, I'm interested in your pyramid observation. I work with academic search which has some of the problems unique queries/information needs and of data sparsity you mention in your blog post. This article makes a similar argument that massive amounts of user data are so important for modern search engines that it is essentially a barrier to entry for new web search engines. Usage Data in Web Search: Benefits and Limitations. Ricardo Baeza-Yates and Yoelle Maarek. In Proceedings of SSDBM'2012, Chania, Crete, June 2012. http://www.springerlink.com/index/58255K40151U036N.pdf Tom I noticed that information retrieval problems fall into a sort-of layered pyramid. At the topmopst point is someone like Google where the sheer amount of high quality user behavior data that search truly is a machine learning problem, much as you propose. As you move down the pyramid the quality of user data diminishes. Eventually you get to a very thick layer of middle-class search applications that value relevance, but have very modest amounts or no user data. For most of them, even if they tracked their searches over a year, they *might* get good data over their top 50 searches. (I know cause they send me the spreadsheet and say fix it!). The best they can use analytics data is after-action troubleshooting. Actual user emails complaining about the search can be more useful than behavior data!
[jira] [Created] (SOLR-7501) map-reduce index tool has timing bugs
Shenghua Wan created SOLR-7501: -- Summary: map-reduce index tool has timing bugs Key: SOLR-7501 URL: https://issues.apache.org/jira/browse/SOLR-7501 Project: Solr Issue Type: Bug Components: contrib - MapReduce Reporter: Shenghua Wan Priority: Minor map-reduce index tool has timing bugs in several classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_45) - Build # 12568 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12568/ Java: 32bit/jdk1.8.0_45 -server -XX:+UseParallelGC 1 tests failed. FAILED: org.apache.solr.cloud.CollectionsAPIAsyncDistributedZkTest.testSolrJAPICalls Error Message: Shard split did not complete. Last recorded state: running expected:[completed] but was:[running] Stack Trace: org.junit.ComparisonFailure: Shard split did not complete. Last recorded state: running expected:[completed] but was:[running] at __randomizedtesting.SeedInfo.seed([A5A68D59067CF8A2:FDC2013800165076]:0) at org.junit.Assert.assertEquals(Assert.java:125) at org.apache.solr.cloud.CollectionsAPIAsyncDistributedZkTest.testSolrJAPICalls(CollectionsAPIAsyncDistributedZkTest.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
Re: Where Search Meets Machine Learning
I totally agree that it depends at the task at hand and the amount/quality of the data that you can get hold of. The problem of relevancy in traditional document/semantic information retrieval (IR) task is such a hard thing because there is little or no source of truth you could use as training data (unless you you something like TREC for a limited set of documents to evaluate) in most cases. Additionally the feedback data you get from users, if it exists, is very noisy. It this case prior knowledge, encoded as attributes-weights, crafted functions, and heuristics is your best bet. You can however mine the content itself by leveraging clustering/topic modeling via LDA which is unsupervised learning algorithm and use that as input. Or perhaps Labeled-LDA and Multi-Grain LDA, another topic model for classification and sentiment analysis, which are supervised algorithms, in which case you can still use the approach I suggested. However, for search tasks that involve e-commerce, advertisements, recommendations, etc., there seems to be more data that can be captured from users interactions with the system/site, that can be used as signals and users' actions (adding things to wish lists, clicks for more info, conversions, etc.) is much more telling about the intention/values the user give to what is presented to them. Then viewing search as a machine learning/multi-objective optimization problem makes sense. My point is that search engines nowadays is used for all these use cases, thus it is worth exploring all the venues exposed in this thread. Cheers, -- Joaquin On Mon, May 4, 2015 at 2:31 PM, Tom Burton-West tburt...@umich.edu wrote: Hi Doug and Joaquin, This is a really interesting discussion. Joaquin, I'm looking forward to taking your code for a test drive. Thank you for making it publicly available. Doug, I'm interested in your pyramid observation. I work with academic search which has some of the problems unique queries/information needs and of data sparsity you mention in your blog post. This article makes a similar argument that massive amounts of user data are so important for modern search engines that it is essentially a barrier to entry for new web search engines. Usage Data in Web Search: Benefits and Limitations. Ricardo Baeza-Yates and Yoelle Maarek. In Proceedings of SSDBM'2012, Chania, Crete, June 2012. http://www.springerlink.com/index/58255K40151U036N.pdf Tom I noticed that information retrieval problems fall into a sort-of layered pyramid. At the topmopst point is someone like Google where the sheer amount of high quality user behavior data that search truly is a machine learning problem, much as you propose. As you move down the pyramid the quality of user data diminishes. Eventually you get to a very thick layer of middle-class search applications that value relevance, but have very modest amounts or no user data. For most of them, even if they tracked their searches over a year, they *might* get good data over their top 50 searches. (I know cause they send me the spreadsheet and say fix it!). The best they can use analytics data is after-action troubleshooting. Actual user emails complaining about the search can be more useful than behavior data!
Re: Where Search Meets Machine Learning
BTW, as i mentioned, the machine learning On Monday, May 4, 2015, J. Delgado joaquin.delg...@gmail.com wrote: I totally agree that it depends at the task at hand and the amount/quality of the data that you can get hold of. The problem of relevancy in traditional document/semantic information retrieval (IR) task is such a hard thing because there is little or no source of truth you could use as training data (unless you you something like TREC for a limited set of documents to evaluate) in most cases. Additionally the feedback data you get from users, if it exists, is very noisy. It this case prior knowledge, encoded as attributes-weights, crafted functions, and heuristics is your best bet. You can however mine the content itself by leveraging clustering/topic modeling via LDA which is unsupervised learning algorithm and use that as input. Or perhaps Labeled-LDA and Multi-Grain LDA, another topic model for classification and sentiment analysis, which are supervised algorithms, in which case you can still use the approach I suggested. However, for search tasks that involve e-commerce, advertisements, recommendations, etc., there seems to be more data that can be captured from users interactions with the system/site, that can be used as signals and users' actions (adding things to wish lists, clicks for more info, conversions, etc.) is much more telling about the intention/values the user give to what is presented to them. Then viewing search as a machine learning/multi-objective optimization problem makes sense. My point is that search engines nowadays is used for all these use cases, thus it is worth exploring all the venues exposed in this thread. Cheers, -- Joaquin On Mon, May 4, 2015 at 2:31 PM, Tom Burton-West tburt...@umich.edu javascript:_e(%7B%7D,'cvml','tburt...@umich.edu'); wrote: Hi Doug and Joaquin, This is a really interesting discussion. Joaquin, I'm looking forward to taking your code for a test drive. Thank you for making it publicly available. Doug, I'm interested in your pyramid observation. I work with academic search which has some of the problems unique queries/information needs and of data sparsity you mention in your blog post. This article makes a similar argument that massive amounts of user data are so important for modern search engines that it is essentially a barrier to entry for new web search engines. Usage Data in Web Search: Benefits and Limitations. Ricardo Baeza-Yates and Yoelle Maarek. In Proceedings of SSDBM'2012, Chania, Crete, June 2012. http://www.springerlink.com/index/58255K40151U036N.pdf Tom I noticed that information retrieval problems fall into a sort-of layered pyramid. At the topmopst point is someone like Google where the sheer amount of high quality user behavior data that search truly is a machine learning problem, much as you propose. As you move down the pyramid the quality of user data diminishes. Eventually you get to a very thick layer of middle-class search applications that value relevance, but have very modest amounts or no user data. For most of them, even if they tracked their searches over a year, they *might* get good data over their top 50 searches. (I know cause they send me the spreadsheet and say fix it!). The best they can use analytics data is after-action troubleshooting. Actual user emails complaining about the search can be more useful than behavior data!
[GitHub] lucene-solr pull request: fix timing bugs in map-reduce contrib fo...
GitHub user wanshenghua opened a pull request: https://github.com/apache/lucene-solr/pull/146 fix timing bugs in map-reduce contrib for indexing https://issues.apache.org/jira/browse/SOLR-7501 You can merge this pull request into a Git repository by running: $ git pull https://github.com/wanshenghua/lucene-solr SOLR_7501 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/146.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #146 commit 06a81f2658f29a83db692c46232b4569c0321352 Author: Shenghua Wan s...@walmartlabs.com Date: 2015-05-05T03:23:17Z fix timing bugs in map-reduce contrib for indexing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7501) map-reduce index tool has timing bugs
[ https://issues.apache.org/jira/browse/SOLR-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527841#comment-14527841 ] ASF GitHub Bot commented on SOLR-7501: -- GitHub user wanshenghua opened a pull request: https://github.com/apache/lucene-solr/pull/146 fix timing bugs in map-reduce contrib for indexing https://issues.apache.org/jira/browse/SOLR-7501 You can merge this pull request into a Git repository by running: $ git pull https://github.com/wanshenghua/lucene-solr SOLR_7501 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/146.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #146 commit 06a81f2658f29a83db692c46232b4569c0321352 Author: Shenghua Wan s...@walmartlabs.com Date: 2015-05-05T03:23:17Z fix timing bugs in map-reduce contrib for indexing map-reduce index tool has timing bugs - Key: SOLR-7501 URL: https://issues.apache.org/jira/browse/SOLR-7501 Project: Solr Issue Type: Bug Components: contrib - MapReduce Reporter: Shenghua Wan Priority: Minor map-reduce index tool has timing bugs in several classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7501) map-reduce index tool has timing bugs
[ https://issues.apache.org/jira/browse/SOLR-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shenghua Wan updated SOLR-7501: --- Description: map-reduce index tool has timing bugs in several classes. bug fix is provided in https://github.com/apache/lucene-solr/pull/146 was:map-reduce index tool has timing bugs in several classes. map-reduce index tool has timing bugs - Key: SOLR-7501 URL: https://issues.apache.org/jira/browse/SOLR-7501 Project: Solr Issue Type: Bug Components: contrib - MapReduce Reporter: Shenghua Wan Priority: Minor map-reduce index tool has timing bugs in several classes. bug fix is provided in https://github.com/apache/lucene-solr/pull/146 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-Linux (32bit/jdk1.7.0_80) - Build # 12393 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/12393/ Java: 32bit/jdk1.7.0_80 -server -XX:+UseG1GC All tests passed Build Log: [...truncated 9437 lines...] [javac] Compiling 532 source files to /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/classes/test [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:175: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^
[jira] [Commented] (LUCENE-6462) Latin Stemmer for lucene
[ https://issues.apache.org/jira/browse/LUCENE-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527828#comment-14527828 ] Niki commented on LUCENE-6462: -- When searching for a LatinStemmer, I found this link from Lucene/Solr https://github.com/scherziglu/solr/blob/master/solr-analysis/src/main/java/org/apache/lucene/analysis/la/LatinStemmer.java. This program does not stem most words properly and also unnecessarily adds an 'i' amongst other things. I modified the above code to accomodate the rules of stemming in Latin. Latin Stemmer for lucene Key: LUCENE-6462 URL: https://issues.apache.org/jira/browse/LUCENE-6462 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: Niki In the latest lucene package there is no stemmer for Latin language. I have a stemmer for latin language which is a rule based program based on the grammar and rules of Latin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 3063 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3063/ All tests passed Build Log: [...truncated 9349 lines...] [javac] Compiling 532 source files to /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/build/solr-core/classes/test [javac] /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac]
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527905#comment-14527905 ] ASF subversion and git services commented on SOLR-6220: --- Commit 1677741 from sha...@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1677741 ] SOLR-6220: Fix compile error on Java7 Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of EACH shard
[JENKINS] Lucene-Solr-5.x-Linux (32bit/jdk1.8.0_45) - Build # 12395 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/12395/ Java: 32bit/jdk1.8.0_45 -client -XX:+UseSerialGC All tests passed Build Log: [...truncated 52931 lines...] BUILD FAILED /home/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:536: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:90: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build.xml:641: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:1990: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:2023: Compile failed; see the compiler error output for details. Total time: 57 minutes 24 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.8.0) - Build # 2218 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2218/ Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseG1GC All tests passed Build Log: [...truncated 54321 lines...] BUILD FAILED /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:536: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:90: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/build.xml:641: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:1990: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:2023: Compile failed; see the compiler error output for details. Total time: 98 minutes 6 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.8.0) - Build # 2218 - Still Failing!
I committed a fix. There was a compile error with Java7 in one of the tests added in SOLR-6220. On Tue, May 5, 2015 at 10:43 AM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2218/ Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseG1GC All tests passed Build Log: [...truncated 54321 lines...] BUILD FAILED /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:536: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/build.xml:90: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/solr/build.xml:641: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:1990: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-5.x-MacOSX/lucene/common-build.xml:2023: Compile failed; see the compiler error output for details. Total time: 98 minutes 6 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Regards, Shalin Shekhar Mangar.
[jira] [Commented] (LUCENE-6196) Include geo3d package, along with Lucene integration to make it useful
[ https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527191#comment-14527191 ] ASF subversion and git services commented on LUCENE-6196: - Commit 1677670 from [~dsmiley] in branch 'dev/branches/lucene6196' [ https://svn.apache.org/r1677670 ] LUCENE-6196: committing Karl's latest patch https://reviews.apache.org/r/33811/ (diff #3) Include geo3d package, along with Lucene integration to make it useful -- Key: LUCENE-6196 URL: https://issues.apache.org/jira/browse/LUCENE-6196 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: Karl Wright Assignee: David Smiley Attachments: LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, LUCENE-6196_Geo3d.patch, ShapeImpl.java, geo3d-tests.zip, geo3d.zip I would like to explore contributing a geo3d package to Lucene. This can be used in conjunction with Lucene search, both for generating geohashes (via spatial4j) for complex geographic shapes, as well as limiting results resulting from those queries to those results within the exact shape in highly performant ways. The package uses 3d planar geometry to do its magic, which basically limits computation necessary to determine membership (once a shape has been initialized, of course) to only multiplications and additions, which makes it feasible to construct a performant BoostSource-based filter for geographic shapes. The math is somewhat more involved when generating geohashes, but is still more than fast enough to do a good job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-Linux (64bit/jdk1.7.0_80) - Build # 12389 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/12389/ Java: 64bit/jdk1.7.0_80 -XX:+UseCompressedOops -XX:+UseParallelGC All tests passed Build Log: [...truncated 9617 lines...] [javac] Compiling 532 source files to /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/classes/test [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:175: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner(
[jira] [Commented] (SOLR-7436) Solr stops printing stacktraces in log and output
[ https://issues.apache.org/jira/browse/SOLR-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527279#comment-14527279 ] Hoss Man commented on SOLR-7436: Best guess, based on random googling since there's nothing in solr that i could think of to explain this, is that you are running into this HotSpot gotcha... http://jawspeak.com/2010/05/26/hotspot-caused-exceptions-to-lose-their-stack-traces-in-production-and-the-fix/ https://stackoverflow.com/questions/2295015/log4j-not-printing-the-stacktrace-for-exceptions bq. The compiler in the server VM now provides correct stack backtraces for all cold built-in exceptions. For performance purposes, when such an exception is thrown a few times, the method may be recompiled. After recompilation, the compiler may choose a faster tactic using preallocated exceptions that do not provide a stack trace. To disable completely the use of preallocated exceptions, use this new flag: -XX:-OmitStackTraceInFastThrow. Solr stops printing stacktraces in log and output - Key: SOLR-7436 URL: https://issues.apache.org/jira/browse/SOLR-7436 Project: Solr Issue Type: Bug Affects Versions: 5.1 Environment: Local 5.1 Reporter: Markus Jelsma Attachments: solr-8983-console.log After a short while, Solr suddenly stops printing stacktraces in the log and output. {code} 251043 [qtp1121454968-17] INFO org.apache.solr.core.SolrCore.Request [ suggests] - [suggests] webapp=/solr path=/select params={q=*:*fq={!collapse+field%3Dquery_digest}fq={!collapse+field%3Dresult_digest}} status=500 QTime=3 251043 [qtp1121454968-17] ERROR org.apache.solr.servlet.SolrDispatchFilter [ suggests] - null:java.lang.NullPointerException at org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:743) at org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:780) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:203) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1660) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1479) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:556) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at
Re: Where Search Meets Machine Learning
Hi Doug and Joaquin, This is a really interesting discussion. Joaquin, I'm looking forward to taking your code for a test drive. Thank you for making it publicly available. Doug, I'm interested in your pyramid observation. I work with academic search which has some of the problems unique queries/information needs and of data sparsity you mention in your blog post. This article makes a similar argument that massive amounts of user data are so important for modern search engines that it is essentially a barrier to entry for new web search engines. Usage Data in Web Search: Benefits and Limitations. Ricardo Baeza-Yates and Yoelle Maarek. In Proceedings of SSDBM'2012, Chania, Crete, June 2012. http://www.springerlink.com/index/58255K40151U036N.pdf Tom I noticed that information retrieval problems fall into a sort-of layered pyramid. At the topmopst point is someone like Google where the sheer amount of high quality user behavior data that search truly is a machine learning problem, much as you propose. As you move down the pyramid the quality of user data diminishes. Eventually you get to a very thick layer of middle-class search applications that value relevance, but have very modest amounts or no user data. For most of them, even if they tracked their searches over a year, they *might* get good data over their top 50 searches. (I know cause they send me the spreadsheet and say fix it!). The best they can use analytics data is after-action troubleshooting. Actual user emails complaining about the search can be more useful than behavior data!
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527392#comment-14527392 ] Michael McCandless commented on LUCENE-6450: This new approach is nice! I don't fully understand all the geo math, but I think I get the gist: you recursively approximate the target shape using smaller and smaller ranges from the morton encoding, and then record when that z-shape is fully within the query and avoid the post-filtering for those ranges. This visits fewer terms than the original patch, which did just a single range that can (w/ the right 'adversary') visit a great many false terms. It's impressive how fast this is, without using any NumericField prefix terms. I think we can explore that later and we should commit this approach now .. Maybe add a test case w/ more data, e.g. randomized test? It could index a bunch of random points, and then run random rects/shapes and do the dumb slow check every single doc check and confirm query hits agree. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527530#comment-14527530 ] Anshum Gupta edited comment on SOLR-7275 at 5/4/15 11:13 PM: - Patch updated to trunk. Working on integrating the context object. was (Author: anshumg): Patch updated to trunk working on integrating the context. Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-7275: --- Attachment: (was: SOLR-7484.patch) Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7500) Remove pathPrefix from SolrDispatchFilter as Solr no longer runs as a part of a bigger webapp
[ https://issues.apache.org/jira/browse/SOLR-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-7500: --- Attachment: SOLR-7500.patch Remove pathPrefix from SolrDispatchFilter as Solr no longer runs as a part of a bigger webapp - Key: SOLR-7500 URL: https://issues.apache.org/jira/browse/SOLR-7500 Project: Solr Issue Type: Improvement Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-7500.patch SolrDispatchFilter has support for Solr running as part of a bigger webapp but as we've moved away from that concept, it makes sense to clean up the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.9.0-ea-b60) - Build # 12564 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12564/ Java: 64bit/jdk1.9.0-ea-b60 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 1 tests failed. FAILED: org.apache.solr.cloud.BasicDistributedZkTest.test Error Message: commitWithin did not work on node: http://127.0.0.1:36402/collection1 expected:68 but was:67 Stack Trace: java.lang.AssertionError: commitWithin did not work on node: http://127.0.0.1:36402/collection1 expected:68 but was:67 at __randomizedtesting.SeedInfo.seed([A9AAF956597BE1F8:21FEC68CF7878C00]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.solr.cloud.BasicDistributedZkTest.test(BasicDistributedZkTest.java:344) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:502) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at
[jira] [Updated] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-7275: --- Attachment: SOLR-7275.patch Patch updated to trunk working on integrating the context. Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527592#comment-14527592 ] Anshum Gupta commented on SOLR-7275: Right now, this also lacks a mechanism to Reload / Reinit without restarting the node. Perhaps it'd be a good idea to have an API to do that. Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527410#comment-14527410 ] Nicholas Knize commented on LUCENE-6450: That's right. The old patch was a naive scan the world approach. Really unusable at scale. As said this one approximates the bounding box as the set of ranges on the space filling curve. I think [~dsmiley] had also suggested random testing, which is definitely necessary. I'll add some randomized testing and post a new patch. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Solr website - problem with anchor links
Hi Shawn, The books on the same page (h3 headings as well) had the same problem - these are linked from the front page, so it was really noticeable. I fixed the problem for the books by directly specifying h3 class=“offset”title/h3 (instead of markdown syntax ### Title ###) - you can see that class in base.css: http://lucene.apache.org/solr/assets/styles/base.css - it shifts content down far enough that you can see it below the floating header. I didn’t apply the fix everywhere in that file because the markdown flavor used by the ASF CMS doesn’t have the ability to specify HTML tag attributes, and many parts of resources.mdtext are still just markdown, so I didn’t want to make it messy (well, messier really) by including more HTML. But since markdown auto-creates anchors for all h3 headings, it makes sense to make them not look terrible when people link directly to them, so I’ve just converted all the H3 headings from: ### Heading ### to: h3 class=“offsetHeading/h3 Steve On May 4, 2015, at 9:59 AM, Shawn Heisey apa...@elyograg.org wrote: When I try to use a URL with an anchor link on the Solr website, it doesn't work right: https://lucene.apache.org/solr/resources.html#mailing-lists On both Firefox and Chrome, this URL doesn't quite go to the right spot. It would be the right spot if the floating header at the top of of the page wasn't there. I'm guessing some CSS trickery is required to get it to anchor below that floating header. I did find the following, and when I have time to digest it, I may be able to try and fix the problem, but finding that time is the hard part. http://stackoverflow.com/questions/10732690/offsetting-an-html-anchor-to-adjust-for-fixed-header If somebody knows exactly how to fix it and has the time, feel free to take this problem! Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-Linux (32bit/jdk1.7.0_80) - Build # 12391 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/12391/ Java: 32bit/jdk1.7.0_80 -client -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 9578 lines...] [javac] Compiling 532 source files to /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/classes/test [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:71: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:77: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:117: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] MapPosition, String mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:129: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:141: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:153: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:164: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac] ^ [javac] required: ListRule,MapString,Integer,List,MapString,SetString,ListString,CoreContainer,ClusterState [javac] found: ListRule,Map,ListString,HashMap,ArrayListObject,null,null [javac] reason: actual argument ArrayListObject cannot be converted to ListString by method invocation conversion [javac] /home/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/core/src/test/org/apache/solr/cloud/rule/RuleEngineTest.java:175: error: constructor ReplicaAssigner in class ReplicaAssigner cannot be applied to given types; [javac] mapping = new ReplicaAssigner( [javac]
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 2262 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/2262/ Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 1 tests failed. FAILED: org.apache.solr.cloud.MultiThreadedOCPTest.test Error Message: Captured an uncaught exception in thread: Thread[id=4003, name=parallelCoreAdminExecutor-1947-thread-8, state=RUNNABLE, group=TGRP-MultiThreadedOCPTest] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=4003, name=parallelCoreAdminExecutor-1947-thread-8, state=RUNNABLE, group=TGRP-MultiThreadedOCPTest] at __randomizedtesting.SeedInfo.seed([83BF6554D449E287:BEB5A8E7AB58F7F]:0) Caused by: java.lang.AssertionError: Too many closes on SolrCore at __randomizedtesting.SeedInfo.seed([83BF6554D449E287]:0) at org.apache.solr.core.SolrCore.close(SolrCore.java:1138) at org.apache.solr.common.util.IOUtils.closeQuietly(IOUtils.java:31) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:535) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:494) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:628) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:213) at org.apache.solr.handler.admin.CoreAdminHandler$ParallelCoreAdminHandlerThread.run(CoreAdminHandler.java:1249) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Build Log: [...truncated 9522 lines...] [junit4] Suite: org.apache.solr.cloud.MultiThreadedOCPTest [junit4] 2 Creating dataDir: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/J0/temp/solr.cloud.MultiThreadedOCPTest 83BF6554D449E287-001/init-core-data-001 [junit4] 2 540661 T3688 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl (false) and clientAuth (false) [junit4] 2 540661 T3688 oas.BaseDistributedSearchTestCase.initHostContext Setting hostContext system property: /hfdh/s [junit4] 2 540663 T3688 oasc.ZkTestServer.run STARTING ZK TEST SERVER [junit4] 2 540663 T3689 oasc.ZkTestServer$2$1.setClientPort client port:0.0.0.0/0.0.0.0:0 [junit4] 2 540663 T3689 oasc.ZkTestServer$ZKServerMain.runFromConfig Starting server [junit4] 2 540765 T3688 oasc.ZkTestServer.run start zk server on port:55932 [junit4] 2 540766 T3688 oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default ZkCredentialsProvider [junit4] 2 540770 T3688 oascc.ConnectionManager.waitForConnected Waiting for client to connect to ZooKeeper [junit4] 2 540782 T3696 oascc.ConnectionManager.process Watcher org.apache.solr.common.cloud.ConnectionManager@72ae8ad5 name:ZooKeeperConnection Watcher:127.0.0.1:55932 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None [junit4] 2 540782 T3688 oascc.ConnectionManager.waitForConnected Client is connected to ZooKeeper [junit4] 2 540783 T3688 oascc.SolrZkClient.createZkACLProvider Using default ZkACLProvider [junit4] 2 540783 T3688 oascc.SolrZkClient.makePath makePath: /solr [junit4] 2 540793 T3688 oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default ZkCredentialsProvider [junit4] 2 540796 T3688 oascc.ConnectionManager.waitForConnected Waiting for client to connect to ZooKeeper [junit4] 2 540799 T3699 oascc.ConnectionManager.process Watcher org.apache.solr.common.cloud.ConnectionManager@52b95ac3 name:ZooKeeperConnection Watcher:127.0.0.1:55932/solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None [junit4] 2 540799 T3688 oascc.ConnectionManager.waitForConnected Client is connected to ZooKeeper [junit4] 2 540800 T3688 oascc.SolrZkClient.createZkACLProvider Using default ZkACLProvider [junit4] 2 540800 T3688 oascc.SolrZkClient.makePath makePath: /collections/collection1 [junit4] 2 540805 T3688 oascc.SolrZkClient.makePath makePath: /collections/collection1/shards [junit4] 2 540810 T3688 oascc.SolrZkClient.makePath makePath: /collections/control_collection [junit4] 2 540815 T3688 oascc.SolrZkClient.makePath makePath: /collections/control_collection/shards [junit4] 2 540820 T3688 oasc.AbstractZkTestCase.putConfig put /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files/solr/collection1/conf/solrconfig-tlog.xml to /configs/conf1/solrconfig.xml [junit4] 2 540820 T3688 oascc.SolrZkClient.makePath makePath: /configs/conf1/solrconfig.xml [junit4] 2 540828 T3688 oasc.AbstractZkTestCase.putConfig put
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527489#comment-14527489 ] Uwe Schindler commented on LUCENE-6450: --- Hi, I will look into this tomorrow (it is too late) now... This looks like it has a completely separate TermsEnum and query impl. Why not extend MultiTermQuery directly and let NRQ live on its own? Uwe Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527496#comment-14527496 ] Uwe Schindler edited comment on LUCENE-6450 at 5/4/15 10:42 PM: bq. It's impressive how fast this is, without using any NumericField prefix terms. I think we can explore that later and we should commit this approach now .. We should first compare how this behaves on *large* bboxes, so a random test / perf test spanning large parts of world and large indexes with maaany points would be good (whole atlantic, whole africa,...). It is also mentioned that it does not allow to cross date line, which is easy to do by splitting into 2 queries, one left of date line, one right. I can help with that. Then we should also test perf with queries spanning whole pacific :-) was (Author: thetaphi): bq. It's impressive how fast this is, without using any NumericField prefix terms. I think we can explore that later and we should commit this approach now .. We should first compare how this behaves on *large* bboxes, so a random test / perf test spanning large parts of world and large indexes with maaany points would be good (whole atlantic, whole africa,...). It is also mentioned that it does not allow to cross date line, which is easy to do by splitting into 2 queries, one left of date line, one right. I can help with that. Then we should also test perf with queries spanning whole pacific :-) Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527496#comment-14527496 ] Uwe Schindler commented on LUCENE-6450: --- bq. It's impressive how fast this is, without using any NumericField prefix terms. I think we can explore that later and we should commit this approach now .. We should first compare how this behaves on *large* bboxes, so a random test / perf test spanning large parts of world and large indexes with maaany points would be good (whole atlantic, whole africa,...). It is also mentioned that it does not allow to cross date line, which is easy to do by splitting into 2 queries, one left of date line, one right. I can help with that. Then we should also test perf with queries spanning whole pacific :-) Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.8.0_45) - Build # 4767 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4767/ Java: 64bit/jdk1.8.0_45 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC 3 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.core.TestLazyCores Error Message: ERROR: SolrIndexSearcher opens=51 closes=50 Stack Trace: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=51 closes=50 at __randomizedtesting.SeedInfo.seed([366927323FB5382C]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:496) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:232) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:799) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) FAILED: junit.framework.TestSuite.org.apache.solr.core.TestLazyCores Error Message: 1 thread leaked from SUITE scope at org.apache.solr.core.TestLazyCores: 1) Thread[id=9505, name=searcherExecutor-4428-thread-1, state=WAITING, group=TGRP-TestLazyCores] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.core.TestLazyCores: 1) Thread[id=9505, name=searcherExecutor-4428-thread-1, state=WAITING, group=TGRP-TestLazyCores] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) at __randomizedtesting.SeedInfo.seed([366927323FB5382C]:0) FAILED: junit.framework.TestSuite.org.apache.solr.core.TestLazyCores Error Message: There are still zombie threads that couldn't be terminated:1) Thread[id=9505,
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527300#comment-14527300 ] David Smiley commented on LUCENE-6450: -- Nice code Nick! LGTM. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527318#comment-14527318 ] David Smiley commented on LUCENE-6450: -- Just curious; how did that Python RTree benchmark compare? https://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/src/python/SearchOSM.py?spec=svn188e330ea8c34a9720cbf0414d2ed19f6a843a3dr=188e330ea8c34a9720cbf0414d2ed19f6a843a3d#1 Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527103#comment-14527103 ] Nicholas Knize edited comment on LUCENE-6450 at 5/4/15 7:37 PM: Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached - this code currently exists in sandbox. Benchmarks (using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are below: Data Set: 60M points of Planet OSM GPS data (http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png) *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec was (Author: nknize): Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached - this code currently exists in sandbox. Benchmarks (using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are below: Data Set: 60M points of Planet OSM GPS data *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7484) Refactor SolrDispatchFilter.doFilter(...) method
[ https://issues.apache.org/jira/browse/SOLR-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527132#comment-14527132 ] ASF subversion and git services commented on SOLR-7484: --- Commit 1677660 from [~anshumg] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1677660 ] SOLR-7484: Refactor SolrDispatchFilter to extract all Solr specific implementation detail to HttpSolrCall and also extract methods from within the current SDF.doFilter(..) logic making things easier to manage. HttpSolrCall converts the processing to a 3-step process i.e. Construct, Init, and Call so the context of the request would be available after Init and before the actual call operation.(merge from trunk) Refactor SolrDispatchFilter.doFilter(...) method Key: SOLR-7484 URL: https://issues.apache.org/jira/browse/SOLR-7484 Project: Solr Issue Type: Improvement Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch Currently almost everything that's done in SDF.doFilter() is sequential. We should refactor it to clean up the code and make things easier to manage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527103#comment-14527103 ] Nicholas Knize edited comment on LUCENE-6450 at 5/4/15 7:55 PM: Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached - this code currently exists in sandbox. Benchmarks (using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are below: Data Set: 60M points of Planet OSM GPS data (http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png) *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoHashPrefixTree* Index Time: 695.079 sec Index Size: 4.2G Mean Query Time: 0.071 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec was (Author: nknize): Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached - this code currently exists in sandbox. Benchmarks (using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are below: Data Set: 60M points of Planet OSM GPS data (http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png) *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6196) Include geo3d package, along with Lucene integration to make it useful
[ https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527230#comment-14527230 ] David Smiley commented on LUCENE-6196: -- I think the Geo3d branch, technically {{lucene6196}}, is now ready to merge into trunk, and then the 5x branch. I could generate a patch, but unless there are process reasons (e.g. I have to?) or technical reasons I am unaware of, I'll simply merge in the branch. The CHANGES.txt entry I plan to add is as follows: {noformat} * LUCENE-6196: New Spatial Geo3d API with partial Spatial4j integration. It is a set of shapes implemented using 3D planar geometry for calculating spatial relations on the surface of a sphere. Shapes include Point, BBox, Circle, Path (buffered line string), and Polygon. (Karl Wright via David Smiley) {noformat} Karl, if you suggest any changes then just let me know. If I don't get another +1 then I'll commit in two days. Include geo3d package, along with Lucene integration to make it useful -- Key: LUCENE-6196 URL: https://issues.apache.org/jira/browse/LUCENE-6196 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: Karl Wright Assignee: David Smiley Attachments: LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, LUCENE-6196_Geo3d.patch, ShapeImpl.java, geo3d-tests.zip, geo3d.zip I would like to explore contributing a geo3d package to Lucene. This can be used in conjunction with Lucene search, both for generating geohashes (via spatial4j) for complex geographic shapes, as well as limiting results resulting from those queries to those results within the exact shape in highly performant ways. The package uses 3d planar geometry to do its magic, which basically limits computation necessary to determine membership (once a shape has been initialized, of course) to only multiplications and additions, which makes it feasible to construct a performant BoostSource-based filter for geographic shapes. The math is somewhat more involved when generating geohashes, but is still more than fast enough to do a good job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7484) Refactor SolrDispatchFilter to move all Solr specific implementation to another class
[ https://issues.apache.org/jira/browse/SOLR-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-7484: --- Summary: Refactor SolrDispatchFilter to move all Solr specific implementation to another class (was: Refactor SolrDispatchFilter) Refactor SolrDispatchFilter to move all Solr specific implementation to another class - Key: SOLR-7484 URL: https://issues.apache.org/jira/browse/SOLR-7484 Project: Solr Issue Type: Improvement Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.2 Attachments: SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch Currently almost everything that's done in SDF.doFilter() is sequential. We should refactor it to clean up the code and make things easier to manage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7484) Refactor SolrDispatchFilter
[ https://issues.apache.org/jira/browse/SOLR-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-7484: --- Summary: Refactor SolrDispatchFilter (was: Refactor SolrDispatchFilter.doFilter(...) method) Refactor SolrDispatchFilter --- Key: SOLR-7484 URL: https://issues.apache.org/jira/browse/SOLR-7484 Project: Solr Issue Type: Improvement Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.2 Attachments: SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch Currently almost everything that's done in SDF.doFilter() is sequential. We should refactor it to clean up the code and make things easier to manage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527103#comment-14527103 ] Nicholas Knize edited comment on LUCENE-6450 at 5/4/15 8:00 PM: Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached - this code currently exists in sandbox. Benchmarks (using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are below: Data Set: 60M points of Planet OSM GPS data (http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png) *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoHashPrefixTree* Index Time: 695.079 sec Index Size: 4.2G Mean Query Time: 0.071 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec Hardware: 8 core System76 Ubuntu 14.10 laptop w/ 16GB memory was (Author: nknize): Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached - this code currently exists in sandbox. Benchmarks (using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are below: Data Set: 60M points of Planet OSM GPS data (http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png) *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoHashPrefixTree* Index Time: 695.079 sec Index Size: 4.2G Mean Query Time: 0.071 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion
[ https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527212#comment-14527212 ] Mark Miller commented on SOLR-7121: --- bq. Without the regular expression, one would need separate configuration files for separate collections which is somewhat of a pain to manage. Couldn't you make the same argument for all of the config in solrconfig.xml? It seems that all SolrCores in the same collection will want the same config, and you usually would want to use different config for other collections if you want any of it to vary. Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion -- Key: SOLR-7121 URL: https://issues.apache.org/jira/browse/SOLR-7121 Project: Solr Issue Type: New Feature Reporter: Sachin Goyal Assignee: Mark Miller Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch Currently, there is no way to control when a Solr node goes down. If the server is having high GC pauses or too many threads or is just getting too many queries due to some bad load-balancer, the cores in the machine keep on serving unless they exhaust the machine's resources and everything comes to a stall. Such a slow-dying core can affect other cores as well by taking huge time to serve their distributed queries. There should be a way to specify some threshold values beyond which the targeted core can its ill-health and proactively go down to recover. When the load improves, the core should come up automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527338#comment-14527338 ] Michael McCandless commented on LUCENE-6450: Here's the OSM subset I'm using for the benchmarks: http://people.apache.org/~mikemccand/latlon.subsetPlusAllLondon.txt.lzma It's a random 1/50th of the latest OSM export (as of last week), but includes all points within London, UK. The search benchmark then runs a fixed set (225 total) of axis-aligned rectangle intersects queries around London. Look for Index/SearchOSM/GeoPoint.java/py in luceneutil... I ran the same benchmarks (except for Packed/QuadPrefixTree): *Geopoint* Index time: 157.3 sec (incl. forceMerge) Index size: 1.8 GB Mean query time: .077 sec 221,119,062 total hits *GeoHashPrefixTree* Index time: 628.5 sec (incl. forceMerge) Index size: 4.2 GB Mean query time: .039 sec 221,120,027 total hits *libspatialindex* (using Python Rtree wrapper) Index time: 469.6 sec Index size: 2.6 GB Mean query time: .158 sec 221,118,844 total hits The first geopoint patch here got exactly the same total hit count as libspatialindex, but now it's different, I think because of the precision control to control how deep the ranges recurse. I think it's also expected geohash won't get the same hit count since it's doing a bit of quantizing (level 11 ... not sure what that equates to in meters). I'm surprised the Rtree impl is so slow ... Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7484) Refactor SolrDispatchFilter.doFilter(...) method
[ https://issues.apache.org/jira/browse/SOLR-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-7484: --- Fix Version/s: 5.2 Refactor SolrDispatchFilter.doFilter(...) method Key: SOLR-7484 URL: https://issues.apache.org/jira/browse/SOLR-7484 Project: Solr Issue Type: Improvement Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.2 Attachments: SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch, SOLR-7484.patch Currently almost everything that's done in SDF.doFilter() is sequential. We should refactor it to clean up the code and make things easier to manage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527148#comment-14527148 ] David Smiley commented on LUCENE-6450: -- Can you please direct me to the luceneutil and geo benchmark? I'm curious what that's about. The numbers look nice. Small indexes and fast index time :-) It'd be interesting to try GeoHashPrefixTree, which will have smaller indexes than Quad. I'll check out your code shortly. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527103#comment-14527103 ] Nicholas Knize edited comment on LUCENE-6450 at 5/4/15 8:01 PM: Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached - this code currently exists in sandbox. Benchmarks (using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are below: Data Set: 60M points of Planet OSM GPS data (http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png) *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoHashPrefixTree* Parameters: level: 11 Index Time: 695.079 sec Index Size: 4.2G Mean Query Time: 0.071 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec Hardware: 8 core System76 Ubuntu 14.10 laptop w/ 16GB memory was (Author: nknize): Was out last week but had some time this weekend to add TermsEnum logic to visit only those ranges along the SFC that represent the bounding box. Updated patch attached - this code currently exists in sandbox. Benchmarks (using luceneutil thanks to [~mikemccand] for adding geo benchmarking) are below: Data Set: 60M points of Planet OSM GPS data (http://wiki.openstreetmap.org/wiki/File:World-gps-points-120604-2048.png) *QuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 2449.08 sec Index Size: 13G Mean Query Time: 0.066 sec *PackedQuadPrefixTree* Parameters: distErrPct: 0, pruneLeafyBranches: true, pointsOnly: true, level: 29 Index Time: 1945.288 sec Index Size: 11G Mean Query Time: 0.058 sec *GeoHashPrefixTree* Index Time: 695.079 sec Index Size: 4.2G Mean Query Time: 0.071 sec *GeoPointField* Index Time: 180.872 sec Index Size: 1.8G Mean Query Time: 0.107 sec Hardware: 8 core System76 Ubuntu 14.10 laptop w/ 16GB memory Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6196) Include geo3d package, along with Lucene integration to make it useful
[ https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527185#comment-14527185 ] ASF subversion and git services commented on LUCENE-6196: - Commit 1677669 from [~dsmiley] in branch 'dev/branches/lucene6196' [ https://svn.apache.org/r1677669 ] LUCENE-6196: Mark @lucene.experimental or @lucene.internal Include geo3d package, along with Lucene integration to make it useful -- Key: LUCENE-6196 URL: https://issues.apache.org/jira/browse/LUCENE-6196 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: Karl Wright Assignee: David Smiley Attachments: LUCENE-6196-additions.patch, LUCENE-6196-fixes.patch, LUCENE-6196_Geo3d.patch, ShapeImpl.java, geo3d-tests.zip, geo3d.zip I would like to explore contributing a geo3d package to Lucene. This can be used in conjunction with Lucene search, both for generating geohashes (via spatial4j) for complex geographic shapes, as well as limiting results resulting from those queries to those results within the exact shape in highly performant ways. The package uses 3d planar geometry to do its magic, which basically limits computation necessary to determine membership (once a shape has been initialized, of course) to only multiplications and additions, which makes it feasible to construct a performant BoostSource-based filter for geographic shapes. The math is somewhat more involved when generating geohashes, but is still more than fast enough to do a good job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org