Solr 7.4.0 - bug in JMX cache stats?
Hi, it seems the format of cache mbeans changed with 7.4.0. And from what I see similar change wasn't made for other mbeans, which may mean it was accidental and may be a bug. In Solr 7.3.* format was (each attribute on its own, numeric type): mbean: solr:dom1=core,dom2=gettingstarted,dom3=shard1,dom4=replica_n1,category=CACHE,scope=searcher,name=filterCache attributes: lookups java.lang.Long = 0 hits java.lang.Long = 0 cumulative_evictions java.lang.Long = 0 size java.lang.Long = 0 hitratio java.lang.Float = 0.0 evictions java.lang.Long = 0 cumulative_lookups java.lang.Long = 0 cumulative_hitratio java.lang.Float = 0.0 warmupTime java.lang.Long = 0 inserts java.lang.Long = 0 cumulative_inserts java.lang.Long = 0 cumulative_hits java.lang.Long = 0 With 7.4.0 there is a single attribute "Value" (java.lang.Object): mbean: solr:dom1=core,dom2=gettingstarted,dom3=shard1,dom4=replica_n1,category=CACHE,scope=searcher,name=filterCache attributes: Value java.lang.Object = {lookups=0, evictions=0, cumulative_inserts=0, cumulative_hits=0, hits=0, cumulative_evictions=0, size=0, hitratio=0.0, cumulative_lookups=0, cumulative_hitratio=0.0, warmupTime=0, inserts=0} So the question is - was this intentional change or a bug? Thanks, Bojan
Re: Geospatial clustering + zoom in/out help
Hi David, I was hoping to get an answer on Geospatial topic from you :). These links basically confirm that approach I wanted to take should work ok with similar (or even bigger) amount of data than I plan to have. Instead of my custom NxM division of world, I'll try existing GeoHash encoding, it may be good enough (and will be quicker to implement). Thanks! Bojan On Fri, Jan 31, 2014 at 8:27 PM, Smiley, David W. wrote: > Hi Bojan. > > You've got some good ideas here along the lines of some that others have > tried. I've through together a page on the wiki about this subject some > time ago that I'm sure you will find interesting. It references a relevant > stack-overflow post, and also a presentation at DrupalCon which had a > segment from a guy using the same approach you suggest here involving > field-collapsing and/or stats components. The video shows it in action. > > http://wiki.apache.org/solr/SpatialClustering > > It would be helpful for everyone if you share your experience with > whatever you choose, once you give an approach a try. > > ~ David > > From: Bojan Šmid [bos...@gmail.com] > Sent: Thursday, January 30, 2014 1:15 PM > To: solr-user@lucene.apache.org > Subject: Geospatial clustering + zoom in/out help > > Hi, > > I have an index with 300K docs with lat,lon. I need to cluster the docs > based on lat,lon for display in the UI. The user then needs to be able to > click on any cluster and zoom in (up to 11 levels deep). > > I'm using Solr 4.6 and I'm wondering how best to implement this > efficiently? > > A bit more specific questions below. > > I need to: > > 1) cluster data points at different zoom levels > > 2) click on a specific cluster and zoom in > > 3) be able to select a region (bounding box or polygon) and show clusters > in the selected area > > What's the best way to implement this so that queries are fast? > > What I thought I would try, but maybe there are better ways: > > * divide the world in NxM large squares and then each of these squares into > 4 more squares, and so on - 11 levels deep > > * at index time figure out all squares (at all 11 levels) each data point > belongs to and index that info into 11 different fields: e.g. > zoom3=square1_62_47_33 > > > * at search time, use field collapsing on zoomX field to get which docs > belong to which square on particular level > > * calculate center point of each square (by calculating mean value of > positions for all points in that square) using StatsComponent (facet on > zoomX field, avg on lat and lon fields) - I would consider those squares as > separate clusters (one square is one cluster) and center points of those > squares as center points of clusters derived from them > > I *think* the problem with this approach is that: > > * there will be many unique fields for bigger zoom levels, which means > field collapsing / StatsComponent maaay not work fast enough > > * clusters will not look very natural because I would have many clusters on > each zoom level and what are "real" geographical clusters would be > displayed as multiple clusters since their points would in some cases be > dispersed into multiple squares. But that may be OK > > * a lot will depend on how the squares are calculated - linearly dividing > 360 degrees by N to get "equal" size squares in degrees would produce > issues with "real" square sizes and counts of points in each of them > > > So I'm wondering if there is a better way? > > Thanks, > > > Bojan >
Geospatial clustering + zoom in/out help
Hi, I have an index with 300K docs with lat,lon. I need to cluster the docs based on lat,lon for display in the UI. The user then needs to be able to click on any cluster and zoom in (up to 11 levels deep). I'm using Solr 4.6 and I'm wondering how best to implement this efficiently? A bit more specific questions below. I need to: 1) cluster data points at different zoom levels 2) click on a specific cluster and zoom in 3) be able to select a region (bounding box or polygon) and show clusters in the selected area What's the best way to implement this so that queries are fast? What I thought I would try, but maybe there are better ways: * divide the world in NxM large squares and then each of these squares into 4 more squares, and so on - 11 levels deep * at index time figure out all squares (at all 11 levels) each data point belongs to and index that info into 11 different fields: e.g. * at search time, use field collapsing on zoomX field to get which docs belong to which square on particular level * calculate center point of each square (by calculating mean value of positions for all points in that square) using StatsComponent (facet on zoomX field, avg on lat and lon fields) - I would consider those squares as separate clusters (one square is one cluster) and center points of those squares as center points of clusters derived from them I *think* the problem with this approach is that: * there will be many unique fields for bigger zoom levels, which means field collapsing / StatsComponent maaay not work fast enough * clusters will not look very natural because I would have many clusters on each zoom level and what are "real" geographical clusters would be displayed as multiple clusters since their points would in some cases be dispersed into multiple squares. But that may be OK * a lot will depend on how the squares are calculated - linearly dividing 360 degrees by N to get "equal" size squares in degrees would produce issues with "real" square sizes and counts of points in each of them So I'm wondering if there is a better way? Thanks, Bojan
SolrCloud - "KeeperErrorCode = NoNode" - after restart
Hi, I have a cluster with 5 Solr nodes (4.6 release) and 5 ZKs, with around 2000 collections (each with single shard, each shard having 1 or 2 replicas), running on Tomcat. Each Solr node hosts around 1000 physical cores. When starting any node, I almost always see errors like: 2013-12-19 18:45:42,454 [coreLoadExecutor-4-thread-721] ERROR org.apache.solr.cloud.ZkController- Error getting leader from zk org.apache.solr.common.SolrException: Could not get leader props at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:945) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:909) at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:873) at org.apache.solr.cloud.ZkController.register(ZkController.java:807) at org.apache.solr.cloud.ZkController.register(ZkController.java:757) at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:272) at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:489) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:272) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:263) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /collections/core6_20131120/leaders/shard1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:264) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:261) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65) It happens just for some cores, usually for about 10-20 of them out of 1000 on one node (each time different cores fail). These 10-20 cores are then marked as "down" and they are never "recovered", while other cores work ok. I did check ZK, there really is no node "/collections/core_20131120/leaders/shard1", but "/collections/core_20131120/leaders" exists, so it looks like "shard1" was removed (maybe during previous shutdown?). Also, when I stop all nodes and clear ZK state, and after that start Solr (rolling starting nodes one by one), all nodes start properly and all cores are properly loaded ("active"). But after that, first restart of any Solr node causes issues on that node. Any ideas about possible cause? And shouldn't Solr maybe try to recover from such situation? Thanks, Bojan
Aggregating data with Solr, getting group stats
Hi, I see there are few ways in Solr which can "almost" be used for my use case, but all of them appear to fall short eventually. Here is what I am trying to do: consider the following document structure (there are many more fields in play, but this is enough for example): Manufacturer ProductType Color Size Price CountAvailableItems Based on user parameters (search string, some filters), I would fetch a set of documents. What I need is to group resulting documents by different attribute combinations (say "Manufacturer + Color" or "ProductType + Color + Size" or ...) and get stats (Max Price, Avg Price, Num of available items) for those groups. Possible solutions in Solr: 1) StatsComponent - provides all stats I would need, but its grouping functionality is basic - it can group on a single field (stats.field + stats.facet) while I need field combinations. There is an issue https://issues.apache.org/jira/browse/SOLR-2472 which tried to deal with that, but it looks like it got stuck in the past. 2) Pivot Faceting - seems like it would provide all the grouping logic I need and in combination with https://issues.apache.org/jira/browse/SOLR-3583"Percentiles for facets, pivot facets, and distributed pivot facets" would bring percentiles and averages. However, I would still miss things like Max/Min/Sum and the issue is not committed yet anyway. I would also depend on another yet to be committed issue https://issues.apache.org/jira/browse/SOLR-2894 for distributed support. 3) Configurable Collectors - https://issues.apache.org/jira/browse/SOLR-4465- seems promissing, but it allows grouping by just one field and, probably a bigger problem, seem it was just a POC and will need overhauling before it is anywhere near being ready for commit Are there any other options I missed? Thanks, Bojan
DataImportHandler - "too many connections" MySQL error after upgrade to Solr 1.4 release
Hi all, I had DataImportHandler working perfectly on Solr 1.4 nightly build from June 2009. I upgraded the Solr to 1.4 release and started getting errors: Caused by: com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: Server connection failure during transaction. Due to underlying exception: 'com.mysql.jdbc.except ions.MySQLNonTransientConnectionException: Too many connections'. This is the same machine, the same setup (except new Solr) that never had problems. The error doesn't pop-up at the beginning, DIH runs for few hours and then breaks (after few millions of rows are processed). Solr is the only process using MySQL, max_connections on MySQL is set to 100, so it seems like there might exist some connection leak in DIH. Few more informations on the setup: MySQL version 5.0.67 driver: mysql-connector-java-5.0.8-bin.jar Java: 1.6.0_14 connection URL parameters : autoReconnect=true, batchSize=-1 OS : CentOS 5.2 Did anyone else had similar problems with 1.4 release? Regards
Re: conditional sorting
I tried to simplify the problem, but the point is that I could have really complex requirements. For instance, "if in the first 5 results none are older than one year, use sort by X, otherwise sort by Y". So, the question is, is there a way to make Solr recognize complex situations and apply different sorting criterion. Bojan On Fri, Oct 2, 2009 at 4:22 PM, Uri Boness wrote: > If the threshold is only 10, why can't you always sort by popularity and if > the result set is <10 then resort on the client side based on date_entered? > > Uri > > > Bojan Šmid wrote: > >> Hi all, >> >> I need to perform sorting of my query hits by different criterion >> depending >> on the number of hits. For instance, if there are < 10 hits, sort by >> date_entered, otherwise, sort by popularity. >> >> Does anyone know if there is a way to do that with a single query, or I'll >> have to send another query with desired sort criterion after I inspect >> number of hits on my client? >> >> Thx >> >> >> >
conditional sorting
Hi all, I need to perform sorting of my query hits by different criterion depending on the number of hits. For instance, if there are < 10 hits, sort by date_entered, otherwise, sort by popularity. Does anyone know if there is a way to do that with a single query, or I'll have to send another query with desired sort criterion after I inspect number of hits on my client? Thx
Re: SolrCoreAware analyzer
Thanks for you suggestions. I do need SolrCore, but I could probably live with just SolrResourceLoader, while also creating my own FieldType (which can be ResourceLoaderAware). Bojan On Thu, Feb 26, 2009 at 11:48 PM, Chris Hostetter wrote: > > : I am writing a custom analyzer for my field type. This analyzer would > need > : to use SolrResourceLoader and SolrConfig, so I want to make it > : SolrCoreAware. > > 1) Solr's support for using Analyzer instances is mainly just to make it > easy for people who already have existing ANalyzer impls that they want to > use -- if you're writing something new, i would suggest implementing the > TokenizerFactory API. > > 2) Do you really need access to the SolrCore, or do you just need access > to the SolrResourceLoader? Because there is also the ResourceLoaderAware > API. If you take a look at StopFilterFactory you can see an example of > how it's used. > > FWIW: The reasons Solr doesn't support SolrCoreAware Analysis related > plugins (TokenizerFactory and TokenFilterFactory) are: > > a. it kept the initalization a lot simpler. currently SOlrCore knows > about the INdexSchema, but the IndexSchema doesnt' know anythng about the > SolrCore. > b. it allows for more reuse of the schema related code independent of the > rest of Solr (there was talk at one point of promoting all of the > IndexSchema/FieldType/Token*Factory code into a Lucene-Java contrib but > so far no one has steped up to work out the refactoring) > > > -Hoss > >
SolrCoreAware analyzer
Hello, I am writing a custom analyzer for my field type. This analyzer would need to use SolrResourceLoader and SolrConfig, so I want to make it SolrCoreAware. However, it seems that Analyzer classes aren't supposed to be used in this way (as described in http://wiki.apache.org/solr/SolrPlugins). Is there any way to provide my analyzer with SolrCore? The list of valid SolrCoreAware classes is in SolrResourceLoader (line 465 on current Solr trunk), so I could create a patch (which would enable Analyzers to get SolrCore) for my Solr instance, but I would rather avoid making patches just for myself (just complicates maintenance). Thanks in advance, Bojan