Solr 7.4.0 - bug in JMX cache stats?

2018-09-06 Thread Bojan Šmid
Hi,

  it seems the format of cache mbeans changed with 7.4.0.  And from what I
see similar change wasn't made for other mbeans, which may mean it was
accidental and may be a bug.

  In Solr 7.3.* format was (each attribute on its own, numeric type):

mbean:
solr:dom1=core,dom2=gettingstarted,dom3=shard1,dom4=replica_n1,category=CACHE,scope=searcher,name=filterCache

attributes:
  lookups java.lang.Long = 0
  hits java.lang.Long = 0
  cumulative_evictions java.lang.Long = 0
  size java.lang.Long = 0
  hitratio java.lang.Float = 0.0
  evictions java.lang.Long = 0
  cumulative_lookups java.lang.Long = 0
  cumulative_hitratio java.lang.Float = 0.0
  warmupTime java.lang.Long = 0
  inserts java.lang.Long = 0
  cumulative_inserts java.lang.Long = 0
  cumulative_hits java.lang.Long = 0


  With 7.4.0 there is a single attribute "Value" (java.lang.Object):

mbean:
solr:dom1=core,dom2=gettingstarted,dom3=shard1,dom4=replica_n1,category=CACHE,scope=searcher,name=filterCache

attributes:
  Value java.lang.Object = {lookups=0, evictions=0,
cumulative_inserts=0, cumulative_hits=0, hits=0, cumulative_evictions=0,
size=0, hitratio=0.0, cumulative_lookups=0, cumulative_hitratio=0.0,
warmupTime=0, inserts=0}


  So the question is - was this intentional change or a bug?

  Thanks,

Bojan


Re: Geospatial clustering + zoom in/out help

2014-02-03 Thread Bojan Šmid
Hi David,

  I was hoping to get an answer on Geospatial topic from you :). These
links basically confirm that approach I wanted to take should work ok with
similar (or even bigger) amount of data than I plan to have. Instead of my
custom NxM division of world, I'll try existing GeoHash encoding, it may be
good enough (and will be quicker to implement).

  Thanks!

  Bojan


On Fri, Jan 31, 2014 at 8:27 PM, Smiley, David W.  wrote:

> Hi Bojan.
>
> You've got some good ideas here along the lines of some that others have
> tried.  I've through together a page on the wiki about this subject some
> time ago that I'm sure you will find interesting.  It references a relevant
> stack-overflow post, and also a presentation at DrupalCon which had a
> segment from a guy using the same approach you suggest here involving
> field-collapsing and/or stats components.  The video shows it in action.
>
> http://wiki.apache.org/solr/SpatialClustering
>
> It would be helpful for everyone if you share your experience with
> whatever you choose, once you give an approach a try.
>
> ~ David
> 
> From: Bojan Šmid [bos...@gmail.com]
> Sent: Thursday, January 30, 2014 1:15 PM
> To: solr-user@lucene.apache.org
> Subject: Geospatial clustering + zoom in/out help
>
> Hi,
>
> I have an index with 300K docs with lat,lon. I need to cluster the docs
> based on lat,lon for display in the UI. The user then needs to be able to
> click on any cluster and zoom in (up to 11 levels deep).
>
> I'm using Solr 4.6 and I'm wondering how best to implement this
> efficiently?
>
> A bit more specific questions below.
>
> I need to:
>
> 1) cluster data points at different zoom levels
>
> 2) click on a specific cluster and zoom in
>
> 3) be able to select a region (bounding box or polygon) and show clusters
> in the selected area
>
> What's the best way to implement this so that queries are fast?
>
> What I thought I would try, but maybe there are better ways:
>
> * divide the world in NxM large squares and then each of these squares into
> 4 more squares, and so on - 11 levels deep
>
> * at index time figure out all squares (at all 11 levels) each data point
> belongs to and index that info into 11 different fields: e.g.
>  zoom3=square1_62_47_33 >
>
> * at search time, use field collapsing on zoomX field to get which docs
> belong to which square on particular level
>
> * calculate center point of each square (by calculating mean value of
> positions for all points in that square) using StatsComponent (facet on
> zoomX field, avg on lat and lon fields) - I would consider those squares as
> separate clusters (one square is one cluster) and center points of those
> squares as center points of clusters derived from them
>
> I *think* the problem with this approach is that:
>
> * there will be many unique fields for bigger zoom levels, which means
> field collapsing / StatsComponent maaay not work fast enough
>
> * clusters will not look very natural because I would have many clusters on
> each zoom level and what are "real" geographical clusters would be
> displayed as multiple clusters since their points would in some cases be
> dispersed into multiple squares. But that may be OK
>
> * a lot will depend on how the squares are calculated - linearly dividing
> 360 degrees by N to get "equal" size squares in degrees would produce
> issues with "real" square sizes and counts of points in each of them
>
>
> So I'm wondering if there is a better way?
>
> Thanks,
>
>
>   Bojan
>


Geospatial clustering + zoom in/out help

2014-01-30 Thread Bojan Šmid
Hi,

I have an index with 300K docs with lat,lon. I need to cluster the docs
based on lat,lon for display in the UI. The user then needs to be able to
click on any cluster and zoom in (up to 11 levels deep).

I'm using Solr 4.6 and I'm wondering how best to implement this efficiently?

A bit more specific questions below.

I need to:

1) cluster data points at different zoom levels

2) click on a specific cluster and zoom in

3) be able to select a region (bounding box or polygon) and show clusters
in the selected area

What's the best way to implement this so that queries are fast?

What I thought I would try, but maybe there are better ways:

* divide the world in NxM large squares and then each of these squares into
4 more squares, and so on - 11 levels deep

* at index time figure out all squares (at all 11 levels) each data point
belongs to and index that info into 11 different fields: e.g.


* at search time, use field collapsing on zoomX field to get which docs
belong to which square on particular level

* calculate center point of each square (by calculating mean value of
positions for all points in that square) using StatsComponent (facet on
zoomX field, avg on lat and lon fields) - I would consider those squares as
separate clusters (one square is one cluster) and center points of those
squares as center points of clusters derived from them

I *think* the problem with this approach is that:

* there will be many unique fields for bigger zoom levels, which means
field collapsing / StatsComponent maaay not work fast enough

* clusters will not look very natural because I would have many clusters on
each zoom level and what are "real" geographical clusters would be
displayed as multiple clusters since their points would in some cases be
dispersed into multiple squares. But that may be OK

* a lot will depend on how the squares are calculated - linearly dividing
360 degrees by N to get "equal" size squares in degrees would produce
issues with "real" square sizes and counts of points in each of them


So I'm wondering if there is a better way?

Thanks,


  Bojan


SolrCloud - "KeeperErrorCode = NoNode" - after restart

2013-12-20 Thread Bojan Šmid
Hi,

  I have a cluster with 5 Solr nodes (4.6 release) and 5 ZKs, with around
2000 collections (each with single shard, each shard having 1 or 2
replicas), running on Tomcat. Each Solr node hosts around 1000 physical
cores.

  When starting any node, I almost always see errors like:

2013-12-19 18:45:42,454 [coreLoadExecutor-4-thread-721] ERROR
org.apache.solr.cloud.ZkController- Error getting leader from zk
org.apache.solr.common.SolrException: Could not get leader props
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:945)
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:909)
at
org.apache.solr.cloud.ZkController.getLeader(ZkController.java:873)
at
org.apache.solr.cloud.ZkController.register(ZkController.java:807)
at
org.apache.solr.cloud.ZkController.register(ZkController.java:757)
at
org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:272)
at
org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:489)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:272)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:263)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/core6_20131120/leaders/shard1
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:264)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:261)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)

  It happens just for some cores, usually for about 10-20 of them out of
1000 on one node (each time different cores fail). These 10-20 cores are
then marked as "down" and they are never "recovered", while other cores
work ok.

  I did check ZK, there really is no node
"/collections/core_20131120/leaders/shard1", but
"/collections/core_20131120/leaders" exists, so it looks like "shard1" was
removed (maybe during previous shutdown?).

  Also, when I stop all nodes and clear ZK state, and after that start Solr
(rolling starting nodes one by one), all nodes start properly and all cores
are properly loaded ("active"). But after that, first restart of any Solr
node causes issues on that node.

  Any ideas about possible cause? And shouldn't Solr maybe try to recover
from such situation?

  Thanks,

  Bojan


Aggregating data with Solr, getting group stats

2013-07-15 Thread Bojan Šmid
Hi,

  I see there are few ways in Solr which can "almost" be used for my use
case, but all of them appear to fall short eventually.

  Here is what I am trying to do: consider the following document structure
(there are many more fields in play, but this is enough for example):

Manufacturer
ProductType
Color
Size
Price
CountAvailableItems

  Based on user parameters (search string, some filters), I would fetch a
set of documents. What I need is to group resulting documents by different
attribute combinations (say "Manufacturer + Color" or "ProductType + Color
+ Size" or ...) and get stats (Max Price, Avg Price, Num of available
items) for those groups.

  Possible solutions in Solr:

1) StatsComponent - provides all stats I would need, but its grouping
functionality is basic - it can group on a single field (stats.field +
stats.facet) while I need field combinations. There is an issue
https://issues.apache.org/jira/browse/SOLR-2472 which tried to deal with
that, but it looks like it got stuck in the past.

2) Pivot Faceting - seems like it would provide all the grouping logic I
need and in combination with
https://issues.apache.org/jira/browse/SOLR-3583"Percentiles for
facets, pivot facets, and distributed pivot facets" would
bring percentiles and averages. However, I would still miss things like
Max/Min/Sum and the issue is not committed yet anyway. I would also depend
on another yet to be committed issue
https://issues.apache.org/jira/browse/SOLR-2894 for distributed support.

3) Configurable Collectors -
https://issues.apache.org/jira/browse/SOLR-4465- seems promissing, but
it allows grouping by just one field and, probably
a bigger problem, seem it was just a POC and will need overhauling before
it is anywhere near being ready for commit


  Are there any other options I missed?

  Thanks,

  Bojan


DataImportHandler - "too many connections" MySQL error after upgrade to Solr 1.4 release

2010-02-10 Thread Bojan Šmid
Hi all,

  I had DataImportHandler working perfectly on Solr 1.4 nightly build from
June 2009. I upgraded the Solr to 1.4 release and started getting errors:


Caused by: com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
Server connection failure during transaction. Due to underlying exception:
'com.mysql.jdbc.except
ions.MySQLNonTransientConnectionException: Too many connections'.


  This is the same machine, the same setup (except new Solr) that never had
problems. The error doesn't pop-up at the beginning, DIH runs for few hours
and then breaks (after few millions of rows are processed).

  Solr is the only process using MySQL, max_connections on MySQL is set to
100, so it seems like there might exist some connection leak in DIH. Few
more informations on the setup:
  MySQL version 5.0.67
  driver: mysql-connector-java-5.0.8-bin.jar
  Java: 1.6.0_14
  connection URL parameters : autoReconnect=true, batchSize=-1
  OS : CentOS 5.2

  Did anyone else had similar problems with 1.4 release?


  Regards


Re: conditional sorting

2009-10-02 Thread Bojan Šmid
I tried to simplify the problem, but the point is that I could have really
complex requirements. For instance, "if in the first 5 results none are
older than one year, use sort by X, otherwise sort by Y".

So, the question is, is there a way to make Solr recognize complex
situations and apply different sorting criterion.

Bojan


On Fri, Oct 2, 2009 at 4:22 PM, Uri Boness  wrote:

> If the threshold is only 10, why can't you always sort by popularity and if
> the result set is <10 then resort on the client side based on date_entered?
>
> Uri
>
>
> Bojan Šmid wrote:
>
>> Hi all,
>>
>> I need to perform sorting of my query hits by different criterion
>> depending
>> on the number of hits. For instance, if there are < 10 hits, sort by
>> date_entered, otherwise, sort by popularity.
>>
>> Does anyone know if there is a way to do that with a single query, or I'll
>> have to send another query with desired sort criterion after I inspect
>> number of hits on my client?
>>
>> Thx
>>
>>
>>
>


conditional sorting

2009-10-02 Thread Bojan Šmid
Hi all,

I need to perform sorting of my query hits by different criterion depending
on the number of hits. For instance, if there are < 10 hits, sort by
date_entered, otherwise, sort by popularity.

Does anyone know if there is a way to do that with a single query, or I'll
have to send another query with desired sort criterion after I inspect
number of hits on my client?

Thx


Re: SolrCoreAware analyzer

2009-02-27 Thread Bojan Šmid
Thanks for you suggestions.

I do need SolrCore, but I could probably live with just SolrResourceLoader,
while also creating my own FieldType (which can be ResourceLoaderAware).

Bojan


On Thu, Feb 26, 2009 at 11:48 PM, Chris Hostetter
wrote:

>
> : I am writing a custom analyzer for my field type. This analyzer would
> need
> : to use SolrResourceLoader and SolrConfig, so I want to make it
> : SolrCoreAware.
>
> 1) Solr's support for using Analyzer instances is mainly just to make it
> easy for people who already have existing ANalyzer impls that they want to
> use -- if you're writing something new, i would suggest implementing the
> TokenizerFactory API.
>
> 2) Do you really need access to the SolrCore, or do you just need access
> to the SolrResourceLoader?  Because there is also the ResourceLoaderAware
> API.  If you take a look at StopFilterFactory you can see an example of
> how it's used.
>
> FWIW: The reasons Solr doesn't support SolrCoreAware Analysis related
> plugins (TokenizerFactory and TokenFilterFactory) are:
>
> a. it kept the initalization a lot simpler.  currently SOlrCore knows
> about the INdexSchema, but the IndexSchema doesnt' know anythng about the
> SolrCore.
> b. it allows for more reuse of the schema related code independent of the
> rest of Solr (there was talk at one point of promoting all of the
> IndexSchema/FieldType/Token*Factory code into a Lucene-Java contrib but
> so far no one has steped up to work out the refactoring)
>
>
> -Hoss
>
>


SolrCoreAware analyzer

2009-02-26 Thread Bojan Šmid
Hello,

I am writing a custom analyzer for my field type. This analyzer would need
to use SolrResourceLoader and SolrConfig, so I want to make it
SolrCoreAware.

However, it seems that Analyzer classes aren't supposed to be used in this
way (as described in http://wiki.apache.org/solr/SolrPlugins). Is there any
way to provide my analyzer with SolrCore?

The list of valid SolrCoreAware classes is in SolrResourceLoader (line 465
on current Solr trunk), so I could create a patch (which would enable
Analyzers to get SolrCore) for my Solr instance, but I would rather avoid
making patches just for myself (just complicates maintenance).

Thanks in advance,

Bojan