DIH
Please reconsider the removal of the DIH from future versions. The repo it's been moved to is a ghost town with zero engagement from Rohit (or anyone). Not sure how 'moving' it caused it to now only support MariaDB but that appears to be the case. The current implementation is fast, easy to work with and just works. Please, please and thank you!
Streaming expressions, what is the effect of collection name in the request url
Do collection names in request url affect how the query works in any way? A streaming expression is sent to http://mySolrHost/solr/col1,col2/stream (notice multiple collections in url) Col1 has 2 shards, each have 3 replicas. * Shard1 has replicas on nodes A, B, C * Shard2 has replicas on D,E,F Col2 has 2 shards, each have 3 replicas. Its shards have the same configuration as Col1. Lets say we have a simple search expression: search( "colA,colB", q="*:*", qt="/export", fl="fl1,fl2", sort="id asc" ) Collection names in search expression denotes which collections should be searched, so we can’t change them. But what would change if we sent the query to http://mySolrHost/solr/someOtherCollection/stream and someOtherCollection has 1 shard and 6 replicas in nodes A,B,C,D,E,F ? I read about worker collections a bit, but as long as I don’t explicitly use parallel streams, what is the difference? Sent from Mail for Windows 10
read/write on different node?
Hi, I have one data collection on 3 shards and 2 replicas, user searches on it. Also I log all user queries and save to another collection on the same solr cloud, but user queries are very slow when there are a lot of logs to be written to the log collection. any solution for me, please advise. or does solr support separate write operation on different node and read on other nodes?
Solr Cloud freezes during scheduled backup
Hello everyone, I have a nasty problem with the scheduled Solr collections backup. From time to time when a scheduled backup is triggered (backup operation takes around 10 minutes) Solr freezes for 20-30 seconds. The freeze happens on one Solr instance at time but this affects all queries latency (because of distributed queries on 6 shards). I can reproduce the problem only when updates in the Solr cluster are enabled. When I disable updates, the problem is gone. Lucene index is not big and fits into OS cache. I am wondering if taking a backup can be the culprit of the problem. I'm wondering if the process messes up operating system caches. Maybe all the files which are copied to NFS are eating up the OS cache and when the OS reaches high memory usage it starts cleaning up memory and making Solr to freeze. During the time of freeze monitoring charts are showing higher IO wait times. In addition to that Solr nodes which seem to be affected are reaching 95-100% total memory usage (used + buffers + caches). I cannot see anything valuable in GC logs apart from a message which suggests that the application was stopped for 20-30 seconds (Application time). The cluster consists of 12 machines. Each Solr is running on Ubuntu 16.04. All the servers are running in AWS EC2. Each Solr node is running inside Docker. EC2 instances have local SSD disks (but the same problem appeared with EBS). Does anyone have a similar problem and can share some thoughts? I'll appreciate all help. -- Pawel Rog
Incorrect distance returned for indexed polygone shape
I am using `geodist()` in solr query. Following this `select?=&fl=*,_dist_:geodist()&fq={!geofilt d=30444}&indent=on&pt=50.53,-9.5722616&q=*:*&sfield=geo&spatial=true&wt=json` However, it seems like distance calculations aren’t working. Here’s an example query where the pt is several hundred kilometers away from the POLYGON. The problem that the calculated geodist is always `20015.115` . This is my query response: ``` { "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"*:*", "pt":"50.53,-9.5722616", "indent":"on", "fl":"*,_dist_:geodist()", "fq":"{!geofilt d=30444}", "sfield":"geo", "spatial":"true", "wt":"json"}}, "response":{"numFound":3,"start":0,"docs":[ { "id":"1", "document_type_id":"1", "geo":["POLYGON ((3.837490081787109 43.61234105514181, 3.843669891357422 43.57877424689641, 3.893280029296875 43.57205863840097, 3.9458084106445312 43.58872191986938, 3.921947479248047 43.62762639320158, 3.8663291931152344 43.63321761913266, 3.837490081787109 43.61234105514181))"], "_version_":1689241382273679360, "timestamp":"2021-01-18T16:08:40.484Z", "_dist_":20015.115}, { "id":"4", "document_type_id":"4", "geo":["POLYGON ((-0.94482421875 45.10454630976873, -0.98876953125 44.6061127451739, 0.06591796875 44.134913443750726, 0.32958984375 45.1510532655634, -0.94482421875 45.10454630976873))"], "_version_":1689244486784253952, "timestamp":"2021-01-18T16:58:01.177Z", "_dist_":20015.115}, { "id":"8", "document_type_id":"8", "geo":["POLYGON ((-2.373046875 48.29781249243716, -2.28515625 48.004625021133904, -1.5380859375 47.76886840424207, -0.32958984375 47.79839667295524, -0.5712890625 48.531157010976706, -2.373046875 48.29781249243716))"], "_version_":1689252312264998912, "timestamp":"2021-01-18T19:02:24.137Z", "_dist_":20015.115}] }} ``` This is my solr field type definition: ```xml ``` This is how I index my polygon: ```json { "id": 12, "document_type_id": 12, "geo": "POLYGON ((3.77105712890625 43.61171961774284, 3.80401611328125 43.57939602461448, 3.8610076904296875 43.59580863402625, 3.8603210449218746 43.61519958447072, 3.826675415039062 43.628123412124616, 3.7827301025390625 43.63110543935801, 3.77105712890625 43.61171961774284))" } ``` By the way I'm using solr 6.6 and I found 2 issues about this : https://issues.apache.org/jira/browse/SOLR-12899 https://issues.apache.org/jira/browse/SOLR-12899 Does there an explanation !? Any help would be appreciated! -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html