DIH

2021-01-20 Thread Joshua Wilder
Please reconsider the removal of the DIH from future versions. The repo
it's been moved to is a ghost town with zero engagement from Rohit (or
anyone). Not sure how 'moving' it caused it to now only support MariaDB but
that appears to be the case. The current implementation is fast, easy to
work with and just works. Please, please and thank you!


Streaming expressions, what is the effect of collection name in the request url

2021-01-20 Thread ufuk yılmaz
Do collection names in request url affect how the query works in any way?

A streaming expression is sent to http://mySolrHost/solr/col1,col2/stream 
(notice multiple collections in url)

Col1 has 2 shards, each have 3 replicas.
* Shard1 has replicas on nodes A, B, C
* Shard2 has replicas on D,E,F

Col2 has 2 shards, each have 3 replicas. Its shards have the same configuration 
as Col1.


Lets say we have a simple search expression:
search(
"colA,colB",
q="*:*",
qt="/export",
fl="fl1,fl2",
sort="id asc"
)

Collection names in search expression denotes which collections should be 
searched, so we can’t change them. But what would change if we sent the query 
to 
http://mySolrHost/solr/someOtherCollection/stream

and someOtherCollection has 1 shard and 6 replicas in nodes A,B,C,D,E,F ?

I read about worker collections a bit, but as long as I don’t explicitly use 
parallel streams, what is the difference?



Sent from Mail for Windows 10



read/write on different node?

2021-01-20 Thread Luke
Hi,

I have one data collection on 3 shards and 2 replicas, user searches on it.
Also I log all user queries and save to another collection on the same solr
cloud, but user queries are very slow when there are a lot of logs to be
written to the log collection.

any solution for me, please advise. or does solr support separate write
operation on different node and read on other nodes?


Solr Cloud freezes during scheduled backup

2021-01-20 Thread Paweł Róg
Hello everyone,
I have a nasty problem with the scheduled Solr collections backup. From
time to time when a scheduled backup is triggered (backup operation takes
around 10 minutes) Solr freezes for 20-30 seconds. The freeze happens on
one Solr instance at time but this affects all queries latency (because of
distributed queries on 6 shards). I can reproduce the problem only when
updates in the Solr cluster are enabled. When I disable updates, the
problem is gone.

Lucene index is not big and fits into OS cache. I am wondering if taking a
backup can be the culprit of the problem. I'm wondering if the process
messes up operating system caches. Maybe all the files which are copied to
NFS are eating up the OS cache and when the OS reaches high memory usage it
starts cleaning up memory and making Solr to freeze.

During the time of freeze monitoring charts are showing higher IO wait
times. In addition to that Solr nodes which seem to be affected are
reaching 95-100% total memory usage (used + buffers + caches).

I cannot see anything valuable in GC logs apart from a message which
suggests that the application was stopped for 20-30 seconds (Application
time).

The cluster consists of 12 machines. Each Solr is running on Ubuntu 16.04.
All the servers are running in AWS EC2. Each Solr node is running inside
Docker. EC2 instances have local SSD disks (but the same problem appeared
with EBS).

Does anyone have a similar problem and can share some thoughts? I'll
appreciate all help.

--
Pawel Rog


Incorrect distance returned for indexed polygone shape

2021-01-20 Thread Famas
I am using `geodist()` in solr query. Following this
`select?=&fl=*,_dist_:geodist()&fq={!geofilt
d=30444}&indent=on&pt=50.53,-9.5722616&q=*:*&sfield=geo&spatial=true&wt=json`
However, it seems like distance calculations aren’t working.  Here’s an
example query where the pt is several hundred kilometers away from the
POLYGON. The problem that the calculated geodist is always `20015.115` .

This is my query response:

```
{
  "responseHeader":{
"status":0,
"QTime":0,
"params":{
  "q":"*:*",
  "pt":"50.53,-9.5722616",
  "indent":"on",
  "fl":"*,_dist_:geodist()",
  "fq":"{!geofilt d=30444}",
  "sfield":"geo",
  "spatial":"true",
  "wt":"json"}},
  "response":{"numFound":3,"start":0,"docs":[
  {
"id":"1",
"document_type_id":"1",
"geo":["POLYGON ((3.837490081787109 43.61234105514181,
3.843669891357422 43.57877424689641, 3.893280029296875 43.57205863840097,
3.9458084106445312 43.58872191986938, 3.921947479248047 43.62762639320158,
3.8663291931152344 43.63321761913266, 3.837490081787109
43.61234105514181))"],
"_version_":1689241382273679360,
"timestamp":"2021-01-18T16:08:40.484Z",
"_dist_":20015.115},
  {
"id":"4",
"document_type_id":"4",
"geo":["POLYGON ((-0.94482421875 45.10454630976873, -0.98876953125
44.6061127451739, 0.06591796875 44.134913443750726, 0.32958984375
45.1510532655634, -0.94482421875 45.10454630976873))"],
"_version_":1689244486784253952,
"timestamp":"2021-01-18T16:58:01.177Z",
"_dist_":20015.115},
  {
"id":"8",
"document_type_id":"8",
"geo":["POLYGON ((-2.373046875 48.29781249243716, -2.28515625
48.004625021133904, -1.5380859375 47.76886840424207, -0.32958984375
47.79839667295524, -0.5712890625 48.531157010976706, -2.373046875
48.29781249243716))"],
"_version_":1689252312264998912,
"timestamp":"2021-01-18T19:02:24.137Z",
"_dist_":20015.115}]
  }}
```
This is my solr field type definition:
```xml




```
This is how I index my polygon:

```json
{
  "id": 12,
  "document_type_id": 12,
  "geo": "POLYGON ((3.77105712890625 43.61171961774284, 3.80401611328125
43.57939602461448, 3.8610076904296875 43.59580863402625, 3.8603210449218746
43.61519958447072, 3.826675415039062 43.628123412124616, 3.7827301025390625
43.63110543935801, 3.77105712890625 43.61171961774284))"
}
```

By the way I'm using solr 6.6 and I found 2 issues about this :

https://issues.apache.org/jira/browse/SOLR-12899 
https://issues.apache.org/jira/browse/SOLR-12899

Does there an explanation !?
Any help would be appreciated!



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html