Re: Trouble Installing Solr 7.1.0 On Ubunti 17
Hi, Maybe, you have a wrong path. Try below. $ sudo solr-7.1.0/bin/install_solr_service.sh Thanks, Yasufumi. 2017-10-24 12:11 GMT+09:00 Dane Terrell : > Hi I'm new to apache solr. I'm looking to install apache solr 7.1.0 on my > localhost computer. I downloaded and extracted the tar file in my tmp > folder. But when I try to run the script... sudo: > solr-7.1.0/solr/bin/install_solr_service.sh: command not found > or > solr-7.1.0/solr/bin/install_solr_service.sh --strip-components=2 > I get the same error message. Can anyone help? > Dane
Re: Solr nodes going into recovery mode and eventually failing
Thanks Emir and Zisis. I added the maxRamMB for filterCache and reduced the size. I could the benefit immediately, the hit ratio went to 0.97. Here's the configuration: It seemed to be stable for few days, the cache hits and jvm pool utilization seemed to be well within expected range. But the OOM issue occurred on one of the nodes as the heap size reached 30gb. The hit ratio for query result cache and document cache at that point was recorded as 0.18 and 0.65. I'm not sure if the cache caused the memory spike at this point, with filter cache restricted to 500mb, it should be negligible. One thing I noticed is that the eviction rate now (with the addition of maxRamMB) is staying at 0. Index hard commit happens at every 10 min, that's when the cache gets flushed. Based on the monitoring log, the spike happened on the indexing side where almost 8k docs went to pending state. On the query performance standpoint, there have been occasional slow queries (1sec+), but nothing alarming so far. Same goes for deep paging, I haven't seen any evidence which points to that. Based on the hit ratio, I can further scale down the query result and document cache, also change to FastLRUCache and add maxRamMB. For filter cache, I think this setting should be optimal enough to work on a 30gb heap space unless I'm wrong on the maxRamMB concept. I'll have to get a heap dump somehow, unfortunately, the whole process (of the node going down) happens so quickly, I've hardly any time to run a profiler. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr deep paging queries run very slow due to redundant q param
Pinging again. Anyone has ideas on this? Thanks On Sat, Oct 14, 2017 at 4:52 PM, Sundeep T wrote: > Hello, > > In our scale environment, we see that the deep paging queries using > cursormark are running really slow. When we traced out the calls, we see > that the second query which queries the individual id's of matched pages is > sending the q param that is already sent by the first query again. If we > remove the q param and directly query for ids, the query runs really fast. > > For example, the initial pagination query is like this with q param on > timestamp field - > > 2017-10-14 12:20:51.647 UTC INFO (qtp331844619-393343) > [core='x:c6e422fc3054c475-core-1'] org.apache.solr.core.SolrCore. > Request@2304 [c6e422fc3054c475-core-1] webapp=/solr path=/select > params={distrib=false&df=text&paginatedQuery=true&fl=id& > shards.purpose=4&start=0&fsv=true&sort=timestamp+desc+,id+asc&shard.url= > http://ops-data-solr-svc-1.rattle.svc.cluster.local:80/solr/ > c6e422fc3054c475-core-1&*rows=50*&version=2& > *q=(timestamp:["2017-10-13T18:42:36Z"+TO+"2017-10-13T21:09:00Z"])*& > shards.tolerant=true&*cursorMark=**&NOW=1507928978918&isShard= > true&timeAllowed=-1&wt=javabin&trackingId=d5eff5476247487555b7413214648} > hits=40294067 status=0 QTime=12727 > > This query results in a second query due to solr implementation of deep > paging like below. In this query, we already know the ids to be matched. > So, there is no reason to pass the q param again. We tried manually > executing the below query without the q param and just passing the ids > alone and that executes in 50ms. So, this looks like a bug that Solr is > passing in the q param again. Any ideas if there is workaround for this > problem we can use? > > 2017-10-14 12:21:09.193 UTC INFO (qtp331844619-742579) > [core='x:6d63f95961c46475-core-1'] org.apache.solr.core.SolrCore. > Request@2304 [6d63f95961c46475-core-1] webapp=/solr path=/select > params={distrib=false&df=text&paginatedQuery=true&fl=*,[ > removedocvaluesuffix]&shards.purpose=64&shard.url=http:// > ops-data-solr-svc-1.rattle.svc.cluster.local:80/solr/ > 6d63f95961c46475-core-1&rows=50&version=2& > *q=(timestamp:["2017-10-14T08:50:16.340Z"+TO+"2017-10-14T19:19:50Z"])*& > shards.tolerant=true&NOW=1507983581099&ids=00f037832e571941ed46ddd1959205 > 02,145c82e3eaa7678564b9e520822a3de1,09633cfabc6c830dfb44e04c313ba6b4, > 0032a76ed4ea01207c2891070348ea39,1b5179ee23fe3e17236da37d6b8d991f, > 04ee42e481b2a657bd3bb3c9f91b5ed5,2a910cf8a259925046a0c9fb5ee013c3, > 1d1d607b03c18ec59c14c2f9ca0ab47f,034e775c96633dae7e629a1d37da86e6, > 2759ca26d449d5df9f41689aa8ed3bac,16995a57699a7bb56d5018fe145028ce, > 0509d16399e679470ffc07a8af22a918,1797ab6e0174c65bf2f6b650b3538407, > 11c804ec4ae153a31929abe8613de336,11d20ed5dc0cf3d71f57aefc4e4b3ee2, > 0135baecd2d3ae819909a0c021bbd48b,224b0671196fd141196b15add2e49b91, > 271088227cf81e3641130d3bd5de8cc6,01f266b9c130239a06b00e45bda277a0, > 1438bed6ffd956f1c49d765b942f1988,2fc9fef6500124b1b48218169a7cf974, > 2d85d00593847398bf09e168bb3a190c,10e1c2803df1db3d47e525d3bd8a1868, > 28b6d72729e79da3ad65ac3879740133,14be34af9995721b358b3fdb0bcb18d7, > 1f2e0867bd495b8a332b8c8bd8ce2508,12cf1a1c07d9b9550ece4079b15f7583, > 022cd0b3eef93cd9a6389c0394cf3899,11aa3132e00a96df6a49540612b91c8f, > 0ff348e0433c9e751f1475db6dcab213,2b48279c9ff833f43a910edfa455a31d, > 241e002d744ff0215155f214166fdd49,0fee30860c82d9a24bb8389317cd772c, > 07f04d380832f514b0575866958eebaa,20b0efa5d88e2a9950fa4fd8ba930455, > 14a9cadb7c75274bfc028bb9ae31236b,1829730aa4ee4750eb242266830b576b, > 1ad5012e83bd271cf00b0c70ea86a856,0af4247d057bd833753e1f7bef959fc4, > 0a09767d81cb351ab1598987022b6955,2f166fae9ca809642b8e20cea3020c24, > 2c4d900575d8594a040c94751af59cb5,03f1c46a004a4e3b995295b512e1e324, > 2c2aae83afc7426424c7de5301f8c692,034baf21ac1db436a7f3f2cf2cc668b0, > 1dda29d03fb8611f8de80b90685fd9ee,0632292ab704dcaa606440cb1fee017b, > 0fbd68f293c6964458a93f3034348625,2cdff46ab2e4d44b42f3381d5e3250b7, > 1b2c90dce4a51b5e5c344fc2f9ab431d&isShard=true&timeAllowed=- > 1&wt=javabin&trackingId=d5eff5476247487555b80c9ac7b82} status=0 > QTime=18136 > > Thanks > Sundeep >
Trouble Installing Solr 7.1.0 On Ubunti 17
Hi I'm new to apache solr. I'm looking to install apache solr 7.1.0 on my localhost computer. I downloaded and extracted the tar file in my tmp folder. But when I try to run the script... sudo: solr-7.1.0/solr/bin/install_solr_service.sh: command not found or solr-7.1.0/solr/bin/install_solr_service.sh --strip-components=2 I get the same error message. Can anyone help? Dane
How to make use of some features from lucene in SOLR?
i need to implement some rather customized sort in SOLR, i would appreciate if you could give some high-level pointer on the following: 1/ could i make use of customized Collector (lucene level) in SOLR? 2/ could i make use of functional query (lucene level) for customized sort in SOLR? 3/ i would like to pass over some parameters at query time for Collector & functional query, those parameters are not related to SOLR, soly for our implementation, could SOLR API accommodate those parameters? i searched solrconfig.xml and schema.xml but did not see (could have missed, a pointer would be great) Thanks very much for helps, Lisheng
Re: Replacing legacyCloud Behaviour in Solr7
Thanks for the quick reply, Erick. To follow up: “ Well, first you can explicitly set legacyCloud=true by using the Collections API CLUSTERPROP command. I don't recommend this, mind you, as legacyCloud will not be supported forever. “ Yes, but like you say: we’ll have to deal with at some point, not much benefit in punting. “ I'm not following something here though. When you say: "The desired final state of a such a deployment is a fully configured cluster ready to accept updates." are there any documents already in the index or is this really a new collection? “ It’s a brand new collection with new configuration on fresh hardware which we’ll then fully index from a source document store (we do this when we have certain schema changes that require re-indexing or we want to experiment). “ Not sure what you mean here. Configuration of what? Just spinning up a Solr node pointing to the right ZooKeeper should be sufficient, or I'm not understanding at all. “ Apologies, the way I stated that was all wrong: by “requires configuration” I just meant to note the need to specify a shard and a node when adding a replica (and not even the node as you point out to me below ☺). “ I suspect you're really talking about the "node" parameter to ADDREPLCIA “ Ah, yes: that is what I meant, sorry. It sounds like I haven’t missed too much in the documentation then, I’ll look more into replica placement rules. Thank you so much again for your time and help. Marko On 10/23/17, 4:33 PM, "Erick Erickson" wrote: Well, first you can explicitly set legacyCloud=true by using the Collections API CLUSTERPROP command. I don't recommend this, mind you, as legacyCloud will not be supported forever. I'm not following something here though. When you say: "The desired final state of a such a deployment is a fully configured cluster ready to accept updates." are there any documents already in the index or is this really a new collection? and "adding new nodes requires explicit configuration" Not sure what you mean here. Configuration of what? Just spinning up a Solr node pointing to the right ZooKeeper should be sufficient, or I'm not understanding at all. If not, your proposed outline seems right with one difference: "if a node needs to be added: provision a machine, start up Solr, use ADDREPLICA from Collections API passing shard number and coreNodeName" coreNodeName isn't something you ordinarily need to bother with. I'm being specific here where coreNodeName is usually something like core_node7. I suspect you're really talking about the "node" parameter to ADDREPLCIA, something like: 192.168.1.32:8983_solr, the entry from live_nodes. Now, all that said you may be better off just letting Solr add the replica where it wants, it'll usually put a new replica on a node without replicas so specifying the collection and shard should be sufficient. Also, note that there are replica placement rules that can help enforce this kind of thing. Best, Erick On Mon, Oct 23, 2017 at 3:12 PM, Marko Babic wrote: > Hi everyone, > > I'm working on upgrading a set of clusters from Solr 4.10.4 to Solr 7.1.0. > > Our deployment tooling no longer works given that legacyCloud defaults to false (SOLR-8256) and I'm hoping to get some advice on what to do going forward. > > Our setup is as follows: > * we run in AWS with multiple independent Solr clusters, each with its own Zookeeper tier > * each cluster hosts only a single collection > * each machine/node in the cluster has a single core / is a replica for one shard in the collection > > We bring up new clusters as needed. This is entirely automated and basically works as follows: > * we first provision and set up a fresh Zookeeper tier > * then, we provision a Solr bootstrapper machine that uploads collection config, specifies numShards and starts up > * it's then easy provision the rest of the machines and have them automatically join a shard in the collection by hooking them to the right Zookeeper cluster and specifying numShards > * if a node needs to be added to the cluster we just need to spin a machine up and start up Solr > > The desired final state of a such a deployment is a fully configured cluster ready to accept updates. > > Now that legacyCloud is false I'm not sure how to preserve this pretty nice, hands-off deployment style as the bootstrapping performed by the first node provisioned doesn't create a collection and adding new nodes requires explicit configuration. > > A new deployment procedure that I've worked out using the Collections API would look like: > * provision Zookeeper tier > * provision all the Solr nodes, wait for them all to come up > * upload c
Re: Replacing legacyCloud Behaviour in Solr7
Well, first you can explicitly set legacyCloud=true by using the Collections API CLUSTERPROP command. I don't recommend this, mind you, as legacyCloud will not be supported forever. I'm not following something here though. When you say: "The desired final state of a such a deployment is a fully configured cluster ready to accept updates." are there any documents already in the index or is this really a new collection? and "adding new nodes requires explicit configuration" Not sure what you mean here. Configuration of what? Just spinning up a Solr node pointing to the right ZooKeeper should be sufficient, or I'm not understanding at all. If not, your proposed outline seems right with one difference: "if a node needs to be added: provision a machine, start up Solr, use ADDREPLICA from Collections API passing shard number and coreNodeName" coreNodeName isn't something you ordinarily need to bother with. I'm being specific here where coreNodeName is usually something like core_node7. I suspect you're really talking about the "node" parameter to ADDREPLCIA, something like: 192.168.1.32:8983_solr, the entry from live_nodes. Now, all that said you may be better off just letting Solr add the replica where it wants, it'll usually put a new replica on a node without replicas so specifying the collection and shard should be sufficient. Also, note that there are replica placement rules that can help enforce this kind of thing. Best, Erick On Mon, Oct 23, 2017 at 3:12 PM, Marko Babic wrote: > Hi everyone, > > I'm working on upgrading a set of clusters from Solr 4.10.4 to Solr 7.1.0. > > Our deployment tooling no longer works given that legacyCloud defaults to > false (SOLR-8256) and I'm hoping to get some advice on what to do going > forward. > > Our setup is as follows: > * we run in AWS with multiple independent Solr clusters, each with its own > Zookeeper tier > * each cluster hosts only a single collection > * each machine/node in the cluster has a single core / is a replica for one > shard in the collection > > We bring up new clusters as needed. This is entirely automated and basically > works as follows: > * we first provision and set up a fresh Zookeeper tier > * then, we provision a Solr bootstrapper machine that uploads collection > config, specifies numShards and starts up > * it's then easy provision the rest of the machines and have them > automatically join a shard in the collection by hooking them to the right > Zookeeper cluster and specifying numShards > * if a node needs to be added to the cluster we just need to spin a machine > up and start up Solr > > The desired final state of a such a deployment is a fully configured cluster > ready to accept updates. > > Now that legacyCloud is false I'm not sure how to preserve this pretty nice, > hands-off deployment style as the bootstrapping performed by the first node > provisioned doesn't create a collection and adding new nodes requires > explicit configuration. > > A new deployment procedure that I've worked out using the Collections API > would look like: > * provision Zookeeper tier > * provision all the Solr nodes, wait for them all to come up > * upload collection config + solr.xml to Zookeeper > * create collection using Collections API > * if a node needs to be added: provision a machine, start up Solr, use > ADDREPLICA from Collections API passing shard number and coreNodeName > > This isn’t a giant deal to build but it adds complexity that I'm not excited > about as deployment tooling needs to have some understanding of what the > global state of the cluster is before being able to create a collection or > when adding/replacing nodes. > > The questions I was hoping someone would have some time to help me with are: > > * Does the new deployment procedure I've suggested seem reasonable? Would we > be doing anything wrong/fighting best practices? > * Is there a way to keep cluster provisioning automated without having to > build additional orchestration logic into our deployment tooling (using > autoscaling, or triggers, or something I don’t know about)? > > Apologies for the wall of text and thanks. :) > > Marko >
How to Efficiently Extract Learning to Rank Similarity Features From Solr?
Hi, I'm trying to extract several similarity measures from Solr for use in a learning to rank model. Doing this mathematically involves taking the dot product of several different matrices, which is extremely fast for non-huge data sets (e.g., millions of documents and queries). However, to extract these similarity features from Solr, I have to perform a Solr query for each query, which introduces several bottlenecks. Are there more efficient means of computing these similarity measures for large numbers of queries (other than increased parallelism)? Thanks, Michael A. Alcorn
Replacing legacyCloud Behaviour in Solr7
Hi everyone, I'm working on upgrading a set of clusters from Solr 4.10.4 to Solr 7.1.0. Our deployment tooling no longer works given that legacyCloud defaults to false (SOLR-8256) and I'm hoping to get some advice on what to do going forward. Our setup is as follows: * we run in AWS with multiple independent Solr clusters, each with its own Zookeeper tier * each cluster hosts only a single collection * each machine/node in the cluster has a single core / is a replica for one shard in the collection We bring up new clusters as needed. This is entirely automated and basically works as follows: * we first provision and set up a fresh Zookeeper tier * then, we provision a Solr bootstrapper machine that uploads collection config, specifies numShards and starts up * it's then easy provision the rest of the machines and have them automatically join a shard in the collection by hooking them to the right Zookeeper cluster and specifying numShards * if a node needs to be added to the cluster we just need to spin a machine up and start up Solr The desired final state of a such a deployment is a fully configured cluster ready to accept updates. Now that legacyCloud is false I'm not sure how to preserve this pretty nice, hands-off deployment style as the bootstrapping performed by the first node provisioned doesn't create a collection and adding new nodes requires explicit configuration. A new deployment procedure that I've worked out using the Collections API would look like: * provision Zookeeper tier * provision all the Solr nodes, wait for them all to come up * upload collection config + solr.xml to Zookeeper * create collection using Collections API * if a node needs to be added: provision a machine, start up Solr, use ADDREPLICA from Collections API passing shard number and coreNodeName This isn’t a giant deal to build but it adds complexity that I'm not excited about as deployment tooling needs to have some understanding of what the global state of the cluster is before being able to create a collection or when adding/replacing nodes. The questions I was hoping someone would have some time to help me with are: * Does the new deployment procedure I've suggested seem reasonable? Would we be doing anything wrong/fighting best practices? * Is there a way to keep cluster provisioning automated without having to build additional orchestration logic into our deployment tooling (using autoscaling, or triggers, or something I don’t know about)? Apologies for the wall of text and thanks. :) Marko
Re: Solr boosting multiple fields using edismax parser.
Thanks for your reply. can the recip function be used to boost a numeric field here: recip(ord(rating),100,1,1) -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr boosting multiple fields using edismax parser.
You can pass additional bq params in the query. ~Aravind On Oct 23, 2017 4:10 PM, "ruby" wrote: > If I want to boost multiple fields using Edismax query parser, is following > the correct way of doing it: > > > > edismax > field1:(apple)^500 > field1:(orange)^400 > field1:(pear)^300 > field2:(4)^500 > field2:(2)^100 > recip(ms(NOW,mod_date),3.16e-11,1,1) > recip(ms(NOW,creation_date),3.16e-11,1,1) > > And if boost is configured in solrconfig.xml, can I still pass additional > boost queries through boost query? > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Solr boosting multiple fields using edismax parser.
If I want to boost multiple fields using Edismax query parser, is following the correct way of doing it: edismax field1:(apple)^500 field1:(orange)^400 field1:(pear)^300 field2:(4)^500 field2:(2)^100 recip(ms(NOW,mod_date),3.16e-11,1,1) recip(ms(NOW,creation_date),3.16e-11,1,1) And if boost is configured in solrconfig.xml, can I still pass additional boost queries through boost query? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Solr boosting multiple fields using edismax parser.
If I want to boost multiple fields using Edismax query parser, is following the correct way of doing it: edismax field1:(apple)^500 field1:(orange)^400 field1:(pear)^300 field2:(4)^500 field2:(2)^100 recip(ms(NOW,mod_date),3.16e-11,1,1) recip(ms(NOW,creation_date),3.16e-11,1,1) And if boost is configured in solrconfig.xml, can I still pass additional boost queries through boost query? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Really slow facet performance in 6.6
John Davis wrote: > We are seeing really slow facet performance with new solr release. > This is on an index of 2M documents. I am currently running some performance experiments on simple String faceting, comparing Solr 4 & 6. There is definitely a performance difference, but it is not trivial to pinpoint where it is. My first thought was that it was tied to the Solr version, with Solr 6 being markedly slower than Solr 4. However, looking at segment count, I can see that Solr 6 has twice as many segments as Solr 4 for my test setup. I tried optimizing down to 10 segments, which flipped the result: Suddenly Solr 6 was faster than Solr 4. I'm still poking at this, but I guess my takeaway for now is to be sure to compare on fair terms. The strategy for creating segments can be tweaked and (guessing a lot here) it seems that Solr 6 defaults leans towards faster indexing (by having more small segments) at the cost of faceting performance. These JIRAs seems relevant: https://issues.apache.org/jira/browse/SOLR-8096 https://issues.apache.org/jira/browse/SOLR-9599 > 1. method=uif however that didn't help much (the facet fields have > docValues=false since they are multi-valued). Debug info below. docValues works fine with multi-values (at least for Strings). - Toke Eskildsen
Re: Merging is not taking place with tiered merge policy
1> merging takes place up until the max segment size is reached (5G in the default TieredMergePolicy). 2> there are a couple of options, again config changes for TieredMergePolicy 10 might help. You could also try upping this (the default is 5G). 5000 Best, Erick On Mon, Oct 23, 2017 at 10:34 AM, chandrushanmugasundaram wrote: > Thanks eric. > > (Beginner in solr). Few questions. > > 1. Does merging take place only when we have deleted docs? > When my segments reach a count of 35+ the search is getting slow.Only on > performing force merge to index the search is efficient. > > 2. Is there any way we can reduce the number of segments in solr > automatically without any cron job by just altering some configuration in > solrconfig.xml. > > > > > > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Facets based on sampling
Docvalues don't work for multivalued fields. I just started a separate thread with more debug info. It is a bit surprising why facet computation is so slow even when the query matches hundreds of docs. On Mon, Oct 23, 2017 at 6:53 AM, alessandro.benedetti wrote: > Hi John, > first of all, I may state the obvious, but have you tried docValues ? > > Apart from that a friend of mine ( Diego Ceccarelli) was discussing a > probabilistic implementation similar to the hyperloglog[1] to approximate > facets counting. > I didn't have time to take a look in details / implement anything yet. > But it is on our To Do list :) > He may add some info here. > > Cheers > > > > > [1] > https://blog.yld.io/2017/04/19/hyperloglog-a-probabilistic-data-structure/ > > > > - > --- > Alessandro Benedetti > Search Consultant, R&D Software Engineer, Director > Sease Ltd. - www.sease.io > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Merging is not taking place with tiered merge policy
Thanks eric. (Beginner in solr). Few questions. 1. Does merging take place only when we have deleted docs? When my segments reach a count of 35+ the search is getting slow.Only on performing force merge to index the search is efficient. 2. Is there any way we can reduce the number of segments in solr automatically without any cron job by just altering some configuration in solrconfig.xml. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Merging is not taking place with tiered merge policy
Amrit, Thanks for your reply. I have removed that 1000 1 15 false 1024 2 2 hdfs 1 0 -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
segment merge in solr not happening
I find the Lucene segments in the backend is not merging and the segment count increases to a lot. I changed the merge policy from LogByteSizeMergePolicy to TieredMergePolicy I tried altering properties according to the solr documentation but still, my segments are high. I am using solr 6.1.X. **The index data is stored in HDFS.** My index config of solrconfig.xml 1000 1 15 false 1024 10 1 10 10 hdfs 1 0 The only way we optimize is by force merging which is IO costly and also takes hours to complete. I have a cluster of three shards and replication factor as 2. Can anyone help me where I am going wrong -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Really slow facet performance in 6.6
Hello, We are seeing really slow facet performance with new solr release. This is on an index of 2M documents. A few things we've tried: 1. method=uif however that didn't help much (the facet fields have docValues=false since they are multi-valued). Debug info below. 2. changing query (q=) that selects what documents to compute facets on didn't help a lot, except repeating the same query was fast presumably due to exact cache hits. Sample debug info: “timing”: { “prepare”: { “debug”: { “time”: 0.0 }, “expand”: { “time”: 0.0 }, “facet”: { “time”: 0.0 }, “facet_module”: { “time”: 0.0 }, “highlight”: { “time”: 0.0 }, “mlt”: { “time”: 0.0 }, “query”: { “time”: 0.0 }, “stats”: { “time”: 0.0 }, “terms”: { “time”: 0.0 }, “time”: 0.0 }, “process”: { “debug”: { “time”: 87.0 }, “expand”: { “time”: 0.0 }, “facet”: { “time”: 9814.0 }, “facet_module”: { “time”: 0.0 }, “highlight”: { “time”: 0.0 }, “mlt”: { “time”: 0.0 }, “query”: { “time”: 20.0 }, “stats”: { “time”: 0.0 }, “terms”: { “time”: 0.0 }, “time”: 9922.0 }, “time”: 9923.0 } }, "facet-debug": { "elapse": 8310, "sub-facet": [ { "action": "field facet", "elapse": 8310, "maxThreads": 2, "processor": "SimpleFacets", "sub-facet": [ {}, { "appliedMethod": "UIF", "field": "school", "inputDocSetSize": 476, "requestedMethod": "UIF" }, { "appliedMethod": "UIF", "elapse": 2575, "field": "work", "inputDocSetSize": 476, "requestedMethod": "UIF" }, { "appliedMethod": "UIF", "elapse": 8310, "field": "level", "inputDocSetSize": 476, "requestedMethod": "UIF" } ] } Thanks John
Re: Merging is not taking place with tiered merge policy
And please define what you mean by "merging is not working". One parameter is max segments size, which defaults to 5G. Segments at or near that size are not eligible for merging unless they have around 50% deleted docs. Best, Erick On Mon, Oct 23, 2017 at 3:11 AM, Amrit Sarkar wrote: > Chandru, > > Didn't try the above config bu whyt have you defined both "mergePolicy" and > "mergePolicyFactory"? and pass different values for same parameters? > > > >> 10 >> 1 >> >> >> 10 >> 10 >> >> > > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > On Mon, Oct 23, 2017 at 11:00 AM, Chandru Shanmugasundaram < > chandru.shanmugasunda...@exterro.com> wrote: > >> The following is my solrconfig.xml >> >> >> 1000 >> 1 >> 15 >> false >> 1024 >> >> 10 >> 1 >> >> >> 10 >> 10 >> >> hdfs >> >> 1 >> 0 >> >> >> >> Please let me know if should I tweak something above >> >> >> -- >> Thanks, >> Chandru.S >>
Re: solr core replication
Great, thanks for bringing closure to this! oh, and one addendum. I wrote: It'll probably be around forever since replication is used as a fall-back Forget the "probably" there. In 7.x there are new replica types that use this as their way of distributing the index, see the PULL replica type. So forget the "probably" in that statement ;) Best, Erick On Mon, Oct 23, 2017 at 6:45 AM, Hendrik Haddorp wrote: > Hi Erick, > > sorry for the slow reply. You are right, the information is not persisted. > Once I do a restart there is no information about the replication source > anymore. That explains why I could not find it anywhere persisted ;-) I > thought I had tested that last week but must have not done so as it worked > just fine now. > > thanks, > Hendrik > > On 20.10.2017 16:39, Erick Erickson wrote: >> >> Does that persist even after you restart Solr on the target cluster? >> >> And that clears up one bit of confusion I had, I didn't know how you >> were having each shard on the target cluster use a different master URL >> given they all use the same solrconfig file. I was guessing some magic >> with >> system variables, but it turns out you were wy ahead of me and >> not configuring the replication in solrconfig at all. >> >> But no, I know of no API level command that works to do what you're >> asking. >> I also don't know where that data is persisted, I'm afraid you'll have to >> go >> code-diving for all the help I can be >> >> Using fetchindex this way in SolrCloud is something of an edge case. It'll >> probably be around forever since replication is used as a fall-back when >> a replica syncs, but there'll be some bits like this hanging around I'd >> guess. >> >> Best, >> Erick >> >> On Thu, Oct 19, 2017 at 11:55 PM, Hendrik Haddorp >> wrote: >>> >>> Hi Erick, >>> >>> that is actually the call I'm using :-) >>> If you invoke >>> http://solr_target_machine:port/solr/core/replication?command=details >>> after >>> that you can see the replication status. But even after a Solr restart >>> the >>> call still shows the replication relation and I would like to remove this >>> so >>> that the core looks "normal" again. >>> >>> regards, >>> Hendrik >>> >>> On 20.10.2017 02:31, Erick Erickson wrote: Little known trick: The fetchIndex replication API call can take any parameter you specify in your config. So you don't have to configure replication at all on your target collection, just issue the replication API command with masterUrl, something like: http://solr_target_machine:port/solr/core/replication?command=fetchindex&masterUrl=http://solr_source_machine:port/solr/core NOTE, "core" above will be something like collection1_shard1_replica1 During the fetchindex, you won't be able to search on the target collection although the source will be searchable. Now, all that said this is just copying stuff. So let's say you've indexed to your source cluster and set up your target cluster (but don't index anything to the target or do the replication etc). Now if you shut down the target cluster and just copy the entire data dir from each source replica to each target replica then start all the target Solr instances up you'll be fine. Best, Erick On Thu, Oct 19, 2017 at 1:33 PM, Hendrik Haddorp wrote: > > Hi, > > I want to transfer a Solr collection from one SolrCloud to another one. > For > that I create a collection in the target cloud using the same config > set > as > on the source cloud but with a replication factor of one. After that > I'm > using the Solr core API with a "replication?command=fetchindex" command > to > transfer the data. In the last step I'm increasing the replication > factor. > This seems to work fine so far. When I invoke > "replication?command=details" > I can see my replication setup and check if the replication is done. In > the > end I would like to remove this relation again but there does not seem > to > be > an API call for that. Given that the replication should be a one time > replication according to the API on > https://lucene.apache.org/solr/guide/6_6/index-replication.html this > should > not be a big problem. It just does not look clean to me to leave this > in > the > system. Is there anything I'm missing? > > regards, > Hendrik >>> >>> >
Re: Goal: reverse chronological display Methods? (1) boost, and/or (2) disable idf
In addition : bf=recip(ms(NOW/DAY,unixdate),3.16e-11,5,0.1)) is an additive boost. I tend to prefer multiplicative ones but that is up to you [1]. You can specify the order of magnitude of the values generated by that function. This means that you have control of how much the date will affect the score. If you decide to go additive be careful with the order of magnitude of the scores : Your relevancy score magnitude will variate depending on the query and the index while your additive boost is going to be < constant. Regards [1] https://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/ - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: LTR feature extraction performance issues
It strictly depends on the kind of features you are using. At the moment there is just one cache for all the features. This means that even if you have 1 query dependent feature and 100 document dependent feature, a different value for the query dependent one will invalidate the cache entry for the full vector[1]. You may look to optimise your features ( where possible). [1] https://issues.apache.org/jira/browse/SOLR-10448 - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Facets based on sampling
Hi John, first of all, I may state the obvious, but have you tried docValues ? Apart from that a friend of mine ( Diego Ceccarelli) was discussing a probabilistic implementation similar to the hyperloglog[1] to approximate facets counting. I didn't have time to take a look in details / implement anything yet. But it is on our To Do list :) He may add some info here. Cheers [1] https://blog.yld.io/2017/04/19/hyperloglog-a-probabilistic-data-structure/ - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
RE: LTR feature cache performance issues
Has anyone had experience tuning feature caches? Do any of the values below look unreasonable? --Brian -Original Message- From: Brian Yee [mailto:b...@wayfair.com] Sent: Friday, October 20, 2017 1:41 PM To: solr-user@lucene.apache.org Subject: LTR feature extraction performance issues I enabled LTR feature extraction and response times spiked. I suppose that was to be expected, but are there any tips regarding performance? I have the feature values cache set up as described in the docs: Do I simply have to wait for the cache to fill up and hope that response times go down? Should I make these cache values bigger? * Brian
Re: solr core replication
Hi Erick, sorry for the slow reply. You are right, the information is not persisted. Once I do a restart there is no information about the replication source anymore. That explains why I could not find it anywhere persisted ;-) I thought I had tested that last week but must have not done so as it worked just fine now. thanks, Hendrik On 20.10.2017 16:39, Erick Erickson wrote: Does that persist even after you restart Solr on the target cluster? And that clears up one bit of confusion I had, I didn't know how you were having each shard on the target cluster use a different master URL given they all use the same solrconfig file. I was guessing some magic with system variables, but it turns out you were wy ahead of me and not configuring the replication in solrconfig at all. But no, I know of no API level command that works to do what you're asking. I also don't know where that data is persisted, I'm afraid you'll have to go code-diving for all the help I can be Using fetchindex this way in SolrCloud is something of an edge case. It'll probably be around forever since replication is used as a fall-back when a replica syncs, but there'll be some bits like this hanging around I'd guess. Best, Erick On Thu, Oct 19, 2017 at 11:55 PM, Hendrik Haddorp wrote: Hi Erick, that is actually the call I'm using :-) If you invoke http://solr_target_machine:port/solr/core/replication?command=details after that you can see the replication status. But even after a Solr restart the call still shows the replication relation and I would like to remove this so that the core looks "normal" again. regards, Hendrik On 20.10.2017 02:31, Erick Erickson wrote: Little known trick: The fetchIndex replication API call can take any parameter you specify in your config. So you don't have to configure replication at all on your target collection, just issue the replication API command with masterUrl, something like: http://solr_target_machine:port/solr/core/replication?command=fetchindex&masterUrl=http://solr_source_machine:port/solr/core NOTE, "core" above will be something like collection1_shard1_replica1 During the fetchindex, you won't be able to search on the target collection although the source will be searchable. Now, all that said this is just copying stuff. So let's say you've indexed to your source cluster and set up your target cluster (but don't index anything to the target or do the replication etc). Now if you shut down the target cluster and just copy the entire data dir from each source replica to each target replica then start all the target Solr instances up you'll be fine. Best, Erick On Thu, Oct 19, 2017 at 1:33 PM, Hendrik Haddorp wrote: Hi, I want to transfer a Solr collection from one SolrCloud to another one. For that I create a collection in the target cloud using the same config set as on the source cloud but with a replication factor of one. After that I'm using the Solr core API with a "replication?command=fetchindex" command to transfer the data. In the last step I'm increasing the replication factor. This seems to work fine so far. When I invoke "replication?command=details" I can see my replication setup and check if the replication is done. In the end I would like to remove this relation again but there does not seem to be an API call for that. Given that the replication should be a one time replication according to the API on https://lucene.apache.org/solr/guide/6_6/index-replication.html this should not be a big problem. It just does not look clean to me to leave this in the system. Is there anything I'm missing? regards, Hendrik
Re: Solr nodes going into recovery mode and eventually failing
You mentioned hat you are on v. 6.6, but in case someone else uses this, just to add that maxRamMB is added to FastLRUCache in version 6.4. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 23 Oct 2017, at 14:35, Zisis T. wrote: > > shamik wrote >> I was not aware of maxRamMB parameter, looks like it's only available for >> queryResultCache. Is that what you are referring to? Can you please share >> your cache configuration? > > I've setup filterCache entry inside solrconfig.xml as follows > > * autowarmCount="0" maxRamMB="120"/>* > > I had a look inside FastLRUCache code and saw that maxRamMB has precedence > over the size. I can also confirm that I had more than 512 entries inside > the cache, so the above will work. > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Sort by field from another collection
Hi! I have one main collection of people and a few more collections with additional data. All search queries are on the main collection with joins to one or more additional collections. A simple example would be: (*:* {!join from=people_person_id to=people_person_id fromIndex=fundraising_donor_info v='total_donations_1y: [1000 TO 2000]'}) I need to sort results by fields from additional collections (e.g. "total_donations_1y”) . Is there any way to do that through the common query parameters? Or the only way is using streaming expressions? Dmitry
RE: Certificate issue ERR_SSL_VERSION_OR_CIPHER_MISMATCH
I was able to resolve the issue. I was adding the certificate and then I had combined my certificate and private key. So when I added the certificate plus the certificate and private key it was breaking. I removed just the certificate and it resolved the issue. So I had my root certificates and the certificate plus private key and everything starting working correctly. Thank you, Kent Younge Systems Engineer USPS MTSC IT Support 600 W. Rock Creek Rd, Norman, OK 73069-8357 O:405 573 2273 -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Friday, October 20, 2017 4:33 PM To: solr-user@lucene.apache.org Subject: Re: Certificate issue ERR_SSL_VERSION_OR_CIPHER_MISMATCH On 10/19/2017 6:30 AM, Younge, Kent A - Norman, OK - Contractor wrote: > Built a clean Solr server imported my certificates and when I go to the > SSL/HTTPS page it tells me that I have ERR_SSL_VERSION_OR_CIPHER_MISMATCH in > Chrome and in IE tells me that I need to TURN ON TLS 1.0, TLS 1.1, and TLS > 1.2. What java version? What Java vendor? What operating system? The OS won't have a lot of impact on HTTPS, I just ask in case other information is desired, so we can tailor the information requests. I see other messages where you mention Solr 6.6, which requires Java 8. As Hoss mentioned to you in another thread, *all* of the SSL capability is provided by Java. The Jetty that ships with Solr includes a config for HTTPS. The included Jetty config *excludes* a handful of low-quality ciphers that your browser probably already refuses to use, but that's the only cipher-specific configuration. If you haven't changed the Jetty config in the Solr download, then Jetty defaults and your local Java settings will control everything else. As far as I am aware, Solr doesn't influence the SSL config at all. SSL_RSA_WITH_DES_CBC_SHA SSL_DHE_RSA_WITH_DES_CBC_SHA SSL_DHE_DSS_WITH_DES_CBC_SHA SSL_RSA_EXPORT_WITH_RC4_40_MD5 SSL_RSA_EXPORT_WITH_DES40_CBC_SHA SSL_DHE_RSA_EXPORT_WITH_DES40_CBC_SHA SSL_DHE_DSS_EXPORT_WITH_DES40_CBC_SHA It is extremely unlikely that Solr itself is causing these problems. It is more likely that there's something about your environment (java version, custom java config, custom Jetty config, browser customization, or maybe something else) that is resulting in a protocol and cipher list that your browser doesn't like. Thanks, Shawn
Re: Solr nodes going into recovery mode and eventually failing
shamik wrote > I was not aware of maxRamMB parameter, looks like it's only available for > queryResultCache. Is that what you are referring to? Can you please share > your cache configuration? I've setup filterCache entry inside solrconfig.xml as follows ** I had a look inside FastLRUCache code and saw that maxRamMB has precedence over the size. I can also confirm that I had more than 512 entries inside the cache, so the above will work. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: 3 color jvm memory usage bar
On Thu, 2017-10-19 at 08:56 -0700, Nawab Zada Asad Iqbal wrote: > I see three colors in the JVM usage bar. Dark Gray, light Gray, > white. (left to right). Only one dark and one light color made sense > to me (as i could interpret them as used vs available memory), but > there is light gray between dark gray and white parts. The light grey is the amount of memory reserved by the JVM. It is only visible if you do not specify Xms, so many people do not have that. Generally the dark grey (the amount of heap that is actively used to hold data) will fluctuate a lot and I don't find it very usable for observing and tweaking heap size. The GC-log is better. - Toke Eskildsen, Royal Danish Library
Re: Solr nodes going into recovery mode and eventually failing
Hi Shamik, I agree that your filter cache is not the reason for OOMs. Can you confirm that your fieldCache and filedValueCache sizes are not consuming too much memory. The next on the list would be some heavy faceting with pivots, but you mentioned that all fields are low cardinality. Do you see any extremely slow queries in your logs? Can you check if there are some deep paging queries? If nothing else, you can always do heap dump and see what’s in it. And about your filterCache hit ratio: how frequently do you commit? With 400 rq/h it can be that filters are not repeated between two commits. Do you have high eviction rate? Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 20 Oct 2017, at 20:10, shamik wrote: > > Zisis, thanks for chiming in. This is really an interesting information and > probably in line what I'm trying to fix. In my case, the facet fields are > certainly not high cardinal ones. Most of them have a finite set of data, > the max being 200 (though it has a low usage percentage). Earlier I had > facet.limit=-1, but then scaled down to 200 to eliminate any performance > overhead. > > I was not aware of maxRamMB parameter, looks like it's only available for > queryResultCache. Is that what you are referring to? Can you please share > your cache configuration? > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Problem JOIN Solr
Hi, i have a problem with JOIN-Function (query of two collections) I have two collections "ColAAA" and "ColBBB" ColAAA => field ABC fieldtype "text_general" or "string" ColBBB => fields XYZ and DEF fieldtype "string" Example of field "ABC" ->"SomeWord 250kg" With the JOIN-Function I want to subquery in collection ColBBB field "DEF" and use the result of XYZ as a query in field "ABC" of collection ColAAA Query in collection ColAAA {Join! from=XYZ to=ABC FromIndex=ColBBB}DEF=Something What works: When the result of XYZ is "SomeWord" (without 250kg) in ColBBB, that finds a match in ColAAA field ABC "SomeWord 250kg" What does not work: When the result of XYZ is "SomeWord 250kg" in ColBBB, that does not find in ColAAA field ABC "SomeWord 250kg" What do I miss? Greetings Guido [http://media.uksh.de/logo/uksh.gif] Universit?tsklinikum Schleswig-Holstein Rechtsf?hige Anstalt des ?ffentlichen Rechts der Christian-Albrechts-Universit?t zu Kiel und der Universit?t zu L?beck Vorstandsmitglieder: Prof. Dr. Jens Scholz (Vorsitzender), Peter Pansegrau, Christa Meyer, Prof. Dr. Thomas M?nte, Prof. Dr. Ulrich Stephani Vorsitzender der Gew?hrtr?gerversammlung: Dr. Philipp Nimmermann Bankverbindungen: F?rde Sparkasse IBAN: DE14 2105 0170 1002 06 SWIFT/BIC: NOLA DE 21 KIE Commerzbank AG IBAN: DE17 2308 0040 0300 0412 00 SWIFT/BIC: DRES DE FF 230 Diese E-Mail enth?lt vertrauliche Informationen und ist nur f?r die Personen bestimmt, an welche sie gerichtet ist. Sollten Sie nicht der bestimmungsgem??e Empf?nger sein, bitten wir Sie, uns hiervon unverz?glich zu unterrichten und die E-Mail zu vernichten. Wir weisen darauf hin, dass der Gebrauch und die Weiterleitung einer nicht bestimmungsgem?? empfangenen E-Mail und ihres Inhalts gesetzlich verboten sind und ggf. Schadensersatzanspr?che ausl?sen k?nnen.
Re: Merging is not taking place with tiered merge policy
Chandru, Didn't try the above config bu whyt have you defined both "mergePolicy" and "mergePolicyFactory"? and pass different values for same parameters? > 10 > 1 > > > 10 > 10 > > Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Mon, Oct 23, 2017 at 11:00 AM, Chandru Shanmugasundaram < chandru.shanmugasunda...@exterro.com> wrote: > The following is my solrconfig.xml > > > 1000 > 1 > 15 > false > 1024 > > 10 > 1 > > > 10 > 10 > > hdfs > > 1 > 0 > > > > Please let me know if should I tweak something above > > > -- > Thanks, > Chandru.S >