Re: How to block expensive solr queries
On Wed, Oct 9, 2019 at 9:59 AM Wei wrote: > Thanks all. I debugged a bit and see timeAllowed does not limit stats > call. Also I think it would be useful for solr to support a white list or > black list of operations as Toke suggested. Will create jira for it. > Currently seems the only option to explore is adding filter to solr's > embedded jetty. Does anyone have experience doing that? Do I also need to > change SolrDispatchFilter? > > On Tue, Oct 8, 2019 at 3:50 AM Toke Eskildsen wrote: > >> On Mon, 2019-10-07 at 10:18 -0700, Wei wrote: >> > /solr/mycollection/select?stats=true=unique_ids >> > cdistinct=true >> ... >> > Is there a way to block certain solr queries based on url pattern? >> > i.e. ignore the stats.calcdistinct request in this case. >> >> It sounds like it is possible for users to issue arbitrary queries >> against your Solr installation. As you have noticed, it makes it easy >> to perform a Denial Of Service (intentional or not). Filtering out >> stats.calcdistinct won't help with the next request for >> group.ngroups=true, facet.field=unique_id=1, >> rows=1 or something fifth. >> >> I recommend you flip your logic and only allow specific types of >> requests and put limits on those. To my knowledge that is not a build- >> in feature of Solr. >> >> - Toke Eskildsem, Royal Danish Library >> >> >>
Solr-8.2.0 Cannot create collection on CentOS 7.7
I have just installed Solr 8.2.0 on CentOS 7.7.1908. Java version is as follows: openjdk version "11.0.4" 2019-07-16 LTS OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS) OpenJDK 64-Bit Server VM 18.9 (build 11.0.4+11-LTS, mixed mode, sharing) I am using the following commad to create a collection "test" on Solr Cloud: solr create_collection -c test The output from the command follows: WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use. To turn off: bin/solr config -c test -p 8983 -action set-user-property -property update.autoCreateFields -value false ERROR: Failed to create collection 'test' due to: Underlying core creation failed while creating collection: test The problem seems to be caused by the following error: Caused by: java.time.format.DateTimeParseException: Text '2019-10-11T04:46:03.971Z' could not be parsed: null at java.base/java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:2017) at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1920) at org.apache.solr.update.processor.ParseDateFieldUpdateProcessorFactory.parseInstant(ParseDateFieldUpdateProcessorFactory.java:230) at org.apache.solr.update.processor.ParseDateFieldUpdateProcessorFactory.validateFormatter(ParseDateFieldUpdateProcessorFactory.java:214) Note that I have tested this and it is working on Windows 10 with Solr 8.2.0 using the following Java version: openjdk version "11.0.2" 2019-01-15 OpenJDK Runtime Environment 18.9 (build 11.0.2+9) OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode) The full detail from the solr.log file follows: 2019-10-11 04:45:58.361 INFO (qtp195801026-19) [ ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :create with params replicationFactor=1=-1=test=test=CREATE=1=json and sendToOCPQueue=true 2019-10-11 04:45:58.445 INFO (OverseerThreadFactory-9-thread-1-processing-n:192.168.1.33:8983_solr) [ ] o.a.s.c.a.c.CreateCollectionCmd Create collection test 2019-10-11 04:45:58.735 INFO (OverseerStateUpdate-72057977101680640-192.168.1.33:8983_solr-n_00) [ ] o.a.s.c.o.SliceMutator createReplica() { "operation":"ADDREPLICA", "collection":"test", "shard":"shard1", "core":"test_shard1_replica_n1", "state":"down", "base_url":"http://192.168.1.33:8983/solr;, "type":"NRT", "waitForFinalState":"false"} 2019-10-11 04:45:59.114 INFO (qtp195801026-21) [ x:test_shard1_replica_n1] o.a.s.h.a.CoreAdminOperation core create command qt=/admin/cores=core_node2=test=true=test_shard1_replica_n1=CREATE=1=test=shard1=javabin=2& replicaType=NRT 2019-10-11 04:45:59.119 INFO (qtp195801026-21) [ x:test_shard1_replica_n1] o.a.s.c.TransientSolrCoreCacheDefault Allocating transient cache for 2147483647 transient cores 2019-10-11 04:46:00.389 INFO (qtp195801026-21) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.c.RequestParams conf resource params.json loaded . version : 0 2019-10-11 04:46:00.390 INFO (qtp195801026-21) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.c.RequestParams request params refreshed to version 0 2019-10-11 04:46:00.424 INFO (qtp195801026-21) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.c.SolrResourceLoader [test_shard1_replica_n1] Added 61 libs to classloader, from paths: [/opt/solr-8.2.0/contrib/clustering/lib, /opt/solr-8.2.0/contrib/extraction/lib, /opt/solr-8.2.0/contrib/langid/lib, /o pt/solr-8.2.0/contrib/velocity/lib, /opt/solr-8.2.0/dist] 2019-10-11 04:46:00.814 INFO (qtp195801026-21) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.c.SolrConfig Using Lucene MatchVersion: 8.2.0 2019-10-11 04:46:01.323 INFO (qtp195801026-21) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.s.IndexSchema [test_shard1_replica_n1] Schema name=default-config 2019-10-11 04:46:03.017 INFO (qtp195801026-21) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.s.IndexSchema Loaded schema default-config/1.6 with uniqueid field id 2019-10-11 04:46:03.205 INFO (qtp195801026-21) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.c.CoreContainer Creating SolrCore 'test_shard1_replica_n1' using configuration from collection test, trusted=true 2019-10-11 04:46:03.212 INFO (qtp195801026-21) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.m.r.SolrJmxReporter JMX monitoring for 'solr.core.test.shard1.replica_n1' (registry 'solr.core.test.shard1.replica_n1') enabled at server: com.sun.jmx.mbeanserver.JmxMBeanServer@606d8acf 2019-10-11 04:46:03.258 INFO (qtp195801026-21) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.c.SolrCore [[test_shard1_replica_n1] ] Opening new SolrCore at [/opt/solr-8.2.0/server/solr/test_shard1_replica_n1], dataDir=[/opt/solr-8.2.0/server/solr/test_shard1_replica_n1/data/] 2019-10-11 04:46:03.496 INFO (qtp195801026-21) [c:test s:shard1
igain query parser generating invalid output
Hi, I apologise in advance for the length of this email, but I want to share my discovery steps to make sure that I haven't missed anything during my investigation... I am working on a classification project and will be using the classify(model()) stream function to classify documents. I have noticed that models generated include many noise terms from the (lexically) early part of the term list. To test, I have used the /BBC articles fulltext and category //dataset from Kaggle/ (https://www.kaggle.com/yufengdev/bbc-fulltext-and-category). I have indexed the data into a Solr collection (news_categories) and am performing the following operation to generate a model for documents categorised as "BUSINESS" (only keeping the 100th iteration): having( train( news_categories, features( news_categories, zkHost="localhost:9983", q="*:*", fq="role:train", fq="category:BUSINESS", featureSet="business", field="body", outcome="positive", numTerms=500 ), fq="role:train", fq="category:BUSINESS", zkHost="localhost:9983", name="business_model", field="body", outcome="positive", maxIterations=100 ), eq(iteration_i, 100) ) The output generated includes "noise" terms, such as the following "1,011.15", "10.3m", "01", "02", "03", "10.50", "04", "05", "06", "07", "09", and these terms all have the same value for idfs_ds ("-Infinity"). Investigating the "features()" output, it seems that the issue is that the noise terms are being returned with NaN for the score_f field: "docs": [ { "featureSet_s": "business", "score_f": "NaN", "term_s": "1,011.15", "idf_d": "-Infinity", "index_i": 1, "id": "business_1" }, { "featureSet_s": "business", "score_f": "NaN", "term_s": "10.3m", "idf_d": "-Infinity", "index_i": 2, "id": "business_2" }, { "featureSet_s": "business", "score_f": "NaN", "term_s": "01", "idf_d": "-Infinity", "index_i": 3, "id": "business_3" }, { "featureSet_s": "business", "score_f": "NaN", "term_s": "02", "idf_d": "-Infinity", "index_i": 4, "id": "business_4" },... I have examined the code within org/apache/solr/client/solrj/io/streamFeatureSelectionStream.java and see that the scores being returned by {!igain} include NaN values, as follows: { "responseHeader":{ "zkConnected":true, "status":0, "QTime":20, "params":{ "q":"*:*", "distrib":"false", "positiveLabel":"1", "field":"body", "numTerms":"300", "fq":["category:BUSINESS", "role:train", "{!igain}"], "version":"2", "wt":"json", "outcome":"positive", "_":"1569982496170"}}, "featuredTerms":[ "0","NaN", "0.0051","NaN", "0.01","NaN", "0.02","NaN", "0.03","NaN", Looking intoorg/apache/solr/search/IGainTermsQParserPlugin.java, it seems that when a term is not included in the positive or negative documents, the docFreq calculation (docFreq = xc + nc) is 0, which means that subsequent calculations result in NaN (division by 0) which generates these meaningless values for the computed score. I have patched a local version of Solr to skip terms for which docFreq is 0 in the finish() method of IGainTermsQParserPlugin and this is now the result: { "responseHeader":{ "zkConnected":true, "status":0, "QTime":260, "params":{ "q":"*:*", "distrib":"false", "positiveLabel":"1", "field":"body", "numTerms":"300", "fq":["category:BUSINESS", "role:train", "{!igain}"], "version":"2", "wt":"json", "outcome":"positive", "_":"1569983546342"}}, "featuredTerms":[ "3",-0.0173133558644304, "authority",-0.0173133558644304, "brand",-0.0173133558644304, "commission",-0.0173133558644304, "compared",-0.0173133558644304, "condition",-0.0173133558644304, "continuing",-0.0173133558644304, "deficit",-0.0173133558644304, "expectation",-0.0173133558644304, To my (admittedly inexpert) eye, it seems like this is producing more reasonable results. With this change in place, train() now produces: "idfs_ds": [ 0.6212826193303013, 0.6434237452075148, 0.7169578292536639, 0.741349282377823, 0.86843471069652, 1.0140549006400466, 1.0639267306802198, 1.0753554265038423,... |"terms_ss": [ "â", "company", "market", "firm", "month", "analyst", "chief", "time",|||...| I am not sure if I have missed anything, but this seems like it's producing better outcomes. I would appreciate any input on whether I have missed
Re: Zk Status Error
On 10/10/2019 9:00 AM, mdsholund wrote: I am also getting this error using ZK 3.5.5 and Solr 7.7.2. I have whitelisted mntr but still get a similar exception 2019-10-10 14:59:01.799 ERROR (qtp591391158-152) [ ] o.a.s.s.HttpSolrCall null:java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.solr.handler.admin.ZookeeperStatusHandler.monitorZookeeper(ZookeeperStatusHandler.java:189) As far as I know my ensemble is working fine. The output of the mntr command looks like it is all at least two values. Is there a way that I can see what it is choking on? You may be running into this problem: https://issues.apache.org/jira/browse/SOLR-13672 ZK 3.5 changed the output of the "conf" 4lw command in a way that is incompatible with Solr code. We consider the problem to be a ZK bug, but worked around it in Solr because ZK's typical release schedule is very slow. A new Solr release will come out long before a new ZK release. Even Solr 8.2.0, which was updated to the ZK 3.5.5 client, has this problem. It will be fixed in 8.3.0 when that version is released. Thanks, Shawn
AutoAddReplicas doesn't work with TLOG and PULL replicas
We would like to use SOLR in "Master-Slave" configuration (3 TLOG replicas as a "master" and several PULL replicas as "slave" for read queries). AutoAddReplicas option is turned on. Here is example of initialization query: http://10.0.48.200:9092/solr/admin/collections?action=CREATE=true=search_cz=10=search_index=2=2=2=routing_key=compositeId On picture_1 is screenshot of live configuration from Admin UI. Then I restart one server which hosts "Slave" nodes. After about 2 minutes when new server starts, AutoAddReplicas process in SOLR creates new replicas on new server, but it doesn't comply with replica type. It always start NRT replica, which is wrong. See picture_2 after server restarting. Do you have any solution how to automatically survive one server crash (auto create replica on new server with correct type and migrate data) when using TLOG and PULL replicas? Thank you for answer. David Kovar
RE: Zk Status Error
I am also getting this error using ZK 3.5.5 and Solr 7.7.2. I have whitelisted mntr but still get a similar exception 2019-10-10 14:59:01.799 ERROR (qtp591391158-152) [ ] o.a.s.s.HttpSolrCall null:java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.solr.handler.admin.ZookeeperStatusHandler.monitorZookeeper(ZookeeperStatusHandler.java:189) As far as I know my ensemble is working fine. The output of the mntr command looks like it is all at least two values. Is there a way that I can see what it is choking on? -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html