Re: Urgent- General Question about document Indexing frequency in solr
Manisha, The most general recommendation around commits is to not explicitly commit after every update. There are settings that will let Solr automatically commit after some threshold is met, and by delegating commits to that mechanism you can generally ingest faster. See this blog post that goes into detail about how to set that up for your situation: https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Kind regards, Scott On Wed, Feb 3, 2021 at 5:44 PM Manisha Rahatadkar < manisha.rahatad...@anjusoftware.com> wrote: > Hi All > > Looking for some help on document indexing frequency. I am using apache > solr 7.7 and SolrNet library to commit documents to Solr. Summary for this > function is: > // Summary: > // Commits posted documents, blocking until index changes are flushed > to disk and > // blocking until a new searcher is opened and registered as the main > query searcher, > // making the changes visible. > > I understand that, the document gets reindexed after every commit. I have > noticed that as the number of documents are increasing, the reindexing > takes time. and sometimes I am getting solr connection time out error. > I have following questions: > > 1. Is there any frequency suggested by Solr for document insert/update > and reindex? Is there any standard recommendation? > 2. If I remove the copy fields from managed-schema.xml, do I need to > delete the existing indexed data from solr core and then insert data and > reindex it again? > > Thanks in advance. > > Regards > Manisha > > > > Confidentiality Notice > > This email message, including any attachments, is for the sole use of the > intended recipient and may contain confidential and privileged information. > Any unauthorized view, use, disclosure or distribution is prohibited. If > you are not the intended recipient, please contact the sender by reply > email and destroy all copies of the original message. Anju Software, Inc. > 4500 S. Lakeshore Drive, Suite 620, Tempe, AZ USA 85282. > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Distributing and scaling Lucene Monitor?
Has anyone built scaling around Lucene Monitor? I worked with it when it was Luwak, but I haven't had to scale it beyond a single node. There's all of the cluster-ish framework in Solr, but Lucene Monitor is fairly disconnected from that. I've seen the URP someone built around it, but that doesn't seem to deal with CRUD operations on the monitor queries themselves. So has anyone built this or given some thought about how to incorporate the monitor index into SolrCloud? Thank you, Scott -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Solr Cloud on Docker?
ly, > he > > provides a number of example Docker configurations from command line > > parameters to docker-compose files running multiple instances and > zookeeper > > quarums. > > - The Docker extra hosts parameter is useful for adding extra hosts to > > your containers hosts file particularly if you have multiple nic cards > with > > internal and external interfaces and you want to force communication > over a > > specific one. > > - We use the Solr Prometheus exporter to collect node metrics. I've found > > I've needed to reduce the metrics to collect as having this many nodes > > overwhelmed it occasionally. From memory it had something to do with > > concurrent modification of Future objects the collector users and it > > sometimes misses collection cycles. This is not Docker related but Solr > > size related and the exporter's ability to handle it. > > - We use the zkCli script a lot for updating configsets. As I did not > want > > to have to copy them into a container to update them I just download a > copy > > of the Solr binaries and use it entirely for this zookeeper script. It's > > not elegant but a number of our Dev's are not familiar with Docker and > this > > was a nice compromise. Another alternative is to just use the rest API to > > do any configset manipulation. > > - We load balance all of these nodes to external clients using a haproxy > > Docker image. This combined with the Docker restart policy and Solr > > replication and autoscaling capabilities provides a very stable > environment > > for us. > > > > All in all migrating and running Solr on Docker has been brilliant. It > was > > primarily driven by a need to scale our environment vertically on large > > hardware instances as running 100 nodes on bare metal was too big a > > maintenance and administrative burden for us with a small Dev and support > > team. To date it's been very stable and reliable so I would recommend the > > approach if you are in a similar situation. > > > > Thanks, > > > > Dwane > > > > > > > > > > > > > > > > From: Walter Underwood > > Sent: Saturday, 14 December 2019 6:04 PM > > To: solr-user@lucene.apache.org > > Subject: Solr Cloud on Docker? > > > > Does anyone have experience running a big Solr Cloud cluster on Docker > > containers? By “big”, I mean 35 million docs, 40 nodes, 8 shards, with 36 > > CPU instances. We are running version 6.6.2 right now, but could upgrade. > > > > If people have specific things to do or avoid, I’d really appreciate it. > > > > I got a couple of responses on the Slack channel, but I’d love more > > stories from the trenches. This is a direction for our company > architecture. > > > > We have a master/slave cluster (Solr 4.10.4) that is awesome. I can > > absolutely see running the slaves as containers. For Solr Cloud? Makes me > > nervous. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Query terms and the match state
Lucene has a SynonymQuery and a BlendedTermQuery that do something like you want in different ways. However, if you want to keep your existing schema and do this through Solr you can use the constant score syntax in edismax on each term: q=name:(corsair)^=1.0 name:(ddr)^=1.0 manu:(corsair)^=1.0 manu:(ddr)^=1.0 The resulting score will be the total number of times each term matched in either field. (Note, if you group the terms together in the parentheses like "name:(corsair ddr)^=1.0" you'll only know if either term matched -- the whole clause gets a score of 1.0). For the techproducts example corpus: [ { "name":"CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail", "manu":"Corsair Microsystems Inc.", "score":3.0}, { "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail", "manu":"Corsair Microsystems Inc.", "score":3.0}, { "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM", "manu":"A-DATA Technology Inc.", "score":1.0}] You could use this as the basis for a function query to gain more control over your scoring. Hope that helps! -Scott On Tue, Sep 3, 2019 at 1:35 PM Kumaresh AK wrote: > Hello Solr Community! > > *Problem*: I wish to know if the result document matched all the terms in > the query. The ranking used in solr works most of the time. For some cases > where one of the term is rare and occurs in couple of fields; such > documents trump a document which matches all the terms. Ideally i wish to > have such a document (that matches all terms) to trump a document that > matches only 9/10 terms but matches one of the rare terms twice. > eg: > *query1* > field1:(a b c d) field2:(a b c d) > Results of the above query looks good. > > *query2* > filed1:(a b c 5) field2:(a b c 5) > result: > doc1: {field1: b c 5 field2: b c 5} > > doc21: {field1: a b c 5 field: null} > > Results are almost good except that doc21 is trailing doc1. There are a few > documents similar to doc1 and pushes doc21 to next page (I use default page > size = 10) > > I understand that this is how tf-idf works. I tried to boost certain fields > to solve this problem. But that breaks normal cases (query1). So, I set out > to just solve the case where I wish to boost (or) augment a field with that > information (as ratio of matched-terms/total-terms) > > *Ask:* Is it possible to get back the terms of the query and the matched > state ? > > I tried > >- debug=query option (with the default select handler) >- with terms in the debug response I could write a function query to >know its match state > > Is this approach safe/performant for production use ? Is there a better > approach to solve this problem ? > > Regards, > Kumaresh > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: BBox question
Hi Fernando, Solr (Lucene) uses a tree-based filter called BKD-tree. There's a good write-up of the approach over on the Elasticsearch blog: https://www.elastic.co/blog/lucene-points-6.0 and a cool animation of it in action on Youtube: https://www.youtube.com/watch?v=x9WnzOvsGKs The blog write-up and Jira issue talk about performance vs other approaches. k/r, Scott On Mon, Feb 4, 2019 at 1:17 PM Fernando Otero wrote: > Hey guys, > I was wondering if BBoxes use filters (ie: goes through all > documents) or uses the index to do a range filter? > It's clear in the doc that the performance is better than geodist but I > couldn't find implementation details.I'm not sure if the performance comes > from doing less comparissons, simple calculations or both (which I assume > it's the case) > > Thanks! > > -- > > Fernando Otero > > Sr Engineering Manager, Panamera > > Buenos Aires - Argentina > > Email: fernando.ot...@olx.com > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Query over nested documents with an AND Operator
Hi Julia, Keep in mind that in order to facet on child document fields you'll need to use the block join facet component: https://lucene.apache.org/solr/guide/7_4/blockjoin-faceting.html For the query itself you probably need to specify each required attribute value, but looks like you're already heading down that path with the facets. Add required local queries wrapped in the default query parser. The local queries themselves would be block joins similar to this: "+{!parent which=contenttype_s:parentDocument}attributevalue_s:brass +{!parent which=contenttype_s:parentDocument}attributevalue_s:plastic" That requires that a parent document satisfies both child document constraints. Also, if you want to return the child documents you'll need to use the ChildDocTransformerFactory: "fl=id,[child parentFilter=contenttype_s:parentDocument]" (I'm not sure if that's required if you just want to facet on the child doc values and not display the other fields.) Hope that helps! -Scott On Fri, Feb 1, 2019 at 8:51 AM Mikhail Khludnev wrote: > Whats' your current query? It's probably a question of building boolean > query by combining Solr queries. > Note, this datamodel might be a little bit overwhelming, So, if number of > distinct attributename values is around a thousand, just handle it via > dynamic field without nesting docs: > > > brass > > 1 > > > > 4711 > > > > here is a short text dealing with plastic and > > brass > > > > here is a detailed description > > > > parentDocument > > > > > > > > > > > > 2 > > > > 4811 > > > > here is a shorttext > > > > here you will find a detailed > description > > > > parentDocument > > > > > > > > > > > > 2_1 > > > > material > > > > brass > > > > > > > > > > > > 2_2 > > > > material quality > > > > plastic > > > > > > > > > > > > > > > > I need an AND operator between my queries because I want to get as > > accurate hits as possible. I managed to search all Parent and Child > > Documents with one search term and get the right result. > > > > But if I want to search for example for plastic and brass (that means 2 > or > > more search terms). I want to get both the Parent Document for the > > respective child document as result (article 4811), as well as article > 4711 > > because in this article the two words appear in the description. But the > > result of my query is always only article 4711. I know that I could also > > write the attribute in one field. However, I want to have a facet about > the > > attribute name. > > > > > > > > I hope you can help me with this problem. > > > > > > > > Thank you very much, > > > > > > > > Mit freundlichen Grüßen / Kind regards > > > > > > *Julia Gelszus * > > Bachelor of Science > > Consultant SAP Development Workbench > > > > > > *FIS Informationssysteme und Consulting GmbH *Röthleiner Weg 1 > > 97506 Grafenrheinfeld > > > > P +49 (9723) 9188-667 > > F +49 (9723) 9188-200 > > E j.gels...@fis-gmbh.de > > www.fis-gmbh.de > > > > Managing Directors: > > Ralf Bernhardt, Wolfgang Ebner, Frank Schöngarth > > > > Registration Office Schweinfurt HRB 2209 > > > > <https://www.fis-gmbh.de/> <https://de-de.facebook.com/FISgmbh> > > <https://www.xing.com/companies/fisinformationssystemeundconsultinggmbh> > > <http://www.kununu.com/de/all/de/it/fis-informationssysteme-consulting> > > <https://www.youtube.com/channel/UC49711WwZ_tSIp_QnAWdeQA> > > > > > > > -- > Sincerely yours > Mikhail Khludnev > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Need to perfom search and group the record on basis of domain,subject,from address and display the count of label i.e inbox,spam
Hi Swapnil, There wasn't a a question in your post, so I'm guessing you're having trouble getting started. Take a look at the JSON Facet API. That should get you most of the way there. https://lucene.apache.org/solr/guide/7_5/json-facet-api.html k/r, Scott On Fri, Feb 1, 2019 at 7:36 AM swap wrote: > Need to perfom search and group the record on basis of domain,subject,from > address and display the count of label i.e inbox,spam > and label status i.e read and unread with it.The label and label status > should be displayed as percentage. > > Scenorio 1 > Document structure is as mentioned below indexed in solr. message_id is > unique field in solr > { > "email_date_time": 1548922689, > "subject": "abcdef", > "created": 1548932108, > "domain": ".com", > "message_id": "123456789ui", > "label": "inbox", > "from_address": xxxbc.com", > "email": "g...@gmail.com", > "label_status": "unread" > } > > { > "email_date_time": 1548922689, > "subject": "abcdef", > "created": 1548932108, > "domain": ".com", > "message_id": "zxiu22", > "label": "inbox", > "from_address": xxxbc.com", > "email": "g...@gmail.com", > "label_status": "unread" > } > > { > "email_date_time": 1548922689, > "subject": "defg", > "created": 1548932108, > "domain": ".com", > "message_id": "ftyuiooo899", > "label": "inbox", > "from_address": xxxbc.com", > "email": "f...@gmail.com", > "label_status": "unread" > } > > I have below mentioned point to be implemented > > 1. Need to perfom search and group the record on basis of > domain,subject,from address and display the count of label i.e inbox,spam > and label status i.e read and unread with it.The label and label status > should be displayed as percentage. > > > 2. Need to paginate the record along with the implementation 1 > > > Display will be as mentioned below > > > 1. domain name : @ subject:hello from addredd: abcd@i > > inbox percentage : 20% spam percentage : 80% > read percentage : 30% unread percentage : 70% > > 2. domain name : @ subject:hi from addredd: abcd@i > > inbox percentage : 20% spam percentage : 80% > read percentage : 30% unread percentage : 70% > > > 3. domain name : @ subject:where from addredd: abcd@i > > inbox percentage : 20% spam percentage : 80% > read percentage : 30% unread percentage : 70% > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: PatternReplaceFilterFactory problem
Hi Chris, You've included the field definition of type text_en, but in your queries you're searching the field "text", which is of type text_general. That may be the source of your problem, but if looking into that doesn't help send the definition of text_general as well. Hope that helps! -Scott On Mon, Jan 28, 2019 at 6:02 AM Chris Wareham < chris.ware...@graduate-jobs.com> wrote: > I'm trying to index some data which often includes domain names. I'd > like to remove the .com TLD, so I have modified the text_en field type > by adding a PatternReplaceFilterFactory filter. However, it doesn't > appear to be working as a search for "text:(mydomain.com)" matches > records but "text:(mydomain)" does not. > > positionIncrementGap="100"> > > > ignoreCase="true" synonyms="synonyms.txt"/> > ignoreCase="true"/> > > pattern="([-a-z])\.com" replacement="$1"/> > > protected="protwords.txt"/> > > > > > ignoreCase="true" synonyms="synonyms.txt"/> > ignoreCase="true"/> > > pattern="([-a-z])\.com" replacement="$1"/> > > protected="protwords.txt"/> > > > > > The actual field definitions are as follows: > > stored="true" required="true" /> > stored="true" required="true" /> > stored="false" /> > > > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Aggregate functions
Yes. Have a look at the Facet API: https://lucene.apache.org/solr/guide/7_5/json-facet-api.html On Mon, Jan 28, 2019 at 6:07 AM naga pradeep dhulipalla < naga.prade...@gmail.com> wrote: > Hi Team, > > > > Can we use SUM aggregate function in our SOLR queries. If not is there an > alternative to achieve this. > > My sample query looks like this as mentioned below. > > > > Select duration from tableName where > solr_query='{"q":"(appName:\"test\")"}' > > > > I need the aggregate SUM value of duration column. Thanks for your quick > help. > > > > Regards > > Pradeep > > +917204007740 > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Region wise query routing with solr
Hi Shruti, Solr clusters should NOT span regions, so when a query hits a particular cluster in a region that query should be handled by nodes in that region and not forwarded to another. My recommendation is to check out cross-datacenter replication and route requests to the correct region (with a load balancer or DNS tricks) rather than queries to the correct cluster. https://lucene.apache.org/solr/guide/7_6/cdcr-architecture.html k/r, Scott On Mon, Jan 28, 2019 at 2:24 AM shruti suri wrote: > Hi, > > I want to configure Region wise query routing with Solr. Suppose, I have my > data center in Singapore and India so if user hit a query from India then > query should fall at Indian data center, likewise for Singapore. How can I > achieve this? Is there any such functionality in Solr or SolrCloud. > > Thanks > > > > - > Regards > Shruti > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Active node "kicked out" when starting a new node
Hi Teddie, Take a look at the core.properties file on the cloned or clone. I suspect there's info in it that describes which collection and shard that node is responsible for. Zookeeper maintains a mapping of node addresses to cores and you can lock a node out of the cluster if you're not careful. This used to be a common mistake with naive autoscaling where a "new" node would spin up with the same IP as an old node before the old one was properly removed from the cluster. Solr 7 has better autoscaling capabilities now: https://lucene.apache.org/solr/guide/7_6/solrcloud-autoscaling-overview.html k/r, Scott On Mon, Jan 28, 2019 at 1:44 AM teddie_lee wrote: > Hi, > > I have a SolrCloud cluster with 3 nodes running on AWS. My collection is > created with numShard=1and replicationFactor=3. Recently, due to the need > of > having stress test, our ops cloned a new machine with exactly the same > configuration as one of the nodes in existed cluster (let's say the new > machine is node4 and the node being cloned is node1). > > > However, after I started node4 mistakenly (node4 is supposed to start in > standalone mode, I just forgot to remove the configuration regards to > zookeeper), I could see that node4 took the place of node1 in Admin UI. > Then > I found directory 'items_shard1_replica_n1' under path > '../solr/server/solr/' is no longer exist on node1. Instead, the directory > was copied to node4. > > > I tried to stop Solr on node4 and restarted Solr on node1 but to no avail. > It seems like node1 can't rejoin the cluster automatically. Then I found > even I start Solr on node4, the status of node4 was still 'Down' and never > become 'Recovering' while the rest of the nodes in cluster are 'Active'. > > So the final solution is to copied directory 'items_shard1_replica_n1' > from > node4 back to the node1 and restarted Solr on node1. Then node1 join the > cluster automatically and everything seems fine. > > > My question is why this would happen? Or are there any documents about how > SolrCloud manages the cluster behind the scenes? > > > Thanks, > Teddie > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Log Statements: Collection, Shard, Replica, and Core Info Missing
Hi Alicia, You've probably already tried this but just to check all the basics, verify that each log4j2.xml file is the same on all of your servers. Then go to the logging config admin page on each machine and verify that none of the overrides have been enabled. The overrides there are temporary, so you can either reset them if they've been changed or restart the instance to get back to default. If none of that helps let us know how many nodes you're running, and double-check the file permissions on log4j2.xml. You could also make a slight modification to the format string just to verify that it's indeed being read. Hope that helps! Scott On Fri, Jan 25, 2019 at 6:35 PM Alicia Broederdorf wrote: > I’m using the SLF4J Reporter for logging metrics ( > https://lucene.apache.org/solr/guide/7_5/metrics-reporting.html#slf4j-reporter). > I have two collections with 5 shards each. Only 3 shards of one collection > are printing collection, shard, replica, and core data in the log > statements, the others do not. For the same metric log statement this data > is only present for 3 of the 10 shards. > > The three shards will have something like: 2019-01-25 21:41:05.297 INFO > (metrics-org.apache.solr.metrics.reporters.SolrSlf4jReporter-6-thread-1) > [c:coll_1 s:shard2 r:core_node13 x:coll_1_shard2_replica_n10] type=GAUGE, > name=SEARCHER.searcher.numDocs, value=236140 > > Others will have: 2019-01-25 21:41:07.125 INFO > (metrics-org.apache.solr.metrics.reporters.SolrSlf4jReporter-8-thread-1) [ > ] type=GAUGE, name=SEARCHER.searcher.numDocs, value=899794 > > > > Here is the config for my metrics log in log4j2.xml: > name="MetricsFile" > fileName="<%= @solr_logs %>/solr_metrics.log" > filePattern="<%= @solr_logs %>/solr_metrics.log.%i" > > > > %d{-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection} %X{shard} > %X{replica} %X{core}] %c{1.} %m%n > > > > > > > > > > Any thoughts on how to get the collection, shard, replica, and core data > printed in every log statement? > > Thanks for the help! > Alicia > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Need help on Solr authorization
allback.succeeded(AbstractConnection.java:283)\n\tat > > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\n\tat > > org.eclipse.jetty.io > .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat > > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat > > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat > > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat > > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat > > > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat > > java.lang.Thread.run(Thread.java:748)\nCaused by: > > javax.net.ssl.SSLHandshakeException: > > sun.security.validator.ValidatorException: PKIX path building failed: > > sun.security.provider.certpath.SunCertPathBuilderException: unable to > find > > valid certification path to requested target\n\tat > > sun.security.ssl.Alerts.getSSLException(Alerts.java:192)\n\tat > > sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1959)\n\tat > > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)\n\tat > > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)\n\tat > > > sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)\n\tat > > > sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)\n\tat > > sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)\n\tat > > sun.security.ssl.Handshaker.process_record(Handshaker.java:961)\n\tat > > sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1072)\n\tat > > > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)\n\tat > > > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)\n\tat > > > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)\n\tat > > > org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:396)\n\tat > > > org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355)\n\tat > > > org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)\n\tat > > > org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359)\n\tat > > > org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)\n\tat > > > org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)\n\tat > > > org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)\n\tat > > org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)\n\tat > > > org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)\n\tat > > > org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)\n\tat > > > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)\n\tat > > > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)\n\tat > > > org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:618)\n\t... > > 33 more\nCaused by: sun.security.validator.ValidatorException: PKIX path > > building failed: > > sun.security.provider.certpath.SunCertPathBuilderException: unable to > find > > valid certification path to requested target\n\tat > > > sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:397)\n\tat > > > sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:302)\n\tat > > sun.security.validator.Validator.validate(Validator.java:260)\n\tat > > > sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)\n\tat > > > sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)\n\tat > > > sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)\n\tat > > > sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496)\n\t... > > 53 more\nCaused by: > > sun.security.provider.certpath.SunCertPathBuilderException: unable to > find > > valid certification path to requested target\n\tat > > > sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)\n\tat > > > sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)\n\tat > > java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)\n\tat > > > sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:392)\n\t... > > 59 more\n", > > > > "code":500}} > > > > > > > > > > > > Regards, > > > > Sathish. > > > > > > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: [QA-search] About field setting
No, you have to tokenize before you filter, but the Keyword tokenizer outputs the whole input text as a single token. On Thu, Jan 17, 2019 at 11:36 PM 유정인 wrote: > hi > Can you use multiple query analyzers to search for or? > > Ex) > > positionIncrementGap="100" multiValued="true"> > > > > > >ignoreCase="true"/> > > > > > > > > > >ignoreCase="true"/> > > > > > > > > > >ignoreCase="true"/> > >ignoreCase="true" synonyms="synonyms.txt"/> > > > > > > > > > > Can you get synonyms to run before tokenzier? > > Ex) > > positionIncrementGap="100" multiValued="true"> > > > > > >ignoreCase="true"/> > > > > > > > > ignoreCase="true" synonyms="synonyms.txt"/> > > > >ignoreCase="true"/> > > > > > > > > > > > > thanks > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: regarding debugging solr in eclipse
This blog article might help: https://opensourceconnections.com/blog/2013/04/13/how-to-debug-solr-with-eclipse/ On Fri, Jan 18, 2019 at 6:53 AM SAGAR INGALE wrote: > Can anybody tell me how to debug solr in eclipse, if possible how can I > build a maven project and launch the jetty server in debug mode? > Thanks. Regards > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: So Many Zookeeper Warnings--There Must Be a Problem
Good! Hopefully that's your smoking gun. The port settings are fine, but since you're deploying to separate servers you don't need different ports in the "server.x=" section. This section of the docs explains it better: http://zookeeper.apache.org/doc/r3.4.7/zookeeperAdmin.html#sc_zkMulitServerSetup On Thu, Jan 3, 2019 at 3:49 PM Joe Lerner wrote: > Hi Scott, > > First, we are definitely mis-onfigured for the myid thing. Basically two of > them were identifying as ID #2, and they are the two ZK's claiming to be > the > leader. Definitely something to straighten out! > > Our 3 lines in zoo.cfg look correct. Except they look like this: > > clientPort:2181 > > server.1=host1:2190:2195 > server.2=host2:2191:2196 > server.3=host3:2192:2197 > > Notice the port range, and overlap... > > Is that.../copacetic/? > > Thanks! > > Joe > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: So Many Zookeeper Warnings--There Must Be a Problem
Hi Joe, Yeah, two leaders is definitely a problem. I'd fix that before wading through the error logs. Check out zoo.cfg on each server. You should have three lines at the end similar to this: server.1=host1:2181:2281 server.2=host2:2182:2282 server.3=host3:2183:2283 (substitute "host*" with the right IP or address of your servers) Also on each server, check the file "myid". It should have a single number that maps to the list above. For example, on host1 your myid file should contain a single value of "1" in it. On host2 the file should contain "2". You'll probably have to delete the contents of the zk data directory and rebuild your collections. On Thu, Jan 3, 2019 at 2:47 PM Joe Lerner wrote: > Hi, > > We have a simple architecture: 2 SOLR Cloud servers (on servers #1 and #2), > and 3 zookeeper instances (on servers #1, #2, and #3). Things work fine > (although we had a couple of brief unexplained outages), but: > > One worrisome thing is that when I status zookeeper on #1 and #2, I get > Mode=Leader on both--#3 shows follower. This seems to be a pretty permanent > condition, at least right now as I look at it. And there isn't any big > maintenance or anything going on. > > Also, we are getting *TONS* of continuous log warnings from our client > applications. From one server it shows this: > > > > And from another server we get this: > > > These are making our logs impossible to read, but worse, I assume indicate > that something is wrong. > > Thanks for any help! > > Joe Lerner > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Excessive resources consumption migrating from Solr 6.6.0 Master/Slave to SolrCloud 6.6.0 (dozen times more resources)
Dani, It might be time to attach some instrumentation to one of your nodes. Finding out which classes are occupying the memory will help narrow the issue. Are you using a lot of facets, grouping, or stats during your queries? Also, when you were doing Master/Slave, was that on the same version of Solr as you're using now in SolrCloud mode? -Scott On Mon, Aug 28, 2017 at 4:50 AM, Daniel Ortega wrote: > Hi Scott, > > Yes, we think that our usage scenario falls into Index-Heavy/Query-Heavy > too. We have tested with several values in softcommit/hardcommit values > (from few seconds to minutes) with no appreciable improvements :( > > Thanks for your reply! > > - Daniel > > 2017-08-25 6:45 GMT+02:00 Scott Stults >: > > > Hi Dani, > > > > It seems like your use case falls into the Index-Heavy / Query-Heavy > > category, so you might try increasing your hard commit frequency to 15 > > seconds rather than 15 minutes: > > > > https://lucidworks.com/2013/08/23/understanding- > > transaction-logs-softcommit-and-commit-in-sorlcloud/ > > > > > > -Scott > > > > On Thu, Aug 24, 2017 at 10:03 AM, Daniel Ortega < > > danielortegauf...@gmail.com > > > wrote: > > > > > Hi Scott, > > > > > > In our indexing service we are using that client too > > > (org.apache.solr.client.solrj.impl.CloudSolrClient) :) > > > > > > This is out Update Request Processor chain configuration: > > > > > > > > name > > > ="signature"> true > name="signatureField"> > > > hash false > > "signatureClass">solr.processor.Lookup3Signature > > > > > < > > > updateRequestProcessorChain processor="signature" name="dedupe"> > > > > class="solr.LogUpdateProcessorFactory" /> > > "solr.RunUpdateProcessorFactory" /> > < > > > requestHandler name="/update" class="solr.UpdateRequestHandler" > > > name= > > > "defaults"> dedupe > > > > > > > > Thanks for your reply :) > > > > > > - Dani > > > > > > 2017-08-24 14:49 GMT+02:00 Scott Stults > opensourceconnections.com > > > >: > > > > > > > Hi Daniel, > > > > > > > > SolrJ has a few client implementations to choose from: > CloudSolrClient, > > > > ConcurrentUpdateSolrClient, HttpSolrClient, LBHttpSolrClient. You > said > > > your > > > > query service uses CloudSolrClient, but it would be good to verify > > which > > > > implementation your indexing service uses. > > > > > > > > One of the problems you might be having is with your deduplication > > step. > > > > Can you post your Update Request Processor Chain? > > > > > > > > > > > > -Scott > > > > > > > > > > > > On Wed, Aug 23, 2017 at 4:13 PM, Daniel Ortega < > > > > danielortegauf...@gmail.com> > > > > wrote: > > > > > > > > > Hi Scott, > > > > > > > > > > - *Can you describe the process that queries the DB and sends > records > > > to > > > > * > > > > > *Solr?* > > > > > > > > > > We are enqueueing ids during every ORACLE transaction (in > > > > insert/updates). > > > > > > > > > > An application dequeues every id and perform queries against dozen > of > > > > > tables in the relational model to retrieve the fields to build the > > > > > document. As we know that we are modifying the same ORACLE row in > > > > > different (but consecutive) transactions, we store only the last > > > version > > > > of > > > > > the modified documents in a map data structure. > > > > > > > > > > The application has a configurable interval to send the documents > > > stored > > > > in > > > > > the map to the update handler (we have tested different intervals > > from > > > > few > > > > > milliseconds to several seconds) using the SolrJ client. Actually > we > > > are > > > > > sending all the documents every 15 seconds. > > > > > > > > > > This application is developed using Java, Spring and Maven and we > > have > > > > > several instances. > > &g
Re: Excessive resources consumption migrating from Solr 6.6.0 Master/Slave to SolrCloud 6.6.0 (dozen times more resources)
Hi Dani, It seems like your use case falls into the Index-Heavy / Query-Heavy category, so you might try increasing your hard commit frequency to 15 seconds rather than 15 minutes: https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ -Scott On Thu, Aug 24, 2017 at 10:03 AM, Daniel Ortega wrote: > Hi Scott, > > In our indexing service we are using that client too > (org.apache.solr.client.solrj.impl.CloudSolrClient) :) > > This is out Update Request Processor chain configuration: > > name > ="signature"> true > hash false "signatureClass">solr.processor.Lookup3Signature > < > updateRequestProcessorChain processor="signature" name="dedupe"> class="solr.LogUpdateProcessorFactory" /> "solr.RunUpdateProcessorFactory" /> < > requestHandler name="/update" class="solr.UpdateRequestHandler" > name= > "defaults"> dedupe > > Thanks for your reply :) > > - Dani > > 2017-08-24 14:49 GMT+02:00 Scott Stults >: > > > Hi Daniel, > > > > SolrJ has a few client implementations to choose from: CloudSolrClient, > > ConcurrentUpdateSolrClient, HttpSolrClient, LBHttpSolrClient. You said > your > > query service uses CloudSolrClient, but it would be good to verify which > > implementation your indexing service uses. > > > > One of the problems you might be having is with your deduplication step. > > Can you post your Update Request Processor Chain? > > > > > > -Scott > > > > > > On Wed, Aug 23, 2017 at 4:13 PM, Daniel Ortega < > > danielortegauf...@gmail.com> > > wrote: > > > > > Hi Scott, > > > > > > - *Can you describe the process that queries the DB and sends records > to > > * > > > *Solr?* > > > > > > We are enqueueing ids during every ORACLE transaction (in > > insert/updates). > > > > > > An application dequeues every id and perform queries against dozen of > > > tables in the relational model to retrieve the fields to build the > > > document. As we know that we are modifying the same ORACLE row in > > > different (but consecutive) transactions, we store only the last > version > > of > > > the modified documents in a map data structure. > > > > > > The application has a configurable interval to send the documents > stored > > in > > > the map to the update handler (we have tested different intervals from > > few > > > milliseconds to several seconds) using the SolrJ client. Actually we > are > > > sending all the documents every 15 seconds. > > > > > > This application is developed using Java, Spring and Maven and we have > > > several instances. > > > > > > -* Is it a SolrJ-based application?* > > > > > > Yes, it is. We aren't using the last version of SolrJ client (we are > > > currently using SolrJ v6.3.0). > > > > > > - *If it is, which client package are you using?* > > > > > > I don't know exactly what do you mean saying 'client package' :) > > > > > > - *How many documents do you send at once?* > > > > > > It depends on the defined interval described before and the number of > > > transactions executed in our relational database. From dozens to few > > > hundreds (and even thousands). > > > > > > - *Are you sending your indexing or query traffic through a load > > balancer?* > > > > > > We aren't using a load balancer for indexing, but we have all our Rest > > > Query services through an HAProxy (using 'leastconn' algorithm). The > Rest > > > Query Services performs queries using the CloudSolrClient. > > > > > > Thanks for your reply, > > > if you need any further information don't hesitate to ask > > > > > > Daniel > > > > > > 2017-08-23 14:57 GMT+02:00 Scott Stults > opensourceconnections.com > > > >: > > > > > > > Hi Daniel, > > > > > > > > Great background information about your setup! I've got just a few > more > > > > questions: > > > > > > > > - Can you describe the process that queries the DB and sends records > to > > > > Solr? > > > > - Is it a SolrJ-based application? > > > > - If it is, which client package are you using? > > > > - How many documents do you send at once? &g
Re: Excessive resources consumption migrating from Solr 6.6.0 Master/Slave to SolrCloud 6.6.0 (dozen times more resources)
Hi Daniel, SolrJ has a few client implementations to choose from: CloudSolrClient, ConcurrentUpdateSolrClient, HttpSolrClient, LBHttpSolrClient. You said your query service uses CloudSolrClient, but it would be good to verify which implementation your indexing service uses. One of the problems you might be having is with your deduplication step. Can you post your Update Request Processor Chain? -Scott On Wed, Aug 23, 2017 at 4:13 PM, Daniel Ortega wrote: > Hi Scott, > > - *Can you describe the process that queries the DB and sends records to * > *Solr?* > > We are enqueueing ids during every ORACLE transaction (in insert/updates). > > An application dequeues every id and perform queries against dozen of > tables in the relational model to retrieve the fields to build the > document. As we know that we are modifying the same ORACLE row in > different (but consecutive) transactions, we store only the last version of > the modified documents in a map data structure. > > The application has a configurable interval to send the documents stored in > the map to the update handler (we have tested different intervals from few > milliseconds to several seconds) using the SolrJ client. Actually we are > sending all the documents every 15 seconds. > > This application is developed using Java, Spring and Maven and we have > several instances. > > -* Is it a SolrJ-based application?* > > Yes, it is. We aren't using the last version of SolrJ client (we are > currently using SolrJ v6.3.0). > > - *If it is, which client package are you using?* > > I don't know exactly what do you mean saying 'client package' :) > > - *How many documents do you send at once?* > > It depends on the defined interval described before and the number of > transactions executed in our relational database. From dozens to few > hundreds (and even thousands). > > - *Are you sending your indexing or query traffic through a load balancer?* > > We aren't using a load balancer for indexing, but we have all our Rest > Query services through an HAProxy (using 'leastconn' algorithm). The Rest > Query Services performs queries using the CloudSolrClient. > > Thanks for your reply, > if you need any further information don't hesitate to ask > > Daniel > > 2017-08-23 14:57 GMT+02:00 Scott Stults >: > > > Hi Daniel, > > > > Great background information about your setup! I've got just a few more > > questions: > > > > - Can you describe the process that queries the DB and sends records to > > Solr? > > - Is it a SolrJ-based application? > > - If it is, which client package are you using? > > - How many documents do you send at once? > > - Are you sending your indexing or query traffic through a load balancer? > > > > If you're sending documents to each replica as fast as they can take > them, > > you might be seeing a bottleneck at the shard leaders. The SolrJ > > CloudSolrClient finds out from Zookeeper which nodes are the shard > leaders > > and sends docs directly to them. > > > > > > -Scott > > > > On Tue, Aug 22, 2017 at 2:16 PM, Daniel Ortega < > > danielortegauf...@gmail.com> > > wrote: > > > > > *Main Problems* > > > > > > > > > We are involved in a migration from Solr Master/Slave infrastructure to > > > SolrCloud infrastructure. > > > > > > > > > > > > The main problems that we have now are: > > > > > > > > > > > >- Excessive resources consumption: Currently we have 5 instances > with > > 80 > > >processors/768 GB RAM each instance using SSD Hard Disk Drives that > > > doesn't > > >support the load that we have in the other architecture. In our > > >Master-Slave architecture we have only 7 Virtual Machines with lower > > > specs > > >(4 processors and 16 GB each instance using SSD Hard Disk Drives > too). > > > So, > > >at the moment our SolrCloud infrastructure is wasting several dozen > > > times > > >more resources than our Solr Master/Slave infrastructure. > > >- Despite spending more resources we have worst query times > (compared > > to > > >Solr in master/slave architecture) > > > > > > > > > *Search infrastructure (SolrCloud infrastructure)* > > > > > > > > > > > > As we cannot use DIH Handler (which is what we use in Solr > Master/Slave), > > > we > > > have developed an application which reads every transaction from &
Re: Excessive resources consumption migrating from Solr 6.6.0 Master/Slave to SolrCloud 6.6.0 (dozen times more resources)
Hi Daniel, Great background information about your setup! I've got just a few more questions: - Can you describe the process that queries the DB and sends records to Solr? - Is it a SolrJ-based application? - If it is, which client package are you using? - How many documents do you send at once? - Are you sending your indexing or query traffic through a load balancer? If you're sending documents to each replica as fast as they can take them, you might be seeing a bottleneck at the shard leaders. The SolrJ CloudSolrClient finds out from Zookeeper which nodes are the shard leaders and sends docs directly to them. -Scott On Tue, Aug 22, 2017 at 2:16 PM, Daniel Ortega wrote: > *Main Problems* > > > We are involved in a migration from Solr Master/Slave infrastructure to > SolrCloud infrastructure. > > > > The main problems that we have now are: > > > >- Excessive resources consumption: Currently we have 5 instances with 80 >processors/768 GB RAM each instance using SSD Hard Disk Drives that > doesn't >support the load that we have in the other architecture. In our >Master-Slave architecture we have only 7 Virtual Machines with lower > specs >(4 processors and 16 GB each instance using SSD Hard Disk Drives too). > So, >at the moment our SolrCloud infrastructure is wasting several dozen > times >more resources than our Solr Master/Slave infrastructure. >- Despite spending more resources we have worst query times (compared to >Solr in master/slave architecture) > > > *Search infrastructure (SolrCloud infrastructure)* > > > > As we cannot use DIH Handler (which is what we use in Solr Master/Slave), > we > have developed an application which reads every transaction from Oracle, > builds a document collection searching in the database and sends the result > to the */update* handler every 200 milliseconds using SolrJ client. This > application tries to delete the possible duplicates in each update window, > but we are using solr’s de-duplication techniques > <https://emea01.safelinks.protection.outlook.com/?url= > https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay% > 2Fsolr%2FDe-Duplication&data=02%7C01%7Cdortega%40idealista.com% > 7Cb169ea024abc4954927208d4bc6868eb%7Cd78b7929c2a34897ae9a7d8f8dc1 > a1cf%7C0%7C0%7C636340604697721266&sdata=WEhzoHC1Bf77K706% > 2Fj2wIWOw5gzfOgsP1IPQESvMsqQ%3D&reserved=0> > too. > > > > We are indexing ~100 documents per second (with peaks of ~1000 documents > per second). > > > > Every search query is centralized in other application which exposes a DSL > behind a REST API and uses SolrJ client too to perform queries. We have > peaks of 2000 QPS. > > *Cluster structure **(SolrCloud infrastructure)* > > > > At the moment, the cluster has 30 SolrCloud instances with the same specs > (Same physical hosts, same JVM Settings, etc.). > > > > *Main collection* > > > > In our use case we are using this collection as a NoSQL database basically. > Our document is composed of about 300 fields that represents an advert, and > is a denormalization of its relational representation in Oracle. > > > We are using all our nodes to store the collection in 3 shards. So, each > shard has 10 replicas. > > > At the moment, we are only indexing a subset of the adverts stored in > Oracle, but our goal is to store all the ads that we have in the DB (a few > tens of millions of documents). We have NRT requirements, so we need to > index every document as soon as posible once it’s changed in Oracle. > > > > We have defined the properties of each field (if it’s stored/indexed or > not, if should be defined as DocValue, etc…) considering the use of that > field. > > > > *Index size **(SolrCloud infrastructure)* > > > > The index size is currently above 6 GB, storing 1.300.000 documents in each > shard. So, we are storing 3.900.000 documents and the total index size is > 18 GB. > > > > *Indexation **(SolrCloud infrastructure)* > > > > The commits *aren’t* triggered by the application described before. The > hardcommit/softcommit interval are configured in Solr: > > > > - *HardCommit:* every 15 minutes (with opensearcher = false) >- *SoftCommit:* every 5 seconds > > > > *Apache Solr Version* > > > > We are currently using the last version of Solr (6.6.0) under an Oracle VM > (Java(TM) SE Runtime Environment (build 1.8.0_131-b11) Oracle (64 bits)) in > both deployments. > > > The question is... What is wrong here?!?!?! > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: solr jetty based auth and distributed solr requests
Radhakrishnan, I'm not sure offhand whether or not that's possible. It sounds like you've done enough analysis to write a good Jira ticket, so if nobody speaks up on the mailing list, go ahead and create one. Cheers, Scott On Tue, Aug 22, 2017 at 7:15 PM, radha krishnan wrote: > Hi, > > I enabled jetty basic auth for solr by making changes to jetty.xml and add > a 'realm.properties' > > while basic queries are working, queries involving more than one shard is > not working. i went through the code and figured out that in > HttpShardHandler, there is no provision to specify a username:password > > I went through a lot of JIRA's/posts and was not able to figure out whether > it is really possible to do. > > can we do a distributed operation with jetty base basic auth. can you > please give share the relevant links so that i can try it out. > > > Thanks, > Radhakrishnan > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Facet date Range without start and and date
No it's not. Use something like facet.date.start=-00-00T00:00:00Z and facet.date.end=3000-00-00T00:00:00Z. k/r, Scott On Mon, Jan 9, 2017 at 10:46 AM, nabil Kouici wrote: > Hi All, > Is it possible to have facet date range without specifying start and and > of the range. > Otherwise, is it possible to put in the same request start to min value > and end to max value. > Thank you. > Regards,NKI. > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: regarding extending classes in org.apache.solr.client.solrj.io.stream.metrics package
Radhakrishnan, That would be an appropriate Jira ticket. You can submit it here: https://issues.apache.org/jira/browse/solr Also, if you want to submit a patch, check out the guidelines (it's pretty easy): https://wiki.apache.org/solr/HowToContribute k/r, Scott On Tue, Jan 10, 2017 at 7:12 PM, radha krishnan wrote: > Hi, > > i want to extend the update(Tuple tuple) method in MaxMetric,. MinMetric, > SumMetric, MeanMetric classes. > > can you please make the below metioned variables and methods in the above > mentioned classes as protected so that it will be easy to extend > > variables > --- > > longMax > > doubleMax > > columnName > > > and > > methods > > --- > > init > > > > Thanks, > > Radhakrishnan D > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Max length of solr query
That doesn't seem like an efficient use of a search engine. Maybe what you want to do is use streaming expressions to process some data: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions k/r, Scott On Thu, Jan 12, 2017 at 11:36 AM, 武井宜行 wrote: > Hi,all > > My Application throws too large query to solr server with solrj > client.(Http Method is Post) > > I have two questions. > > At first,I would like to know the limit of clauses of Boolean Query.I Know > the number is restricted to 1024 by default, and I can increase the limit > by setting setMaxClauseCount,but what is the limit of increasing clauses? > > Next,if there is no limit of increasing clauses,is there the limit of query > length?My Application throws to large query like this with solrj client. > > item_id: OR item_id: OR item_id: ... > (The number of item_id is maybe over than one million) > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: [More Like This] Query building
quency > >>>>> > for the term t . > >>>>> > Then we build the termQuery : > >>>>> > > >>>>> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, > tf)); > >>>>> > > >>>>> > In this way we lose a lot of precision. > >>>>> > Not sure why we do that. > >>>>> > I would prefer to keep the relation between terms and fields. > >>>>> > The MLT query can improve a lot the quality. > >>>>> > If i run the MLT on 2 fields : *description* and *facilities* for > >>>>> example. > >>>>> > It is likely I want to find documents with similar terms in the > >>>>> > description and similar terms in the facilities, without mixing up > >>>>> the > >>>>> > things and loosing the semantic of the terms. > >>>>> > > >>>>> > Let me know your opinion, > >>>>> > > >>>>> > Cheers > >>>>> > > >>>>> > > >>>>> > -- > >>>>> > -- > >>>>> > > >>>>> > Benedetti Alessandro > >>>>> > Visiting card : http://about.me/alessandro_benedetti > >>>>> > > >>>>> > "Tyger, tyger burning bright > >>>>> > In the forests of the night, > >>>>> > What immortal hand or eye > >>>>> > Could frame thy fearful symmetry?" > >>>>> > > >>>>> > William Blake - Songs of Experience -1794 England > >>>>> > > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Anshum Gupta > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> -- > >>>> > >>>> Benedetti Alessandro > >>>> Visiting card : http://about.me/alessandro_benedetti > >>>> > >>>> "Tyger, tyger burning bright > >>>> In the forests of the night, > >>>> What immortal hand or eye > >>>> Could frame thy fearful symmetry?" > >>>> > >>>> William Blake - Songs of Experience -1794 England > >>>> > >>> > >>> > >>> > >>> -- > >>> -- > >>> > >>> Benedetti Alessandro > >>> Visiting card : http://about.me/alessandro_benedetti > >>> > >>> "Tyger, tyger burning bright > >>> In the forests of the night, > >>> What immortal hand or eye > >>> Could frame thy fearful symmetry?" > >>> > >>> William Blake - Songs of Experience -1794 England > >>> > >> > >> > >> > >> -- > >> -- > >> > >> Benedetti Alessandro > >> Visiting card : http://about.me/alessandro_benedetti > >> > >> "Tyger, tyger burning bright > >> In the forests of the night, > >> What immortal hand or eye > >> Could frame thy fearful symmetry?" > >> > >> William Blake - Songs of Experience -1794 England > >> > > > > > > > > -- > > -- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Boosts for relevancy (shopping products)
You're not going to be able to look at field boosts by themselves to judge relevancy because it's very much a data-driven optimization problem. For example, if you only sell iPhone cases but no iPhones, a search for "black iphone" should show a bunch of black iPhone cases at the top of the results. But if you do sell iPhones themselves, you'll likely see them rank low in the results because they typically have names like "Apple iPhone 6s Plus 64 GB - Black" and your cases just have "iPhone Case - Black". More of the search terms match the shorter field value and so it scores better. Approach the problem methodically and collect data. There are several evaluation metrics that will not only help you quantify the problem but also gauge how much your tuning efforts have improved things. MRR and DCGS are good places to start. https://en.wikipedia.org/wiki/Category:Information_retrieval_evaluation Also take a look at Quepid (full disclosure: my company makes it). It'll let the business folks rank the results for searches and you'll be able to do search regression tests against those judgement lists as you tweak things. k/r, Scott On Thu, Mar 17, 2016 at 4:36 AM, Robert Brown wrote: > Hi, > > I currently have an index of ~50m docs representing shopping products: > name, description, brand, category, etc. > > Our "qf" is currently setup as: > > name^5 > brand^2 > category^3 > merchant^2 > description^1 > > mm: 100% > ps: 5 > > I'm getting complaints from the business concerning relevancy, and was > hoping to get some constructive ideas/thoughts on whether these boosts look > semi-sensible or not, I think they were put in place pretty much at random. > > I know it's going to be a case of rounds upon rounds of testing, but maybe > there's a good starting point that will save me some time? > > My initial thoughts right now are to actually just search on the name > field, and maybe the brand (for things like "Apple Ipod"). > > Has anyone got a similar setup that could share some direction? > > Many Thanks, > Rob > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Scripting server side
Are you trying to manipulate the query with a script, or just the response? If it's the response you want to work with, I think your only options are using Velocity templates or XSLT. For working with the query you'll either have to make your own QueryParserPlugin or intercept the request before it gets to Solr. k/r, Scott On Sun, Jan 24, 2016 at 6:22 PM, Vincenzo D'Amore wrote: > Hi, > > looking at Solr documentation I found a pretty interesting processor which > is able to execute scripting languages server side. > > > http://lucene.apache.org/solr/5_4_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html > > As far as I understood, this is useful only during document update. > I'm just curious to know if there is something else that I can use before > or during the query execution. > > Best regards, > Vincenzo > > -- > Vincenzo D'Amore > email: v.dam...@gmail.com > skype: free.dev > mobile: +39 349 8513251 > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: upgrade SolrCloud
That appears to be the case. If you're apprehensive because you had trouble upgrading to 5.4.0, there was a bug in that release (fixed in 5.4.1) that could've bitten you: https://issues.apache.org/jira/browse/SOLR-8561 k/r, Scott On Thu, Jan 28, 2016 at 1:36 PM, Oakley, Craig (NIH/NLM/NCBI) [C] < craig.oak...@nih.gov> wrote: > I'm planning to upgrade (from 5.4.0 to 5.4.1) a SolrCloud with two > replicas (one shard). > > Am I correct in thinking I should be able simply to shutdown one node, > change it to using 5.4.1, restart the upgraded node, shutdown the other > node and upgrade it? Or are there caveats to consider? > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Solr+HDFS
pt=1 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 4181ms > INFO - 2016-01-28 22:16:32.971; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=2 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 65331ms > INFO - 2016-01-28 22:17:34.638; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=3 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 126998ms > INFO - 2016-01-28 22:18:35.764; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=4 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 188124ms > INFO - 2016-01-28 22:19:37.114; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=5 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 249474ms > INFO - 2016-01-28 22:20:38.629; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=6 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 310989ms > INFO - 2016-01-28 22:21:39.751; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=7 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 372111ms > INFO - 2016-01-28 22:22:40.854; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=8 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 433214ms > INFO - 2016-01-28 22:23:41.981; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=9 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 494341ms > INFO - 2016-01-28 22:24:43.088; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=10 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 555448ms > INFO - 2016-01-28 22:25:44.808; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=11 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 617168ms > INFO - 2016-01-28 22:26:45.934; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=12 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 678294ms > INFO - 2016-01-28 22:27:47.036; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=13 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 739396ms > INFO - 2016-01-28 22:28:48.504; [ UNCLASS] > org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=14 on > > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0282933 > after 800864ms > > Some shards in the cluster can take hours to come back up. Any ideas? It > appears to wait 900 seconds for each of the tlog files. When there are 60+ > files - this takes a long time! > Thank you! > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Multi-lingual search
The IndicNormalizationFilter appears to work with Tamil. Is it not working for you? k/r, Scott On Mon, Feb 1, 2016 at 8:34 AM, vidya wrote: > Hi > > My use case is to index and able to query different languages in solr > which > are not in-built languages supported by solr. How can i implement this ? > > My input document consists of different languages in a field. I came across > "Solr in action" book with searching content in multiple languages i.e., > chapter 14. For built in languages i have implemented this approach. But > for > languages like Tamil, how to implement? Do i need to find for filter > classes > of that particular language or any libraries in specific. > > Please help me on this. > > Thanks in advance. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Multi-lingual-search-tp4254398.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: plugging an analyzer
There are a lot of things that can go wrong when you're wiring up a custom analyzer. I'd first check the simple things: * Custom jar is in Solr's classpath * Not using the custom factory in a field type's analysis chain * Not declaring a field with that type * Not using that field in a document * Assuming the tokenizer/filter will be instantiated directly rather than through the factory interfaces. Hope that helps! k/r, Scott On Tue, Feb 2, 2016 at 3:04 AM, Roxana Danger < roxana.dan...@reedonline.co.uk> wrote: > Hello, > I would like to use some code embedded on an analyser. The problem is that > I need to pass some parameters for initializing it. My though was to create > a plugin and initialize the parameters with the init( Map > args ) or init( NamedList args ) methods as explained in > http://wiki.apache.org/solr/SolrPlugins. > But none of these methods are called when the schema is read and the > analyser constructed. I have also tried implementing the > ResourceLoaderAware interface, but the inform() method is not called > either. > I am missing something to have my analyser running? When these init methods > are and how can I trigger their call? Any suggestion that does not imply to > divide the code on Tokenizer/Filters? > > Thank you very much in advance, > Roxana > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: How to achieve exact string match query which includes spaces and quotes
This might be a good case for the Raw query parser (I haven't used it myself). https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-RawQueryParser k/r, Scott On Wed, Jan 13, 2016 at 12:05 PM, Erick Erickson wrote: > what _does_ matter is getting all that through the parser which means > you have to enclose things in quotes and escape them. > > For instance, consider this query stringFIeld:abc "i am not" > > this will get parsed as > stringField:abc defaultTextField:"i am not". > > To get around this you need to make sure the entire search gets > through the parser as a _single_ token by enclosing in quotes. But > then of course you have confusion because you have quotes in your > search term so you need to escape those, something like > stringField:"abc \"i am not\"" > > Here's a list for Lucene 5 > > https://lucene.apache.org/core/5_1_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters > > Best, > Erick > > On Wed, Jan 13, 2016 at 3:39 AM, Binoy Dalal > wrote: > > No. > > > > On Wed, 13 Jan 2016, 16:58 Alok Bhandari < > alokomprakashbhand...@gmail.com> > > wrote: > > > >> Hi Binoy thanks. > >> > >> But does it matter which query-parser I use , shall I use "lucene" > parser > >> or > >> "edismax" parser. > >> > >> > >> > >> -- > >> View this message in context: > >> > http://lucene.472066.n3.nabble.com/How-to-achieve-exact-string-match-query-which-includes-spaces-and-quotes-tp4250402p4250405.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > > -- > > Regards, > > Binoy Dalal > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Permutations of entries in a multivalued field
Johannes, I think your best bet is to create a QParserPlugin that orders the terms of the incoming query. It sounds like you have control over the way that field is indexed, so you could enforce the same ordering when the document comes into Solr. If that's not the case then you'll also want to write an UpdateRequestProcessor: https://wiki.apache.org/solr/UpdateRequestProcessor Using a phrase query is probably not an option since you're probably working with > 3 terms and phrase slop wouldn't be able to extend past that. Hope that helps! -Scott On Wed, Dec 16, 2015 at 8:38 AM, Johannes Riedl < johannes.ri...@uni-tuebingen.de> wrote: > Hello all, > > we are facing the following problem: we use a multivalued string field > that contains entries of the kind A/B/C/, where A,B,C are terms. > We are now looking for a simple way to also find all permutations of > A/B/C, so e.g. B/A/C. As a workaround we added a new field that contains > all entries alphabetically sorted and guarantee sorting on the user side. > However - since this is limited in some ways - is there a simple way to > either index in a way such that solely A/B/C and all permutations are found > (using e.g. type=text is not an option since a term could occur in a > different entry of the multivalued field) or trigger an alphabetical > sorting of incoming queries. > > Thanks a lot for your feedback, best regards > > Johannes > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: query to get parents without childs
Hi Novin, How are you associating parents with children? Is it a "children" multivalued field in the parent record? If so you could query for records that don't have a value in that field like "-children:[* TO *]" k/r, Scott On Wed, Dec 16, 2015 at 7:29 AM, Novin Novin wrote: > Hi guys, > > I have few parent index without child, what would wold be the query for > those to get? > > Thanks, > Novin > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Highlighting large documents
There are two things going on that you should be aware of. The first is, Solr Highlighting is mainly concerned about putting a representative snippet in a results listing. There are a couple of configuration changes you need to do if you want to highlight a whole document, like setting the fragListBuilder to SingleFragListBuilder and the maxAnalyzedChars setting you've already mentioned: https://wiki.apache.org/solr/HighlightingParameters#hl.fragsize Because full document highlighting is so different from highlighting snippets in a result list you'll want to configure two different highlighters: One for snippets and one for the full document. The other thing you need to know is that performance in highlighting is an active area of development. Right now the top docs in the current result list are calculated completely separate from the snippets (highlighting), which can lead to problems when the most relevant snippets are later in the document. What most people do is compromise by making the result list fast but inaccurate, and having the full-document highlight be accurate but slower. Hope that helps, -Scott On Fri, Dec 4, 2015 at 11:12 AM, Andrea Gazzarini wrote: > No no, sorry, the project is not yet started so I didn't experience your > issue, but I'll be a careful listener of this thread > > Best, > Andrea > > 2015-12-04 17:04 GMT+01:00 Zheng Lin Edwin Yeo : > > > Hi Andrea, > > > > I'm using the original highlighter. > > > > Below is my configuration for the highlighter in solrconfig.xml > > > > > > > >explicit > >10 > >json > >true > > text > > id, title, content_type, last_modified, url, score > > > > > on > >id, title, content, author > > true > >true > >html > > 200 > > 100 > > > > true > > signature > > true > > 100 > > > > > > > > > > Have you managed to solve the problem? > > > > Regards, > > Edwin > > > > > > On 4 December 2015 at 23:54, Andrea Gazzarini > > wrote: > > > > > Hi Zheng, > > > just curiousity, because shortly I will have to deal with a similar > > > scenario (Solr 5.3.1 + large documents + highlighting). > > > Which highlighter are you using? > > > > > > Andrea > > > > > > 2015-12-04 16:51 GMT+01:00 Zheng Lin Edwin Yeo : > > > > > > > Hi, > > > > > > > > I'm using Solr 5.3.0 > > > > > > > > I found that in large documents, sometimes I face situation that > when I > > > do > > > > a highlight query, the resultset that is returned does not contain > the > > > > highlighted query. There are actually matches in the documents, but > > just > > > > that they located further back in the documents. > > > > > > > > I have tried to increase the value of the hl.maxAnalyzedChars, as the > > > > default value is 51200, and I have documents that are much larger > than > > > > 51200 characters. Although this method works, but, when I increase > this > > > > value, the performance of the search and highlight drops. It can drop > > > from > > > > less than 0.5 seconds to more than 10 seconds. > > > > > > > > Would like to check, is this method of increasing the value of the > > > > hl.maxAnalyzedChars the best method to use, or is there other ways > > which > > > > can solve the same purpose, but without affecting the performance > much? > > > > > > > > Regards, > > > > Edwin > > > > > > > > > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Highlighting tag problem
I see. There appears to be a gap in what you can match on and what will get highlighted: id, title, content_type, last_modified, url, score id, title, content, author, tag Unless you override fl or hl.fl in url parameters you can get a hit in content_type, last_modified, url, or score and those fields will not get highlighted. Try adding those fields to hl.fl. k/r, Scott On Fri, Dec 4, 2015 at 12:59 AM, Zheng Lin Edwin Yeo wrote: > Hi Scott, > > No, what's describe in SOLR-8334 is the tag appearing at the result, but at > the wrong position. > > For this problem, the situation is that when I do a highlight query, some > of the results in the resultset does not contain the search word in title, > content_type, last_modified and url, as specified in my solrconfig.xml > which I'm posted earlier on, and there is no tag in those results. So > I'm not sure why those results are returned. > > Regards, > Edwin > > > On 4 December 2015 at 01:03, Scott Stults < > sstu...@opensourceconnections.com > > wrote: > > > Edwin, > > > > Is this related to what's described in SOLR-8334? > > > > > > k/r, > > Scott > > > > On Thu, Dec 3, 2015 at 5:07 AM, Zheng Lin Edwin Yeo < > edwinye...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I'm using Solr 5.3.0. > > > Would like to find out, during a search, sometimes there is a match in > > > content, but it is not highlighted (the word is not in the stopword > > list)? > > > Did I make any mistakes in my configuration? > > > > > > This is my highlighting request handler from solrconfig.xml. > > > > > > > > > > > > explicit > > > 10 > > > json > > > true > > > text > > > id, title, content_type, last_modified, url, score > > > > > > > on > > > id, title, content, author, tag > > >true > > > true > > > html > > > 200 > > > > > > true > > > signature > > > true > > > 100 > > > > > > > > > > > > > > > This is my pipeline for the field. > > > > > > > > positionIncrementGap="100"> > > > > > > > > > > > > class="analyzer.solr5.jieba.JiebaTokenizerFactory" > > > segMode="SEARCH"/> > > > > > > > > > > > > > > > > > > > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/> > > > > > > > > words="stopwords.txt" /> > > > > > > > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > > > > > > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/> > > > > > > > > > > > > > > maxGramSize="15"/> > > > > > > > > > > > > > > > > > > class="analyzer.solr5.jieba.JiebaTokenizerFactory" > > > segMode="SEARCH"/> > > > > > > > > > > > > > > > > > > > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/> > > > > > > > > words="stopwords.txt" /> > > > > > > > > generateWordParts="0" generateNumberParts="0" catenateWords="0" > > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> > > > > > > > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/> > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > Edwin > > > > > > > > > > > -- > > Scott Stults | Founder & Solutions Architect | OpenSource Connections, > LLC > > | 434.409.2780 > > http://www.opensourceconnections.com > > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Highlighting tag problem
Edwin, Is this related to what's described in SOLR-8334? k/r, Scott On Thu, Dec 3, 2015 at 5:07 AM, Zheng Lin Edwin Yeo wrote: > Hi, > > I'm using Solr 5.3.0. > Would like to find out, during a search, sometimes there is a match in > content, but it is not highlighted (the word is not in the stopword list)? > Did I make any mistakes in my configuration? > > This is my highlighting request handler from solrconfig.xml. > > > > explicit > 10 > json > true > text > id, title, content_type, last_modified, url, score > > on > id, title, content, author, tag >true > true > html > 200 > > true > signature > true > 100 > > > > > This is my pipeline for the field. > > positionIncrementGap="100"> > > > > segMode="SEARCH"/> > > > > > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/> > > words="stopwords.txt" /> > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/> > > > > maxGramSize="15"/> > > > > > > segMode="SEARCH"/> > > > > > > words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/> > > words="stopwords.txt" /> > > generateWordParts="0" generateNumberParts="0" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> > > synonyms="synonyms.txt" ignoreCase="true" expand="false"/> > > > > > > > > > Regards, > Edwin > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Different Similarities for the same field
I haven't tried this before (overriding default similarity in a custom SearchComponent), but it looks like it should be possible. In QueryComponent.process() you can get a hold of the SolrIndexSearcher and call setSimilarity(). It also looks like this is set only once by default when the searcher is created, so you may need to set it back to the default similarity when you're done. k/r, Scott On Tue, Nov 24, 2015 at 10:25 AM, Markus, Sascha wrote: > Hi, > I implemented a Similarity which is based on the DefaultSimilarity changing > the calculation for the idf. > To work with this CustomSimilarity and the DefaultSimilarity from our > application I have one field with the default and a copyfield with my > similarity. > Concerning the extra space needed for this field I wonder if there is a way > to have my similarity or the default one on the SAME field. Because there > are no differences for the index. E.g. by creating a SearchComponent to > have something like solr/mySelect for queries with my similarity and the > usual solr/select for the default similarity? > How could I achive this, has anybody a hint? > > Cheers, > Sascha > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Highlighting content field problem when using JiebaTokenizerFactory
t; >>> > Jieba and StopFilter. The problem is still there.* > >>> > > >>> > 2.Does this problem occur only on Chinese search words? Does it > happen > >>> on > >>> > English search words? > >>> > *A) Yes, the same problem occurs on English words. For example, when > I > >>> > search for "word", it will highlight in this way: word* > >>> > > >>> > 3.To use FastVectorHighlighter, you seem to have to enable 3 term* > >>> > parameters in field declaration? I see only one is enabled. Please > >>> refer to > >>> > the answer in this stackoverflow question: > >>> > > >>> > > >>> > http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-search-phrase-only > >>> > *A) I have tried to enable all 3 terms in the FastVectorHighlighter > >>> too, > >>> > >>> > but the same problem persists as well.* > >>> > > >>> > > >>> > Regards, > >>> > Edwin > >>> > > >>> > > >>> > On 22 October 2015 at 16:25, Scott Chu >>> <+scott@udngroup.com> > >>> > <+scott@udngroup.com <+scott@udngroup.com>>> wrote: > >>> > > >>> > > Hi solr-user, > >>> > > > >>> > > Can't judge the cause on fast glimpse of your definition but some > >>> > > suggestions I can give: > >>> > > > >>> > > 1. I take a look at Jieba. It uses a dictionary and it seems to do > a > >>> good > >>> > > job on CJK. I doubt this problem may be from those filters (note: I > >>> can > >>> > > understand you may use CJKWidthFilter to convert Japanese but > doesn't > >>> > > understand why you use CJKBigramFilter and EdgeNGramFilter). Have > you > >>> > tried > >>> > > commenting out those filters, say leave only Jieba and StopFilter, > >>> and > >>> > >>> > see > >>> > > if this problem disppears? > >>> > > > >>> > > 2.Does this problem occur only on Chinese search words? Does it > >>> happen on > >>> > > English search words? > >>> > > > >>> > > 3.To use FastVectorHighlighter, you seem to have to enable 3 term* > >>> > > parameters in field declaration? I see only one is enabled. Please > >>> refer > >>> > to > >>> > > the answer in this stackoverflow question: > >>> > > > >>> > > >>> > http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-search-phrase-only > >>> > > > >>> > > > >>> > > Scott Chu,scott@udngroup.com <+scott@udngroup.com> <+ > >>> scott@udngroup.com <+scott@udngroup.com>> > >>> > > 2015/10/22 > >>> > > > >>> > > - Original Message - > >>> > > *From: *Zheng Lin Edwin Yeo >>> <+edwinye...@gmail.com> > >>> > <+edwinye...@gmail.com <+edwinye...@gmail.com>>> > >>> > > *To: *solr-user >>> <+solr-user@lucene.apache.org> > >>> > <+solr-user@lucene.apache.org <+solr-user@lucene.apache.org>>> > >>> > > *Date: *2015-10-20, 12:04:11 > >>> > > *Subject: *Re: Highlighting content field problem when using > >>> > > >>> > > JiebaTokenizerFactory > >>> > > > >>> > > Hi Scott, > >>> > > > >>> > > Here's my schema.xml for content and title, which uses > text_chinese. > >>> The > >>> > > >>> > > problem only occurs in content, and not in title. > >>> > > > >>> > > >>> stored="true" > >>> > > omitNorms="true" termVectors="true"/> > >>> > > stored="true" > >>> > > omitNorms="true" termVectors="true"/> > >>> > > > >>> > > > >>> > > >>> > > positionIncrementGap="100"> > >>> > > > >>> > > >>> > > segMode="SEARCH"
Re: Number of fields in qf & fq
Steve, Another thing debugQuery will give you is a breakdown of how much each field contributed to the final score of each hit. That's going to give you a nice shopping list of qf to weed out. k/r, Scott On Fri, Nov 20, 2015 at 9:26 AM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Hello Steve, > > debugQuery=true shows whether it's facets or query, whether it's query > parsing or searching (prepare vs process), cache statistics can tell about > its' efficiency; sometimes a problem is obvious from request parameters. > Simple sampling with jconsole or even by jstack can point on a smoking > gun. > > On Fri, Nov 20, 2015 at 4:08 PM, Steven White > wrote: > > > Thanks Erick. > > > > The 1500 fields is a design that I inherited. I'm trying to figure out > why > > it was done as such and what it will take to fix it. > > > > What about my other question: how does one go about debugging performance > > issues in Solr to find out where time is mostly spent? How do I know my > > Solr parameters, such as cache and what have you are set right? From > what > > I see, we are using the defaults off solrconfig.xml. > > > > I'm on Solr 5.2 > > > > Steve > > > > > > On Thu, Nov 19, 2015 at 11:36 PM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > > > An fq is still a single entry in your filterCache so from that > > > perspective it's the same. > > > > > > And to create that entry, you're still using all the underlying fields > > > to search, so they have to be loaded just like they would be in a q > > > clause. > > > > > > But really, the fundamental question here is why your design even has > > > 1,500 fields and, more specifically, why you would want to search them > > > all at once. From a 10,000 ft. view, that's a very suspect design. > > > > > > Best, > > > Erick > > > > > > On Thu, Nov 19, 2015 at 4:06 PM, Walter Underwood < > wun...@wunderwood.org > > > > > > wrote: > > > > The implementation for fq has changed from 4.x to 5.x, so I’ll let > > > someone else answer that in detail. > > > > > > > > In 4.x, the result of each filter query can be cached. After that, > they > > > are quite fast. > > > > > > > > wunder > > > > Walter Underwood > > > > wun...@wunderwood.org > > > > http://observer.wunderwood.org/ (my blog) > > > > > > > > > > > >> On Nov 19, 2015, at 3:59 PM, Steven White > > wrote: > > > >> > > > >> Thanks Walter. I see your point. Does this apply to fq as will? > > > >> > > > >> Also, how does one go about debugging performance issues in Solr to > > find > > > >> out where time is mostly spent? > > > >> > > > >> Steve > > > >> > > > >> On Thu, Nov 19, 2015 at 6:54 PM, Walter Underwood < > > > wun...@wunderwood.org> > > > >> wrote: > > > >> > > > >>> With one field in qf for a single-term query, Solr is fetching one > > > posting > > > >>> list. With 1500 fields, it is fetching 1500 posting lists. It could > > > easily > > > >>> be 1500 times slower. > > > >>> > > > >>> It might be even slower than that, because we can’t guarantee that: > > a) > > > >>> every algorithm in Solr is linear, b) that all those lists will fit > > in > > > >>> memory. > > > >>> > > > >>> wunder > > > >>> Walter Underwood > > > >>> wun...@wunderwood.org > > > >>> http://observer.wunderwood.org/ (my blog) > > > >>> > > > >>> > > > >>>> On Nov 19, 2015, at 3:46 PM, Steven White > > > wrote: > > > >>>> > > > >>>> Hi everyone > > > >>>> > > > >>>> What is considered too many fields for qf and fq? On average I > will > > > have > > > >>>> 1500 fields in qf and 100 in fq (all of which are OR'ed). > Assuming > > I > > > can > > > >>>> (I have to check with the design) for qf, if I cut it down to 1 > > field, > > > >>> will > > > >>>> I see noticeable performance improvement? It will take a lot of > > > effort > > > >>> to > > > >>>> test this which is why I'm asking first. > > > >>>> > > > >>>> As is, I'm seeing 2-5 sec response time for searches on an index > of > > 1 > > > >>>> million records with total index size (on disk) of 4 GB. I gave > > Solr > > > 2 > > > >>> GB > > > >>>> of RAM (also tested at 4 GB) in both cases Solr didn't use more > > then 1 > > > >>> GB. > > > >>>> > > > >>>> Thanks in advanced > > > >>>> > > > >>>> Steve > > > >>> > > > >>> > > > > > > > > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: StringIndexOutOfBoundsException using spellcheck and synonyms
hread.run(Thread.java:722) > > Derek > > -- > CONFIDENTIALITY NOTICE > This e-mail (including any attachments) may contain confidential and/or > privileged information. If you are not the intended recipient or have > received this e-mail in error, please inform the sender immediately and > delete this e-mail (including any attachments) from your computer, and you > must not use, disclose to anyone else or copy this e-mail (including any > attachments), whether in whole or in part. > This e-mail and any reply to it may be monitored for security, legal, > regulatory compliance and/or other appropriate reasons. -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Stopping Solr on Linux when run as a service
Steve, In short, don't worry: it all gets taken care of. The way services work on Linux is, when the system shuts down it will basically call "service (servicname) stop" on each service. That calls the bin/init.d/solr script with a "stop" argument, which in turn calls the bin/solr script with a "stop" argument (I'm referring to where the files are in the distribution, not where they get installed). k/r, Scott On Tue, Nov 10, 2015 at 9:40 AM, Steven White wrote: > Hi folks, > > This question maybe more of a Linux one vs. Solr, but I have to start > someplace. > > I'm reading this link > https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production > to get Solr on Linux (I'm more of a Windows guy). > > The page provides good intro on how to setup Solr to start as a service on > Linux. Now what I don't get is this: what happens when the system is > shutting down? How does Solr knows to shutdown gracefully when there is > noting on that page talks about issuing a "stop" command on system > shutdown? Can someone shed some light on this? Like I said, I'm more of a > "Windows" guy. > > Thanks in advanced!! > > Steve > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Solr Search: Access Control / Role based security
Susheel, This is perfectly fine for simple use-cases and has the benefit that the filterCache will help things stay nice and speedy. Apache ManifoldCF goes a bit further and ties back to your authentication and authorization mechanism: http://manifoldcf.apache.org/release/trunk/en_US/concepts.html#ManifoldCF+security+model k/r, Scott On Thu, Nov 5, 2015 at 2:26 PM, Susheel Kumar wrote: > Hi, > > I have seen couple of use cases / need where we want to restrict result of > search based on role of a user. For e.g. > > - if user role is admin, any document from the search result will be > returned > - if user role is manager, only documents intended for managers will be > returned > - if user role is worker, only documents intended for workers will be > returned > > Typical practise is to tag the documents with the roles (using a > multi-valued field) during indexing and then during search append filter > query to restrict result based on roles. > > Wondering if there is any other better way out there and if this common > requirement should be added as a Solr feature/plugin. > > The current security plugins are more towards making Solr apis/resources > secure not towards securing/controlling data during search. > > https://cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins > > > Please share your thoughts. > > Thanks, > Susheel > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Securing field level access permission by filtering the query itself
Good to hear! Depending on how far you want to take it, you can then scan the initial request coming in from the client (and the final response) for raw Solr fields -- that shouldn't happen. I've used mod_security as a general-purpose application firewall and would recommend it. k/r, Scott On Wed, Nov 4, 2015 at 1:40 PM, Douglas McGilvray wrote: > > Thanks Alessandro, I had overlooked the highlighting component. > > I will also add a reminder to exclude these fields from spellcheck fields, > (or maintain different spellcheck fields for different roles). > > @Scott - Once I started planning my code the penny finally dropped > regarding your point about aliasing the fields - it removes the need for > calculating which fields to request in the app itself. > > Regards, > D > > > > On 4 Nov 2015, at 14:53, Alessandro Benedetti > wrote: > > > > Of course it depends of all the query parameter you use and you process > in > > the response. > > The list you wrote should be ok if you use only those components. > > > > For example if you use highlight, it's not ok and you need to take care > of > > the highlighted fields as well. > > > > Cheers > > > > On 30 October 2015 at 14:51, Douglas McGilvray wrote: > > > >> > >> Scott thanks for the reply. I like the idea of mapping all the > fieldnames > >> internally, adding security through obscurity. My question therefore > would > >> be what is the definitive list of query parameters that one must filter > to > >> ensure a particular field is not exposed in the query response? Am I > >> missing in the following? > >> > >> fl > >> facect.field > >> facet.pivot > >> json.facet > >> terms.fl > >> > >> > >> kr > >> Douglas > >> > >> > >>> On 30 Oct 2015, at 07:37, Scott Stults < > >> sstu...@opensourceconnections.com> wrote: > >>> > >>> Douglas, > >>> > >>> Managing a per-user-group whitelist of fields outside of Solr seems the > >>> best approach. When the query comes in you can then filter out any > fields > >>> not contained in the whitelist before you send the request to Solr. The > >>> easy part will be to do that on URL parameters like fl. Depending on > how > >>> your app generates the actual query string, you may want to also scan > >> that > >>> for fielded query clauses (eg "badfield:value") and localParams (eg > >>> "{!dismax qf=badfield}value"). > >>> > >>> Secondly, you can map internal Solr fields to aliases using this syntax > >> in > >>> the fl parameter: "display_name:real_solr_name". So when the request > >> comes > >>> in from your app, first you'll map from the requested field alias names > >> to > >>> internal Solr names (while enforcing the whitelist), and then in the fl > >>> parameter supply the aliases you want sent in the response. > >>> > >>> > >>> k/r, > >>> Scott > >>> > >>> On Wed, Oct 28, 2015 at 6:58 PM, Douglas McGilvray > >> wrote: > >>> > >>>> Hi all, > >>>> > >>>> First I’d like to say the nested facets and the json facet api in > >>>> particular have made my world much better, I thank everyone involved, > >> you > >>>> are all awesome. > >>>> > >>>> In my implementation has much of the solr query building working on > the > >>>> browser, solr is behind a php server which acts as “proxy” and > doorman, > >>>> filtering at the document level according to user role and supplying > >> some > >>>> sensible maximums … > >>>> > >>>> However we now wish to filter just one or two potentially sensitive > >> fields > >>>> in one document type according to user role (as determined in the php > >>>> proxy). Duplicating documents (or cores) seems like overkill for just > >> two > >>>> fields in one document type .. I wondered if it would be feasible (in > >> the > >>>> interests of preventing malicious activity) to filter the query itself > >>>> whether it be parameters (fl, facet.fields, terms, etc) … or even deny > >> any > >>>> request in which fieldname occurs … > >>>> > >>>> Is there someway someone might obscure a fieldname in a request? > >>>> > >>>> Kind Regards & thanks in davacne, > >>>> Douglas > >>> > >>> > >>> > >>> > >>> -- > >>> Scott Stults | Founder & Solutions Architect | OpenSource Connections, > >> LLC > >>> | 434.409.2780 > >>> http://www.opensourceconnections.com > >> > >> > > > > > > -- > > -- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Question on index time de-duplication
At the top of the De-Duplication wiki page is a note about collapsing results. Once you have the signature (identical for each of the duplicates) you'll want to collapse your results, keeping the one with max date. https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results k/r, Scott On Thu, Oct 29, 2015 at 11:59 PM, Zheng Lin Edwin Yeo wrote: > Yes, you can try to use the SignatureUpdateProcessorFactory to do a hashing > of the content to a signature field, and group the signature field during > your search. > > You can find more information here: > https://cwiki.apache.org/confluence/display/solr/De-Duplication > > I have been using this method to group the index with duplicated content, > and it is working fine. > > Regards, > Edwin > > > On 30 October 2015 at 07:20, Shamik Bandopadhyay > wrote: > > > Hi, > > > > I'm looking to customizing index time de-duplication. Here's my use > case > > and what I'm trying to achieve. > > > > I've identical documents coming from different release year of a given > > product. I need to index them in Solr as they are required in individual > > year context. But there's a generic search which spans across all the > years > > and hence bring back duplicate/identical content. My goal is to only > return > > the latest document and filter out the rest. For e.g. if product A has > > identical documents for 2015, 2014 and 2013, search should only return > 2015 > > (latest document) and filter out the rest. > > > > What I'm thinking (if possible) during index time : > > > > Index all documents, but add a special tag (e.g. dedup=true) to 2013 and > > 2014 content, keeping 2015 (the latest release) untouched. During query > > time, I'll add a filter which will exclude contents tagged with "dedup". > > > > Just wondering if this is achievable by perhaps extending > > UpdateRequestProcessorFactory or > > customizing SignatureUpdateProcessorFactory ? > > > > Any pointers will be appreciated. > > > > Regards, > > Shamik > > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Securing field level access permission by filtering the query itself
Douglas, Managing a per-user-group whitelist of fields outside of Solr seems the best approach. When the query comes in you can then filter out any fields not contained in the whitelist before you send the request to Solr. The easy part will be to do that on URL parameters like fl. Depending on how your app generates the actual query string, you may want to also scan that for fielded query clauses (eg "badfield:value") and localParams (eg "{!dismax qf=badfield}value"). Secondly, you can map internal Solr fields to aliases using this syntax in the fl parameter: "display_name:real_solr_name". So when the request comes in from your app, first you'll map from the requested field alias names to internal Solr names (while enforcing the whitelist), and then in the fl parameter supply the aliases you want sent in the response. k/r, Scott On Wed, Oct 28, 2015 at 6:58 PM, Douglas McGilvray wrote: > Hi all, > > First I’d like to say the nested facets and the json facet api in > particular have made my world much better, I thank everyone involved, you > are all awesome. > > In my implementation has much of the solr query building working on the > browser, solr is behind a php server which acts as “proxy” and doorman, > filtering at the document level according to user role and supplying some > sensible maximums … > > However we now wish to filter just one or two potentially sensitive fields > in one document type according to user role (as determined in the php > proxy). Duplicating documents (or cores) seems like overkill for just two > fields in one document type .. I wondered if it would be feasible (in the > interests of preventing malicious activity) to filter the query itself > whether it be parameters (fl, facet.fields, terms, etc) … or even deny any > request in which fieldname occurs … > > Is there someway someone might obscure a fieldname in a request? > > Kind Regards & thanks in davacne, > Douglas -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Solr collection alias - how rank is affected
Collection statistics aren't shared between collections, so there's going to be a difference. However, if the distribution is fairly random you won't notice. On Tue, Oct 27, 2015 at 3:21 PM, SolrUser1543 wrote: > How is document ranking is affected when using a collection alias for > searching on two collections with same schema ? is it affected at all ? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-collection-alias-how-rank-is-affected-tp4236776.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Does docValues impact termfreq ?
gt;>>>>>>>> wrote: > >>>>> > >>>>>> If you mean using the term frequency function query, then > >>>>>>>>>>> > >>>>>>>>>> I'm > >>> > >>>> not > >>>>> > >>>>>> sure > >>>>>>> > >>>>>>>> there's a huge amount you can do to improve performance. > >>>>>>>>>>> > >>>>>>>>>>> The term frequency is a number that is used often, so it is > >>>>>>>>>>> > >>>>>>>>>> stored > >>>>> > >>>>>> in > >>>>>>> > >>>>>>>> the index pre-calculated. Perhaps, if your data is not > >>>>>>>>>>> > >>>>>>>>>> changing, > >>>>> > >>>>>> optimising your index would reduce it to one segment, and > >>>>>>>>>>> > >>>>>>>>>> thus > >>> > >>>> might > >>>>>>> > >>>>>>>> ever so slightly speed the aggregation of term frequencies, > >>>>>>>>>>> > >>>>>>>>>> but I > >>>>> > >>>>>> doubt > >>>>>>> > >>>>>>>> it'd make enough difference to make it worth doing. > >>>>>>>>>>> > >>>>>>>>>>> Upayavira > >>>>>>>>>>> > >>>>>>>>>>> On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Thanks, Jack. I did some more research and found similar > >>>>>>>>>>>> > >>>>>>>>>>> results. > >>>>> > >>>>>> In our application, we are making multiple (think: 50) > >>>>>>>>>>>> > >>>>>>>>>>> concurrent > >>>>> > >>>>>> requests > >>>>>>>>>>>> to calculate term frequency on a set of documents in > >>>>>>>>>>>> > >>>>>>>>>>> "real-time". The > >>>>>>> > >>>>>>>> faster that results return, the better. > >>>>>>>>>>>> > >>>>>>>>>>>> Most of these requests are unique, so cache only helps > >>>>>>>>>>>> > >>>>>>>>>>> slightly. > >>>>> > >>>>>> This analysis is happening on a single solr instance. > >>>>>>>>>>>> > >>>>>>>>>>>> Other than moving to solr cloud and splitting out the > >>>>>>>>>>>> > >>>>>>>>>>> processing > >>>>> > >>>>>> onto > >>>>>>> > >>>>>>>> multiple servers, do you have any suggestions for what > >>>>>>>>>>>> > >>>>>>>>>>> might > >>> > >>>> speed up > >>>>>>> > >>>>>>>> termfreq at query time? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Aki > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Oct 23, 2015 at 7:21 PM, Jack Krupansky > >>>>>>>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Term frequency applies only to the indexed terms of a > >>>>>>>>>>>>> > >>>>>>>>>>>> tokenized > >>>>> > >>>>>> field. > >>>>>>>>>> > >>>>>>>>>>> DocValues is really just a copy of the original source > >>>>>>>>>>>>> > >>>>>>>>>>>> text > >>> > >>>> and is > >>>>>>> > >>>>>>>> not > >>>>>>>>>> > >>>>>>>>>>> tokenized into terms. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Maybe you could explain how exactly you are using term > >>>>>>>>>>>>> > >>>>>>>>>>>> frequency in > >>>>>>> > >>>>>>>> function queries. More importantly, what is so "heavy" > >>>>>>>>>>>>> > >>>>>>>>>>>> about > >>>>> > >>>>>> your > >>>>>>> > >>>>>>>> usage? > >>>>>>>>>>> > >>>>>>>>>>>> Generally, moderate use of a feature is much more > >>>>>>>>>>>>> > >>>>>>>>>>>> advisable to > >>>>> > >>>>>> heavy > >>>>>>>>> > >>>>>>>>>> usage, > >>>>>>>>>>> > >>>>>>>>>>>> unless you don't care about performance. > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- Jack Krupansky > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh < > >>>>>>>>>>>>> > >>>>>>>>>>>> a...@marketmuse.com> > >>>>>>> > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hello, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> In our solr application, we use a Function Query > >>>>>>>>>>>>>> > >>>>>>>>>>>>> (termfreq) > >>>>> > >>>>>> very > >>>>>>> > >>>>>>>> heavily. > >>>>>>>>>>> > >>>>>>>>>>>> Index time and disk space are not important, but > >>>>>>>>>>>>>> > >>>>>>>>>>>>> we're > >>> > >>>> looking to > >>>>>>> > >>>>>>>> improve > >>>>>>>>>>> > >>>>>>>>>>>> performance on termfreq at query time. > >>>>>>>>>>>>>> I've been reading up on docValues. Would this be a > >>>>>>>>>>>>>> > >>>>>>>>>>>>> way to > >>> > >>>> improve > >>>>>>> > >>>>>>>> performance? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I had read that Lucene uses Field Cache for Function > >>>>>>>>>>>>>> > >>>>>>>>>>>>> Queries, so > >>>>>>> > >>>>>>>> performance may not be affected. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> And, any general suggestions for improving query > >>>>>>>>>>>>>> > >>>>>>>>>>>>> performance > >>>>> > >>>>>> on > >>>>>>> > >>>>>>>> Function > >>>>>>>>>>> > >>>>>>>>>>>> Queries? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> Aki > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > > -- > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > Solr & Elasticsearch Support * http://sematext.com/ > > < > https://t.yesware.com/tl/506312808dab13214164f92fbcf5714d3ce38c6b/92f5492fd055692ff7f03b2888be3b50/7a8fd1f72b93af5d79583420b3483a7d?ytl=http%3A%2F%2Fsematext.com%2F > > > > > > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Highlight with NGram and German S Sharp "ß"
Yep, I misunderstood the problem. The multiple tokens at the same offset might be messing things up. One thing you can do is copyField to a field that doesn't have n-grams and do something like f.textng.hl.alternateField= in your solrconfig. That'll use the other field during highlighting. Yeah, that'll increase your index size on disk. On Fri, Oct 16, 2015 at 10:07 AM, Jérôme Bernardes < jerome.bernar...@mappy.com> wrote: > Thanks for your reply Scott. > > I tried > > bs.language=de&bs.country=de > > Unfortunately the problem still occurs. > I have just discovered that the problem does not only affect "ß" but also > "æ" (which is mapped to "ae" > at query and index time) > q=hae --> hæna > So it seems to me that the problem is related to any single character that > is map to several characters using class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > > Jérôme > > > Le 13/10/2015 07:46, Scott Stults a écrit : > >> My guess is that the boundary scanner isn't configured right for your >> highlighter. Try setting the bs.language and bs.country parameters either >> in your request or in the requestHandler. >> >> >> k/r, >> Scott >> >> On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes < >> jerome.bernar...@mappy.com >> >>> wrote: >>> Dear Solr Users, >>> I am facing a problem with highligting on ngram fields. >>> Highlighting is working well, except for words with german character >>> "ß". >>> Eg : with q=rosen& >>> "highlighting": { >>> "gcl3r:12723710:6643": { >>> "textng": [ >>> "Rosensteinpark (Métro), Stuttgart (Allemagne)" >>> ] >>> }, >>> "gcl3r:2267495:780930": { >>> "textng": [ >>> "Rosenstraße, 94554 Moos (Allemagne)" >>> ] >>> } >>> } >>> Without "ß" words are highlight partially Rosensteinpark but >>> with "ß", the whole word is highlighted (Rosenstraße) >>> >>> - >>> This characters ß is mapped to "ss" at query and index time (using >>> >> mapping="mapping-ISOLatin1Accent.txt"/> >>> >>> ) >>> . >>> Here the schema.xml for the highlighted field. >>> >>> >>> >> mapping="mapping-ISOLatin1Accent.txt"/> >>> >>> >> pattern="[\s,;: >>> \-\']"/> >>> >> splitOnNumerics="0" >>> generateWordParts="1" >>> generateNumberParts="1" >>> catenateWords="0" >>> catenateNumbers="0" >>> catenateAll="0" >>> splitOnCaseChange="1" >>> preserveOriginal="1" >>> types="wdfftypes.txt" >>> /> >>> >>> >> ignoreCase="true" expand="true"/> >>> >> minGramSize="1"/> >>> >>> >>> >>> >> mapping="mapping-ISOLatin1Accent.txt"/> >>> >>> >> pattern="[\s,;: >>> \-\']"/> >>> >> splitOnNumerics="0" >>> generateWordParts="1" >>> generateNumberParts="0" >>> catenateWords="0" >>> catenateNumbers="0" >>> catenateAll="0" >>> splitOnCaseChange="0" >>> preserveOriginal="1" >>> types="wdfftypes.txt" >>> /> >>> >>> >>> >> pattern="^(.{20})(.*)?" replacement="$1" replace="all"/> >>> >>> >>> >>> Is it a problem in our configuration or a known bug ? >>> Regards >>> Jérôme >>> >>> >>> >> > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Autostart Zookeeper and Solr using scripting
Hi Adrian, I'd probably start with the expect command and "echo ruok | nc " for a simple script. You might also want to try the Netflix Exhibitor REST interface: https://github.com/Netflix/exhibitor/wiki/REST-Cluster k/r, Scott On Thu, Oct 15, 2015 at 2:01 AM, Adrian Liew wrote: > Hi, > > I am trying to implement some scripting to detect if all Zookeepers have > started in a cluster, then restart the solr servers. Has anyone achieved > this yet through scripting? > > I also saw there is the ZookeeperClient that is available in .NET via a > nuget package. Not sure if this could be also implemented to check if a > zookeeper is running. > > Any thoughts on anyone using a script to perform this? > > Regards, > Adrian > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Highlighting content field problem when using JiebaTokenizerFactory
Edwin, Try setting hl.bs.language and hl.bs.country in your request or requestHandler: https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter#FastVectorHighlighter-UsingBoundaryScannerswiththeFastVectorHighlighter -Scott On Tue, Oct 13, 2015 at 5:04 AM, Zheng Lin Edwin Yeo wrote: > Hi, > > I'm trying to use the JiebaTokenizerFactory to index Chinese characters in > Solr. It works fine with the segmentation when I'm using > the Analysis function on the Solr Admin UI. > > However, when I tried to do the highlighting in Solr, it is not > highlighting in the correct place. For example, when I search of 自然环境与企业本身, > it highlight 认为自然环境与企业本身的 > > Even when I search for English character like responsibility, it highlight > *responsibilit*y. > > Basically, the highlighting goes off by 1 character/space consistently. > > This problem only happens in content field, and not in any other fields. > Does anyone knows what could be causing the issue? > > I'm using jieba-analysis-1.0.0, Solr 5.3.0 and Lucene 5.3.0. > > > Regards, > Edwin > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Highlight with NGram and German S Sharp "ß"
My guess is that the boundary scanner isn't configured right for your highlighter. Try setting the bs.language and bs.country parameters either in your request or in the requestHandler. k/r, Scott On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes wrote: > Dear Solr Users, > I am facing a problem with highligting on ngram fields. > Highlighting is working well, except for words with german character > "ß". > Eg : with q=rosen& > "highlighting": { > "gcl3r:12723710:6643": { > "textng": [ > "Rosensteinpark (Métro), Stuttgart (Allemagne)" > ] > }, > "gcl3r:2267495:780930": { > "textng": [ > "Rosenstraße, 94554 Moos (Allemagne)" > ] > } > } > Without "ß" words are highlight partially Rosensteinpark but > with "ß", the whole word is highlighted (Rosenstraße) > > - > This characters ß is mapped to "ss" at query and index time (using > mapping="mapping-ISOLatin1Accent.txt"/> > > ) > . > Here the schema.xml for the highlighted field. > > > mapping="mapping-ISOLatin1Accent.txt"/> > > pattern="[\s,;: > \-\']"/> > splitOnNumerics="0" > generateWordParts="1" > generateNumberParts="1" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > splitOnCaseChange="1" > preserveOriginal="1" > types="wdfftypes.txt" > /> > > ignoreCase="true" expand="true"/> > minGramSize="1"/> > > > > mapping="mapping-ISOLatin1Accent.txt"/> > > pattern="[\s,;: > \-\']"/> > splitOnNumerics="0" > generateWordParts="1" > generateNumberParts="0" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > splitOnCaseChange="0" > preserveOriginal="1" > types="wdfftypes.txt" > /> > > > pattern="^(.{20})(.*)?" replacement="$1" replace="all"/> > > > > Is it a problem in our configuration or a known bug ? > Regards > Jérôme > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Selective field query
Colin, The other thing you'll want to keep in mind (and you'll find this out with debugQuery) is that the query parser is going to take your ServiceName:(Search Service) and turn it into two queries -- ServiceName:(Search) ServiceName:(Service). That's because the query parser breaks on whitespace. My bet is you have a lot of entries with a name of "X Service" and the second part of your query is hitting them. Phrase Field might be your friend here: https://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29 -Scott On Mon, Oct 12, 2015 at 4:15 AM, Colin Hunter wrote: > Thanks Erick, I'm sure this will be valuable in implementing ngram filter > factory > > On Fri, Oct 9, 2015 at 4:38 PM, Erick Erickson > wrote: > > > Colin: > > > > Adding &debug=all to your query is your friend here, the > > parsed_query.toString will show you exactly what > > is searched against. > > > > Best, > > Erick > > > > On Fri, Oct 9, 2015 at 2:09 AM, Colin Hunter > wrote: > > > Ah ha... the copy field... makes sense. > > > Thank You. > > > > > > On Fri, Oct 9, 2015 at 10:04 AM, Upayavira wrote: > > > > > >> > > >> > > >> On Fri, Oct 9, 2015, at 09:54 AM, Colin Hunter wrote: > > >> > Hi > > >> > > > >> > I am working on a complex search utility with an index created via > > data > > >> > import from an extensive MySQL database. > > >> > There are many ways in which the index is searched. One of the > utility > > >> > input fields searches only on a Service Name. However, if I target > the > > >> > query as q=ServiceName:"Searched service", this only returns an > exact > > >> > string match. If q=Searched Service, the query still returns results > > from > > >> > all indexed data. > > >> > > > >> > Is there a way to construct a query to only return results from one > > field > > >> > of a doc ? > > >> > I have tried setting index=false, stored=true on unwanted fields, > but > > >> > these > > >> > appear to have still been returned in results. > > >> > > >> q=ServiceName:(Searched Service) > > >> > > >> That'll look in just one field. > > >> > > >> Remember changing indexed to false doesn't impact the stuff already in > > >> your index. And the reason you are likely getting all that stuff is > > >> because you have a copyField that copies it over into the 'text' > field. > > >> If you'll never want to search on some fields, switch them to > > >> index=false, make sure you aren't doing a copyField on them, and then > > >> reindex. > > >> > > >> Upayavira > > >> > > > > > > > > > > > > -- > > > www.gfc.uk.net > > > > > > -- > www.gfc.uk.net > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: are there any SolrCloud supervisors?
Something like Exhibitor for Zookeeper? Very cool! Don't worry too much about cleaning up the repo. When it comes time to integrate it with Solr or make it an Apache top-level project you can start with a fresh commit history :) -Scott On Fri, Oct 2, 2015 at 3:09 PM, r b wrote: > I've been working on something that just monitors ZooKeeper to add and > remove nodes from collections. the use case being I put SolrCloud in > an autoscaling group on EC2 and as instances go up and down, I need > them added to the collection. It's something I've built for work and > could clean up to share on GitHub if there is much interest. > > I asked in the IRC about a SolrCloud supervisor utility but wanted to > extend that question to this list. are there any more "full featured" > supervisors out there? > > > -renning > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Why is Process Total Time greater than Elapsed Time?
Thanks Hoss, sorry I wasn't clear. By Process Total Time I mean this structure in the debug response: debug timing process time Elapsed time is what I get from SolrJ's API: SolrClient.quey().getElapsedTime(). So I really expect elapsed time to be the greatest duration of all values. Do you know why that's not the case? Thank you, Scott On Thu, Sep 3, 2015 at 4:41 PM, Chris Hostetter wrote: > > depends on where you are reading "Process Total Time" from. that > terminology isn't something i've ever sen used in the context of solr > (fairly certain nothing in solr refers to anything that way) > > QTime is the amount of time spent processing a request before it starts > being written out over the wire to the client, so it is almost garunteed > to be *less* then the total elapsed (wall clock) time witnessed by your > solrJ client ... but i have no idea what "Process Total Time" is if you > are seeing it greater then wall clock. > > : From what I can tell, each component processes the request sequentially. > So > : how can I see an Elapsed Time of 750ms (SolrJ client) and a Process Total > : Time of 1300ms? Does the Process Total Time add up the amount of time > each > : leaf reader takes, or some other concurrent things? > > > -Hoss > http://www.lucidworks.com/ > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Why is Process Total Time greater than Elapsed Time?
>From what I can tell, each component processes the request sequentially. So how can I see an Elapsed Time of 750ms (SolrJ client) and a Process Total Time of 1300ms? Does the Process Total Time add up the amount of time each leaf reader takes, or some other concurrent things? Thank you, Scott
Re: Solr packages in Apache BigTop.
Jay, This is music to my ears. I've used the bigtop packages and would love to see the Solr portion of them keep pace with releases. Let me know where to start! Thank you, Scott On Sat, Mar 7, 2015 at 5:03 PM, jay vyas wrote: > Hi Solr. > > I work on the apache bigtop project, and am interested in integrating it > deeper with Solr, for example, for testing spark / solr integration cases. > > Is anyone in the Solr community interested in collborating on testing > releases with us and maintaining Solr packagins in bigtop (with our help of > course) ? > > The advantage here is that we can synergize efforts: When new SOLR > releases come out, we can test them in bigtop to gaurantee that there are > rpm/deb packages which work well with the hadoop ecosystem. > > For those that don't know, bigtop is the upstream apache bigdata packaging > project, we build hadoop, spark, solr, hbase and so on in rpm/deb format, > and supply puppet provisioners along with vagrant recipse for testing. > > -- > jay vyas > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Multi words query
A couple more things would help debug this. First, could you grab the specific Solr log entry when this query is sent? Also, have you changed the default schema at all? If you're querying "string" fields you have to exactly match what's indexed there, versus "text" which gets tokenized. k/r, Scott On Thu, Feb 12, 2015 at 4:22 AM, melb wrote: > I am using rub gem rsolr and querying simply the collection by this query: > > response = solr.get 'select', :params => { > :q=>query, > :fl=> 'id,title,description,body' > :rows=>10 > } > > response["response"]["docs"].each{|doc| puts doc["id"] } > > I created a text field to copy all the fields to and the query handler > request this field > > rgds, > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Multi-words-query-tp4185625p4185922.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: bulk indexing with optimistick lock
This isn't a Solr-specific answer, but the easiest approach might be to just collect the document IDs you're about to add, query for them, and then filter out the ones Solr already has (this'll give you a nice list for later reporting). You'll need to keep your batch sizes below maxBooleanClauses in solrconfig.xml. Overall, this might be simpler to maintain and less prone to bugs. k/r, Scott On Wed, Feb 11, 2015 at 4:59 AM, Sankalp Gupta wrote: > Hi All, > My server side we are trying to add multiple documents in a list and then > ask solr to add them in solr (using solrj client) and then after its > finished calling the commit. > Now we also want to control concurrency and for that we wanted to use > solr's optimistic lock/versioning feature. That is good but *in case of > bulk docs add, the solr doesn't perform add docs as expected.* It fails as > soon as it finds any doc with optimistic lock failure and return response > telling only the first failed doc (adding all docs before that and no docs > are added after that). *We require solr to add all docs for which no > versioning problem is there and return list of all failed docs. * > Please can anyone suggest a way to do this? > > Regards > Sankalp Gupta > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: SpellingQueryConverter and query parsing
Thank you, James, I'll do that. ResponseBuilder carries around with it the QParser, Query, and query string, so getting suggestions from parsed query terms shouldn't be a big deal. What looks to be hard is rewriting the original query with the suggestions. That's probably why the regex is used instead of the parser. -Scott On Tue, Jan 27, 2015 at 1:37 PM, Dyer, James wrote: > Having worked with the spellchecking code for the last few years, I've > often wondered the same thing, but I never looked seriously into it. I'm > sure there's probably some serious hurdles, hence the Query Converter. The > easy thing to do here is to use "spellcheck.q", and then pass in > space-delimited keywords. This bypasses the query converter entirely for > custom situations like yours. > > But please, if you find a way to plug the actual query parser into > spellcheck, consider opening a jira & contributing the code, even if what > you end up with isn't in a final polished state for general use. > > James Dyer > Ingram Content Group > > > -Original Message- > From: Scott Stults [mailto:sstu...@opensourceconnections.com] > Sent: Tuesday, January 27, 2015 11:26 AM > To: solr-user@lucene.apache.org > Subject: SpellingQueryConverter and query parsing > > Hello! > > SpellingQueryConverter "parses" the incoming query in sort of a quick and > dirty way with a regular expression. Is there a reason the query string > isn't parsed with the _actual_ parser, if one was configured for that type > of request? Even better, could the parsed query object be added to the > response in some way so that the query wouldn't need to be parsed twice? > The individual terms could then be visited and substituted in-place without > needing to worry about preserving the meaning of operators in the query. > > The motive in my question is, I may need to implement a QueryConverter > because I'm using a custom parser, and using that parser in the > QueryConverter itself seems like the right thing to do. That wasn't done > though in SpellingQueryConverter, so I wan't to find out why before I go > blundering into a known minefield. > > > Thanks! > -Scott > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
SpellingQueryConverter and query parsing
Hello! SpellingQueryConverter "parses" the incoming query in sort of a quick and dirty way with a regular expression. Is there a reason the query string isn't parsed with the _actual_ parser, if one was configured for that type of request? Even better, could the parsed query object be added to the response in some way so that the query wouldn't need to be parsed twice? The individual terms could then be visited and substituted in-place without needing to worry about preserving the meaning of operators in the query. The motive in my question is, I may need to implement a QueryConverter because I'm using a custom parser, and using that parser in the QueryConverter itself seems like the right thing to do. That wasn't done though in SpellingQueryConverter, so I wan't to find out why before I go blundering into a known minefield. Thanks! -Scott
Re: zkCli zkhost parameter
I did, but it looks like I mixed in the chroot too after every entry rather than once at the very end (thanks to David Smiley for catching that). I'll try again and update if it's still a problem. Thanks! -Scott On Sat, Apr 26, 2014 at 1:08 PM, Mark Miller wrote: > Have you tried a comma-separated list or are you going by documentation? > It should work. > -- > Mark Miller > about.me/markrmiller > > On April 26, 2014 at 1:03:25 PM, Scott Stults ( > sstu...@opensourceconnections.com) wrote: > > It looks like this only takes a single host as its value, whereas the > zkHost environment variable for Solr takes a comma-separated list. > Shouldn't the client also take a comma-separated list? > > k/r, > Scott > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
zkCli zkhost parameter
It looks like this only takes a single host as its value, whereas the zkHost environment variable for Solr takes a comma-separated list. Shouldn't the client also take a comma-separated list? k/r, Scott
JVM tuning?
We've been using a slightly older version of this script to start Solr in server environments: https://github.com/apache/cassandra/blob/trunk/conf/cassandra-env.sh The thing I especially like about it is its ability to dynamically cap memory usage, and the garbage collection log section is a great reference when we need to check gc times. My question is, does anyone else use a script like this to configure the JVM for Solr? Would it be useful to have this as a reference in solr/example/etc? Thanks! -Scott
Re: Thoughts on production deployment?
There's an RPM project on GitHub that comes close: https://github.com/boogieshafer/jetty-solr-rpm On Fri, Feb 1, 2013 at 6:19 AM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > When I was referring to the "different version of Jetty," I meant Jetty > Plus, which the wiki mentions. Is this no longer true? > > My Chef recipe makes assumptions about the OS and EBS volumes being > available, which can easily be fixed. > > Michael > Thanks for jumping in guys. I agree the SolrJetty page needs just a little > updating -- I commented at the bottom of SOLR-3159 about that. > > Michael and Paul, are your chef and ant recipes generic enough to share? My > next install is going to be on RHEL 6, so I can take a crack at an install > script that'll work there. It wouldn't be hard to translate between shell > and chef. > > Michael: The problem with adding a dependency on Jetty in your chef recipe > is that it's going to grab whatever version of Jetty was blessed by the > distro maintainers on your target platform. > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Thoughts on production deployment?
Thanks for jumping in guys. I agree the SolrJetty page needs just a little updating -- I commented at the bottom of SOLR-3159 about that. Michael and Paul, are your chef and ant recipes generic enough to share? My next install is going to be on RHEL 6, so I can take a crack at an install script that'll work there. It wouldn't be hard to translate between shell and chef. Michael: The problem with adding a dependency on Jetty in your chef recipe is that it's going to grab whatever version of Jetty was blessed by the distro maintainers on your target platform.
Thoughts on production deployment?
Part of this is a rant, part is a plea to others who've run successful production deployments. Solr is a second-class citizen when it comes to production deployment. Every recipe I've seen (RPM, DEB, chef, or puppet) makes assumptions that in one way or another run afoul of best-practices when it comes to production use. And if you're not using one of these recipe formats to deploy Solr you're building a SnowflakeServer (Martin Fowler's term). Granted, Solr _can_ be deployed into any vanilla JEE container, so the deployment spec responsibility may be erroneously assigned to whichever you choose. BUT, if you want to get the maximum out of Solr you'll want to put it on its own box, running in its own tuned container, and that container should be the one that Solr's been tested on repeatedly by an army of build bots. Right now that blessed container is Jetty version 8.1.2.v20120308. So the first problem with the recipes is that they make a generic dependency of Jetty or Tomcat. The assumption there is that either can be treated as a generic OS facility to be shared with other apps. That's not true because Solr is the driving force behind which version is deployed. The container can't be up- or downgraded without affecting Solr, and any other app running in there needs to be aware that Solr is taking first priority. The next problem is that most recipes don't make a distinction between collections. "Solr" configuration goes in one folder, "Solr" data goes in another, and the logs and container stuff gets scattered likewise. In reality, every collection can be configured differently and there is no generic "Solr" data. Lastly, the package maintainers of all the major OS distributions have ignored Solr since around version 1.4. That means if you want a newer version you're going to download a tarball and make another snowflake. This might be attributable to thinking of Solr as just another web app that doesn't need special packaging. Regardless, the consequence is that the only people who are deploying Solr according to best-practices are those intimately familiar with Solr. So what's the best way to fix this situation? Solr already ships with everything it needs except Java and a start-up script. Maybe the first step is to include a generic "install.sh" script that has a couple distro-specific support scripts. That would be fairly agnostic toward package management systems and it would be useful to sysadmins right away. It would also help package maintainers update their build specs. What do _you_ think? -Scott
Re: Will SolrCloud always slice by ID hash?
Thanks guys. Yeah, separate rolling collections seem like the better way to go. -Scott On Sat, Dec 29, 2012 at 1:30 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > https://issues.apache.org/jira/browse/SOLR-4237
Will SolrCloud always slice by ID hash?
I'm going to be building a Solr cluster and I want to have a rolling set of slices so that I can keep a fixed number of days in my collection. If I send an update to a particular slice leader, will it always hash the unique key and (probably) forward the doc to another leader? Thank you, Scott
Re: Do Hignlighting + proximity using surround query parser
I got this working the way you describe it (in the getHighlightQuery() method). The span queries were tripping it up, so I extracted the query terms and created a DisMax query from them. There'll be a loss of accuracy in the highlighting, but in my case that's better than no highlighting. Should I just go ahead and submit a patch to SOLR-2703? On Tue, Jan 10, 2012 at 9:35 AM, Ahmet Arslan wrote: > > I am not able to do highlighting with surround query parser > > on the returned > > results. > > I have tried the highlighting component but it does not > > return highlighted > > results. > > Highlighter does not recognize Surround Query. It must be re-written to > enable highlighting in o.a.s.search.QParser#getHighlightQuery() method. > > Not sure this functionality should be added in SOLR-2703 or a separate > jira issue. > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com