Re: How to check if a solr core is down and is ready for a solr re-start
I have to ask a completely different question: "Why are the replicas down in the first place?" Having to restart the Solr node is a sledgehammer of a solution, I'd put more effort into finding out why that's happening in the first place. Are you getting OOM errors? Any other exception? Is the OOM killer script executing? Best, Erick On Thu, Aug 31, 2017 at 10:59 AM, Minu Theresa Thomas wrote: > Hello Team, > > I have few experiences where restart of a solr node is the only option when > a core goes down. I am trying to automate the restart of a solr server when > a core goes down or the replica is unresponsive over a period of time. > > I have a script to check if the cores/ replicas associated with a node is > up. I have two approaches - One is to get the cores from solr CLUSTERSTATUS > API and do a PING on each core. If atleast one core on the node doesn't > repond to ping, then mark that node down and do restart after few retries. > Second is to get the cores from the solr CLUSTERSTATUS API along with its > status. If the status is down, then mark that node down and do a restart > after few retries. > > Which is the best way/ recommended approach to check if a core associated > with a node is down and is ready for a solr service restart? > > Thanks!
Re: RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.
Hi Markus, I did encounter the error before. It was due to the usage of the memory of the server is very high, at almost 98%. The usage is not all contributed by Solr, there are other programs and Virtual Machines running on the server. What is the memory usage of the system, and do you have other applications running on the server? Regards, Edwin On 31 August 2017 at 16:07, Stephan Schubert wrote: > Hi Markus, > > I don't know what Client you use, but if you are using SolrJ enabling the > logging could be an option to "dig deeper" into the problem. This can be > the ouput for example via log4j on log level info: > > ... > 2017-08-31 10:01:56 INFO ZooKeeper:438 - Initiating client connection, > connectString=ZKHOST1:9983,ZKHOST2:9983,ZKHOST3:9983, > ZKHOST4:9983,ZKHOST5:9983 > sessionTimeout=60 > watcher=org.apache.solr.common.cloud.SolrZkClient$3@14379273 > 2017-08-31 10:01:56 INFO ClientCnxn:876 - Socket connection established > to SOLRHOST/ZKHOST3:9983, initiating session > 2017-08-31 10:01:56 INFO ClientCnxn:1299 - Session establishment complete > on server SOLRHOST/ZKHOST3:9983, sessionid = 0x45e35eaa9fd3584, negotiated > timeout = 4 > 2017-08-31 10:01:56 INFO ZkStateReader:688 - Updated live nodes from > ZooKeeper... (0) -> (4) > 2017-08-31 10:01:56 INFO ZkClientClusterStateProvider:134 - Cluster at > ZKHOST1:9983,ZKHOST2:9983,ZKHOST3:9983,ZKHOST4:9983,ZKHOST5:9983 ready > > > > > > Von:Markus Jelsma > An: solr-user@lucene.apache.org > Datum: 31.08.2017 10:00 > Betreff:RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are > disabled. > > > > Hello Stephan, > > I know that restarting stuff can sometimes cure what's wrong, but we are > nog going to, we want to get rid of the problem, not restart microsoft > windows whenever things run slow. Also, there is no indexing going on > right now. > > We also see these sometimes, this explains at least why it cannot talk to > Zookeeper, but why.. > o.a.s.c.RecoveryStrategy Socket timeout on send prep recovery cmd, > retrying.. > > This has been going on with just one of our nodes for over two hours, > other nodes are fine. And why is this bad node green in cluster overview? > > Thanks, > Markus > > -Original message- > > From:Stephan Schubert > > Sent: Thursday 31st August 2017 9:52 > > To: solr-user@lucene.apache.org > > Subject: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled. > > > > Hi markus, > > > > try to stop your indexing/update processes and restart your ZooKeeper > > instances (not all at the same time of course). This is what I do in > these > > cases and helped me so far. > > > > > > > > > > Von:Markus Jelsma > > An: Solr-user > > Datum: 31.08.2017 09:49 > > Betreff:6.6 Cannot talk to ZooKeeper - Updates are disabled. > > > > > > > > Hello, > > > > One node is behaving badly, at least according to the logs, but the node > > > is green in the cluster overview although the logs claim recovery fails > > all the time. It is not the first time this message pops up in the logs > of > > one of the nodes, why can it not talk to Zookeeper? I miss a reason. > > > > The cluster is not extremely busy at the moment, we allow plenty of file > > > descriptors, there are no firewall restrictions, i cannot think of any > > problem in our infrastructure. > > > > What's going on? What can i do? Can the error be explained a bit > further? > > > > Thanks, > > Markus > > > > 8/31/2017, 9:34:34 AM > > ERROR false > > RequestHandlerBase > > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > > > are disabled. > > 8/31/2017, 9:34:34 AM > > ERROR false > > RequestHandlerBase > > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > > > are disabled. > > 8/31/2017, 9:34:36 AM > > ERROR false > > RequestHandlerBase > > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > > > are disabled. > > 8/31/2017, 9:34:38 AM > > ERROR false > > RecoveryStrategy > > Could not publish as ACTIVE after succesful recovery > > 8/31/2017, 9:34:38 AM > > ERROR true > > RecoveryStrategy > > Recovery failed - trying again... (0) > > 8/31/2017, 9:34:49 AM > > ERROR false > > RequestHandlerBase > > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > > > are disabled. > > 8/31/2017, 9:34:49 AM > > ERROR false > > RequestHandlerBase > > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > > > are disabled. > > 8/31/2017, 9:34:50 AM > > ERROR false > > RecoveryStrategy > > Could not publish as ACTIVE after succesful recovery > > 8/31/2017, 9:34:50 AM > > ERROR false > > RecoveryStrategy > > Recovery failed - trying again... (1) > > 8/31/2017, 9:35:36 AM > > ERROR false > > RequestHandlerBase > > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > > > are disabled. > > > > > > > > > > > >
Re: Facet on a Payload field type?
if the "middle tier" of your application doesn't already have an easy key-value lookup that you keep this translation data in (which would suprise me, because i've never seen anyone care about this type of "late-binding" translation of serach results w/o also caring about late-binding translation of other aspects of the UI) then you could always create a side car collection in solr: one document per "word" using the english term as the id, with a lowercased copy field for searching + one field per langauge with the trnaslations if available. after doing your main query, toss all the facet.field terms in the response into a second query to your side car "translation" collection using the "terms" parser and setting the "rows" == the total number of terms you're asking to tnraslate and fl=id,fr (or fl=id,es ... whatever language the user wants) https://lucene.apache.org/solr/guide/6_6/other-parsers.html ...then use those results to translate the final output. : Date: Thu, 31 Aug 2017 14:12:38 -0500 : From: Webster Homer : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: Facet on a Payload field type? : : You are describing the idea pretty accurately. Apparently Endeca has : something that sort of supports this, which we used for the problem. : : On Thu, Aug 31, 2017 at 1:59 PM, Chris Hostetter : wrote: : : > : > ok, so lemme attempt to restate your objective to ensure no : > miscommunication: : > : > 1) you have fields like "color" : > 2) you want to index english words into the color field for docs : > 3) you want to search/filter against these fields using english words as : > input : > 4) you want to facet on the fields like "color" : > 5) you want the list of terms:counts displayed to the end user when : > faceting on these fields to be in a variety of different langauges, based : > on a "user_lang" option specified at query time and a set of known : > translations : > 6) if no english->user_lang translation is available for a particular : > term, you want to display the eglish workd when displaying the facet : > counts : > : > does that sound right? : > : > based on your objective, attempting to embed/encode the various : > translations into the terms when indexing (as payloads, or an : > alternative field or prefixed terms, etc...) seems like a vastly : > overcomplicated way to deal with this problem. : > : > If i were in your shoes, i would keep the translation aspect of the : > displya completely distinct from Solr, and after solr has returned the : > response then loop over the facet.field temrs and do a lookup in some : > other (cached) key/value translation mapping in your middle layer -- : > replacing the english word with the translation if it exists. : > : > This has the added benefit of allowing you to tweak the translations w/o : > reindexing any docs. : > : > Practically speaking: the idea of encoding these translations as payloads : > wouldn't make sense -- because payloads exist per *occurance* of the term : > -- ie: it wouldn't make sense to put "es=rojo;fr=rouge" in the payload of : > a term "red" when indexing a document, because you want those translations : > for all instances of red -- not just that instance of red in that : > singlular document. : > : > : > : > : Date: Mon, 28 Aug 2017 13:29:00 -0500 : > : From: Webster Homer : > : Reply-To: solr-user@lucene.apache.org : > : To: solr-user@lucene.apache.org : > : Subject: Re: Facet on a Payload field type? : > : : > : The issue is, that we lack translations for much of our attribute data. : > We : > : do have English versions. The idea is to use the English values for the : > : faceted values and for the filters, but be able to retrieve different : > : language versions of the term to the caller. : > : If we have a facet on color if the value is red, be able to retrieve rojo : > : for Spanish etc... : > : : > : Also users can switch regions between searches. If a user starts out in : > : French, executes a search, selects a facet then switches to German they : > : should get the German for the facet (if it exists) even when they : > : originally used French. If all of the searching was in English where we : > : have the data, we could then show French (or German etc) for the facet : > : value. : > : : > : The real field value that we use for filtering would be in English but : > the : > : values returned to the user would be in the language of their locale or : > : English if we don't have a translation for it. The idea being that the : > : translations would be stored in the payloads : > : : > : On Wed, Aug 23, 2017 at 7:47 PM, Chris Hostetter < : > hossman_luc...@fucit.org> : > : wrote: : > : : > : > : > : > : The payload idea was from my boss, it's similar to how they did this : > in : > : > : Endeca. : > : > ... : > : > : My alternate idea is to have sets of facet fields for different : > : > languages, : > : > : then let our service layer determine the correct on
Re: Facet on a Payload field type?
You are describing the idea pretty accurately. Apparently Endeca has something that sort of supports this, which we used for the problem. On Thu, Aug 31, 2017 at 1:59 PM, Chris Hostetter wrote: > > ok, so lemme attempt to restate your objective to ensure no > miscommunication: > > 1) you have fields like "color" > 2) you want to index english words into the color field for docs > 3) you want to search/filter against these fields using english words as > input > 4) you want to facet on the fields like "color" > 5) you want the list of terms:counts displayed to the end user when > faceting on these fields to be in a variety of different langauges, based > on a "user_lang" option specified at query time and a set of known > translations > 6) if no english->user_lang translation is available for a particular > term, you want to display the eglish workd when displaying the facet > counts > > does that sound right? > > based on your objective, attempting to embed/encode the various > translations into the terms when indexing (as payloads, or an > alternative field or prefixed terms, etc...) seems like a vastly > overcomplicated way to deal with this problem. > > If i were in your shoes, i would keep the translation aspect of the > displya completely distinct from Solr, and after solr has returned the > response then loop over the facet.field temrs and do a lookup in some > other (cached) key/value translation mapping in your middle layer -- > replacing the english word with the translation if it exists. > > This has the added benefit of allowing you to tweak the translations w/o > reindexing any docs. > > Practically speaking: the idea of encoding these translations as payloads > wouldn't make sense -- because payloads exist per *occurance* of the term > -- ie: it wouldn't make sense to put "es=rojo;fr=rouge" in the payload of > a term "red" when indexing a document, because you want those translations > for all instances of red -- not just that instance of red in that > singlular document. > > > > : Date: Mon, 28 Aug 2017 13:29:00 -0500 > : From: Webster Homer > : Reply-To: solr-user@lucene.apache.org > : To: solr-user@lucene.apache.org > : Subject: Re: Facet on a Payload field type? > : > : The issue is, that we lack translations for much of our attribute data. > We > : do have English versions. The idea is to use the English values for the > : faceted values and for the filters, but be able to retrieve different > : language versions of the term to the caller. > : If we have a facet on color if the value is red, be able to retrieve rojo > : for Spanish etc... > : > : Also users can switch regions between searches. If a user starts out in > : French, executes a search, selects a facet then switches to German they > : should get the German for the facet (if it exists) even when they > : originally used French. If all of the searching was in English where we > : have the data, we could then show French (or German etc) for the facet > : value. > : > : The real field value that we use for filtering would be in English but > the > : values returned to the user would be in the language of their locale or > : English if we don't have a translation for it. The idea being that the > : translations would be stored in the payloads > : > : On Wed, Aug 23, 2017 at 7:47 PM, Chris Hostetter < > hossman_luc...@fucit.org> > : wrote: > : > : > > : > : The payload idea was from my boss, it's similar to how they did this > in > : > : Endeca. > : > ... > : > : My alternate idea is to have sets of facet fields for different > : > languages, > : > : then let our service layer determine the correct one for the user's > : > : language, but I'm curious as to how others have solved this. > : > > : > Let's back up for a minute -- can you please explain your ultimate > goal, > : > from a "solr client application" perspective? (assuming we have no > : > knowledge of how/how you might have used Endeca in the past) > : > > : > What is it you want your application to be able to do when indexing > docs > : > to solr and making queries to solr? give us some real world examples > : > > : > > : > > : > (If i had to guess: i gather maybe you're just dealing with a > "keywords" > : > type field that you want to facet on -- and maybe you could use a diff > : > field for each langauge, or encode the langauges as a prefix on each > term > : > and use facet.prefix to restrict the facet contraints returned) > : > > : > > : > > : > https://people.apache.org/~hossman/#xyproblem > : > XY Problem > : > > : > Your question appears to be an "XY Problem" ... that is: you are > dealing > : > with "X", you are assuming "Y" will help you, and you are asking about > "Y" > : > without giving more details about the "X" so that we can understand the > : > full issue. Perhaps the best solution doesn't involve "Y" at all? > : > See Also: http://www.perlmonks.org/index.pl?node_id=542341 > : > > : > > : > > : > : > : > : On Wed, Aug 23, 2017 at 2:10 PM, Mar
RE: data import class not found
I just tried putting the solr-dataimporthandler-6.6.0.jar in server/solr/lib and I got past the problem. I still don't understand why not found in /dist -Original Message- From: Steve Pruitt [mailto:bpru...@opentext.com] Sent: Thursday, August 31, 2017 3:05 PM To: solr-user@lucene.apache.org Subject: [EXTERNAL] - data import class not found I still can't understand how Solr establishes the classpath. I have a custom entity processor that subclasses EntityProcessorBase. When I execute the /dataimport call I get java.lang.NoClassDefFoundError: org/apache/solr/handler/dataimport/EntityProcessorBase no matter how I state in solrconfig.xml to locate the solr-dataimporthandler jar. I have tried: from the existing libs in solrconfig.xml from the Ref Guide try anything But, I always get the class not found error. The DataImportHandler class is found when Solr starts, since EntityProcessorBase is in the same jar why is it not found. I have not tried putting in the core's lib thinking the above should work. Of course, the 3rd choice is only an experiment. Thanks. -S
data import class not found
I still can't understand how Solr establishes the classpath. I have a custom entity processor that subclasses EntityProcessorBase. When I execute the /dataimport call I get java.lang.NoClassDefFoundError: org/apache/solr/handler/dataimport/EntityProcessorBase no matter how I state in solrconfig.xml to locate the solr-dataimporthandler jar. I have tried: from the existing libs in solrconfig.xml from the Ref Guide try anything But, I always get the class not found error. The DataImportHandler class is found when Solr starts, since EntityProcessorBase is in the same jar why is it not found. I have not tried putting in the core's lib thinking the above should work. Of course, the 3rd choice is only an experiment. Thanks. -S
Re: Facet on a Payload field type?
ok, so lemme attempt to restate your objective to ensure no miscommunication: 1) you have fields like "color" 2) you want to index english words into the color field for docs 3) you want to search/filter against these fields using english words as input 4) you want to facet on the fields like "color" 5) you want the list of terms:counts displayed to the end user when faceting on these fields to be in a variety of different langauges, based on a "user_lang" option specified at query time and a set of known translations 6) if no english->user_lang translation is available for a particular term, you want to display the eglish workd when displaying the facet counts does that sound right? based on your objective, attempting to embed/encode the various translations into the terms when indexing (as payloads, or an alternative field or prefixed terms, etc...) seems like a vastly overcomplicated way to deal with this problem. If i were in your shoes, i would keep the translation aspect of the displya completely distinct from Solr, and after solr has returned the response then loop over the facet.field temrs and do a lookup in some other (cached) key/value translation mapping in your middle layer -- replacing the english word with the translation if it exists. This has the added benefit of allowing you to tweak the translations w/o reindexing any docs. Practically speaking: the idea of encoding these translations as payloads wouldn't make sense -- because payloads exist per *occurance* of the term -- ie: it wouldn't make sense to put "es=rojo;fr=rouge" in the payload of a term "red" when indexing a document, because you want those translations for all instances of red -- not just that instance of red in that singlular document. : Date: Mon, 28 Aug 2017 13:29:00 -0500 : From: Webster Homer : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: Facet on a Payload field type? : : The issue is, that we lack translations for much of our attribute data. We : do have English versions. The idea is to use the English values for the : faceted values and for the filters, but be able to retrieve different : language versions of the term to the caller. : If we have a facet on color if the value is red, be able to retrieve rojo : for Spanish etc... : : Also users can switch regions between searches. If a user starts out in : French, executes a search, selects a facet then switches to German they : should get the German for the facet (if it exists) even when they : originally used French. If all of the searching was in English where we : have the data, we could then show French (or German etc) for the facet : value. : : The real field value that we use for filtering would be in English but the : values returned to the user would be in the language of their locale or : English if we don't have a translation for it. The idea being that the : translations would be stored in the payloads : : On Wed, Aug 23, 2017 at 7:47 PM, Chris Hostetter : wrote: : : > : > : The payload idea was from my boss, it's similar to how they did this in : > : Endeca. : > ... : > : My alternate idea is to have sets of facet fields for different : > languages, : > : then let our service layer determine the correct one for the user's : > : language, but I'm curious as to how others have solved this. : > : > Let's back up for a minute -- can you please explain your ultimate goal, : > from a "solr client application" perspective? (assuming we have no : > knowledge of how/how you might have used Endeca in the past) : > : > What is it you want your application to be able to do when indexing docs : > to solr and making queries to solr? give us some real world examples : > : > : > : > (If i had to guess: i gather maybe you're just dealing with a "keywords" : > type field that you want to facet on -- and maybe you could use a diff : > field for each langauge, or encode the langauges as a prefix on each term : > and use facet.prefix to restrict the facet contraints returned) : > : > : > : > https://people.apache.org/~hossman/#xyproblem : > XY Problem : > : > Your question appears to be an "XY Problem" ... that is: you are dealing : > with "X", you are assuming "Y" will help you, and you are asking about "Y" : > without giving more details about the "X" so that we can understand the : > full issue. Perhaps the best solution doesn't involve "Y" at all? : > See Also: http://www.perlmonks.org/index.pl?node_id=542341 : > : > : > : > : : > : On Wed, Aug 23, 2017 at 2:10 PM, Markus Jelsma < : > markus.jel...@openindex.io> : > : wrote: : > : : > : > Technically they could, facetting is possible on TextField, but it : > would : > : > be useless for facetting. Payloads are only used for scoring via a : > custom : > : > Similarity. Payloads also can only contain one byte of information (or : > was : > : > it 64 bits?) : > : > : > : > Payloads are not something you want to use when dealing with : >
Re: query with wild card with AND taking lot of time
a field:* query always takes a long time, and should be avoided if at all possible. solr/lucene is still going to try to rank the documents based on that, even thought theres nothing to really rank. every single document where that field is not empty will have the same score for that part of the ranking. I dont know what the purpose of adding that in is in your case. On Thu, Aug 31, 2017 at 2:38 PM, Josh Lincoln wrote: > The closest thing to an execution plan that I know of is debug=true.That'll > show timings of some of the components > I also find it useful to add echoParams=all when troubleshooting. That'll > show every param solr is using for the request, including params set in > solrconfig.xml and not passed in the request. This can help explain the > debug output (e.g. what queryparser is being used, if fields are being > expanded through field aliases, etc.). > > On Thu, Aug 31, 2017 at 1:35 PM suresh pendap > wrote: > > > Hello everybody, > > > > We are seeing that the below query is running very slow and taking > almost 4 > > seconds to finish > > > > > > [] webapp=/solr path=/select > > > > params={df=_text_&distrib=false&fl=id&shards.purpose=4& > start=0&fsv=true&sort=modified_dtm+desc&shard.url=http:// > > :8983/solr/flat_product_index_shard7_replica1/ > %7Chttp://:8983/solr/flat_product_index_shard7_ > replica2/%7Chttp://:8983/solr/flat_product_index_ > shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+ > AND+abstract_or_primary_product_id:*+AND+(gtin:< > numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW= > 1504196301534&isShard=true&timeAllowed=25000&wt=javabin} > > hits=0 status=0 QTime=3663 > > > > > > It seems like the abstract_or_primary_product_id:* clause is > contributing > > to the overall response time. It seems that the > > abstract_or_primary_product_id:* . clause is not adding any value in the > > query criteria and can be safely removed. Is my understanding correct? > > > > I would like to know if the order of the clauses in the AND query would > > affect the response time of the query? > > > > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10 > > > > Doesn't Lucene/Solr pick up the optimal query execution plan? > > > > Is there anyway to look at the query execution plan generated by Lucene? > > > > Regards > > Suresh > > >
Re: query with wild card with AND taking lot of time
The closest thing to an execution plan that I know of is debug=true.That'll show timings of some of the components I also find it useful to add echoParams=all when troubleshooting. That'll show every param solr is using for the request, including params set in solrconfig.xml and not passed in the request. This can help explain the debug output (e.g. what queryparser is being used, if fields are being expanded through field aliases, etc.). On Thu, Aug 31, 2017 at 1:35 PM suresh pendap wrote: > Hello everybody, > > We are seeing that the below query is running very slow and taking almost 4 > seconds to finish > > > [] webapp=/solr path=/select > > params={df=_text_&distrib=false&fl=id&shards.purpose=4&start=0&fsv=true&sort=modified_dtm+desc&shard.url=http:// > :8983/solr/flat_product_index_shard7_replica1/%7Chttp://:8983/solr/flat_product_index_shard7_replica2/%7Chttp://:8983/solr/flat_product_index_shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+AND+abstract_or_primary_product_id:*+AND+(gtin:)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=1504196301534&isShard=true&timeAllowed=25000&wt=javabin} > hits=0 status=0 QTime=3663 > > > It seems like the abstract_or_primary_product_id:* clause is contributing > to the overall response time. It seems that the > abstract_or_primary_product_id:* . clause is not adding any value in the > query criteria and can be safely removed. Is my understanding correct? > > I would like to know if the order of the clauses in the AND query would > affect the response time of the query? > > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10 > > Doesn't Lucene/Solr pick up the optimal query execution plan? > > Is there anyway to look at the query execution plan generated by Lucene? > > Regards > Suresh >
Re: query with wild card with AND taking lot of time
Thanks Lincoln for your suggestions. It was very helpful. I am still curious as to why is the original query taking long time. It is something that Lucene should have ideally optimized. Is there any way to see the execution plan used by Lucene? Thanks Suresh On Thu, Aug 31, 2017 at 11:11 AM, Josh Lincoln wrote: > As I understand it, using a different fq for each clause makes the > resultant caches more likely to be used in future requests. > > For the query > fq=first:bob AND last:smith > a subsequent query for > fq=first:tim AND last:smith > won't be able to use the fq cache from the first query. > > However, if the first query was > fq=first:bob > fq=last:smith > and subsequently > fq=first:tim > fq=last:smith > then the second query will at least benefit from the last:smith cache > > Because fq clauses are always ANDed, this does not work for ORed clauses. > > I suppose if some conditions are frequently used together it may be better > to put them in the same fq so there's only one cache. E.g. if an ecommerce > site reqularly queried for featured:Y AND instock:Y > > On Thu, Aug 31, 2017 at 1:48 PM David Hastings < > hastings.recurs...@gmail.com> > wrote: > > > > > > > 2) Because all your clauses are more like filters and are ANDed > together, > > > you'll likely get better performance by putting them _each_ in an fq > > > E.g. > > > fq=product_identifier_type:DOTCOM_OFFER > > > fq=abstract_or_primary_product_id:[* TO *] > > > > > > why is this the case? is it just better to have no logic operators in > the > > filter queries? > > > > > > > > On Thu, Aug 31, 2017 at 1:47 PM, Josh Lincoln > > wrote: > > > > > Suresh, > > > Two things I noticed. > > > 1) If your intent is to only match records where there's something, > > > anything, in abstract_or_primary_product_id, you should use > fieldname:[* > > > TO > > > *] but that will exclude records where that field is empty/missing. If > > you > > > want to match records even if that field is empty/missing, then you > > should > > > remove that clause entirely > > > 2) Because all your clauses are more like filters and are ANDed > together, > > > you'll likely get better performance by putting them _each_ in an fq > > > E.g. > > > fq=product_identifier_type:DOTCOM_OFFER > > > fq=abstract_or_primary_product_id:[* TO *] > > > fq=gtin: > > > fq=product_class_type:BUNDLE > > > fq=hasProduct:N > > > > > > > > > On Thu, Aug 31, 2017 at 1:35 PM suresh pendap > > > > wrote: > > > > > > > Hello everybody, > > > > > > > > We are seeing that the below query is running very slow and taking > > > almost 4 > > > > seconds to finish > > > > > > > > > > > > [] webapp=/solr path=/select > > > > > > > > params={df=_text_&distrib=false&fl=id&shards.purpose=4& > > > start=0&fsv=true&sort=modified_dtm+desc&shard.url=http:// > > > > :8983/solr/flat_product_index_shard7_replica1/ > > > %7Chttp://:8983/solr/flat_product_index_shard7_ > > > replica2/%7Chttp://:8983/solr/flat_product_index_ > > > > > shard7_replica0/&rows=11&version=2&q=product_ > identifier_type:DOTCOM_OFFER+ > > > AND+abstract_or_primary_product_id:*+AND+(gtin:< > > > numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW= > > > 1504196301534&isShard=true&timeAllowed=25000&wt=javabin} > > > > hits=0 status=0 QTime=3663 > > > > > > > > > > > > It seems like the abstract_or_primary_product_id:* clause is > > > contributing > > > > to the overall response time. It seems that the > > > > abstract_or_primary_product_id:* . clause is not adding any value in > > the > > > > query criteria and can be safely removed. Is my understanding > correct? > > > > > > > > I would like to know if the order of the clauses in the AND query > would > > > > affect the response time of the query? > > > > > > > > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10 > > > > > > > > Doesn't Lucene/Solr pick up the optimal query execution plan? > > > > > > > > Is there anyway to look at the query execution plan generated by > > Lucene? > > > > > > > > Regards > > > > Suresh > > > > > > > > > >
Re: query with wild card with AND taking lot of time
As I understand it, using a different fq for each clause makes the resultant caches more likely to be used in future requests. For the query fq=first:bob AND last:smith a subsequent query for fq=first:tim AND last:smith won't be able to use the fq cache from the first query. However, if the first query was fq=first:bob fq=last:smith and subsequently fq=first:tim fq=last:smith then the second query will at least benefit from the last:smith cache Because fq clauses are always ANDed, this does not work for ORed clauses. I suppose if some conditions are frequently used together it may be better to put them in the same fq so there's only one cache. E.g. if an ecommerce site reqularly queried for featured:Y AND instock:Y On Thu, Aug 31, 2017 at 1:48 PM David Hastings wrote: > > > > 2) Because all your clauses are more like filters and are ANDed together, > > you'll likely get better performance by putting them _each_ in an fq > > E.g. > > fq=product_identifier_type:DOTCOM_OFFER > > fq=abstract_or_primary_product_id:[* TO *] > > > why is this the case? is it just better to have no logic operators in the > filter queries? > > > > On Thu, Aug 31, 2017 at 1:47 PM, Josh Lincoln > wrote: > > > Suresh, > > Two things I noticed. > > 1) If your intent is to only match records where there's something, > > anything, in abstract_or_primary_product_id, you should use fieldname:[* > > TO > > *] but that will exclude records where that field is empty/missing. If > you > > want to match records even if that field is empty/missing, then you > should > > remove that clause entirely > > 2) Because all your clauses are more like filters and are ANDed together, > > you'll likely get better performance by putting them _each_ in an fq > > E.g. > > fq=product_identifier_type:DOTCOM_OFFER > > fq=abstract_or_primary_product_id:[* TO *] > > fq=gtin: > > fq=product_class_type:BUNDLE > > fq=hasProduct:N > > > > > > On Thu, Aug 31, 2017 at 1:35 PM suresh pendap > > wrote: > > > > > Hello everybody, > > > > > > We are seeing that the below query is running very slow and taking > > almost 4 > > > seconds to finish > > > > > > > > > [] webapp=/solr path=/select > > > > > > params={df=_text_&distrib=false&fl=id&shards.purpose=4& > > start=0&fsv=true&sort=modified_dtm+desc&shard.url=http:// > > > :8983/solr/flat_product_index_shard7_replica1/ > > %7Chttp://:8983/solr/flat_product_index_shard7_ > > replica2/%7Chttp://:8983/solr/flat_product_index_ > > > shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+ > > AND+abstract_or_primary_product_id:*+AND+(gtin:< > > numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW= > > 1504196301534&isShard=true&timeAllowed=25000&wt=javabin} > > > hits=0 status=0 QTime=3663 > > > > > > > > > It seems like the abstract_or_primary_product_id:* clause is > > contributing > > > to the overall response time. It seems that the > > > abstract_or_primary_product_id:* . clause is not adding any value in > the > > > query criteria and can be safely removed. Is my understanding correct? > > > > > > I would like to know if the order of the clauses in the AND query would > > > affect the response time of the query? > > > > > > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10 > > > > > > Doesn't Lucene/Solr pick up the optimal query execution plan? > > > > > > Is there anyway to look at the query execution plan generated by > Lucene? > > > > > > Regards > > > Suresh > > > > > >
How to check if a solr core is down and is ready for a solr re-start
Hello Team, I have few experiences where restart of a solr node is the only option when a core goes down. I am trying to automate the restart of a solr server when a core goes down or the replica is unresponsive over a period of time. I have a script to check if the cores/ replicas associated with a node is up. I have two approaches - One is to get the cores from solr CLUSTERSTATUS API and do a PING on each core. If atleast one core on the node doesn't repond to ping, then mark that node down and do restart after few retries. Second is to get the cores from the solr CLUSTERSTATUS API along with its status. If the status is down, then mark that node down and do a restart after few retries. Which is the best way/ recommended approach to check if a core associated with a node is down and is ready for a solr service restart? Thanks!
Re: query with wild card with AND taking lot of time
> > 2) Because all your clauses are more like filters and are ANDed together, > you'll likely get better performance by putting them _each_ in an fq > E.g. > fq=product_identifier_type:DOTCOM_OFFER > fq=abstract_or_primary_product_id:[* TO *] why is this the case? is it just better to have no logic operators in the filter queries? On Thu, Aug 31, 2017 at 1:47 PM, Josh Lincoln wrote: > Suresh, > Two things I noticed. > 1) If your intent is to only match records where there's something, > anything, in abstract_or_primary_product_id, you should use fieldname:[* > TO > *] but that will exclude records where that field is empty/missing. If you > want to match records even if that field is empty/missing, then you should > remove that clause entirely > 2) Because all your clauses are more like filters and are ANDed together, > you'll likely get better performance by putting them _each_ in an fq > E.g. > fq=product_identifier_type:DOTCOM_OFFER > fq=abstract_or_primary_product_id:[* TO *] > fq=gtin: > fq=product_class_type:BUNDLE > fq=hasProduct:N > > > On Thu, Aug 31, 2017 at 1:35 PM suresh pendap > wrote: > > > Hello everybody, > > > > We are seeing that the below query is running very slow and taking > almost 4 > > seconds to finish > > > > > > [] webapp=/solr path=/select > > > > params={df=_text_&distrib=false&fl=id&shards.purpose=4& > start=0&fsv=true&sort=modified_dtm+desc&shard.url=http:// > > :8983/solr/flat_product_index_shard7_replica1/ > %7Chttp://:8983/solr/flat_product_index_shard7_ > replica2/%7Chttp://:8983/solr/flat_product_index_ > shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+ > AND+abstract_or_primary_product_id:*+AND+(gtin:< > numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW= > 1504196301534&isShard=true&timeAllowed=25000&wt=javabin} > > hits=0 status=0 QTime=3663 > > > > > > It seems like the abstract_or_primary_product_id:* clause is > contributing > > to the overall response time. It seems that the > > abstract_or_primary_product_id:* . clause is not adding any value in the > > query criteria and can be safely removed. Is my understanding correct? > > > > I would like to know if the order of the clauses in the AND query would > > affect the response time of the query? > > > > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10 > > > > Doesn't Lucene/Solr pick up the optimal query execution plan? > > > > Is there anyway to look at the query execution plan generated by Lucene? > > > > Regards > > Suresh > > >
Re: query with wild card with AND taking lot of time
Suresh, Two things I noticed. 1) If your intent is to only match records where there's something, anything, in abstract_or_primary_product_id, you should use fieldname:[* TO *] but that will exclude records where that field is empty/missing. If you want to match records even if that field is empty/missing, then you should remove that clause entirely 2) Because all your clauses are more like filters and are ANDed together, you'll likely get better performance by putting them _each_ in an fq E.g. fq=product_identifier_type:DOTCOM_OFFER fq=abstract_or_primary_product_id:[* TO *] fq=gtin: fq=product_class_type:BUNDLE fq=hasProduct:N On Thu, Aug 31, 2017 at 1:35 PM suresh pendap wrote: > Hello everybody, > > We are seeing that the below query is running very slow and taking almost 4 > seconds to finish > > > [] webapp=/solr path=/select > > params={df=_text_&distrib=false&fl=id&shards.purpose=4&start=0&fsv=true&sort=modified_dtm+desc&shard.url=http:// > :8983/solr/flat_product_index_shard7_replica1/%7Chttp://:8983/solr/flat_product_index_shard7_replica2/%7Chttp://:8983/solr/flat_product_index_shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+AND+abstract_or_primary_product_id:*+AND+(gtin:)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=1504196301534&isShard=true&timeAllowed=25000&wt=javabin} > hits=0 status=0 QTime=3663 > > > It seems like the abstract_or_primary_product_id:* clause is contributing > to the overall response time. It seems that the > abstract_or_primary_product_id:* . clause is not adding any value in the > query criteria and can be safely removed. Is my understanding correct? > > I would like to know if the order of the clauses in the AND query would > affect the response time of the query? > > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10 > > Doesn't Lucene/Solr pick up the optimal query execution plan? > > Is there anyway to look at the query execution plan generated by Lucene? > > Regards > Suresh >
query with wild card with AND taking lot of time
Hello everybody, We are seeing that the below query is running very slow and taking almost 4 seconds to finish [] webapp=/solr path=/select params={df=_text_&distrib=false&fl=id&shards.purpose=4&start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://:8983/solr/flat_product_index_shard7_replica1/%7Chttp://:8983/solr/flat_product_index_shard7_replica2/%7Chttp://:8983/solr/flat_product_index_shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+AND+abstract_or_primary_product_id:*+AND+(gtin:)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=1504196301534&isShard=true&timeAllowed=25000&wt=javabin} hits=0 status=0 QTime=3663 It seems like the abstract_or_primary_product_id:* clause is contributing to the overall response time. It seems that the abstract_or_primary_product_id:* . clause is not adding any value in the query criteria and can be safely removed. Is my understanding correct? I would like to know if the order of the clauses in the AND query would affect the response time of the query? For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10 Doesn't Lucene/Solr pick up the optimal query execution plan? Is there anyway to look at the query execution plan generated by Lucene? Regards Suresh
Re: slow solr facet processing
A possible improvement for some multiValued fields might be to use the "uif" facet method (UnInvertedField was the default method for multiValued fields in 4.x) I'm not sure if you would need to reindex without docValues on that field to try it though. Example: to enable on the "union" field, add f.union.facet.method=uif Support for this was added in https://issues.apache.org/jira/browse/SOLR-8466 -Yonik On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler wrote: > Hi, > > in the meantime I came across the reason for the slow facet processing > capacities of SOLR since version 5.x > > https://issues.apache.org/jira/browse/SOLR-8096 > https://issues.apache.org/jira/browse/LUCENE-5666 > > compared to version 4.x > > Various library networks across the world are suffering from the same > symptoms: > > Facet processing is one of the most important features of a search server > (for us) and it seems (at least IMHO) there is no solution for the issue > since March 2015 (release date for the last SOLR 4 version) > > What are the plans / ideas of the solr developers for a possible future > solution? Or maybe there is already a solution I haven't seen so far. > > Thanks for a feedback > > Günter > > > > On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote: >> >> Hi, >> >> I can't figure out the reason why the facet processing in version 6 needs >> significantly more time compared to version 4. >> >> The debugging response (for 30 million documents) >> >> solr 4 >> 280.0> name="query">0.0> name="time">280.0 >> (once the query is cached) >> before caching: between 1.5 and 2 sec >> >> >> solr 6.x (my last try was with 6.6) >> without docvalues for facetting fields (same schema as version 4) >> 5874.0> name="query">0.0> name="time">5873.0> name="time">0.0 >> the time is not getting better even after repeating the query several >> times >> >> >> solr 6.6 with docvalues for facetting fields >> 9837.0> name="query">0.0> name="time">9837.0> name="time">0.0 >> >> used query (our productive system with version 4) >> >> http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre=START_HILITE&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count >> >> >> Running the queries on smaller indices (8 million docs) the difference is >> similar although the absolut figures for processing time are smaller >> >> >> Any hints why this huge differences? >> >> Günter >> >> >> >> >> >> >> >> >> > > -- > Universität Basel > Universitätsbibliothek > Günter Hipler > Projekt SwissBib > Schoenbeinstrasse 18-20 > 4056 Basel, Schweiz > Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103 > E-Mail guenter.hip...@unibas.ch > URL: www.swissbib.org / http://www.ub.unibas.ch/ >
Re: "What is Solr" in Google search results
Well, isn't it always the same with Wikipedia? It's already there .. so it has to be correct. If you're trying to remove it, you have to prove it - but there is not even prove it should be there in the first place oO You really need to have time to go through that kind of argument ... -Stefan On Aug 31, 2017 4:37 PM, "Vincenzo D'Amore" wrote: Hi Rick, right, I've already tried to correct the wikipedia page, to be honest, I've just removed the sentence "Solr is the second-most... etc." But my change has been discarded because I missed to add a valid motivation. Anyway, not sure I'm the most representative person to discuss this in the wikipedia talk page :) but I'll try to do whatever I can And just to share with you my thought, my principal motivation is that even if DB Engines has a proven accuracy, the sentence in question has not be considered so relevant to explain what is Solr. For sure, it should be used as first one. On Thu, Aug 31, 2017 at 5:53 AM, Rick Leir wrote: > Vincenzo, > This is a discussion for the wikipedia 'talk' page. My sense is that > information must be verifiable, and that the popularity rating at > db-engines is not transparent. Would you like to start the discussion? > Cheers -- Rick > > On August 30, 2017 5:17:25 PM MDT, Vincenzo D'Amore > wrote: > >Hi All, > > > >googling for "what is Solr" I found this as *first* sentence: > > > >"Solr is the second-most popular enterprise search engine after > >Elasticsearch. ... " > > > >The description comes from wikipedia https://en. > >wikipedia.org/wiki/Apache_Solr > > > >Now, well, I'm a little upset, because I think this is a misleading > >description, this answer does not really... well, answer the question. > > > >And even... because Solr is not the first most popular :))) > > > >Ok, seriously, the first sentence (or the answer at all) should not > >define > >the position of the search engine in a list, in a kind of competition > >where > >Solr has the second place. > >If it is the first, the second or whatever most popular is not the > >right > >answer. > > > >So I want inform the community and search for an advice, if any, how to > >have a better description in the Google results page. > > > >If you have any comments or questions, please let me know. > > > >Best regards, > >Vincenzo > > > > > >-- > >Vincenzo D'Amore > >email: v.dam...@gmail.com > >skype: free.dev > >mobile: +39 349 8513251 <349%20851%203251> > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251 <349%20851%203251>
RE: "What is Solr" in Google search results
Wikipedia seems to be better now. Thank you, Peaceray. Honestly, though, by the numbers, I think the comment was correct. Elasticsearch has a much smoother on-ramp for IT developers, but it is much harder to customize relevancy and integrate with BigData pipelines. IT developers are the big voters here. Now, Google will simply index this thread, and then show different rich snippets to all of us here... -Original Message- From: Vincenzo D'Amore [mailto:v.dam...@gmail.com] Sent: Thursday, August 31, 2017 10:37 AM To: solr-user@lucene.apache.org Subject: Re: "What is Solr" in Google search results Hi Rick, right, I've already tried to correct the wikipedia page, to be honest, I've just removed the sentence "Solr is the second-most... etc." But my change has been discarded because I missed to add a valid motivation. Anyway, not sure I'm the most representative person to discuss this in the wikipedia talk page :) but I'll try to do whatever I can And just to share with you my thought, my principal motivation is that even if DB Engines has a proven accuracy, the sentence in question has not be considered so relevant to explain what is Solr. For sure, it should be used as first one. On Thu, Aug 31, 2017 at 5:53 AM, Rick Leir wrote: > Vincenzo, > This is a discussion for the wikipedia 'talk' page. My sense is that > information must be verifiable, and that the popularity rating at > db-engines is not transparent. Would you like to start the discussion? > Cheers -- Rick > > On August 30, 2017 5:17:25 PM MDT, Vincenzo D'Amore > > wrote: > >Hi All, > > > >googling for "what is Solr" I found this as *first* sentence: > > > >"Solr is the second-most popular enterprise search engine after > >Elasticsearch. ... " > > > >The description comes from wikipedia https://en. > >wikipedia.org/wiki/Apache_Solr > > > >Now, well, I'm a little upset, because I think this is a misleading > >description, this answer does not really... well, answer the question. > > > >And even... because Solr is not the first most popular :))) > > > >Ok, seriously, the first sentence (or the answer at all) should not > >define the position of the search engine in a list, in a kind of > >competition where Solr has the second place. > >If it is the first, the second or whatever most popular is not the > >right answer. > > > >So I want inform the community and search for an advice, if any, how > >to have a better description in the Google results page. > > > >If you have any comments or questions, please let me know. > > > >Best regards, > >Vincenzo > > > > > >-- > >Vincenzo D'Amore > >email: v.dam...@gmail.com > >skype: free.dev > >mobile: +39 349 8513251 <349%20851%203251> > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251 <349%20851%203251>
Re: Solr Reindex Issue - Can't able to Reindex Old Data
I have no idea where to even start. Have you looked at your Solr logs to see if there are helpful error messages? What is "reindex"? Something from some program you're running? 'cause it's not a field option for Solr schema field definitions, if you're putting that in a Solr schema I wouldn't even expect the core to initialize. You might review: https://wiki.apache.org/solr/UsingMailingLists Best, Erick On Wed, Aug 30, 2017 at 6:34 PM, @Nandan@ wrote: > Hi , > > I am using Apache Solr with Cassandra Database. In my table, I have 20 > rows. Due to some changes, I changed my Solr schema and Reindex schema with > below option as > > *reindex=true and deleteAll=false* > > After Reindexing my Solr Schema, I am not able to do reindex my old data > which are present in my table before. I am only able to retrieve newly > added data which is done after reindexing. > > Please help in this issue. > > Thanks
Re: Index relational database
To pile on here: When you denormalize you also get some functionality that you do not get with Solr joins, they've been called "pseudo joins" in Solr for a reason. If you just use the simple approach of indexing the two tables then joining across them you can't return fields from both tables in a single document. To do that you need to use parent/child docs which has its own restrictions. So rather than worry excessively about which is faster, I'd recommend you decide on the functionality you need as a starting point. Best, Erick On Thu, Aug 31, 2017 at 7:34 AM, Walter Underwood wrote: > There is no way tell which is faster without trying it. > > Query speed depends on the size of the data (rows), the complexity of the > join, which database, what kind of disk, etc. > > Solr speed depends on the size of the documents, the complexity of your > analysis chains, what kind of disk, how much CPU is available, etc. > > We have one query that extracts 9 million documents from MySQL in about 20 > minutes. We have another query on a different MySQL database that takes 90 > minutes to get 7 million documents. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Aug 31, 2017, at 12:54 AM, Renuka Srishti >> wrote: >> >> Thanks Erick, Walter >> But I think join query will reduce the performance. Denormalization will be >> the better way than join query, am I right? >> >> >> >> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood >> wrote: >> >>> Think about making a denormalized view, with all the fields needed in one >>> table. That view gets sent to Solr. Each row is a Solr document. >>> >>> It could be implemented as a view or as SQL, but that is a useful mental >>> model for people starting from a relational background. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> On Aug 30, 2017, at 9:14 AM, Erick Erickson >>> wrote: First, it's often best, by far, to denormalize the data in your solr >>> index, that's what I'd explore first. If you can't do that, the join query parser might work for you. On Aug 30, 2017 4:49 AM, "Renuka Srishti" wrote: > Thanks Susheel for your response. > Here is the scenario about which I am talking: > > - Let suppose there are two documents doc1 and doc2. > - I want to fetch the data from doc2 on the basis of doc1 fields which > are related to doc2. > > How to achieve this efficiently. > > > Thanks, > > Renuka Srishti > > > On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar > wrote: > >> Hello Renuka, >> >> I would suggest to start with your use case(s). May be start with your >> first use case with the below questions >> >> a) What is that you want to search (which fields like name, desc, city >> etc.) >> b) What is that you want to show part of search result (name, city >>> etc.) >> >> Based on above two questions, you would know what data to pull in from >> relational database and create solr schema and index the data. >> >> You may first try to denormalize / flatten the structure so that you >>> deal >> with one collection/schema and query upon it. >> >> HTH. >> >> Thanks, >> Susheel >> >> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < >> renuka.srisht...@gmail.com> >> wrote: >> >>> Hii, >>> >>> What is the best way to index relational database, and how it impacts > on >>> the performance? >>> >>> Thanks >>> Renuka Srishti >>> >> > >>> >>> >
slow solr facet processing
Hi, in the meantime I came across the reason for the slow facet processing capacities of SOLR since version 5.x https://issues.apache.org/jira/browse/SOLR-8096 https://issues.apache.org/jira/browse/LUCENE-5666 compared to version 4.x Various library networks across the world are suffering from the same symptoms: Facet processing is one of the most important features of a search server (for us) and it seems (at least IMHO) there is no solution for the issue since March 2015 (release date for the last SOLR 4 version) What are the plans / ideas of the solr developers for a possible future solution? Or maybe there is already a solution I haven't seen so far. Thanks for a feedback Günter On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote: Hi, I can't figure out the reason why the facet processing in version 6 needs significantly more time compared to version 4. The debugging response (for 30 million documents) solr 4 280.0name="query">0.0name="facet">280.0 (once the query is cached) before caching: between 1.5 and 2 sec solr 6.x (my last try was with 6.6) without docvalues for facetting fields (same schema as version 4) 5874.0name="query">0.0name="facet">5873.0name="facet_module">0.0 the time is not getting better even after repeating the query several times solr 6.6 with docvalues for facetting fields 9837.0name="query">0.0name="facet">9837.0name="facet_module">0.0 used query (our productive system with version 4) http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre=START_HILITE&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count Running the queries on smaller indices (8 million docs) the difference is similar although the absolut figures for processing time are smaller Any hints why this huge differences? Günter -- Universität Basel Universitätsbibliothek Günter Hipler Projekt SwissBib Schoenbeinstrasse 18-20 4056 Basel, Schweiz Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103 E-Mail guenter.hip...@unibas.ch URL: www.swissbib.org / http://www.ub.unibas.ch/
Re: "What is Solr" in Google search results
Hi Rick, right, I've already tried to correct the wikipedia page, to be honest, I've just removed the sentence "Solr is the second-most... etc." But my change has been discarded because I missed to add a valid motivation. Anyway, not sure I'm the most representative person to discuss this in the wikipedia talk page :) but I'll try to do whatever I can And just to share with you my thought, my principal motivation is that even if DB Engines has a proven accuracy, the sentence in question has not be considered so relevant to explain what is Solr. For sure, it should be used as first one. On Thu, Aug 31, 2017 at 5:53 AM, Rick Leir wrote: > Vincenzo, > This is a discussion for the wikipedia 'talk' page. My sense is that > information must be verifiable, and that the popularity rating at > db-engines is not transparent. Would you like to start the discussion? > Cheers -- Rick > > On August 30, 2017 5:17:25 PM MDT, Vincenzo D'Amore > wrote: > >Hi All, > > > >googling for "what is Solr" I found this as *first* sentence: > > > >"Solr is the second-most popular enterprise search engine after > >Elasticsearch. ... " > > > >The description comes from wikipedia https://en. > >wikipedia.org/wiki/Apache_Solr > > > >Now, well, I'm a little upset, because I think this is a misleading > >description, this answer does not really... well, answer the question. > > > >And even... because Solr is not the first most popular :))) > > > >Ok, seriously, the first sentence (or the answer at all) should not > >define > >the position of the search engine in a list, in a kind of competition > >where > >Solr has the second place. > >If it is the first, the second or whatever most popular is not the > >right > >answer. > > > >So I want inform the community and search for an advice, if any, how to > >have a better description in the Google results page. > > > >If you have any comments or questions, please let me know. > > > >Best regards, > >Vincenzo > > > > > >-- > >Vincenzo D'Amore > >email: v.dam...@gmail.com > >skype: free.dev > >mobile: +39 349 8513251 <349%20851%203251> > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251 <349%20851%203251>
Re: Index relational database
There is no way tell which is faster without trying it. Query speed depends on the size of the data (rows), the complexity of the join, which database, what kind of disk, etc. Solr speed depends on the size of the documents, the complexity of your analysis chains, what kind of disk, how much CPU is available, etc. We have one query that extracts 9 million documents from MySQL in about 20 minutes. We have another query on a different MySQL database that takes 90 minutes to get 7 million documents. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 31, 2017, at 12:54 AM, Renuka Srishti > wrote: > > Thanks Erick, Walter > But I think join query will reduce the performance. Denormalization will be > the better way than join query, am I right? > > > > On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood > wrote: > >> Think about making a denormalized view, with all the fields needed in one >> table. That view gets sent to Solr. Each row is a Solr document. >> >> It could be implemented as a view or as SQL, but that is a useful mental >> model for people starting from a relational background. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >>> On Aug 30, 2017, at 9:14 AM, Erick Erickson >> wrote: >>> >>> First, it's often best, by far, to denormalize the data in your solr >> index, >>> that's what I'd explore first. >>> >>> If you can't do that, the join query parser might work for you. >>> >>> On Aug 30, 2017 4:49 AM, "Renuka Srishti" >>> wrote: >>> Thanks Susheel for your response. Here is the scenario about which I am talking: - Let suppose there are two documents doc1 and doc2. - I want to fetch the data from doc2 on the basis of doc1 fields which are related to doc2. How to achieve this efficiently. Thanks, Renuka Srishti On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar wrote: > Hello Renuka, > > I would suggest to start with your use case(s). May be start with your > first use case with the below questions > > a) What is that you want to search (which fields like name, desc, city > etc.) > b) What is that you want to show part of search result (name, city >> etc.) > > Based on above two questions, you would know what data to pull in from > relational database and create solr schema and index the data. > > You may first try to denormalize / flatten the structure so that you >> deal > with one collection/schema and query upon it. > > HTH. > > Thanks, > Susheel > > On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < > renuka.srisht...@gmail.com> > wrote: > >> Hii, >> >> What is the best way to index relational database, and how it impacts on >> the performance? >> >> Thanks >> Renuka Srishti >> > >> >>
Re: Solr index getting replaced instead of merged
>Can anyone tell is it possible to paginate the data using Solr UI? use the start/rows input fields using standard array start as 0, ie start=0, rows=10 start=10, rows=10 start=20, rows=10 On Thu, Aug 31, 2017 at 8:21 AM, Agrawal, Harshal (GE Digital) < harshal.agra...@ge.com> wrote: > Hello All, > > If I check out clear option while indexing 2nd table it worked.Thanks > Gurdeep :) > Can anyone tell is it possible to paginate the data using Solr UI? > If yes please tell me the features which I can use? > > Regards > Harshal > > From: Agrawal, Harshal (GE Digital) > Sent: Wednesday, August 30, 2017 4:36 PM > To: 'solr-user@lucene.apache.org' > Cc: Singh, Susnata (GE Digital) > Subject: Solr index getting replaced instead of merged > > Hello Guys, > > I have installed solr in my local system and was able to connect to > Teradata successfully. > For single table I am able to index the data and query it also but when I > am trying for multiple tables in the same schema and doing indexing one by > one respectively. > I can see datasets getting replaced instead of merged . > > Can anyone help me please: > > Regards > Harshal > > >
RE: Solr index getting replaced instead of merged
Hello All, If I check out clear option while indexing 2nd table it worked.Thanks Gurdeep :) Can anyone tell is it possible to paginate the data using Solr UI? If yes please tell me the features which I can use? Regards Harshal From: Agrawal, Harshal (GE Digital) Sent: Wednesday, August 30, 2017 4:36 PM To: 'solr-user@lucene.apache.org' Cc: Singh, Susnata (GE Digital) Subject: Solr index getting replaced instead of merged Hello Guys, I have installed solr in my local system and was able to connect to Teradata successfully. For single table I am able to index the data and query it also but when I am trying for multiple tables in the same schema and doing indexing one by one respectively. I can see datasets getting replaced instead of merged . Can anyone help me please: Regards Harshal
Re: Index relational database
when indexing a relational database its generally always best to denormalize it in a view or in your indexing code On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti wrote: > Thanks Erick, Walter > But I think join query will reduce the performance. Denormalization will be > the better way than join query, am I right? > > > > On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood > wrote: > > > Think about making a denormalized view, with all the fields needed in one > > table. That view gets sent to Solr. Each row is a Solr document. > > > > It could be implemented as a view or as SQL, but that is a useful mental > > model for people starting from a relational background. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > > > > On Aug 30, 2017, at 9:14 AM, Erick Erickson > > wrote: > > > > > > First, it's often best, by far, to denormalize the data in your solr > > index, > > > that's what I'd explore first. > > > > > > If you can't do that, the join query parser might work for you. > > > > > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" > > > wrote: > > > > > >> Thanks Susheel for your response. > > >> Here is the scenario about which I am talking: > > >> > > >> - Let suppose there are two documents doc1 and doc2. > > >> - I want to fetch the data from doc2 on the basis of doc1 fields > which > > >> are related to doc2. > > >> > > >> How to achieve this efficiently. > > >> > > >> > > >> Thanks, > > >> > > >> Renuka Srishti > > >> > > >> > > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar > > > >> wrote: > > >> > > >>> Hello Renuka, > > >>> > > >>> I would suggest to start with your use case(s). May be start with > your > > >>> first use case with the below questions > > >>> > > >>> a) What is that you want to search (which fields like name, desc, > city > > >>> etc.) > > >>> b) What is that you want to show part of search result (name, city > > etc.) > > >>> > > >>> Based on above two questions, you would know what data to pull in > from > > >>> relational database and create solr schema and index the data. > > >>> > > >>> You may first try to denormalize / flatten the structure so that you > > deal > > >>> with one collection/schema and query upon it. > > >>> > > >>> HTH. > > >>> > > >>> Thanks, > > >>> Susheel > > >>> > > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < > > >>> renuka.srisht...@gmail.com> > > >>> wrote: > > >>> > > Hii, > > > > What is the best way to index relational database, and how it > impacts > > >> on > > the performance? > > > > Thanks > > Renuka Srishti > > > > >>> > > >> > > > > >
Re: Index relational database
Thank all for sharing your thoughts :) On Thu, Aug 31, 2017 at 5:28 PM, Susheel Kumar wrote: > Yes, if you can avoid join and work with flat/denormalized structure then > that's the best. > > On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti < > renuka.srisht...@gmail.com> > wrote: > > > Thanks Erick, Walter > > But I think join query will reduce the performance. Denormalization will > be > > the better way than join query, am I right? > > > > > > > > On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood < > wun...@wunderwood.org> > > wrote: > > > > > Think about making a denormalized view, with all the fields needed in > one > > > table. That view gets sent to Solr. Each row is a Solr document. > > > > > > It could be implemented as a view or as SQL, but that is a useful > mental > > > model for people starting from a relational background. > > > > > > wunder > > > Walter Underwood > > > wun...@wunderwood.org > > > http://observer.wunderwood.org/ (my blog) > > > > > > > > > > On Aug 30, 2017, at 9:14 AM, Erick Erickson > > > > wrote: > > > > > > > > First, it's often best, by far, to denormalize the data in your solr > > > index, > > > > that's what I'd explore first. > > > > > > > > If you can't do that, the join query parser might work for you. > > > > > > > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" < > renuka.srisht...@gmail.com> > > > > wrote: > > > > > > > >> Thanks Susheel for your response. > > > >> Here is the scenario about which I am talking: > > > >> > > > >> - Let suppose there are two documents doc1 and doc2. > > > >> - I want to fetch the data from doc2 on the basis of doc1 fields > > which > > > >> are related to doc2. > > > >> > > > >> How to achieve this efficiently. > > > >> > > > >> > > > >> Thanks, > > > >> > > > >> Renuka Srishti > > > >> > > > >> > > > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar < > susheel2...@gmail.com > > > > > > >> wrote: > > > >> > > > >>> Hello Renuka, > > > >>> > > > >>> I would suggest to start with your use case(s). May be start with > > your > > > >>> first use case with the below questions > > > >>> > > > >>> a) What is that you want to search (which fields like name, desc, > > city > > > >>> etc.) > > > >>> b) What is that you want to show part of search result (name, city > > > etc.) > > > >>> > > > >>> Based on above two questions, you would know what data to pull in > > from > > > >>> relational database and create solr schema and index the data. > > > >>> > > > >>> You may first try to denormalize / flatten the structure so that > you > > > deal > > > >>> with one collection/schema and query upon it. > > > >>> > > > >>> HTH. > > > >>> > > > >>> Thanks, > > > >>> Susheel > > > >>> > > > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < > > > >>> renuka.srisht...@gmail.com> > > > >>> wrote: > > > >>> > > > Hii, > > > > > > What is the best way to index relational database, and how it > > impacts > > > >> on > > > the performance? > > > > > > Thanks > > > Renuka Srishti > > > > > > >>> > > > >> > > > > > > > > >
Re: Index relational database
Yes, if you can avoid join and work with flat/denormalized structure then that's the best. On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti wrote: > Thanks Erick, Walter > But I think join query will reduce the performance. Denormalization will be > the better way than join query, am I right? > > > > On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood > wrote: > > > Think about making a denormalized view, with all the fields needed in one > > table. That view gets sent to Solr. Each row is a Solr document. > > > > It could be implemented as a view or as SQL, but that is a useful mental > > model for people starting from a relational background. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > > > > On Aug 30, 2017, at 9:14 AM, Erick Erickson > > wrote: > > > > > > First, it's often best, by far, to denormalize the data in your solr > > index, > > > that's what I'd explore first. > > > > > > If you can't do that, the join query parser might work for you. > > > > > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" > > > wrote: > > > > > >> Thanks Susheel for your response. > > >> Here is the scenario about which I am talking: > > >> > > >> - Let suppose there are two documents doc1 and doc2. > > >> - I want to fetch the data from doc2 on the basis of doc1 fields > which > > >> are related to doc2. > > >> > > >> How to achieve this efficiently. > > >> > > >> > > >> Thanks, > > >> > > >> Renuka Srishti > > >> > > >> > > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar > > > >> wrote: > > >> > > >>> Hello Renuka, > > >>> > > >>> I would suggest to start with your use case(s). May be start with > your > > >>> first use case with the below questions > > >>> > > >>> a) What is that you want to search (which fields like name, desc, > city > > >>> etc.) > > >>> b) What is that you want to show part of search result (name, city > > etc.) > > >>> > > >>> Based on above two questions, you would know what data to pull in > from > > >>> relational database and create solr schema and index the data. > > >>> > > >>> You may first try to denormalize / flatten the structure so that you > > deal > > >>> with one collection/schema and query upon it. > > >>> > > >>> HTH. > > >>> > > >>> Thanks, > > >>> Susheel > > >>> > > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < > > >>> renuka.srisht...@gmail.com> > > >>> wrote: > > >>> > > Hii, > > > > What is the best way to index relational database, and how it > impacts > > >> on > > the performance? > > > > Thanks > > Renuka Srishti > > > > >>> > > >> > > > > >
Antwort: RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.
Hi Markus, I don't know what Client you use, but if you are using SolrJ enabling the logging could be an option to "dig deeper" into the problem. This can be the ouput for example via log4j on log level info: ... 2017-08-31 10:01:56 INFO ZooKeeper:438 - Initiating client connection, connectString=ZKHOST1:9983,ZKHOST2:9983,ZKHOST3:9983,ZKHOST4:9983,ZKHOST5:9983 sessionTimeout=60 watcher=org.apache.solr.common.cloud.SolrZkClient$3@14379273 2017-08-31 10:01:56 INFO ClientCnxn:876 - Socket connection established to SOLRHOST/ZKHOST3:9983, initiating session 2017-08-31 10:01:56 INFO ClientCnxn:1299 - Session establishment complete on server SOLRHOST/ZKHOST3:9983, sessionid = 0x45e35eaa9fd3584, negotiated timeout = 4 2017-08-31 10:01:56 INFO ZkStateReader:688 - Updated live nodes from ZooKeeper... (0) -> (4) 2017-08-31 10:01:56 INFO ZkClientClusterStateProvider:134 - Cluster at ZKHOST1:9983,ZKHOST2:9983,ZKHOST3:9983,ZKHOST4:9983,ZKHOST5:9983 ready Von:Markus Jelsma An: solr-user@lucene.apache.org Datum: 31.08.2017 10:00 Betreff:RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled. Hello Stephan, I know that restarting stuff can sometimes cure what's wrong, but we are nog going to, we want to get rid of the problem, not restart microsoft windows whenever things run slow. Also, there is no indexing going on right now. We also see these sometimes, this explains at least why it cannot talk to Zookeeper, but why.. o.a.s.c.RecoveryStrategy Socket timeout on send prep recovery cmd, retrying.. This has been going on with just one of our nodes for over two hours, other nodes are fine. And why is this bad node green in cluster overview? Thanks, Markus -Original message- > From:Stephan Schubert > Sent: Thursday 31st August 2017 9:52 > To: solr-user@lucene.apache.org > Subject: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled. > > Hi markus, > > try to stop your indexing/update processes and restart your ZooKeeper > instances (not all at the same time of course). This is what I do in these > cases and helped me so far. > > > > > Von:Markus Jelsma > An: Solr-user > Datum: 31.08.2017 09:49 > Betreff:6.6 Cannot talk to ZooKeeper - Updates are disabled. > > > > Hello, > > One node is behaving badly, at least according to the logs, but the node > is green in the cluster overview although the logs claim recovery fails > all the time. It is not the first time this message pops up in the logs of > one of the nodes, why can it not talk to Zookeeper? I miss a reason. > > The cluster is not extremely busy at the moment, we allow plenty of file > descriptors, there are no firewall restrictions, i cannot think of any > problem in our infrastructure. > > What's going on? What can i do? Can the error be explained a bit further? > > Thanks, > Markus > > 8/31/2017, 9:34:34 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > 8/31/2017, 9:34:34 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > 8/31/2017, 9:34:36 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > 8/31/2017, 9:34:38 AM > ERROR false > RecoveryStrategy > Could not publish as ACTIVE after succesful recovery > 8/31/2017, 9:34:38 AM > ERROR true > RecoveryStrategy > Recovery failed - trying again... (0) > 8/31/2017, 9:34:49 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > 8/31/2017, 9:34:49 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > 8/31/2017, 9:34:50 AM > ERROR false > RecoveryStrategy > Could not publish as ACTIVE after succesful recovery > 8/31/2017, 9:34:50 AM > ERROR false > RecoveryStrategy > Recovery failed - trying again... (1) > 8/31/2017, 9:35:36 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > > > >
RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.
Hello Stephan, I know that restarting stuff can sometimes cure what's wrong, but we are nog going to, we want to get rid of the problem, not restart microsoft windows whenever things run slow. Also, there is no indexing going on right now. We also see these sometimes, this explains at least why it cannot talk to Zookeeper, but why.. o.a.s.c.RecoveryStrategy Socket timeout on send prep recovery cmd, retrying.. This has been going on with just one of our nodes for over two hours, other nodes are fine. And why is this bad node green in cluster overview? Thanks, Markus -Original message- > From:Stephan Schubert > Sent: Thursday 31st August 2017 9:52 > To: solr-user@lucene.apache.org > Subject: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled. > > Hi markus, > > try to stop your indexing/update processes and restart your ZooKeeper > instances (not all at the same time of course). This is what I do in these > cases and helped me so far. > > > > > Von:Markus Jelsma > An: Solr-user > Datum: 31.08.2017 09:49 > Betreff:6.6 Cannot talk to ZooKeeper - Updates are disabled. > > > > Hello, > > One node is behaving badly, at least according to the logs, but the node > is green in the cluster overview although the logs claim recovery fails > all the time. It is not the first time this message pops up in the logs of > one of the nodes, why can it not talk to Zookeeper? I miss a reason. > > The cluster is not extremely busy at the moment, we allow plenty of file > descriptors, there are no firewall restrictions, i cannot think of any > problem in our infrastructure. > > What's going on? What can i do? Can the error be explained a bit further? > > Thanks, > Markus > > 8/31/2017, 9:34:34 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > 8/31/2017, 9:34:34 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > 8/31/2017, 9:34:36 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > 8/31/2017, 9:34:38 AM > ERROR false > RecoveryStrategy > Could not publish as ACTIVE after succesful recovery > 8/31/2017, 9:34:38 AM > ERROR true > RecoveryStrategy > Recovery failed - trying again... (0) > 8/31/2017, 9:34:49 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > 8/31/2017, 9:34:49 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > 8/31/2017, 9:34:50 AM > ERROR false > RecoveryStrategy > Could not publish as ACTIVE after succesful recovery > 8/31/2017, 9:34:50 AM > ERROR false > RecoveryStrategy > Recovery failed - trying again... (1) > 8/31/2017, 9:35:36 AM > ERROR false > RequestHandlerBase > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates > are disabled. > > > >
Re: Index relational database
Thanks Erick, Walter But I think join query will reduce the performance. Denormalization will be the better way than join query, am I right? On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood wrote: > Think about making a denormalized view, with all the fields needed in one > table. That view gets sent to Solr. Each row is a Solr document. > > It could be implemented as a view or as SQL, but that is a useful mental > model for people starting from a relational background. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Aug 30, 2017, at 9:14 AM, Erick Erickson > wrote: > > > > First, it's often best, by far, to denormalize the data in your solr > index, > > that's what I'd explore first. > > > > If you can't do that, the join query parser might work for you. > > > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" > > wrote: > > > >> Thanks Susheel for your response. > >> Here is the scenario about which I am talking: > >> > >> - Let suppose there are two documents doc1 and doc2. > >> - I want to fetch the data from doc2 on the basis of doc1 fields which > >> are related to doc2. > >> > >> How to achieve this efficiently. > >> > >> > >> Thanks, > >> > >> Renuka Srishti > >> > >> > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar > >> wrote: > >> > >>> Hello Renuka, > >>> > >>> I would suggest to start with your use case(s). May be start with your > >>> first use case with the below questions > >>> > >>> a) What is that you want to search (which fields like name, desc, city > >>> etc.) > >>> b) What is that you want to show part of search result (name, city > etc.) > >>> > >>> Based on above two questions, you would know what data to pull in from > >>> relational database and create solr schema and index the data. > >>> > >>> You may first try to denormalize / flatten the structure so that you > deal > >>> with one collection/schema and query upon it. > >>> > >>> HTH. > >>> > >>> Thanks, > >>> Susheel > >>> > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < > >>> renuka.srisht...@gmail.com> > >>> wrote: > >>> > Hii, > > What is the best way to index relational database, and how it impacts > >> on > the performance? > > Thanks > Renuka Srishti > > >>> > >> > >
Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.
Hi markus, try to stop your indexing/update processes and restart your ZooKeeper instances (not all at the same time of course). This is what I do in these cases and helped me so far. Von:Markus Jelsma An: Solr-user Datum: 31.08.2017 09:49 Betreff:6.6 Cannot talk to ZooKeeper - Updates are disabled. Hello, One node is behaving badly, at least according to the logs, but the node is green in the cluster overview although the logs claim recovery fails all the time. It is not the first time this message pops up in the logs of one of the nodes, why can it not talk to Zookeeper? I miss a reason. The cluster is not extremely busy at the moment, we allow plenty of file descriptors, there are no firewall restrictions, i cannot think of any problem in our infrastructure. What's going on? What can i do? Can the error be explained a bit further? Thanks, Markus 8/31/2017, 9:34:34 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. 8/31/2017, 9:34:34 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. 8/31/2017, 9:34:36 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. 8/31/2017, 9:34:38 AM ERROR false RecoveryStrategy Could not publish as ACTIVE after succesful recovery 8/31/2017, 9:34:38 AM ERROR true RecoveryStrategy Recovery failed - trying again... (0) 8/31/2017, 9:34:49 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. 8/31/2017, 9:34:49 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. 8/31/2017, 9:34:50 AM ERROR false RecoveryStrategy Could not publish as ACTIVE after succesful recovery 8/31/2017, 9:34:50 AM ERROR false RecoveryStrategy Recovery failed - trying again... (1) 8/31/2017, 9:35:36 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled.
6.6 Cannot talk to ZooKeeper - Updates are disabled.
Hello, One node is behaving badly, at least according to the logs, but the node is green in the cluster overview although the logs claim recovery fails all the time. It is not the first time this message pops up in the logs of one of the nodes, why can it not talk to Zookeeper? I miss a reason. The cluster is not extremely busy at the moment, we allow plenty of file descriptors, there are no firewall restrictions, i cannot think of any problem in our infrastructure. What's going on? What can i do? Can the error be explained a bit further? Thanks, Markus 8/31/2017, 9:34:34 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. 8/31/2017, 9:34:34 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. 8/31/2017, 9:34:36 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. 8/31/2017, 9:34:38 AM ERROR false RecoveryStrategy Could not publish as ACTIVE after succesful recovery 8/31/2017, 9:34:38 AM ERROR true RecoveryStrategy Recovery failed - trying again... (0) 8/31/2017, 9:34:49 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. 8/31/2017, 9:34:49 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. 8/31/2017, 9:34:50 AM ERROR false RecoveryStrategy Could not publish as ACTIVE after succesful recovery 8/31/2017, 9:34:50 AM ERROR false RecoveryStrategy Recovery failed - trying again... (1) 8/31/2017, 9:35:36 AM ERROR false RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled.
Antwort: Re: Bug in Solr 6.6.0? "Cannot change DocValues type from SORTED_SET to SORTED"
No, not really. I'm wondering why on both Servers (each Collection has a replica on a different server) doesn't have a problem with it in 6.5., but if I update on Solr 6.6. on both instances the error occured. As I already said, If i revert the instances back to 6.5. everything seems to be fine. This is a little bit strange in my opinion ;) Von:Erick Erickson An: solr-user Datum: 30.08.2017 21:34 Betreff:Re: Bug in Solr 6.6.0? "Cannot change DocValues type from SORTED_SET to SORTED" P.S. Perhaps the defaults changed when you upgraded for some reason? Erick On Wed, Aug 30, 2017 at 11:15 AM, Erick Erickson wrote: > This usually means you changed multiValued from true to false or vice > versa then added more docs. > > So since each segment is its own "mini index", different segments have > different expectations and when you query this error is thrown. > > Most of the time when you change a field's type in the schema you have > to re-index from scratch. And I'd delete *:* first (or just use a new > collection and alias). > > Best, > Erick > > On Wed, Aug 30, 2017 at 10:04 AM, Stephan Schubert > wrote: >> After I tried an update from Solr 6.5.0 to Solr 6.6.0 (SolrCloud mode), I >> receive in one collection the following error: >> >> "Cannot change DocValues type from SORTED_SET to SORTED for field >> "index_todelete". >> >> I had a look on the index values (if set all are true or not filled, >> checked via faceting in the working instance) and I can't see anything >> special issues on this field. In the case I move back to Solr 6.5.0 the >> Solr collection is coming up normal with the same set of index data. So I >> assume there was any change in 6.6.0 but couldn't find anything in the >> release notes nor in any known issues in JIRA. >> >> Does anyone have an idea what's going on here? The field even doesn't have >> docValues set or multivalued, so I don't understand the error message >> here. >> >> Configuration in schema.xml: >> > stored="true" type="boolean"/> >> >> >> Error Log: >> java.util.concurrent.ExecutionException: >> org.apache.solr.common.SolrException: Unable to create core >> [GLOBAL-Fileshares-Index_shard1_replica2] >> at >> java.util.concurrent.FutureTask.report(FutureTask.java:122) >> at >> java.util.concurrent.FutureTask.get(FutureTask.java:192) >> at >> org.apache.solr.core.CoreContainer.lambda$load$6(CoreContainer.java:586) >> at >> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> at >> java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: org.apache.solr.common.SolrException: Unable to create core >> [GLOBAL-Fileshares-Index_shard1_replica2] >> at >> org.apache.solr.core.CoreContainer.create(CoreContainer.java:935) >> at >> org.apache.solr.core.CoreContainer.lambda$load$5(CoreContainer.java:558) >> at >> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197) >> ... 5 more >> Caused by: org.apache.solr.common.SolrException: Error opening new >> searcher >> at >> org.apache.solr.core.SolrCore.(SolrCore.java:977) >> at >> org.apache.solr.core.SolrCore.(SolrCore.java:830) >> at >> org.apache.solr.core.CoreContainer.create(CoreContainer.java:920) >> ... 7 more >> Caused by: org.apache.solr.common.SolrException: Error opening new >> searcher >> at >> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2069) >> at >> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189) >> at >> org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1071) >> at >> org.apache.solr.core.SolrCore.(SolrCore.java:949) >> ... 9 more >> Caused by: java.lang.IllegalArgumentException: cannot change DocValues >> type from SORTED_SET to SORTED for field "index_todelete" >> at >> org.apache.lucene.index.FieldInfo.setDocValuesType(FieldInfo.java:212) >> at >> org.apache.lucene.index.FieldInfos$Builder.addOrUpdateInternal(FieldInfos.java:430) >> at >> org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:438) >> at >> org.apache.lucene.index.FieldInfos$B