Problem with DIH
Hi I 'm using DIH for updating my core. I 'm using store procedure for doing a full/ delta imports. In order to avoid running delta imports for a long time, i limit the rows returned to a max of 100,000 rows at a given time. On an average the delta import runs for less than 1 minute. For the last couple of days I have been noticing that my delta imports has been running for couple of hours and tries to update all the records in the core. I 'm not sure why that has been happening. I cant reproduce this event all the time, it happens randomly. Has anyone noticed this kind of behavior. And secondly are there any solr logs that will tell me what is getting updated or what exactly is happening at the DIH ? Any suggestion appreciated. Document size: 20 million Solr 4.9 3 Nodes in the solr cloud. Thanks J
Re: Solr Synonyms, Escape space in case of multi words
Hi David, I think you should have the filter class with tokenizer specified. [As shown below] filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true *tokenizerFactory=solr.KeywordTokenizerFactory/* So your field type should be as shown below: fieldType name=text_syn class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ /analyzer /fieldType On Wed, Oct 15, 2014 at 7:25 PM, David Philip davidphilipshe...@gmail.com wrote: Sorry, analysis page clip is getting trimmed off and hence the indention is lost. Here it is : ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz| care expected: ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz| *ride care* On Wed, Oct 15, 2014 at 7:21 PM, David Philip davidphilipshe...@gmail.com wrote: contd.. expectation was that the ride care should not have split into two tokens. It should have been as below. Please correct me/point me where I am wrong. Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care o/p ridemakersrideridemakerzrideridemarkridemakersmakerz *ride care* On Wed, Oct 15, 2014 at 7:16 PM, David Philip davidphilipshe...@gmail.com wrote: Hi All, I remember using multi-words in synonyms in Solr 3.x version. In case of multi words, I was escaping space with back slash[\] and it work as intended. Ex: ride\ makers, riders, rider\ guards. Each one mapped to each other and so when I searched for ride makers, I obtained the search results for all of them. The field type was same as below. I have same set up in solr 4.10 but now the multi word space escape is getting ignored. It is tokenizing on spaces. synonyms.txt ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care Analysis page: ridemakersrideridemakerzrideridemarkridemakersmakerzcare Field Type fieldType name=text_syn class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ /analyzer /fieldType Could you please tell me what could be the issue? How do I handle multi-word cases? synonyms.txt ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care Thanks - David
Re: Get Data under Highlight Json value pair
How can I get the value , is there any option at query syntax, current I used h1.on and h1.fl= List of Fields highlighting:{ :{ CatalogSearch_en-US:[ em VCM /em ], Name_en-US:[ em VCM /em em TO /em em LAPTOP /em CABLE], Description_en-US:[.\n em VCM /em ( em Vehicle Communication Module /em ) / VMM ( em Vehicle /em Measurement em Module /em ) to Laptop Cable.\n\nPrevious part]}, :{ CatalogSearch_en-US:[ em VCM /em em II /em ], Name_en-US:[ em VCM /em em II /em em DLC /em CABLE], Description_en-US:[.\n em VCM /em em II /em em DLC /em cable]}, :{ CatalogSearch_en-US:[ em VCM /em ], Name_en-US:[8' DLC TO em VCM /em em I /em em CABLE /em ], Description_en-US:[8' DLC to em VCM /em em I /em em cable /em .]}, Thanks Ravi I know I'm a little late now ;-) Anyway, I ran into the same problem and figured out the cause and solution: You do not set the field uniqueField in your index, hence the key in the JSON is empty which will result in problems when parsing the JSON string in JS, leaving only one key-value pair. -- View this message in context: http://lucene.472066.n3.nabble.com/Get-Data-under-Highlight-Json-value-pair-tp4149041p4164494.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How should one search on all fields? *:XX does not work
Hey, what do you think this is, Elasticsearch?!! Or... try LucidWorks Search, which supports ALL as a pseudo-field name. It supports * as well. See: https://docs.lucidworks.com/display/lweug/Field+Queries Whether LucidWorks still supports their (my!) query parser in their new Fusion product is unclear - I couldn't find any reference in the doc. -- Jack Krupansky -Original Message- From: Aaron Lewis Sent: Thursday, October 16, 2014 1:47 AM To: solr-user@lucene.apache.org Subject: How should one search on all fields? *:XX does not work Hi, I'm trying to match all fields, so I tried this: *:XX Is that a bad practice? It does seem to be supported either -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33
Re: How to use less than and greater than in data-config file of solr
Hi, Since it is an xml file you need to encode greater than sign. gt; Ahmet On Thursday, October 16, 2014 8:52 AM, madhav bahuguna madhav.bahug...@gmail.com wrote: I have two tables and i want to link them using greater than and less than condition.They have nothing in common,the only way i can link them is using range values.Iam able to do this in Mysql but how do i do this in solr in my data-config.xml file This is how my data-config file looks entity name=business_colors query=SELECT business_colors_id, business_rating_from,business_rating_to,business_text,hex_colors,rgb_colors,business_colors_modify from business_colors where business_rating_from gt;= '${businessmasters.business_point}' AND business_rating_to lt; '${businessmasters.business_point}' deltaQuery=select business_colors_id from business_colors where business_colors_modify '${dih.last_index_time}' parentDeltaQuery=select business_id from businessmasters where business_point lt; ${business_colors.business_rating_from} AND business_point gt;= ${business_colors.business_rating_from} field column=business_colors_id name=id/ field column=business_rating_from name=business_rating_from indexed=true stored=true / field column=business_rating_to name=business_rating_to indexed=true stored=true / field column=business_text name=business_text indexed=true stored=true / field column=hex_colors name=hex_colors indexed=true stored=true / field column=rgb_colors name=rgb_colors indexed=true stored=true / field column=business_colors_modify name=business_colors_modify indexed=true stored=true/ When i click full indexing data does not get index and no error is shown. What is wrong with this,Can any one help and advise.What i have seen is that if i replace AND with OR it works fine or just use one condition instead of both it works fine . Can any one advise and help How do i achieve what i want to do. I have also posted this question in stackoverflow http://stackoverflow.com/questions/26397084/how-use-less-than-and-greater-than-in-data-config-file-of-solr -- Regards Madhav Bahuguna
Re: Does Solr support this?
Nope, not yet. Someone did propose a JavascriptRequestHandler or such, which would allow you to code such things in Javascript (obviously), but I don't believe that has been accepted or completed yet. Upayavira On Thu, Oct 16, 2014, at 03:48 AM, Aaron Lewis wrote: Hi, I'm trying to a if first query is empty then do a second query, e.g if this returns no rows: title:XX AND subject:YY Then do a title:XX I can do that with two queries. But I'm wondering if I can merge them into a single one? -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33
Re: How should one search on all fields? *:XX does not work
On 16 October 2014 06:50, Jack Krupansky j...@basetechnology.com wrote: Hey, what do you think this is, Elasticsearch?!! LoL. AFAIK, ElasticSearch does it by auto-copying all fields to _all_. So, easy enough to replicate with a single copyField * - text instruction. With the appropriate loss of precision, analyzers, etc. I don't think eDisMax supports '*' in the fl value, does it? Otherwise, that would be a solution. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
Re: eDismax - boost function of multiple values
Hey Ahmet, thanks for your answer. I've read about this on the following page: http://wiki.apache.org/solr/FunctionQuery Using FunctionQuery point 3: The bf parameter actually takes a list of function queries separated by whitespace and each with an optional boost. If I write it the way you suggested, the result is the same. Only inhabitants ranked up and importance will be ignored. greetings Ahmet Arslan iori...@yahoo.com schrieb am 20:26 Dienstag, 14.Oktober 2014: Hi Jens, Where did you read that you can write it separated by white spaces? bq and bf are both can be defined multiple times. q=foobf=ord(inhabitants)bf=ord(importance) Ahmet On Tuesday, October 14, 2014 6:34 PM, Jens Mayer mjen...@yahoo.com.INVALID wrote: Hey everyone, I have a question about the boost function of solr. The documentation say about multiple function querys that I can write it seperated by whitespaces. Example: q=foobf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3 Now I have two fields I like to boost. Inhabitants and importance. The field Inhabitants contain the inhabitants of citys. and the field importance contain a priority value - citys have the value 10, suburb the value 5 and streets the value 1. If I use the bf parameter I can boost inhabitants so that citys with the most inhabitants ranked up. Example: q=foobf=ord(inhabitants) The same happens if I boost importance. Example: q=foobf=ord(importance) But if I try to combine both so that importance and inhabitants ranked up only inhabitants will be ranked up and importance will be ignored. Example: q=foobf=ord(inhabitants) ord(importance) Knows anyone how I can fix this problem? greetings
Boost on basis of field is present or not in found documents
Where should i do changes in config files if i want to boost on the basis of if a field is present in my found documents. Explanation: I have documents with fields name, address, id, number, where number may or may not exists. I have to rank the documents higher based on if number is not present. I thought of writing function exists in my qf but that is not working. I am using edismax query parser. Thanks -- Rahul Ranjan
Re: Does Solr support this?
I'm doing something similar with a custom search component. See SOLR-6502 https://issues.apache.org/jira/browse/SOLR-6502 On Thu, Oct 16, 2014 at 8:14 AM, Upayavira u...@odoko.co.uk wrote: Nope, not yet. Someone did propose a JavascriptRequestHandler or such, which would allow you to code such things in Javascript (obviously), but I don't believe that has been accepted or completed yet. Upayavira On Thu, Oct 16, 2014, at 03:48 AM, Aaron Lewis wrote: Hi, I'm trying to a if first query is empty then do a second query, e.g if this returns no rows: title:XX AND subject:YY Then do a title:XX I can do that with two queries. But I'm wondering if I can merge them into a single one? -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33
Re: eDismax - boost function of multiple values
Hi, I forgot one ampersand in my example. Did you add it? q=foobf=ord(inhabitants)bf=ord(importance) Ahmet On Thursday, October 16, 2014 4:50 PM, Jens Mayer mjen...@yahoo.com.INVALID wrote: Hey Ahmet, thanks for your answer. I've read about this on the following page: http://wiki.apache.org/solr/FunctionQuery Using FunctionQuery point 3: The bf parameter actually takes a list of function queries separated by whitespace and each with an optional boost. If I write it the way you suggested, the result is the same. Only inhabitants ranked up and importance will be ignored. greetings Ahmet Arslan iori...@yahoo.com schrieb am 20:26 Dienstag, 14.Oktober 2014: Hi Jens, Where did you read that you can write it separated by white spaces? bq and bf are both can be defined multiple times. q=foobf=ord(inhabitants)bf=ord(importance) Ahmet On Tuesday, October 14, 2014 6:34 PM, Jens Mayer mjen...@yahoo.com.INVALID wrote: Hey everyone, I have a question about the boost function of solr. The documentation say about multiple function querys that I can write it seperated by whitespaces. Example: q=foobf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3 Now I have two fields I like to boost. Inhabitants and importance. The field Inhabitants contain the inhabitants of citys. and the field importance contain a priority value - citys have the value 10, suburb the value 5 and streets the value 1. If I use the bf parameter I can boost inhabitants so that citys with the most inhabitants ranked up. Example: q=foobf=ord(inhabitants) The same happens if I boost importance. Example: q=foobf=ord(importance) But if I try to combine both so that importance and inhabitants ranked up only inhabitants will be ranked up and importance will be ignored. Example: q=foobf=ord(inhabitants) ord(importance) Knows anyone how I can fix this problem? greetings
Re: Boost on basis of field is present or not in found documents
Hi, Can't you mix not, exists, if functions? https://cwiki.apache.org/confluence/display/solr/Function+Queries boost=if(not(exists(number)),1,100) On Thursday, October 16, 2014 5:13 PM, Rahul rahul1...@gmail.com wrote: Where should i do changes in config files if i want to boost on the basis of if a field is present in my found documents. Explanation: I have documents with fields name, address, id, number, where number may or may not exists. I have to rank the documents higher based on if number is not present. I thought of writing function exists in my qf but that is not working. I am using edismax query parser. Thanks -- Rahul Ranjan
Re: eDismax - boost function of multiple values
Spaces should work just fine. Can you show us exactly what is happening with the score that leads you to the conclusion that it isn’t working? Some testing from an example collection I have… No boost: http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismax id,price,yearpub,score db9780819562005,13.21,1989,0.40321594 db1562399055,17.87,2001,0.28511673 db0072519096,66.67,2008,0.28511673 db0140236392,10.88,1994,0.28511673 db04,44.99,2007,0.25200996 db07,19.77,2005,0.25200996 db0763777595,24.44,2002,0.25200996 db0879305835,43.58,2011,0.24947715 db1933550309,18.99,2004,0.24691834 db02,40.09,2009,0.21383755 Boost of just yearpub: http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismaxbf=ord%28yearpub%29 id,price,yearpub,score db0879305835,43.58,2011,11.069619 db1847195881,33.62,2010,10.635455 db02,40.09,2009,10.233932 db0072519096,66.67,2008,9.897689 db0316033723,23.1,2008,9.821208 db04,44.99,2007,9.465844 db05,44.99,2007,9.419684 db9780061336461,12.18,2007,9.398244 db07,19.77,2005,8.662797 db1933550309,18.99,2004,8.256611 boost of yearpub and price, using just a space as separator: http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismaxbf=ord%28yearpub%29%20ord%28price%29 id,price,yearpub,score db0072519096,66.67,2008,28.933228 db0879305835,43.58,2011,28.15772 db04,44.99,2007,27.414654 db05,44.99,2007,27.371819 db02,40.09,2009,27.009602 db1847195881,33.62,2010,26.636993 db9780201896831,57.43,1997,24.749598 db0767914384,37.87,1997,22.835175 db0316033723,23.1,2008,21.037462 db0763777595,24.44,2002,19.58986 Score keeps increasing with each boost. Regards, Garth Hey Ahmet, thanks for your answer. I've read about this on the following page: http://wiki.apache.org/solr/FunctionQuery Using FunctionQuery point 3: The bf parameter actually takes a list of function queries separated by whitespace and each with an optional boost. If I write it the way you suggested, the result is the same. Only inhabitants ranked up and importance will be ignored. greetings Ahmet Arslan iori...@yahoo.com schrieb am 20:26 Dienstag, 14.Oktober 2014: Hi Jens, Where did you read that you can write it separated by white spaces? bq and bf are both can be defined multiple times. q=foobf=ord(inhabitants)bf=ord(importance) Ahmet On Tuesday, October 14, 2014 6:34 PM, Jens Mayer mjen...@yahoo.com.INVALID wrote: Hey everyone, I have a question about the boost function of solr. The documentation say about multiple function querys that I can write it seperated by whitespaces. Example: q=foobf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3 Now I have two fields I like to boost. Inhabitants and importance. The field Inhabitants contain the inhabitants of citys. and the field importance contain a priority value - citys have the value 10, suburb the value 5 and streets the value 1. If I use the bf parameter I can boost inhabitants so that citys with the most inhabitants ranked up. Example: q=foobf=ord(inhabitants) The same happens if I boost importance. Example: q=foobf=ord(importance) But if I try to combine both so that importance and inhabitants ranked up only inhabitants will be ranked up and importance will be ignored. Example: q=foobf=ord(inhabitants) ord(importance) Knows anyone how I can fix this problem? greetings
Add core in solr.xml | Problem with starting SOLRcloud
Hello, Our platform has 4 solr instances and 3 zookeepers(solr 4.1.0). I want to add a new core in my solrcloud. I add the new core to the solr.xml file: core name=collection2 instanceDir=collection2 / I put the config files in the directory collection2. I uploaded the new config to zookeeper and start solr. Solr did not start up and gives the following error: Oct 16, 2014 4:57:06 PM org.apache.solr.cloud.ZkController publish INFO: publishing core=collection1 state=recovering Oct 16, 2014 4:57:06 PM org.apache.solr.cloud.ZkController publish INFO: numShards not found on descriptor - reading it from system property Oct 16, 2014 4:57:06 PM org.apache.solr.client.solrj.impl.HttpClientUtil createClient INFO: Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false Oct 16, 2014 4:59:06 PM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover. core=collection1:org.apache.solr.common.SolrException: I was asked to wait on state recovering for 31.114.2.237:8910_solr but I still do not see the requested state. I see state: active live:true at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:404) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223) Oct 16, 2014 4:59:06 PM org.apache.solr.cloud.RecoveryStrategy doRecovery SEVERE: Recovery failed - trying again... (0) core=collection1 Oct 16, 2014 4:59:06 PM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: Wait 2.0 seconds before trying to recover again (1) Oct 16, 2014 4:59:08 PM org.apache.solr.cloud.ZkController publish INFO: publishing core=collection1 state=recovering Oct 16, 2014 4:59:08 PM org.apache.solr.cloud.ZkController publish INFO: numShards not found on descriptor - reading it from system property Oct 16, 2014 4:59:08 PM org.apache.solr.client.solrj.impl.HttpClientUtil createClient INFO: Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false What's wrong with my setup? Any help would be appreciated! Roy -- View this message in context: http://lucene.472066.n3.nabble.com/Add-core-in-solr-xml-Problem-with-starting-SOLRcloud-tp4164524.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom JSON
Hello, I'm trying to use the new custom JSON feature described in https://issues.apache.org/jira/browse/SOLR-6304. I'm running Solr 4.10.1. It seems that the new feature, or more specifically, the /update/json/docs endpoint is not enabled out-of-the-box except in the schema-less example. Is there some dependence of the feature on schemaless mode? I've tried pulling the endpoint definition and related pieces of the example-schemaless solrconfig.xml and adding those to the standard solrconfig.xml in the main example but I've run into a cascade of issues. Right now I'm getting a This IndexSchema is not mutable exception when I try to post to the /update/json/docs endpoint. My real question is -- what's the easiest way to get this feature up and running quickly and is this documented somewhere? I'm trying to do a quick proof-of-concept to verify that we can move from our current flat JSON ingestion to a more natural use of structured JSON. Thanks, Scott Dawson
Re: Problem with DIH
This seems a little abstract. What I'd do is double check that the SQL is working correctly by running the stored procedure outside of Solr and see what you get. You should also be able to look at the corresponding .properties file and see the inputs used for the delta import. If the data import XML is called dih-example.xml, then the properties file should be called dih-example.properties and be in the same conf directory (for the collection).Example contents are: #Fri Oct 10 14:53:44 EDT 2014 last_index_time=2014-10-10 14\:53\:44 healthtopic.last_index_time=2014-10-10 14\:53\:44 Again, I'm suggesting you double check that the SQL is working correctly. If that isn't the problem, provide more details on your data import handler, e.g. the XML with some modifications (no passwords). On Thu, Oct 16, 2014 at 2:11 AM, Jay Potharaju jspothar...@gmail.com wrote: Hi I 'm using DIH for updating my core. I 'm using store procedure for doing a full/ delta imports. In order to avoid running delta imports for a long time, i limit the rows returned to a max of 100,000 rows at a given time. On an average the delta import runs for less than 1 minute. For the last couple of days I have been noticing that my delta imports has been running for couple of hours and tries to update all the records in the core. I 'm not sure why that has been happening. I cant reproduce this event all the time, it happens randomly. Has anyone noticed this kind of behavior. And secondly are there any solr logs that will tell me what is getting updated or what exactly is happening at the DIH ? Any suggestion appreciated. Document size: 20 million Solr 4.9 3 Nodes in the solr cloud. Thanks J
Frequent recovery of nodes in SolrCloud
Hi, Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave configuration. We have only one collection and it has only only one shard. Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have two instances running on each) out of which one is leader. Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud, it shows at least one (sometimes, it is 2-3) node status as recovering. We are using HAProxy load balancer and there also many times, it is showing the nodes are recovering. This is happening for all nodes in the cluster. What would be the problem here? How do I check this in logs? -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom JSON
Noble, Thanks. You're right. I had some things incorrectly configured but now I can put structured JSON into Solr using the out-of-the-box solrconfig.xml. One additional question: Is there any way to query Solr and receive the original structured JSON document in response? Or does the flattening process that happens during indexing obliterate the original structure with no way to reconstruct it? Thanks again, Scott On Thu, Oct 16, 2014 at 2:10 PM, Noble Paul noble.p...@gmail.com wrote: The end point /update/json/docs is enabled implicitly in Solr irrespective of the solrconfig.xml In schemaless the fields are created automatically by solr. If you have all the fields created in your schema.xml it will work . if you need an id field please use a copy field to create one --Noble On Thu, Oct 16, 2014 at 8:42 PM, Scott Dawson sc.e.daw...@gmail.com wrote: Hello, I'm trying to use the new custom JSON feature described in https://issues.apache.org/jira/browse/SOLR-6304. I'm running Solr 4.10.1. It seems that the new feature, or more specifically, the /update/json/docs endpoint is not enabled out-of-the-box except in the schema-less example. Is there some dependence of the feature on schemaless mode? I've tried pulling the endpoint definition and related pieces of the example-schemaless solrconfig.xml and adding those to the standard solrconfig.xml in the main example but I've run into a cascade of issues. Right now I'm getting a This IndexSchema is not mutable exception when I try to post to the /update/json/docs endpoint. My real question is -- what's the easiest way to get this feature up and running quickly and is this documented somewhere? I'm trying to do a quick proof-of-concept to verify that we can move from our current flat JSON ingestion to a more natural use of structured JSON. Thanks, Scott Dawson -- - Noble Paul
Re: import solr source to eclipse
I had a problem with the ant eclipse answer - it was unable to resolve javax.activation for the Javadoc. Updating solr/contrib/dataimporthandler-extras/ivy.xml as follows did the trick for me: - dependency org=javax.activation name=activation rev=${/javax.activation/activation} conf=compile-*/ + dependency org=javax.activation name=activation rev=${/javax.activation/activation} conf=compile-default/ What I'm trying to do is to construct a failing Unit test for something that I think is a bug. But the first thing is to be able to run tests, probably in eclipse, but the command-line might be good enough although not ideal. On Tue, Oct 14, 2014 at 10:38 AM, Erick Erickson erickerick...@gmail.com wrote: I do exactly what Anurag mentioned, but _only_ when what I want to debug is, for some reason, not accessible via unit tests. It's very easy to do. It's usually much faster though to use unit tests, which you should be able to run from eclipse without starting a server at all. In IntelliJ, you just ctrl-click on the file and the menu gives you a choice of running or debugging the unit test, I'm sure Eclipse does something similar. There are zillions of units to choose from, and for new development it's a Good Thing to write the unit test first... Good luck! Erick On Tue, Oct 14, 2014 at 1:37 AM, Anurag Sharma anura...@gmail.com wrote: Another alternative is launch the jetty server from outside and attach it remotely from eclipse. java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7666 -jar start.jar The above command waits until the application attach succeed. On Tue, Oct 14, 2014 at 12:56 PM, Rajani Maski rajinima...@gmail.com wrote: Configure eclipse with Jetty plugin. Create a Solr folder under your Solr-Java-Project and Run the project [Run as] on Jetty Server. This blog[1] may help you to configure Solr within eclipse. [1] http://hokiesuns.blogspot.in/2010/01/setting-up-apache-solr-in-eclipse.html On Tue, Oct 14, 2014 at 12:06 PM, Ali Nazemian alinazem...@gmail.com wrote: Thank you very much for your guides but how can I run solr server inside eclipse? Best regards. On Mon, Oct 13, 2014 at 8:02 PM, Rajani Maski rajinima...@gmail.com wrote: Hi, The best tutorial for setting up Solr[solr 4.7] in eclipse/intellij is documented in Solr In Action book, Apendix A, *Working with the Solr codebase* On Mon, Oct 13, 2014 at 6:45 AM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: The way I do this: From a terminal: svn checkout https://svn.apache.org/repos/asf/lucene/dev/trunk/ lucene-solr-trunk cd lucene-solr-trunk ant eclipse ... And then, from your Eclipse import existing java project, and select the directory where you placed lucene-solr-trunk On Sun, Oct 12, 2014 at 7:09 AM, Ali Nazemian alinazem...@gmail.com wrote: Hi, I am going to import solr source code to eclipse for some development purpose. Unfortunately every tutorial that I found for this purpose is outdated and did not work. So would you please give me some hint about how can I import solr source code to eclipse? Thank you very much. -- A.Nazemian -- A.Nazemian
Re: Frequent recovery of nodes in SolrCloud
Hello, you have one shard and 11 replicas? Hmm... - Why you have to keep two nodes on some machines? - Physical hardware or virtual machines? - What is the size of this index? - Is this all on a local network or are there links with potential outages or failures in between? - What is the query load? - Have you had a look at garbage collection? - Do you use the internal Zookeeper? - How many nodes? - Any observers? - What kind of load does Zookeeper show? - How much RAM do these nodes have available? - Do some servers get into swapping? - ... How about some more details in terms of sizing and topology? Cheers, --Jürgen On 16.10.2014 18:41, sachinpkale wrote: Hi, Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave configuration. We have only one collection and it has only only one shard. Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have two instances running on each) out of which one is leader. Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud, it shows at least one (sometimes, it is 2-3) node status as recovering. We are using HAProxy load balancer and there also many times, it is showing the nodes are recovering. This is happening for all nodes in the cluster. What would be the problem here? How do I check this in logs? -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html Sent from the Solr - User mailing list archive at Nabble.com. -- Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С уважением *i.A. Jürgen Wagner* Head of Competence Center Intelligence Senior Cloud Consultant Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 E-Mail: juergen.wag...@devoteam.com mailto:juergen.wag...@devoteam.com, URL: www.devoteam.de http://www.devoteam.de/ Managing Board: Jürgen Hatzipantelis (CEO) Address of Record: 64331 Weiterstadt, Germany; Commercial Register: Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
Complex boost statement
Edismax, solrnet I'm thinking that solrnet is going to be my problem due to I can only sent one boost parameter. Is it possible to have a boost value: if(exists(query({!v=BUS_CITY:regina}))(BUS_IS_NEARBY),20,1) Thanks, Corey
Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
Shawn, Please find the answers to your questions. 1. Java Version :java version 1.7.0_51 Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) 2.OS CentOS Linux release 7.0.1406 (Core) 3. Everything is 64 bit , OS , Java , and CPU. 4. Java Args. -Djava.io.tmpdir=/opt/tomcat1/temp -Dcatalina.home=/opt/tomcat1 -Dcatalina.base=/opt/tomcat1 -Djava.endorsed.dirs=/opt/tomcat1/endorsed -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181, server3.mydomain.com:2181 -DzkClientTimeout=2 -DhostContext=solr -Dport=8081 -Dhost=server1.mydomain.com -Dsolr.solr.home=/opt/solr/home1 -Dfile.encoding=UTF8 -Duser.timezone=UTC -XX:+UseG1GC -XX:MaxPermSize=128m -XX:PermSize=64m -Xmx2048m -Xms128m -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties 5. Zookeeper ensemble has 3 zookeeper instances , which are external and are not embedded. 6. Container : I am using Tomcat Apache Tomcat Version 7.0.42 *Additional Observations:* I queries all docs on both replicas with distrib=falsefl=idsort=id+asc, then compared the two lists, I could see by eyeballing the first few lines of ids in both the lists ,I could say that even though each list has equal number of documents i.e 96309 each , but the document ids in them seem to be *mutually exclusive* , , I did not find even a single common id in those lists , I tried at least 15 manually ,it looks like to me that the replicas are disjoint sets. Thanks. On Thu, Oct 16, 2014 at 1:41 AM, Shawn Heisey apa...@elyograg.org wrote: On 10/15/2014 10:24 PM, S.L wrote: Yes , I tried those two queries with distrib=false , I get 0 results for first and 1 result for the second query( (i.e. server 3 shard 2 replica 2) consistently. However if I run the same second query (i.e. server 3 shard 2 replica 2) with distrib=true, I sometimes get a result and sometimes not , should'nt this query always return a result when its pointing to a core that seems to have that document regardless of distrib=true or false ? Unfortunately I dont see anything particular in the logs to point to any information. BTW you asked me to replace the request handler , I use the select request handler ,so I cannot replace it with anything else , is that a problem ? If you send the query with distrib=true (which is the default value in SolrCloud), then it treats it just as if you had sent it to /solr/collection instead of /solr/collection_shardN_replicaN, so it's a full distributed query. The distrib=false is required to turn that behavior off and ONLY query the index on the actual core where you sent it. I only said to replace those things as appropriate. Since you are using /select, it's no problem that you left it that way. If I were to assume that you used /select, but you didn't, the URLs as I wrote them might not have worked. As discussed, this means that your replicas are truly out of sync. It's difficult to know what caused it, especially if you can't see anything in the log when you indexed the missing documents. We know you're on Solr 4.10.1. This means that your Java is a 1.7 version, since Java7 is required. Here's where I ask a whole lot of questions about your setup. What is the precise Java version, and which vendor's Java are you using? What operating system is it on? Is everything 64-bit, or is any piece (CPU, OS, Java) 32-bit? On the Solr admin UI dashboard, it lists all parameters used when starting Java, labelled as Args. Can you include those? Is zookeeper external, or embedded in Solr? Is it a 3-server (or more) ensemble? Are you using the example jetty, or did you provide your own servlet container? We recommend 64-bit Oracle Java, the latest 1.7 version. OpenJDK (since version 1.7.x) should be pretty safe as well, but IBM's Java should be avoided. IBM does very aggressive runtime optimizations. These can make programs run faster, but they are known to negatively affect Lucene/Solr. Thanks, Shawn
Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
On 10/16/2014 6:27 PM, S.L wrote: 1. Java Version :java version 1.7.0_51 Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) I believe that build 51 is one of those that is known to have bugs related to Lucene. If you can upgrade this to 67, that would be good, but I don't know that it's a pressing matter. It looks like the Oracle JVM, which is good. 2.OS CentOS Linux release 7.0.1406 (Core) 3. Everything is 64 bit , OS , Java , and CPU. 4. Java Args. -Djava.io.tmpdir=/opt/tomcat1/temp -Dcatalina.home=/opt/tomcat1 -Dcatalina.base=/opt/tomcat1 -Djava.endorsed.dirs=/opt/tomcat1/endorsed -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181, server3.mydomain.com:2181 -DzkClientTimeout=2 -DhostContext=solr -Dport=8081 -Dhost=server1.mydomain.com -Dsolr.solr.home=/opt/solr/home1 -Dfile.encoding=UTF8 -Duser.timezone=UTC -XX:+UseG1GC -XX:MaxPermSize=128m -XX:PermSize=64m -Xmx2048m -Xms128m -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties I would not use the G1 collector myself, but with the heap at only 2GB, I don't know that it matters all that much. Even a worst-case collection probably is not going to take more than a few seconds, and you've already increased the zookeeper client timeout. http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning 5. Zookeeper ensemble has 3 zookeeper instances , which are external and are not embedded. 6. Container : I am using Tomcat Apache Tomcat Version 7.0.42 *Additional Observations:* I queries all docs on both replicas with distrib=falsefl=idsort=id+asc, then compared the two lists, I could see by eyeballing the first few lines of ids in both the lists ,I could say that even though each list has equal number of documents i.e 96309 each , but the document ids in them seem to be *mutually exclusive* , , I did not find even a single common id in those lists , I tried at least 15 manually ,it looks like to me that the replicas are disjoint sets. Are you sure you hit both replicas of the same shard number? If you are, then it sounds like something is going wrong with your document routing, or maybe your clusterstate is really messed up. Recreating the collection from scratch and doing a full reindex might be a good plan ... assuming this is possible for you. You could create a whole new collection, and then when you're ready to switch, delete the original collection and create an alias so your app can still use the old name. How much total RAM do you have on these systems, and how large are those index shards? With a shard having 96K documents, it sounds like your whole index is probably just shy of 300K documents. Thanks, Shawn
Re: import solr source to eclipse
Sorry, not an Eclipse guy, I'll have to wait for them to chime in... Kudos for trying to construct a unit test illustrating the error though, that'll be a great help! Erick On Thu, Oct 16, 2014 at 4:14 PM, Dan Davis dansm...@gmail.com wrote: I had a problem with the ant eclipse answer - it was unable to resolve javax.activation for the Javadoc. Updating solr/contrib/dataimporthandler-extras/ivy.xml as follows did the trick for me: - dependency org=javax.activation name=activation rev=${/javax.activation/activation} conf=compile-*/ + dependency org=javax.activation name=activation rev=${/javax.activation/activation} conf=compile-default/ What I'm trying to do is to construct a failing Unit test for something that I think is a bug. But the first thing is to be able to run tests, probably in eclipse, but the command-line might be good enough although not ideal. On Tue, Oct 14, 2014 at 10:38 AM, Erick Erickson erickerick...@gmail.com wrote: I do exactly what Anurag mentioned, but _only_ when what I want to debug is, for some reason, not accessible via unit tests. It's very easy to do. It's usually much faster though to use unit tests, which you should be able to run from eclipse without starting a server at all. In IntelliJ, you just ctrl-click on the file and the menu gives you a choice of running or debugging the unit test, I'm sure Eclipse does something similar. There are zillions of units to choose from, and for new development it's a Good Thing to write the unit test first... Good luck! Erick On Tue, Oct 14, 2014 at 1:37 AM, Anurag Sharma anura...@gmail.com wrote: Another alternative is launch the jetty server from outside and attach it remotely from eclipse. java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7666 -jar start.jar The above command waits until the application attach succeed. On Tue, Oct 14, 2014 at 12:56 PM, Rajani Maski rajinima...@gmail.com wrote: Configure eclipse with Jetty plugin. Create a Solr folder under your Solr-Java-Project and Run the project [Run as] on Jetty Server. This blog[1] may help you to configure Solr within eclipse. [1] http://hokiesuns.blogspot.in/2010/01/setting-up-apache-solr-in-eclipse.html On Tue, Oct 14, 2014 at 12:06 PM, Ali Nazemian alinazem...@gmail.com wrote: Thank you very much for your guides but how can I run solr server inside eclipse? Best regards. On Mon, Oct 13, 2014 at 8:02 PM, Rajani Maski rajinima...@gmail.com wrote: Hi, The best tutorial for setting up Solr[solr 4.7] in eclipse/intellij is documented in Solr In Action book, Apendix A, *Working with the Solr codebase* On Mon, Oct 13, 2014 at 6:45 AM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: The way I do this: From a terminal: svn checkout https://svn.apache.org/repos/asf/lucene/dev/trunk/ lucene-solr-trunk cd lucene-solr-trunk ant eclipse ... And then, from your Eclipse import existing java project, and select the directory where you placed lucene-solr-trunk On Sun, Oct 12, 2014 at 7:09 AM, Ali Nazemian alinazem...@gmail.com wrote: Hi, I am going to import solr source code to eclipse for some development purpose. Unfortunately every tutorial that I found for this purpose is outdated and did not work. So would you please give me some hint about how can I import solr source code to eclipse? Thank you very much. -- A.Nazemian -- A.Nazemian
Re: Frequent recovery of nodes in SolrCloud
And what is your zookeeper timeout? When it's too short that can lead to this behavior. Best, Erick On Thu, Oct 16, 2014 at 4:34 PM, Jürgen Wagner (DVT) juergen.wag...@devoteam.com wrote: Hello, you have one shard and 11 replicas? Hmm... - Why you have to keep two nodes on some machines? - Physical hardware or virtual machines? - What is the size of this index? - Is this all on a local network or are there links with potential outages or failures in between? - What is the query load? - Have you had a look at garbage collection? - Do you use the internal Zookeeper? - How many nodes? - Any observers? - What kind of load does Zookeeper show? - How much RAM do these nodes have available? - Do some servers get into swapping? - ... How about some more details in terms of sizing and topology? Cheers, --Jürgen On 16.10.2014 18:41, sachinpkale wrote: Hi, Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave configuration. We have only one collection and it has only only one shard. Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have two instances running on each) out of which one is leader. Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud, it shows at least one (sometimes, it is 2-3) node status as recovering. We are using HAProxy load balancer and there also many times, it is showing the nodes are recovering. This is happening for all nodes in the cluster. What would be the problem here? How do I check this in logs? -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html Sent from the Solr - User mailing list archive at Nabble.com. -- Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С уважением i.A. Jürgen Wagner Head of Competence Center Intelligence Senior Cloud Consultant Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de Managing Board: Jürgen Hatzipantelis (CEO) Address of Record: 64331 Weiterstadt, Germany; Commercial Register: Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
Shawn , 1. I will upgrade to 67 JVM shortly . 2. This is a new collection as , I was facing a similar issue in 4.7 and based on Erick's recommendation I updated to 4.10.1 and created a new collection. 3. Yes, I am hitting the replicas of the same shard and I see the lists are completely non overlapping.I am using CloudSolrServer to add the documents. 4. I have a 3 physical node cluster , with each having 16GB in memory. 5. I also have a custom request handler defined in my solrconfig.xml as below , however I am not using that and I am only using the default select handler, but my MyCustomHandler class has been been added to the source and included in the build , but not being used for any requests yet. requestHandler name=/mycustomselect class=solr.MyCustomHandler startup=lazy lst name=defaults str name=dfsuggestAggregate/str str name=spellcheck.dictionarydirect/str !--str name=spellcheck.dictionarywordbreak/str-- str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count10/str str name=spellcheck.alternativeTermCount5/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries10/str str name=spellcheck.maxCollations5/str /lst arr name=last-components strspellcheck/str /arr /requestHandler 5. The clusterstate.json is copied below {dyCollection1:{ shards:{ shard1:{ range:8000-d554, state:active, replicas:{ core_node3:{ state:active, core:dyCollection1_shard1_replica1, node_name:server3.mydomain.com:8082_solr, base_url:http://server3.mydomain.com:8082/solr}, core_node4:{ state:active, core:dyCollection1_shard1_replica2, node_name:server2.mydomain.com:8081_solr, base_url:http://server2.mydomain.com:8081/solr;, leader:true}}}, shard2:{ range:d555-2aa9, state:active, replicas:{ core_node1:{ state:active, core:dyCollection1_shard2_replica1, node_name:server1.mydomain.com:8081_solr, base_url:http://server1.mydomain.com:8081/solr;, leader:true}, core_node6:{ state:active, core:dyCollection1_shard2_replica2, node_name:server3.mydomain.com:8081_solr, base_url:http://server3.mydomain.com:8081/solr}}}, shard3:{ range:2aaa-7fff, state:active, replicas:{ core_node2:{ state:active, core:dyCollection1_shard3_replica2, node_name:server1.mydomain.com:8082_solr, base_url:http://server1.mydomain.com:8082/solr;, leader:true}, core_node5:{ state:active, core:dyCollection1_shard3_replica1, node_name:server2.mydomain.com:8082_solr, base_url:http://server2.mydomain.com:8082/solr, maxShardsPerNode:1, router:{name:compositeId}, replicationFactor:2, autoAddReplicas:false}} Thanks! On Thu, Oct 16, 2014 at 9:02 PM, Shawn Heisey apa...@elyograg.org wrote: On 10/16/2014 6:27 PM, S.L wrote: 1. Java Version :java version 1.7.0_51 Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) I believe that build 51 is one of those that is known to have bugs related to Lucene. If you can upgrade this to 67, that would be good, but I don't know that it's a pressing matter. It looks like the Oracle JVM, which is good. 2.OS CentOS Linux release 7.0.1406 (Core) 3. Everything is 64 bit , OS , Java , and CPU. 4. Java Args. -Djava.io.tmpdir=/opt/tomcat1/temp -Dcatalina.home=/opt/tomcat1 -Dcatalina.base=/opt/tomcat1 -Djava.endorsed.dirs=/opt/tomcat1/endorsed -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181, server3.mydomain.com:2181 -DzkClientTimeout=2 -DhostContext=solr -Dport=8081 -Dhost=server1.mydomain.com -Dsolr.solr.home=/opt/solr/home1 -Dfile.encoding=UTF8 -Duser.timezone=UTC -XX:+UseG1GC -XX:MaxPermSize=128m -XX:PermSize=64m -Xmx2048m -Xms128m -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties I would not use the G1 collector myself, but with the heap at only 2GB, I don't know that it matters all that much. Even a worst-case collection probably is not going to take more than a few seconds, and you've already increased the zookeeper client timeout. http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning 5.
Re: Frequent recovery of nodes in SolrCloud
- Why you have to keep two nodes on some machines? - These are very powerful machines (32-Core, 64GB) and our index size is 1GB. We are allocating 7GB to JVM, so we thought it would be OK to have two instances on the same machine. - Physical hardware or virtual machines? - Physical hardware - What is the size of this index? - 1GB - Is this all on a local network or are there links with potential outages or failures in between? - local network - What is the query load? - 10K requests per minute. - Have you had a look at garbage collection? - GC time is generally 5-10%. I have attached a screenshot. - Do you use the internal Zookeeper? - No. We have setup external Zookeeper ensemble with 3 instances. Following is the ZooKeeper configuration: tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=192.168.70.27:2888:3888 server.2=192.168.70.64:2889:3889 server.3=192.168.70.26:2889:3889 Also, in solr.xml, we have zkClientTimeout set to 3. - How many nodes? - 3 - Any observers? - I don't know what observers are. Can you please explain? - What kind of load does Zookeeper show? - Load is normal I guess. Need to double-check. - How much RAM do these nodes have available? - Each SOLR node has 7GB allocated. For ZooKeeper, we have not allocated the memory explicitly. - Do some servers get into swapping? - Not sure. How do I check that? On Fri, Oct 17, 2014 at 2:04 AM, Jürgen Wagner (DVT) juergen.wag...@devoteam.com wrote: Hello, you have one shard and 11 replicas? Hmm... - Why you have to keep two nodes on some machines? - Physical hardware or virtual machines? - What is the size of this index? - Is this all on a local network or are there links with potential outages or failures in between? - What is the query load? - Have you had a look at garbage collection? - Do you use the internal Zookeeper? - How many nodes? - Any observers? - What kind of load does Zookeeper show? - How much RAM do these nodes have available? - Do some servers get into swapping? - ... How about some more details in terms of sizing and topology? Cheers, --Jürgen On 16.10.2014 18:41, sachinpkale wrote: Hi, Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave configuration. We have only one collection and it has only only one shard. Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have two instances running on each) out of which one is leader. Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud, it shows at least one (sometimes, it is 2-3) node status as recovering. We are using HAProxy load balancer and there also many times, it is showing the nodes are recovering. This is happening for all nodes in the cluster. What would be the problem here? How do I check this in logs? -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html Sent from the Solr - User mailing list archive at Nabble.com. -- Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С уважением *i.A. Jürgen Wagner* Head of Competence Center Intelligence Senior Cloud Consultant Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de -- Managing Board: Jürgen Hatzipantelis (CEO) Address of Record: 64331 Weiterstadt, Germany; Commercial Register: Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
Re: Frequent recovery of nodes in SolrCloud
From ZooKeeper side, we have following configuration: tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=192.168.70.27:2888:3888 server.2=192.168.70.64:2889:3889 server.3=192.168.70.26:2889:3889 Also, in solr.xml, we have zkClientTimeout set to 3. On Fri, Oct 17, 2014 at 7:27 AM, Erick Erickson erickerick...@gmail.com wrote: And what is your zookeeper timeout? When it's too short that can lead to this behavior. Best, Erick On Thu, Oct 16, 2014 at 4:34 PM, Jürgen Wagner (DVT) juergen.wag...@devoteam.com wrote: Hello, you have one shard and 11 replicas? Hmm... - Why you have to keep two nodes on some machines? - Physical hardware or virtual machines? - What is the size of this index? - Is this all on a local network or are there links with potential outages or failures in between? - What is the query load? - Have you had a look at garbage collection? - Do you use the internal Zookeeper? - How many nodes? - Any observers? - What kind of load does Zookeeper show? - How much RAM do these nodes have available? - Do some servers get into swapping? - ... How about some more details in terms of sizing and topology? Cheers, --Jürgen On 16.10.2014 18:41, sachinpkale wrote: Hi, Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave configuration. We have only one collection and it has only only one shard. Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have two instances running on each) out of which one is leader. Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud, it shows at least one (sometimes, it is 2-3) node status as recovering. We are using HAProxy load balancer and there also many times, it is showing the nodes are recovering. This is happening for all nodes in the cluster. What would be the problem here? How do I check this in logs? -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html Sent from the Solr - User mailing list archive at Nabble.com. -- Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С уважением i.A. Jürgen Wagner Head of Competence Center Intelligence Senior Cloud Consultant Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de Managing Board: Jürgen Hatzipantelis (CEO) Address of Record: 64331 Weiterstadt, Germany; Commercial Register: Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
Re: Frequent recovery of nodes in SolrCloud
Also, the PingRequestHandler is configured as: requestHandler name=/admin/ping class=solr.PingRequestHandler str name=healthcheckFileserver-enabled.txt/str/requestHandler On Fri, Oct 17, 2014 at 9:07 AM, Sachin Kale sachinpk...@gmail.com wrote: From ZooKeeper side, we have following configuration: tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=192.168.70.27:2888:3888 server.2=192.168.70.64:2889:3889 server.3=192.168.70.26:2889:3889 Also, in solr.xml, we have zkClientTimeout set to 3. On Fri, Oct 17, 2014 at 7:27 AM, Erick Erickson erickerick...@gmail.com wrote: And what is your zookeeper timeout? When it's too short that can lead to this behavior. Best, Erick On Thu, Oct 16, 2014 at 4:34 PM, Jürgen Wagner (DVT) juergen.wag...@devoteam.com wrote: Hello, you have one shard and 11 replicas? Hmm... - Why you have to keep two nodes on some machines? - Physical hardware or virtual machines? - What is the size of this index? - Is this all on a local network or are there links with potential outages or failures in between? - What is the query load? - Have you had a look at garbage collection? - Do you use the internal Zookeeper? - How many nodes? - Any observers? - What kind of load does Zookeeper show? - How much RAM do these nodes have available? - Do some servers get into swapping? - ... How about some more details in terms of sizing and topology? Cheers, --Jürgen On 16.10.2014 18:41, sachinpkale wrote: Hi, Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave configuration. We have only one collection and it has only only one shard. Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have two instances running on each) out of which one is leader. Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud, it shows at least one (sometimes, it is 2-3) node status as recovering. We are using HAProxy load balancer and there also many times, it is showing the nodes are recovering. This is happening for all nodes in the cluster. What would be the problem here? How do I check this in logs? -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html Sent from the Solr - User mailing list archive at Nabble.com. -- Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С уважением i.A. Jürgen Wagner Head of Competence Center Intelligence Senior Cloud Consultant Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de Managing Board: Jürgen Hatzipantelis (CEO) Address of Record: 64331 Weiterstadt, Germany; Commercial Register: Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
Re: Custom JSON
The original json is is not stored the fields are extracted and the data is thrown away On Fri, Oct 17, 2014 at 1:18 AM, Scott Dawson sc.e.daw...@gmail.com wrote: Noble, Thanks. You're right. I had some things incorrectly configured but now I can put structured JSON into Solr using the out-of-the-box solrconfig.xml. One additional question: Is there any way to query Solr and receive the original structured JSON document in response? Or does the flattening process that happens during indexing obliterate the original structure with no way to reconstruct it? Thanks again, Scott On Thu, Oct 16, 2014 at 2:10 PM, Noble Paul noble.p...@gmail.com wrote: The end point /update/json/docs is enabled implicitly in Solr irrespective of the solrconfig.xml In schemaless the fields are created automatically by solr. If you have all the fields created in your schema.xml it will work . if you need an id field please use a copy field to create one --Noble On Thu, Oct 16, 2014 at 8:42 PM, Scott Dawson sc.e.daw...@gmail.com wrote: Hello, I'm trying to use the new custom JSON feature described in https://issues.apache.org/jira/browse/SOLR-6304. I'm running Solr 4.10.1. It seems that the new feature, or more specifically, the /update/json/docs endpoint is not enabled out-of-the-box except in the schema-less example. Is there some dependence of the feature on schemaless mode? I've tried pulling the endpoint definition and related pieces of the example-schemaless solrconfig.xml and adding those to the standard solrconfig.xml in the main example but I've run into a cascade of issues. Right now I'm getting a This IndexSchema is not mutable exception when I try to post to the /update/json/docs endpoint. My real question is -- what's the easiest way to get this feature up and running quickly and is this documented somewhere? I'm trying to do a quick proof-of-concept to verify that we can move from our current flat JSON ingestion to a more natural use of structured JSON. Thanks, Scott Dawson -- - Noble Paul -- - Noble Paul