Re: Edismax mm and efficiency
On Fri, Sep 5, 2014 at 9:34 PM, Walter Underwood wun...@wunderwood.org wrote: What would be a high mm value, 75%? Walter, I suppose that the length of the search result influence the run time. So, for particular query and an index, the high mm value is that one, which significantly reduces the search result length. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Solr Suggestion not working in solr PLZ HELP
Suggestion In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namemySuggester/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldcontent/str str name=weightField/str str name=suggestAnalyzerFieldTypestring/str /lst /searchComponent requestHandler name=/suggest class=solr.SearchHandler startup=lazy lst name=defaults str name=suggesttrue/str str name=suggest.count10/str str name=suggest.dictionarymySuggester/str /lst arr name=components strsuggest/str /arr /requestHandler -- Suggestion: localhost:28080/solr/suggest?q=foobat above throwing exception as below responselst name=responseHeaderint name=status500/intint name=QTime12/int/lstlst name=errorstr name=msgNo suggester named default was configured/strstr name=tracejava.lang.IllegalArgumentException: No suggester named default was configured at org.apache.solr.handler.component.SuggestComponent.getSuggesters(SuggestComponent.java:353) at org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:158) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149) at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:145) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:559) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:336) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:653) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:926) at java.lang.Thread.run(Thread.java:745) /strint name=code500/int/lst/response -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Suggestion-not-working-in-solr-PLZ-HELP-tp4159351.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Suggestion not working in solr PLZ HELP
Hi Vaibhav, Could you check with the directory *suggest.dictionary* mySuggester is present or not, try making it with mkdir, if still problem persist try giving full path. I found good article in below link check with that too. [http://romiawasthy.blogspot.com/2014/06/configure-solr-suggester.html] Regards,Amey Date: Wed, 17 Sep 2014 00:03:33 -0700 From: vaibhav.h.pa...@gmail.com To: solr-user@lucene.apache.org Subject: Solr Suggestion not working in solr PLZ HELP Suggestion In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namemySuggester/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldcontent/str str name=weightField/str str name=suggestAnalyzerFieldTypestring/str /lst /searchComponent requestHandler name=/suggest class=solr.SearchHandler startup=lazy lst name=defaults str name=suggesttrue/str str name=suggest.count10/str str name=suggest.dictionarymySuggester/str /lst arr name=components strsuggest/str /arr /requestHandler -- Suggestion: localhost:28080/solr/suggest?q=foobat above throwing exception as below responselst name=responseHeaderint name=status500/intint name=QTime12/int/lstlst name=errorstr name=msgNo suggester named default was configured/strstr name=tracejava.lang.IllegalArgumentException: No suggester named default was configured at org.apache.solr.handler.component.SuggestComponent.getSuggesters(SuggestComponent.java:353) at org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:158) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149) at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:145) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:559) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:336) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:653) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:926) at java.lang.Thread.run(Thread.java:745) /strint name=code500/int/lst/response -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Suggestion-not-working-in-solr-PLZ-HELP-tp4159351.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr(j) API for manipulating the schema(.xml)?
Is there an API to manipulate/consolidate the schema(.xml) of a Solr-core? Through SolrJ? Context: We already have a generic indexing/searching framework (based on lucene) where any component can act as a so called IndexDataPorvider. This provider delivers the field-types and also the entities to be (converted into documents and then) indexed. Each of these IndexProviders has ist own lucene index. So we kind of have the information for the Solr schema.xml. Hope the intention is clear. And yes the manipulation of the schema.xml is basically only needed when the field types change. Thats why I am looking for a way to consolidate the schema.xml (upon boot, initialization oft he IndexDataProviders ...). In 99,999% it won't change, But I'd like to keep the possibility of an IndexDataProvider to hand in its schema. Also, again driven by the dynamic nature of our framework, can I easily create new cores over Sorj or the Solr-REST API ?
Problem deploying solr-4.10.0.war in Tomcat
Hello, I've dropped solr-4.10.0.war in Tomcat 7's webapp directory. When I start the Java web server, the following message appears in catalina.out: --- INFO: Starting Servlet Engine: Apache Tomcat/7.0.55 Sep 17, 2014 11:35:59 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /archives/apache-tomcat-7.0.55_solr_8983/webapps/solr-4.10.0.war Sep 17, 2014 11:35:59 AM org.apache.catalina.core.StandardContext startInternal SEVERE: Error filterStart Sep 17, 2014 11:35:59 AM org.apache.catalina.core.StandardContext startInternal SEVERE: Context [/solr-4.10.0] startup failed due to previous errors -- Any help would be much appreciated. Cheers, Philippe
RE: Problem deploying solr-4.10.0.war in Tomcat
Yes, this is a nasty error. You have not set up logging libraries properly: https://cwiki.apache.org/confluence/display/solr/Configuring+Logging -Original message- From:phi...@free.fr phi...@free.fr Sent: Wednesday 17th September 2014 11:51 To: solr-user@lucene.apache.org Subject: Problem deploying solr-4.10.0.war in Tomcat Hello, I've dropped solr-4.10.0.war in Tomcat 7's webapp directory. When I start the Java web server, the following message appears in catalina.out: --- INFO: Starting Servlet Engine: Apache Tomcat/7.0.55 Sep 17, 2014 11:35:59 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /archives/apache-tomcat-7.0.55_solr_8983/webapps/solr-4.10.0.war Sep 17, 2014 11:35:59 AM org.apache.catalina.core.StandardContext startInternal SEVERE: Error filterStart Sep 17, 2014 11:35:59 AM org.apache.catalina.core.StandardContext startInternal SEVERE: Context [/solr-4.10.0] startup failed due to previous errors -- Any help would be much appreciated. Cheers, Philippe
Ping handler during initial wamup
As far as I can see, when a Solr instance is started (whether standalone or SolrCloud), a PingRequestHandler will wait until index warmup is complete before returning (at least with useColdSearcher=false) which may take a while. This poses a problem in that a load balancer either needs to wait for the result or employ a short timeout for timely failover. Of course the request is eventually served, but it would be better to be able to switch over to another server until warmup is complete. So, is it possible to configure a ping handler to return quickly with non-OK status if a search handler is not yet available? This would allow the load balancer to quickly fail over to another server. I couldn't find anything like this in the docs, but I'm still hopeful. I'm aware of the possibility of using a health state file, but I'd rather have a way of doing this automatically. --Ere
solr 4.8 Tika stripping out all xml tags
I'm processing a zip file with an xml file. The TikaEntityProcessor opens the zip, reads the file but is stripping the xml tags even though I have supplied the htmlMapper=identity attribute. It maintains any html that is contained in a CDATA section but seems to strip the other xml tags. Is this due to the recursive nature of opening the zip file? Somehow that identity value is lost? My understanding is that this should work in this version 4.8. Thanks. Below is my config info. dataConfigdataSource type=BinFileDataSource /document entity name=kmlfiles dataSource=null rootEntity=false baseDir=mydirectory fileName=.*\.kmz$ onError=skip processor=FileListEntityProcessor recursive=false field defs / entity name=kmlImport processor=TikaEntityProcessor datasource=kmlfiles htmlMapper=identity format=xml transformer=TemplateTransformer url=${kmlfiles.fileAbsolutePath} recursive=true more field defs / entity name=xml processor=XPathEntityProcessor ForEach=/kml dataSource=fds dataField=kmlImport.text field xpath=//name column=name / ...more field defs /entity /entity /entity /document/dataConfig -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-8-Tika-stripping-out-all-xml-tags-tp4159419.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 4.8 Tika stripping out all xml tags
Sorry...adding more information. Note that it does wrap my data in html but it is after it strips all my xml tags out. So the data I am interested in parsing which would be namesomething/name descriptionsomething/description coordinates12345,12345,0/coordinates end up like p/n something /t/n something /n 12345,12345,0 etc. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-8-Tika-stripping-out-all-xml-tags-tp4159419p4159430.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to preserve 0 after decimal point?
: second, and assuming your problem is really that you're looking at the : _display_, you should get back exactly what you put in so I'm guessing Not quite ... With the numeric types, the numeric value is both indexed and stored so that there is no search/sort inconsistency between 1.1, 1.10, 001.1 etc... those are all the number 1.1 and are treated as such. if you have an input string of that you want preserved verbatim, then you need to use a string type. It doesn't matter if those strings *look* like numbers or not, if you consider 27.50 to be different from 27.5 then those aren't numbers, they are strings. -Hoss http://www.lucidworks.com/
Re: Solr(j) API for manipulating the schema(.xml)?
Right, you can create new cores over the rest api. As far as changing the schema, there's no good way to do that that I know of programmatically. In the SolrCloud world, you can upload the schema to ZooKeeper and have it automatically distributed to all the nodes though. Best, Erick On Wed, Sep 17, 2014 at 2:28 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Is there an API to manipulate/consolidate the schema(.xml) of a Solr-core? Through SolrJ? Context: We already have a generic indexing/searching framework (based on lucene) where any component can act as a so called IndexDataPorvider. This provider delivers the field-types and also the entities to be (converted into documents and then) indexed. Each of these IndexProviders has ist own lucene index. So we kind of have the information for the Solr schema.xml. Hope the intention is clear. And yes the manipulation of the schema.xml is basically only needed when the field types change. Thats why I am looking for a way to consolidate the schema.xml (upon boot, initialization oft he IndexDataProviders ...). In 99,999% it won't change, But I'd like to keep the possibility of an IndexDataProvider to hand in its schema. Also, again driven by the dynamic nature of our framework, can I easily create new cores over Sorj or the Solr-REST API ?
Re: How to preserve 0 after decimal point?
Really! Ya learn something new every day. On Wed, Sep 17, 2014 at 10:48 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : second, and assuming your problem is really that you're looking at the : _display_, you should get back exactly what you put in so I'm guessing Not quite ... With the numeric types, the numeric value is both indexed and stored so that there is no search/sort inconsistency between 1.1, 1.10, 001.1 etc... those are all the number 1.1 and are treated as such. if you have an input string of that you want preserved verbatim, then you need to use a string type. It doesn't matter if those strings *look* like numbers or not, if you consider 27.50 to be different from 27.5 then those aren't numbers, they are strings. -Hoss http://www.lucidworks.com/
Re: MaxScore
See if SOLR-5831 https://issues.apache.org/jira/browse/SOLR-5831 helps. Peter On Tue, Sep 16, 2014 at 11:32 PM, William Bell billnb...@gmail.com wrote: What we need is a function like scale(field,min,max) but only operates on the results that come back from the search results. scale() takes the min, max from the field in the index, not necessarily those in the results. I cannot think of a solution. max() only looks at one field, not across fields in the results. I tried a query() but cannot think of a way to get the max value of a field ONLY in the results... Ideas? -- Bill Bell billnb...@gmail.com cell 720-256-8076
Loading an index (generated by map reduce) in SolrCloud
Hello I have generated a lucene index (with 6 shards) using Map Reduce. I want to load this into a SolrCloud Cluster inside a collection. Is there any out of the box way of doing this? Any ideas are much appreciated Thanks Nitin
How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term
The Solr wiki says A repeated question is how can I have the original term contribute more to the score than the stemmed version? In Solr 4.3, the KeywordRepeatFilterFactory has been added to assist this functionality. https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming (Full section reproduced below.) I can see how in the example from the wiki reproduced below that both the stemmed and original term get indexed, but I don't see how the original term gets more weight than the stemmed term. Wouldn't this require a filter that gives terms with the keyword attribute more weight? What am I missing? Tom - A repeated question is how can I have the original term contribute more to the score than the stemmed version? In Solr 4.3, the KeywordRepeatFilterFactory has been added to assist this functionality. This filter emits two tokens for each input token, one of them is marked with the Keyword attribute. Stemmers that respect keyword attributes will pass through the token so marked without change. So the effect of this filter would be to index both the original word and the stemmed version. The 4 stemmers listed above all respect the keyword attribute. For terms that are not changed by stemming, this will result in duplicate, identical tokens in the document. This can be alleviated by adding the RemoveDuplicatesTokenFilterFactory. fieldType name=text_keyword class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.KeywordRepeatFilterFactory/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType
Re: How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term
I'm not 100% on this, but I imagine this is what happens: (using - to mean tokenized to) Suppose that you index: I am running home - am run running home If you then query running home - run running home and thus give a higher score than if you query runs home - run runs home - Original Message - The Solr wiki says A repeated question is how can I have the original term contribute more to the score than the stemmed version? In Solr 4.3, the KeywordRepeatFilterFactory has been added to assist this functionality. https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming (Full section reproduced below.) I can see how in the example from the wiki reproduced below that both the stemmed and original term get indexed, but I don't see how the original term gets more weight than the stemmed term. Wouldn't this require a filter that gives terms with the keyword attribute more weight? What am I missing? Tom - A repeated question is how can I have the original term contribute more to the score than the stemmed version? In Solr 4.3, the KeywordRepeatFilterFactory has been added to assist this functionality. This filter emits two tokens for each input token, one of them is marked with the Keyword attribute. Stemmers that respect keyword attributes will pass through the token so marked without change. So the effect of this filter would be to index both the original word and the stemmed version. The 4 stemmers listed above all respect the keyword attribute. For terms that are not changed by stemming, this will result in duplicate, identical tokens in the document. This can be alleviated by adding the RemoveDuplicatesTokenFilterFactory. fieldType name=text_keyword class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.KeywordRepeatFilterFactory/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -- Diego Fernandez - 爱国 Software Engineer GSS - Diagnostics
Re: Loading an index (generated by map reduce) in SolrCloud
Details please. You say MapReduce. Is this the MapReduceIndexerTool? If so, you can use the --go-live option to auto-merge them. Your Solr instances need to be running over HDFS though. If you don't have Solr running over HDFS, you can just copy the results for each shard to the right place. What that means is that you must insure that the shards produced via MRIT get copied to the corresponding Solr local directory for each shard. If you put the wrong one in the wrong place you'll have trouble with multiple copies of documents showing up when you re-add any doc that already exists in your Solr installation. BTW, I'd surely stop all my Solr instances while copying all this around. Best, Erick On Wed, Sep 17, 2014 at 1:41 PM, KNitin nitin.t...@gmail.com wrote: Hello I have generated a lucene index (with 6 shards) using Map Reduce. I want to load this into a SolrCloud Cluster inside a collection. Is there any out of the box way of doing this? Any ideas are much appreciated Thanks Nitin
Re: Loading an index (generated by map reduce) in SolrCloud
FWIW, I do a lot of moving Lucene indexes around and as long as the core is unloaded it's never been an issue for Solr to be running at the same time. If you move a core into the correct hierarchy for a replica, you can call the Collections API's CREATESHARD action with the appropriate params (make sure you use createNodeSet to point to the right server) and Solr will load the index appropriately. It's easiest to create a dummy shard and see where data lands on your installation than to try to guess. Ex: PORT=8983 SHARD=myshard COLLECTION=mycollection SOLR_HOST=box1.mysolr.corp curl http:// ${SOLR_HOST}:${PORT}/solr/admin/collections?action=CREATESHARDshard=${SHARD}collection=${COLLECTION}createNodeSet=${SOLR_HOST}:${PORT}_solr One file to watch out for if you are moving cores across machines/JVMs is the core.properties file, which you don't want to duplicate to another server/location when moving a data directory. I don't recommend trying to move transaction logs around either. On Wed, Sep 17, 2014 at 5:22 PM, Erick Erickson erickerick...@gmail.com wrote: Details please. You say MapReduce. Is this the MapReduceIndexerTool? If so, you can use the --go-live option to auto-merge them. Your Solr instances need to be running over HDFS though. If you don't have Solr running over HDFS, you can just copy the results for each shard to the right place. What that means is that you must insure that the shards produced via MRIT get copied to the corresponding Solr local directory for each shard. If you put the wrong one in the wrong place you'll have trouble with multiple copies of documents showing up when you re-add any doc that already exists in your Solr installation. BTW, I'd surely stop all my Solr instances while copying all this around. Best, Erick On Wed, Sep 17, 2014 at 1:41 PM, KNitin nitin.t...@gmail.com wrote: Hello I have generated a lucene index (with 6 shards) using Map Reduce. I want to load this into a SolrCloud Cluster inside a collection. Is there any out of the box way of doing this? Any ideas are much appreciated Thanks Nitin
Re: Implementing custom analyzer for multi-language stemming
If each token have a languageattribute on it, when I search by word and language and if hightlighting is switched on, each word of sentence will be highlighted. Because of it this solution not fit. -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-custom-analyzer-for-multi-language-stemming-tp4150156p4159550.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Loading an index (generated by map reduce) in SolrCloud
Hi, my case is a little simpler. For example, I have 100 collections now in my solr cloud, and I want to backup 20 of them so I can restore them later. I think I can just copy the index and log for each shard/core to another location, then delete the collections. Later, I can create new collections (likely with different names), then copy the index and log back to the right directory structure on the node. After that, I can either reload the collection or core. However, some testing shows these do not work. I could not reload the collection or core. Have not tried re-starting the solr cloud. Can someone point out the best way to achieve the goal? I prefer not to re-start solr cloud. Shushuai From: ralph tice ralph.t...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, September 17, 2014 6:53 PM Subject: Re: Loading an index (generated by map reduce) in SolrCloud FWIW, I do a lot of moving Lucene indexes around and as long as the core is unloaded it's never been an issue for Solr to be running at the same time. If you move a core into the correct hierarchy for a replica, you can call the Collections API's CREATESHARD action with the appropriate params (make sure you use createNodeSet to point to the right server) and Solr will load the index appropriately. It's easiest to create a dummy shard and see where data lands on your installation than to try to guess. Ex: PORT=8983 SHARD=myshard COLLECTION=mycollection SOLR_HOST=box1.mysolr.corp curl http:// ${SOLR_HOST}:${PORT}/solr/admin/collections?action=CREATESHARDshard=${SHARD}collection=${COLLECTION}createNodeSet=${SOLR_HOST}:${PORT}_solr One file to watch out for if you are moving cores across machines/JVMs is the core.properties file, which you don't want to duplicate to another server/location when moving a data directory. I don't recommend trying to move transaction logs around either. On Wed, Sep 17, 2014 at 5:22 PM, Erick Erickson erickerick...@gmail.com wrote: Details please. You say MapReduce. Is this the MapReduceIndexerTool? If so, you can use the --go-live option to auto-merge them. Your Solr instances need to be running over HDFS though. If you don't have Solr running over HDFS, you can just copy the results for each shard to the right place. What that means is that you must insure that the shards produced via MRIT get copied to the corresponding Solr local directory for each shard. If you put the wrong one in the wrong place you'll have trouble with multiple copies of documents showing up when you re-add any doc that already exists in your Solr installation. BTW, I'd surely stop all my Solr instances while copying all this around. Best, Erick On Wed, Sep 17, 2014 at 1:41 PM, KNitin nitin.t...@gmail.com wrote: Hello I have generated a lucene index (with 6 shards) using Map Reduce. I want to load this into a SolrCloud Cluster inside a collection. Is there any out of the box way of doing this? Any ideas are much appreciated Thanks Nitin
Re: Ping handler during initial wamup
On 9/17/2014 7:06 AM, Ere Maijala wrote: As far as I can see, when a Solr instance is started (whether standalone or SolrCloud), a PingRequestHandler will wait until index warmup is complete before returning (at least with useColdSearcher=false) which may take a while. This poses a problem in that a load balancer either needs to wait for the result or employ a short timeout for timely failover. Of course the request is eventually served, but it would be better to be able to switch over to another server until warmup is complete. So, is it possible to configure a ping handler to return quickly with non-OK status if a search handler is not yet available? This would allow the load balancer to quickly fail over to another server. I couldn't find anything like this in the docs, but I'm still hopeful. I'm aware of the possibility of using a health state file, but I'd rather have a way of doing this automatically. If it's not horribly messy to implement, returning a non-OK status immediately when there is no available searcher seems like a good idea. Please file an improvement issue in Jira. This can be handled on the load balancer end by configuring a quick timeout on load balancer health checks, and doing them very frequently. I've got haproxy in front of my solr servers. My checks happen every five seconds, with a 4990 millisecond timeout. My ping handler query (defined in solrconfig.xml) is q=*:*rows=1 ... so it's very simple and fast. Because of efficiencies in the *:* query and caching, I doubt this is putting much of a load on Solr. It would probably be acceptable to do the health checks once a second, although with typical Solr logging configs you'd end up with a LOT of log data. If you configure logging at the WARN level, this would not be a worry. Thanks, Shawn
Re: Ping handler during initial wamup
On 9/17/2014 8:07 PM, Shawn Heisey wrote: I've got haproxy in front of my solr servers. My checks happen every five seconds, with a 4990 millisecond timeout. My ping handler query (defined in solrconfig.xml) is q=*:*rows=1 ... so it's very simple and fast. Because of efficiencies in the *:* query and caching, I doubt this is putting much of a load on Solr. It would probably be acceptable to do the health checks once a second, although with typical Solr logging configs you'd end up with a LOT of log data. If you configure logging at the WARN level, this would not be a worry. At the URL below, you can see a trimmed version of my haproxy config. I've got more than I show here, but this is the part that handles my main Solr index: http://apaste.info/0vk The ncmain core is a core that has no index, with the shards parameter built into the config, so the application has no idea that it's talking to a sharded index that actually lives on two separate servers. Thanks, Shawn
Re: Loading an index (generated by map reduce) in SolrCloud
If you are updating or deleting from your indexes I don't believe it is possible to get a consistent copy of the index from the file system directly without monkeying with hard links. The safest thing is to use the ADDREPLICA command in the Collections API and then an UNLOAD from the CORE API if you want to take the data offline. If you don't care to use additional servers/JVMs, you can use the replication handler to make backup instead. This older discussion covers most any backup strategy I can think of: http://grokbase.com/t/lucene/solr-user/12c37h0g18/backing-up-solr-4-0 On Wed, Sep 17, 2014 at 9:01 PM, shushuai zhu ss...@yahoo.com.invalid wrote: Hi, my case is a little simpler. For example, I have 100 collections now in my solr cloud, and I want to backup 20 of them so I can restore them later. I think I can just copy the index and log for each shard/core to another location, then delete the collections. Later, I can create new collections (likely with different names), then copy the index and log back to the right directory structure on the node. After that, I can either reload the collection or core. However, some testing shows these do not work. I could not reload the collection or core. Have not tried re-starting the solr cloud. Can someone point out the best way to achieve the goal? I prefer not to re-start solr cloud. Shushuai From: ralph tice ralph.t...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, September 17, 2014 6:53 PM Subject: Re: Loading an index (generated by map reduce) in SolrCloud FWIW, I do a lot of moving Lucene indexes around and as long as the core is unloaded it's never been an issue for Solr to be running at the same time. If you move a core into the correct hierarchy for a replica, you can call the Collections API's CREATESHARD action with the appropriate params (make sure you use createNodeSet to point to the right server) and Solr will load the index appropriately. It's easiest to create a dummy shard and see where data lands on your installation than to try to guess. Ex: PORT=8983 SHARD=myshard COLLECTION=mycollection SOLR_HOST=box1.mysolr.corp curl http:// ${SOLR_HOST}:${PORT}/solr/admin/collections?action=CREATESHARDshard=${SHARD}collection=${COLLECTION}createNodeSet=${SOLR_HOST}:${PORT}_solr One file to watch out for if you are moving cores across machines/JVMs is the core.properties file, which you don't want to duplicate to another server/location when moving a data directory. I don't recommend trying to move transaction logs around either. On Wed, Sep 17, 2014 at 5:22 PM, Erick Erickson erickerick...@gmail.com wrote: Details please. You say MapReduce. Is this the MapReduceIndexerTool? If so, you can use the --go-live option to auto-merge them. Your Solr instances need to be running over HDFS though. If you don't have Solr running over HDFS, you can just copy the results for each shard to the right place. What that means is that you must insure that the shards produced via MRIT get copied to the corresponding Solr local directory for each shard. If you put the wrong one in the wrong place you'll have trouble with multiple copies of documents showing up when you re-add any doc that already exists in your Solr installation. BTW, I'd surely stop all my Solr instances while copying all this around. Best, Erick On Wed, Sep 17, 2014 at 1:41 PM, KNitin nitin.t...@gmail.com wrote: Hello I have generated a lucene index (with 6 shards) using Map Reduce. I want to load this into a SolrCloud Cluster inside a collection. Is there any out of the box way of doing this? Any ideas are much appreciated Thanks Nitin
SolrCloud deleted all existing indexes after update query
I'm using SOLR-hs_0.06 based on SOLR 4.10 I have SolrCloud with external ZooKeepers. I manually indexed with DIH from mySQL on each instance - we have lot of dbs, so It's one db per solr instace. All was just fine - I could search and so on. Then I sended update queries (lot of, about 1 or 100k or more) like this - 192.168.1.1:8983/solr/mycollection/update/json + DATA in POST. IP addreses where selected from pool, so there were many querires on each solr instance. This queries perfomed well, but when I tried to search (after manually commiting), I've seen only data added with update queries. All data from DIH was deleted. And data on disk was also deleted. I can still see import result on dataimport page - but there is no data in index. There are no errors in logs. I just don't know what to do with that. P.S. sory for my english, if it's bad. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-deleted-all-existing-indexes-after-update-query-tp4159566.html Sent from the Solr - User mailing list archive at Nabble.com.