Re: unable to facet range query
Hi, If you don't use QEC, just remove it from configuration file. If you need it change queryFieldType from integer to text_sw or something like that. http://wiki.apache.org/solr/QueryElevationComponent#queryFieldType The word 'promotions' is not a numeric value, thats why you are getting the exception. On Thursday, December 12, 2013 8:43 AM, Nutan nutanshinde1...@gmail.com wrote: My schema has : field name=contents type=text indexed=true stored=true multiValued=false/ field name=id type=integer indexed=true stored=true required=true multiValued=false/ this is my field which i want to facet: field name=id type=integer indexed=true stored=true required=true multiValued=false/ fieldType name=integer class=solr.IntField omitNorms=true positionIncrementGap=0/ I replaced above fieldtype with this: fieldType name=integer class=solr.TrieIntField precisionStep=0 positionIncrementGap=0/ But now this shows error in elevate component.My elevate.xml is ?xml version=1.0 encoding=UTF-8 ? elevate query text=promotions doc id=2 / doc id=7 / /query /elevate searchComponent name=elevator class=solr.QueryElevationComponent str name=queryFieldTypeinteger/str str name=config-fileelevate.xml/str /searchComponent *Logs:* Caused by: org.apache.solr.common.SolrException: Error initializing QueryElevationComponent. at org.apache.solr.handler.component.QueryElevationComponent.inform(QueryElevationComponent.java:218) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:592) at org.apache.solr.core.SolrCore.init(SolrCore.java:801) ... 13 more Caused by: org.apache.solr.common.SolrException: Invalid Number: promotions at org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:122) i read that range queries are for numeric fields,isn't IntField a numeric one? What are the other datatypes that support range queries? -- View this message in context: http://lucene.472066.n3.nabble.com/unable-to-facet-range-query-tp4106305.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr hardware memory question
On Thu, 2013-12-12 at 02:46 +0100, Joel Bernstein wrote: Curious how many documents per shard you were planning? 350-500 million, optimized to a single segment as the data are not changing. The number of documents per shard and field type will drive the amount of a RAM needed to sort and facet. Very true. It makes a lot of sense to separate RAM requirements for the Lucene/Solr structures and OS-caching. It seems that Gil is working on about the same project as we are, so I will elaborate in this thread: We would like to perform some sort of grouping on URL, so that the same page harvested at different points in time, is only displayed once. This is probably the heaviest functionality as the cardinality of the field will be near the number of documents. For plain(er) faceting, things like MIME-type, harvest date and site seems relevant. Those field have lower cardinality and they are single-valued so the memory requirements are something like #docs*log2(#unique_values) bits With 500M documents and 1000 values, that is 600MB. With 20 shards, we are looking at 12GB per simple facet field. Regards, Toke Eskildsen
Re: Constantly increasing time of full data import
One more stack trace which is active during indexing. This call task is also executed on the same single threaded executor as registering new searcher: searcherExecutor-48-thread-1 prio=10 tid=0x7f24c0715000 nid=0x3de6 runnable [0x7f24b096d000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:111) at org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:131) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:311) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1494) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1363) at org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:118) at org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:465) at org.apache.solr.search.LRUCache.warm(LRUCache.java:188) at org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:2035) at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1676) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Locked ownable synchronizers: - 0x7f2880335d38 (a java.util.concurrent.ThreadPoolExecutor$Worker) Maybe warming queries are blocking commit? But... why it increases during not so high load - 1000-2000 request per hour. And doesn't increase during very low load. Best, Michał -- View this message in context: http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4106318.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Change Velocity Template Directory in Solr 4.6
Hi Olson, You are correct v.base_dir parameter is not used at all after SOLR-4882. {corename}/conf/velocity is the only option. solr.allow.unsafe.resourceloading system property does not affect this behavior. Wiki needs update. (confluence does not mention v.base_dir parameter) Do you want to add your findings to velocity wiki page? P.S. If you don't have a wiki account anyone can create it. But to edit the wiki, your username should be added wiki contributors group. This is achieved by sending an e-mail to solr user mailing list. On Wednesday, December 11, 2013 10:49 PM, O. Olson olson_...@yahoo.it wrote: Thank you iorixxx. Yes, when I run: java -Dsolr.allow.unsafe.resourceloading=true -jar start.jar And I then load the root of my site, I get: ERROR - 2013-12-11 14:36:03.434; org.apache.solr.common.SolrException; null:java.io.IOException: Unable to find resource 'browse.vm' at org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:174) at org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:50) stacktrace truncated In the above case, in the solrconfig.xml I have set: str name=v.base_dirMyVMTemplates/str And my velocity templates are in /corename/conf/MyVMTemplates . If you look at the VelocityResponseWriter at http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_6/solr/contrib/velocity/src/java/org/apache/solr/response/VelocityResponseWriter.java?revision=1541081view=markup nowhere does it use v.base_dir. So it seems that you need to name the velocity template directory as velocity. (I tried to set it to /corename/conf/velocity and it works without any errors.) Thank you, O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Change-Velocity-Template-Directory-in-Solr-4-6-tp4105381p4106232.html Sent from the Solr - User mailing list archive at Nabble.com.
subscribe for this maillist
I want to subscribe for this solr mailing list. Thanks and Best Regards, Gabriel Zhang
Re: subscribe for this maillist
Hello! To subscribe please send a mail to solr-user-subscr...@lucene.apache.org -- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ I want to subscribe for this solr mailing list. Thanks and Best Regards, Gabriel Zhang
Re: Getting Solr Document Attributes from a Custom Function
Hi Thanks a lot , that helps Regards Mukund On Thu, Dec 12, 2013 at 1:18 AM, Kydryavtsev Andrey werde...@yandex.ruwrote: As I know (not 100% sure actually), function queries don't work with multivalued fields. Why do you need multivalued fields here? Your price and numberOfCities don't look like multivalued. At least you can try to use, you know, some tricky format like 50;40;20 to index multivalued field as single-valued and then parse this into values list in function. 11.12.2013, 11:13, Mukundaraman valakumaresan muk...@8kmiles.com: Hi Kydryavtsev Thanks a lot it works, but how do i pass a multivalued field values to a function query? Can it be passed as a String array? Thanks Regards Mukund On Tue, Dec 10, 2013 at 12:05 PM, Kydryavtsev Andrey werde...@yandex.ru wrote: You can implement it in this way: Index number of cities as new int field (like field name=numberOfCities2/field) and implement user function like customFunction(price, numberOfCities, 1, 2000, 5) Custom parser should parse this into value sources list. From first two field sources we can get per doc value for this particular fields, another three will be ConstValueSource instances - just constants, so we can access all 5 values and implement custom formula per doc id. Find examples in ValueSourceParser and solr functions like DefFunction or MinFloatFunction 10.12.2013, 09:31, Mukundaraman valakumaresan muk...@8kmiles.com: Hi Hoss, Thanks a lot for your response. The actual problem is, For every record that I query, I have to execute a formula and sort the records based on the value of the formula. The formula has elements from the record. For eg. for the following document ,I need to apply the formula (maxprice - solrprice)/ (maxprice - minprice) + count(cities)/totalcities. where maxprice, maxprice and total cities will be available at run time. So for the following record, it has to execute as (1 - *5000*)/(1-2000) + *2*/5 (where 5000 and 2, which are in bold are from the document) doc field name=idapartment_1/field field name=nameCasa Grande/field field name=localitychennai/field field name=localitybangalore/field field name=price5000/field /doc Thanks Regards Mukund On Tue, Dec 10, 2013 at 12:22 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: Smells like an XY problem ... Can you please describe what your end goal is in writing a custom function, and what you would do with things like the name field inside your funciton? In general, accessing stored field values for indexed documents ca be prohibitively expensive, it rather defeats the entire point of the inverted index data structure. If you help us understand what your goal is, people may be able to offer performant suggestions. https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 : Date: Mon, 9 Dec 2013 20:24:15 +0530 : From: Mukundaraman valakumaresan muk...@8kmiles.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Getting Solr Document Attributes from a Custom Function : : Hi All, : : I have a written a custom solr function and I would like to read a property : of the document inside my custom function. Is it possible to get that using : Solr? : : For eg. inside the floatVal method, I would like to get the value of the : attribute name : : public class CustomValueSource extends ValueSource { : : @Override : public FunctionValues getValues(Map context, : AtomicReaderContext readerContext) throws IOException { : return new FloatDocValues(this) { @Override public float floatVal(int doc) : { : /*** : getDocument(doc).getAttribute(name) : : / }}} : : Thanks Regards : Mukund : -Hoss http://www.lucidworks.com/
Re: unable to facet range query
I remove the QEC but for facet it still gives this error: Unable to range facet on field:id{type=integer,properties=indexed,stored,omitNorms,required, required=true} -- View this message in context: http://lucene.472066.n3.nabble.com/unable-to-facet-range-query-tp4106305p4106335.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cloud graph gone after manually editing clusterstate.json
Michael that only shows that the http request is a success .. the white page might be caused through a) invalid json structure -- which should be easy to check b) missing information inside the clusterstate -- therefore it would be good to know the difference between the original file and your modified one. -Stefan On Wednesday, December 11, 2013 at 5:06 PM, michael.boom wrote: I had a look, but all looks fine there too: [Wed Dec 11 2013 17:04:41 GMT+0100 (CET)] runRoute get #/~cloud GET tpl/cloud.html?_=1386777881244 200 OK 57ms GET /solr/zookeeper?wt=json_=1386777881308 200 OK 509ms GET /solr/zookeeper?wt=jsonpath=%2Flive_nodes_=1386777881822 200 OK 62ms GET /solr/zookeeper?wt=jsondetail=truepath=%2Fclusterstate.json_=1386777881886 200 OK 84ms - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Cloud-graph-gone-after-manually-editing-clusterstate-json-tp4106142p4106172.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
RE: Solr hardware memory question
Thanks for this - I haven't any previous experience with utilising SSDs in the way you suggest, so I guess I need to start learning! And thanks for the Danish-webscale URL, looks like very informed reading. (Yes, I think we're working in similar industries with similar constraints and expectations). Compiliing my answers into one email, Curious how many documents per shard you were planning? The number of documents per shard and field type will drive the amount of a RAM needed to sort and facet. - Number of documents per shard, I think about 200 million. That's a bit of a rough estimate based on other Solrs we run though. Which I think means we hold a lot of data for each document, though I keep arguing to keep this to the truly required minimum. We also have many facets, some of which are pretty large (I'm stretching my understanding here but I think most documents have many 'entries' in many facets so these really hit us performance-wise.) I try to keep a 1-to-1 ratio of Solr nodes to CPUs with a few spare for the operating system. I utilise MMapDirectory to manage memory via the OS. So at this moment I guessing that we'll have 56 Solr dedicated CPUs across 2 physical 32 CPU servers and _hopefully_ 256GB RAM on each. This would give 28 shards and each would have 5GB java memory (in Tomcat), leaving 126GB on each server for the OS and MMap. (I believe the Solr theory for this doesn't accurately work out but we can accept the edge cases where this will fail.) I can also see that our hardware requirements will also depend on usage as well as the volume of data, and I've been pondering how best we can structure our index/es to facilitate a long term service (which means that, given it's a lot of data, I need to structure the data so that new usage doesn't require re-indexing.) But at this early stage, as people say, we need to prototype, test, profile etc. and to do that I need the hardware to run the trials (policy dictates that I buy the production hardware now, before profiling - I get to control much of the design and construction so I don't argue with this!) Thanks for all the comments everyone, all very much appreciated :) Gil -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: 11 December 2013 12:02 To: solr-user@lucene.apache.org Subject: Re: Solr hardware memory question On Tue, 2013-12-10 at 17:51 +0100, Hoggarth, Gil wrote: We're probably going to be building a Solr service to handle a dataset of ~60TB, which for our data and schema typically gives a Solr index size of 1/10th - i.e., 6TB. Given there's a general rule about the amount of hardware memory required should exceed the size of the Solr index (exceed to also allow for the operating system etc.), how have people handled this situation? By acknowledging that it is cheaper to buy SSDs instead of trying to compensate for slow spinning drives with excessive amounts of RAM. Our plans for an estimated 20TB of indexes out of 372TB of raw web data is to use SSDs controlled by a single machine with 512GB of RAM (or was it 256GB? I'll have to ask the hardware guys): https://sbdevel.wordpress.com/2013/12/06/danish-webscale/ As always YMMW and the numbers you quite elsewhere indicates that your queries are quite complex. You might want to be a bit of profiling to see if they are heavy enough to make the CPU the bottleneck. Regards, Toke Eskildsen, State and University Library, Denmark
Re: Equivalent of SQL JOIN in SOLR across multiple cores
I had gone through link - http://wiki.apache.org/solr/Join and it says there is a Limitation in JOIN, you will be able to get resulting documents containing fields in either of two I have used below query http://localhost:8983/solr/coreTO/select?q={!join from=docId to=id fromIndex=coreFROM}query I want consolidation of fields from multiple cores and there are two fields in common across all cores. I have data stored in normalized form across 3 cores on same JVM. Want to merge and select multiple fields depending on WHERE clause/common fields in each core. Any help would be appreciated! -- View this message in context: http://lucene.472066.n3.nabble.com/Equivalent-of-SQL-JOIN-in-SOLR-across-multiple-cores-tp4106152p4106344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cloud graph gone after manually editing clusterstate.json
Hi guys, thanks for the replies! The json was valid, i validated it and the only diff between the fiels was my edit. But actually, it got fixed by itself - when i got to work today, everything was working as it should. Maybe it was something on my machine or browser, can't put a finger on the problem. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Cloud-graph-gone-after-manually-editing-clusterstate-json-tp4106142p4106350.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud and MoreLikeThis: SOLR-788
Hi; SOLR-4414 has no patch. It's related issue has patches bu it seems fixed since Solr 4.1 Thanks; Furkan KAMACI 2013/12/12 gf80 giuseppe_fe...@hotmail.com Hi guys, could you kindly help me to apply patch for MoreLikeThis on solrcloud. I'm using Solr 4.6 and I'm using solrcloud with 10 shards. The problem is described here https://issues.apache.org/jira/browse/SOLR-4414 but I think that it was solved but not already delivered in Solr4.6. Thanks a lot in advance, Giuseppe P.S. Rakudten: Did you figure out the problem applying patch? Tx -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106258.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrCloud and MoreLikeThis: SOLR-788
Hi, thanks for the answer, I think that you mean the issue SOLR-788, isn't it? if yes, I think that it's solved as you say, but I see Fix Version/s: 4.1, 5.0, so is it possible that it's not already delivered in Solr4.6? However I think that solving related issue my problem is not solved :(. I am just trying to find a warkaround, for example is there any way to ask solrcloud about where is document with this id?, if yes I can try to customize morelikethis to do this question before to ask for MLT on a owner shard. Obviously, I am assuming that after selecting the right shard the MLT answer include documents in other shards. Let me know if you have any suggestion, thanks in advance, -giuseppe Date: Thu, 12 Dec 2013 03:51:04 -0800 From: ml-node+s472066n4106355...@n3.nabble.com To: giuseppe_fe...@hotmail.com Subject: Re: SolrCloud and MoreLikeThis: SOLR-788 Hi; SOLR-4414 has no patch. It's related issue has patches bu it seems fixed since Solr 4.1 Thanks; Furkan KAMACI 2013/12/12 gf80 [hidden email] Hi guys, could you kindly help me to apply patch for MoreLikeThis on solrcloud. I'm using Solr 4.6 and I'm using solrcloud with 10 shards. The problem is described here https://issues.apache.org/jira/browse/SOLR-4414 but I think that it was solved but not already delivered in Solr4.6. Thanks a lot in advance, Giuseppe P.S. Rakudten: Did you figure out the problem applying patch? Tx -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106258.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106355.html To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106364.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr hardware memory question
On Thu, 2013-12-12 at 11:10 +0100, Hoggarth, Gil wrote: Thanks for this - I haven't any previous experience with utilising SSDs in the way you suggest, so I guess I need to start learning! There's a bit of divide in the Lucene/Solr-world on this. Everybody agrees that SSDs in themselves are great for Lucene/Solr searches, compared to a spinning drives solution. How much better is another matter and the issue gets confusing when RAM caching is factored in. Some are also very concerned about the reliability of SSDs and the write performance degradation without TRIM (you need to have a quite specific setup to have TRIM enabled on a server with SSDs in RAID). Guessing that your 6TB index is not heavily updated, the TRIM part should not be one of your worries though. At Statsbiblioteket, we have been using SSDs for our search servers since 2008. That was back when random write performance was horrible and a large drive was 64GB. As you have probably guessed, we are very much in the SSD camp. We have done some testing and for simple searches (i.e. a lot of IO and comparatively little CPU usage), we have observed that SSDs + 10% index size RAM for caching deliver something like 80% of pure RAM speed. https://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/ Your mileage will surely vary. [...] leaving 126GB on each server for the OS and MMap. [...] So about the same as your existing 3TB setup? Seems like you will get the same performance then. I must say that 1 minute response times would be very hard to sell at our library, even for a special search only used by a small and dedicated audience. Even your goal of 20 seconds seems adverse to exploratory search. May I be so frank as to suggest a course of action? Buy one ½ TB Samsung 840 EVO SSD, fill it with indexes and test it in a machine with 32GB of RAM, thus matching the 1/20 index size RAM that your servers will have. Such a drive costs £250 on Amazon and the experiment would spare you for a lot of speculation and time. Next, conclude that SSDs are the obvious choice and secure the 840 for your workstation with reference to further testing. I can also see that our hardware requirements will also depend on usage as well as the volume of data, and I've been pondering how best we can structure our index/es to facilitate a long term service (which means that, given it's a lot of data, I need to structure the data so that new usage doesn't require re-indexing.) We definitely have this problem too. We have resigned to re-indexing the data after some months of real world usage. Regards, Toke Eskildsen, State and University Library, Denmark
Sudden Solr crush after commit
In the last days one of my tomcat servlet, running only a Solr instance, crushed unexpectedly twice. Low memory usage, nothing written in the tomcat log, and the last thing happening in solr log is 'end_commit_flush' followed by 'UnInverted mutli-valued field' for the fields faceted during the newsearcher run. Right after this, the tomcat crushed leaving no trace. Has anyone experienced a similar issue before? Thanks, Manu
Re: SolrCloud and MoreLikeThis: SOLR-788
Hi; Yes, I am talking about SOLR-788. There writes 4.1 so it means that it has fixed at 4.1. On the other hand some patches are applied both for ongoing versions and trunk. 5.0 is the trunk version of Solr. For your other question: what do you mean with: where is document with this id?. If you want to learn the shard that document belongs to you can do that: http://localhost:8983/solr/collection1/select?q=*%3A*fl=url%2C+%5Bshard%5Dwt=jsonindent=true Thanks; Furkan KAMACI 2013/12/12 gf80 giuseppe_fe...@hotmail.com Hi, thanks for the answer, I think that you mean the issue SOLR-788, isn't it? if yes, I think that it's solved as you say, but I see Fix Version/s: 4.1, 5.0, so is it possible that it's not already delivered in Solr4.6? However I think that solving related issue my problem is not solved :(. I am just trying to find a warkaround, for example is there any way to ask solrcloud about where is document with this id?, if yes I can try to customize morelikethis to do this question before to ask for MLT on a owner shard. Obviously, I am assuming that after selecting the right shard the MLT answer include documents in other shards. Let me know if you have any suggestion, thanks in advance, -giuseppe Date: Thu, 12 Dec 2013 03:51:04 -0800 From: ml-node+s472066n4106355...@n3.nabble.com To: giuseppe_fe...@hotmail.com Subject: Re: SolrCloud and MoreLikeThis: SOLR-788 Hi; SOLR-4414 has no patch. It's related issue has patches bu it seems fixed since Solr 4.1 Thanks; Furkan KAMACI 2013/12/12 gf80 [hidden email] Hi guys, could you kindly help me to apply patch for MoreLikeThis on solrcloud. I'm using Solr 4.6 and I'm using solrcloud with 10 shards. The problem is described here https://issues.apache.org/jira/browse/SOLR-4414 but I think that it was solved but not already delivered in Solr4.6. Thanks a lot in advance, Giuseppe P.S. Rakudten: Did you figure out the problem applying patch? Tx -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106258.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106355.html To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106364.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [Solr Wiki] Your wiki account data
Sorry, your first email hit the spam box ! -- View this message in context: http://lucene.472066.n3.nabble.com/Fwd-Solr-Wiki-Your-wiki-account-data-tp4104901p4106383.html Sent from the Solr - User mailing list archive at Nabble.com.
Metrics in monitoring SolrCloud
Hi, I'm trying to add SolrCloud to out internal monitoring tools and I wonder if anybody else proceeded in this direction and could maybe provide some tips. I would want to be able to get from SolrCloud: 1. The status for each collection - meaning can it serve queries or not. 2. Average query time per collection 3. Nr of requests per second/min for each collection Would i need to implement some solr plugins for this, or is the information already existing? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Metrics-in-monitoring-SolrCloud-tp4106384.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrCloud and MoreLikeThis: SOLR-788
great, thanks very much for your kind support, with this query I can perform a sort of workaround to SOLR-4414 issue, what do you think about? I am wrong? Anyway, I am new with solr and solrcloud too, but I am having a full immersion with it from few days to index very large mole of documents. So, any hint is appreciate, for instance I don't know if the choice to use 10 shards on the same server (30giga RAM) is good and how much number of shards is impact on indexing time. Tx a lot, -giuseppe Date: Thu, 12 Dec 2013 06:38:09 -0800 From: ml-node+s472066n4106382...@n3.nabble.com To: giuseppe_fe...@hotmail.com Subject: Re: SolrCloud and MoreLikeThis: SOLR-788 Hi; Yes, I am talking about SOLR-788. There writes 4.1 so it means that it has fixed at 4.1. On the other hand some patches are applied both for ongoing versions and trunk. 5.0 is the trunk version of Solr. For your other question: what do you mean with: where is document with this id?. If you want to learn the shard that document belongs to you can do that: http://localhost:8983/solr/collection1/select?q=*%3A*fl=url%2C+%5Bshard%5Dwt=jsonindent=true Thanks; Furkan KAMACI 2013/12/12 gf80 [hidden email] Hi, thanks for the answer, I think that you mean the issue SOLR-788, isn't it? if yes, I think that it's solved as you say, but I see Fix Version/s: 4.1, 5.0, so is it possible that it's not already delivered in Solr4.6? However I think that solving related issue my problem is not solved :(. I am just trying to find a warkaround, for example is there any way to ask solrcloud about where is document with this id?, if yes I can try to customize morelikethis to do this question before to ask for MLT on a owner shard. Obviously, I am assuming that after selecting the right shard the MLT answer include documents in other shards. Let me know if you have any suggestion, thanks in advance, -giuseppe Date: Thu, 12 Dec 2013 03:51:04 -0800 From: [hidden email] To: [hidden email] Subject: Re: SolrCloud and MoreLikeThis: SOLR-788 Hi; SOLR-4414 has no patch. It's related issue has patches bu it seems fixed since Solr 4.1 Thanks; Furkan KAMACI 2013/12/12 gf80 [hidden email] Hi guys, could you kindly help me to apply patch for MoreLikeThis on solrcloud. I'm using Solr 4.6 and I'm using solrcloud with 10 shards. The problem is described here https://issues.apache.org/jira/browse/SOLR-4414 but I think that it was solved but not already delivered in Solr4.6. Thanks a lot in advance, Giuseppe P.S. Rakudten: Did you figure out the problem applying patch? Tx -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106258.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106355.html To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106364.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106382.html To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106385.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configurable collectors for custom ranking
Regarding my original goal, which is to perform a math function using the scaled score and a field value, and sort on the result, how does this fit in? Must I implement another custom PostFilter with a higher cost than the scale PostFilter? Thanks, Peter On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan peterlkee...@gmail.comwrote: Thanks very much for the guidance. I'd be happy to donate a working solution. Peter On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein joels...@gmail.comwrote: SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher I believe. They might apply to 4.3. I think as long you have the finish method that's all you'll need. If you can get this working it would be excellent if you could donate back the Scale PostFilter. On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan peterlkee...@gmail.com wrote: This is what I was looking for, but the DelegatingCollector 'finish' method doesn't exist in 4.3.0 :( Can this be patched in and are there any other PostFilter dependencies on 4.5? Thanks, Peter On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein joels...@gmail.com wrote: Here is one approach to use in a postfilter 1) In the collect() method call score for each doc. Use the scores to create your scaleInfo. 2) Keep a bitset of the hits and a priorityQueue of your top X ScoreDocs. 3) Don't delegate any documents to lower collectors in the collect() method. 4) In the finish method create a score mapping (use the hppc IntFloatOpenHashMap) with your top X docIds pointing to their score, using the priorityQueue created in step 2. Then iterate the bitset (also created in step 2) sending down each doc to the lower collectors, retrieving and scaling the score from the score map. If the document is not in the score map then send down 0. You'll have setup a dummy scorer to feed to lower collectors. The CollapsingQParserPlugin has an example of how to do this. On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan peterlkee...@gmail.com wrote: Hi Joel, I thought about using a PostFilter, but the problem is that the 'scale' function must be done after all matching docs have been scored but before adding them to the PriorityQueue that sorts just the rows to be returned. Doing the 'scale' function wrapped in a 'query' is proving to be too slow when it visits every document in the index. In the Collector, I can see how to get the field values like this: indexSearcher.getSchema().getField(field(myfield).getType().getValueSource(SchemaField, QParser).getValues() But, 'getValueSource' needs a QParser, which isn't available. And I can't create a QParser without a SolrQueryRequest, which isn't available. Thanks, Peter On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein joels...@gmail.com wrote: Peter, It sounds like you could achieve what you want to do in a PostFilter rather then extending the TopDocsCollector. Is there a reason why a PostFilter won't work for you? Joel On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan peterlkee...@gmail.com wrote: Quick question: In the context of a custom collector, how does one get the values of a field of type 'ExternalFileField'? Thanks, Peter On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan peterlkee...@gmail.com wrote: Hi Joel, This is related to another thread on function query matching ( http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513 ). The patch in SOLR-4465 will allow me to extend TopDocsCollector and perform the 'scale' function on only the documents matching the main dismax query. As you mention, it is a slightly intrusive design and requires that I manage my own PriorityQueue (and a local duplicate of HitQueue), but should work. I think a better design would hide the PQ from the plugin. Thanks, Peter On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein joels...@gmail.com wrote: Hi Peter, I've been meaning to revisit configurable ranking collectors, but I haven't yet had a chance. It's on the shortlist of things I'd like to tackle though. On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan peterlkee...@gmail.com wrote: I looked at SOLR-4465 and SOLR-5045, where it appears that there is a goal to be able to do custom sorting and ranking in a PostFilter. So far, it looks like only custom aggregation can be implemented in PostFilter (5045).
Re: SolrCloud and MoreLikeThis: SOLR-788
Hi; Yes, you can it with that way. On the other hand you can start a new thread about your second question. I can help you to decide the shard size and other parameters. However you should know that it depends on your system and your needs. Thanks; Furkan KAMACI 2013/12/12 gf80 giuseppe_fe...@hotmail.com great, thanks very much for your kind support, with this query I can perform a sort of workaround to SOLR-4414 issue, what do you think about? I am wrong? Anyway, I am new with solr and solrcloud too, but I am having a full immersion with it from few days to index very large mole of documents. So, any hint is appreciate, for instance I don't know if the choice to use 10 shards on the same server (30giga RAM) is good and how much number of shards is impact on indexing time. Tx a lot, -giuseppe Date: Thu, 12 Dec 2013 06:38:09 -0800 From: ml-node+s472066n4106382...@n3.nabble.com To: giuseppe_fe...@hotmail.com Subject: Re: SolrCloud and MoreLikeThis: SOLR-788 Hi; Yes, I am talking about SOLR-788. There writes 4.1 so it means that it has fixed at 4.1. On the other hand some patches are applied both for ongoing versions and trunk. 5.0 is the trunk version of Solr. For your other question: what do you mean with: where is document with this id?. If you want to learn the shard that document belongs to you can do that: http://localhost:8983/solr/collection1/select?q=*%3A*fl=url%2C+%5Bshard%5Dwt=jsonindent=true Thanks; Furkan KAMACI 2013/12/12 gf80 [hidden email] Hi, thanks for the answer, I think that you mean the issue SOLR-788, isn't it? if yes, I think that it's solved as you say, but I see Fix Version/s: 4.1, 5.0, so is it possible that it's not already delivered in Solr4.6? However I think that solving related issue my problem is not solved :(. I am just trying to find a warkaround, for example is there any way to ask solrcloud about where is document with this id?, if yes I can try to customize morelikethis to do this question before to ask for MLT on a owner shard. Obviously, I am assuming that after selecting the right shard the MLT answer include documents in other shards. Let me know if you have any suggestion, thanks in advance, -giuseppe Date: Thu, 12 Dec 2013 03:51:04 -0800 From: [hidden email] To: [hidden email] Subject: Re: SolrCloud and MoreLikeThis: SOLR-788 Hi; SOLR-4414 has no patch. It's related issue has patches bu it seems fixed since Solr 4.1 Thanks; Furkan KAMACI 2013/12/12 gf80 [hidden email] Hi guys, could you kindly help me to apply patch for MoreLikeThis on solrcloud. I'm using Solr 4.6 and I'm using solrcloud with 10 shards. The problem is described here https://issues.apache.org/jira/browse/SOLR-4414 but I think that it was solved but not already delivered in Solr4.6. Thanks a lot in advance, Giuseppe P.S. Rakudten: Did you figure out the problem applying patch? Tx -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106258.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106355.html To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106364.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106382.html To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106385.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sudden Solr crush after commit
Hi Manuel; Mutivalued field via un-inverted field is introduced at Solr 1.4 and you can check it from here: https://issues.apache.org/jira/browse/SOLR-475 Could you give more details about your system (i.e. Solr version) and other parameters? Thanks; Furkan KAMACI 2013/12/12 Manuel Le Normand manuel.lenorm...@gmail.com In the last days one of my tomcat servlet, running only a Solr instance, crushed unexpectedly twice. Low memory usage, nothing written in the tomcat log, and the last thing happening in solr log is 'end_commit_flush' followed by 'UnInverted mutli-valued field' for the fields faceted during the newsearcher run. Right after this, the tomcat crushed leaving no trace. Has anyone experienced a similar issue before? Thanks, Manu
Re: unable to facet range query
Hi, Did you reindex after schema change? On Thursday, December 12, 2013 11:51 AM, Nutan nutanshinde1...@gmail.com wrote: I remove the QEC but for facet it still gives this error: Unable to range facet on field:id{type=integer,properties=indexed,stored,omitNorms,required, required=true} -- View this message in context: http://lucene.472066.n3.nabble.com/unable-to-facet-range-query-tp4106305p4106335.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Metrics in monitoring SolrCloud
Hi Michael, Not sure about collection, solr exposes 2) and 3) via : http://wiki.apache.org/solr/SolrJmx https://cwiki.apache.org/confluence/display/solr/MBean+Request+Handler On Thursday, December 12, 2013 4:47 PM, michael.boom my_sky...@yahoo.com wrote: Hi, I'm trying to add SolrCloud to out internal monitoring tools and I wonder if anybody else proceeded in this direction and could maybe provide some tips. I would want to be able to get from SolrCloud: 1. The status for each collection - meaning can it serve queries or not. 2. Average query time per collection 3. Nr of requests per second/min for each collection Would i need to implement some solr plugins for this, or is the information already existing? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Metrics-in-monitoring-SolrCloud-tp4106384.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CollapsingQParserPlugin scores incorrectly in Solr 4.6.0 when multiple sort criteria are used
Hi, This is a known issue resolved in SOLR-5408. It's fixed in trunk and 4x and if there is a 4.6.1 it will be in there. If not it will be Solr 4.7. https://issues.apache.org/jira/browse/SOLR-5408 Joel On Wed, Dec 11, 2013 at 11:36 PM, Umesh Prasad umesh.i...@gmail.com wrote: Issue occurs in Single Segment index also .. sort: score desc,floSalesRank asc response: { - numFound: 21461, - start: 0, - maxScore: 4.4415073, - docs: [ - { - floSalesRank: 0, - score: 0.123750895, - [docid]: 9208 - On Thu, Dec 12, 2013 at 9:50 AM, Umesh Prasad umesh.i...@gmail.com wrote: Hi All, I am using new CollapsingQParserPlugin for Grouping and found that it works incorrectly when I use multiple sort criteria. http://localhost:8080/solr/toys/select/?q=car%20and%20toysversion=2.2start=0rows=10indent=onsort=score%20desc,floSalesRank%20ascfacet=onfacet.field=store_pathfacet.mincount=1bq=store_path:%22mgl/ksc/gcv%22 ^10wt=jsonfl=score,floSalesRank,[docid]bq=id:STFDCHZM3552AHXE^1000fq={!collapse%20field=item_id} - sort: score desc,floSalesRank asc, - fl: score,floSalesRank,[docid], - start: 0, - q: car and toys, - facet.field: store_path, - fq: {!collapse field=item_id} response: { - numFound: 21461, - start: 0, - maxScore: 4.447499, - docs: [ - { - floSalesRank: 0, - score: 0.12396862, - [docid]: 9703 }, - { - I found a bug opened for same https://issues.apache.org/jira/browse/SOLR-5408 .. The bug is closed but I am not really sure that it works specially for Multiple segment parts .. I am using Solr 4.6.0 and my index contains 4 segments .. Have anyone else faced the same issue ? --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad -- Joel Bernstein Search Engineer at Heliosearch
Re: Configurable collectors for custom ranking
The sorting is going to happen in the lower level collectors. You need a value source that returns the score of the document being collected. Here is how you can make this happen: 1) Create an object in your PostFilter that simply holds the current score. Place this object in the SearchRequest context map. Update object.score as you pass the docs and scores to the lower collectors. 2) Create a values source that checks the SearchRequest context for the object that's holding the current score. Use this object to return the current score when called. For example if you give the value source a handle called score a compound function call will look like this: sum(score(), field(x)) Joel On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan peterlkee...@gmail.comwrote: Regarding my original goal, which is to perform a math function using the scaled score and a field value, and sort on the result, how does this fit in? Must I implement another custom PostFilter with a higher cost than the scale PostFilter? Thanks, Peter On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan peterlkee...@gmail.com wrote: Thanks very much for the guidance. I'd be happy to donate a working solution. Peter On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein joels...@gmail.com wrote: SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher I believe. They might apply to 4.3. I think as long you have the finish method that's all you'll need. If you can get this working it would be excellent if you could donate back the Scale PostFilter. On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan peterlkee...@gmail.com wrote: This is what I was looking for, but the DelegatingCollector 'finish' method doesn't exist in 4.3.0 :( Can this be patched in and are there any other PostFilter dependencies on 4.5? Thanks, Peter On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein joels...@gmail.com wrote: Here is one approach to use in a postfilter 1) In the collect() method call score for each doc. Use the scores to create your scaleInfo. 2) Keep a bitset of the hits and a priorityQueue of your top X ScoreDocs. 3) Don't delegate any documents to lower collectors in the collect() method. 4) In the finish method create a score mapping (use the hppc IntFloatOpenHashMap) with your top X docIds pointing to their score, using the priorityQueue created in step 2. Then iterate the bitset (also created in step 2) sending down each doc to the lower collectors, retrieving and scaling the score from the score map. If the document is not in the score map then send down 0. You'll have setup a dummy scorer to feed to lower collectors. The CollapsingQParserPlugin has an example of how to do this. On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan peterlkee...@gmail.com wrote: Hi Joel, I thought about using a PostFilter, but the problem is that the 'scale' function must be done after all matching docs have been scored but before adding them to the PriorityQueue that sorts just the rows to be returned. Doing the 'scale' function wrapped in a 'query' is proving to be too slow when it visits every document in the index. In the Collector, I can see how to get the field values like this: indexSearcher.getSchema().getField(field(myfield).getType().getValueSource(SchemaField, QParser).getValues() But, 'getValueSource' needs a QParser, which isn't available. And I can't create a QParser without a SolrQueryRequest, which isn't available. Thanks, Peter On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein joels...@gmail.com wrote: Peter, It sounds like you could achieve what you want to do in a PostFilter rather then extending the TopDocsCollector. Is there a reason why a PostFilter won't work for you? Joel On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan peterlkee...@gmail.com wrote: Quick question: In the context of a custom collector, how does one get the values of a field of type 'ExternalFileField'? Thanks, Peter On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan peterlkee...@gmail.com wrote: Hi Joel, This is related to another thread on function query matching ( http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513 ). The patch in SOLR-4465 will allow me to extend TopDocsCollector and perform the 'scale' function on only the documents matching the main dismax query. As you mention, it is a slightly intrusive design and requires that I
Re: Solr hardware memory question
Hello, Gil, I'm wondering if you've been in touch with the Hathi Trust people, because I imagine your use cases are somewhat similar. They've done some blogging around getting digitized texts indexed at scale, which I what I assume you're doing: http://www.hathitrust.org/blogs/Large-scale-Search Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Thu, Dec 12, 2013 at 5:10 AM, Hoggarth, Gil gil.hogga...@bl.uk wrote: Thanks for this - I haven't any previous experience with utilising SSDs in the way you suggest, so I guess I need to start learning! And thanks for the Danish-webscale URL, looks like very informed reading. (Yes, I think we're working in similar industries with similar constraints and expectations). Compiliing my answers into one email, Curious how many documents per shard you were planning? The number of documents per shard and field type will drive the amount of a RAM needed to sort and facet. - Number of documents per shard, I think about 200 million. That's a bit of a rough estimate based on other Solrs we run though. Which I think means we hold a lot of data for each document, though I keep arguing to keep this to the truly required minimum. We also have many facets, some of which are pretty large (I'm stretching my understanding here but I think most documents have many 'entries' in many facets so these really hit us performance-wise.) I try to keep a 1-to-1 ratio of Solr nodes to CPUs with a few spare for the operating system. I utilise MMapDirectory to manage memory via the OS. So at this moment I guessing that we'll have 56 Solr dedicated CPUs across 2 physical 32 CPU servers and _hopefully_ 256GB RAM on each. This would give 28 shards and each would have 5GB java memory (in Tomcat), leaving 126GB on each server for the OS and MMap. (I believe the Solr theory for this doesn't accurately work out but we can accept the edge cases where this will fail.) I can also see that our hardware requirements will also depend on usage as well as the volume of data, and I've been pondering how best we can structure our index/es to facilitate a long term service (which means that, given it's a lot of data, I need to structure the data so that new usage doesn't require re-indexing.) But at this early stage, as people say, we need to prototype, test, profile etc. and to do that I need the hardware to run the trials (policy dictates that I buy the production hardware now, before profiling - I get to control much of the design and construction so I don't argue with this!) Thanks for all the comments everyone, all very much appreciated :) Gil -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: 11 December 2013 12:02 To: solr-user@lucene.apache.org Subject: Re: Solr hardware memory question On Tue, 2013-12-10 at 17:51 +0100, Hoggarth, Gil wrote: We're probably going to be building a Solr service to handle a dataset of ~60TB, which for our data and schema typically gives a Solr index size of 1/10th - i.e., 6TB. Given there's a general rule about the amount of hardware memory required should exceed the size of the Solr index (exceed to also allow for the operating system etc.), how have people handled this situation? By acknowledging that it is cheaper to buy SSDs instead of trying to compensate for slow spinning drives with excessive amounts of RAM. Our plans for an estimated 20TB of indexes out of 372TB of raw web data is to use SSDs controlled by a single machine with 512GB of RAM (or was it 256GB? I'll have to ask the hardware guys): https://sbdevel.wordpress.com/2013/12/06/danish-webscale/ As always YMMW and the numbers you quite elsewhere indicates that your queries are quite complex. You might want to be a bit of profiling to see if they are heavy enough to make the CPU the bottleneck. Regards, Toke Eskildsen, State and University Library, Denmark
Re: Solr Profiler
I've used VisualVM quite a bit, but not sure that it's going to top any of the other products mentioned in this thread. It's free, though, so there's that! Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Thu, Dec 12, 2013 at 12:39 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Are you looking for a Java profiler? Or a Solr monitoring tool? For a profiler I'd recommend YourKit -- http://www.yourkit.com/ For Solr monitoring I'd recommend our SPM -- http://sematext.com/spm/solr-performance-monitoring/index.html Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Dec 11, 2013 at 3:46 PM, Monica Skidmore monica.skidm...@careerbuilder.com wrote: We're trying to improve the speed of some custom Solr code we've written, and we'd like to use a profiler to help us focus our efforts. However, we've tried both JProfiler and NewRelic, and we've found it challenging to configure them correctly to be able to tell where our bottlenecks really are. What profilers/configurations have people successfully used for Solr? Monica Skidmore Engineering Lead, Core Search CareerBuilder.com
RE: Load existing HDFS files into solr?
Hi Chen, I'm not aware of any direct integration between the two at this time. You might ping the Hive user list with this question too. That said, I've been thinking whether it makes sense to build a Hive StorageHandler for Solr? That at least seems like a quick way to go. However, it might also be possible to just plug a Hive InputFormat into Mark's MapReduce/Solr stuff? See: https://github.com/markrmiller/solr-map-reduce-example Cheers, Timothy Potter www.lucidworks.com From: cynosure cynosure...@gmail.com Sent: Thursday, December 12, 2013 12:11 AM To: solr-user@lucene.apache.org Subject: Load existing HDFS files into solr? Folks, Our current data is stored in hive tables. Is there a way to specify solr to index the existing hdfs files directly? or I have to import each hive table to solr? Can any one point to me some reference? Thank you very much! Chen
solr OOM Crash
Helllo, We are experiencing unexplained OOM crashes. We have already seen it a few times, over our different solr instances. The crash happens only at a single shard of the collection. Environment details: 1. Solr 4.3, running on tomcat. 2. 24 Shards. 3. Indexing rate of ~800 docs per minute. Solrconfig.xml: 1. Merge factor 4 2. Sofrcommit every 10 min 3. Hardcommit every 30 min Main findings: 1. Solr logs: No query failures prior to the OOM, but DOUBLE the amount of soft and hard commits in comparison to other shards. 2. Analyzing the dump (VisualVM): Class byte[] takes 4gb out of 5gb resourced to the JVM, mainly referenced by CompressingStoredFieldsReader GC root (which by looking at the code, we suspect they were created due to CompressingSortedFieldsWriter.merge). Sub findings: 1. GC logs: Showed 108 GC fails prior to the crash. 2. CPI: Overall usage seems fine, but the % of CPU time for the GC stays high 6 min before the OOM. 3. Memory: Half an hour before OOM the usage slowly rises, until it gets to 5.4gb. Has anyone encountered higher than normal commit rate that seem to increase merge rate and cause what I described?
Re: Configurable collectors for custom ranking
This is pretty cool, and worthy of adding to Solr in Action (v2) and the other books. With function queries, flexible filter processing and caching, custom collectors, and post filters, there's a lot of flexibility here. Btw, the query times using a custom collector to scale/recompute scores is excellent (will have to see how it compares to your outlined solution). Thanks, Peter On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein joels...@gmail.com wrote: The sorting is going to happen in the lower level collectors. You need a value source that returns the score of the document being collected. Here is how you can make this happen: 1) Create an object in your PostFilter that simply holds the current score. Place this object in the SearchRequest context map. Update object.score as you pass the docs and scores to the lower collectors. 2) Create a values source that checks the SearchRequest context for the object that's holding the current score. Use this object to return the current score when called. For example if you give the value source a handle called score a compound function call will look like this: sum(score(), field(x)) Joel On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan peterlkee...@gmail.com wrote: Regarding my original goal, which is to perform a math function using the scaled score and a field value, and sort on the result, how does this fit in? Must I implement another custom PostFilter with a higher cost than the scale PostFilter? Thanks, Peter On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan peterlkee...@gmail.com wrote: Thanks very much for the guidance. I'd be happy to donate a working solution. Peter On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein joels...@gmail.com wrote: SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher I believe. They might apply to 4.3. I think as long you have the finish method that's all you'll need. If you can get this working it would be excellent if you could donate back the Scale PostFilter. On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan peterlkee...@gmail.com wrote: This is what I was looking for, but the DelegatingCollector 'finish' method doesn't exist in 4.3.0 :( Can this be patched in and are there any other PostFilter dependencies on 4.5? Thanks, Peter On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein joels...@gmail.com wrote: Here is one approach to use in a postfilter 1) In the collect() method call score for each doc. Use the scores to create your scaleInfo. 2) Keep a bitset of the hits and a priorityQueue of your top X ScoreDocs. 3) Don't delegate any documents to lower collectors in the collect() method. 4) In the finish method create a score mapping (use the hppc IntFloatOpenHashMap) with your top X docIds pointing to their score, using the priorityQueue created in step 2. Then iterate the bitset (also created in step 2) sending down each doc to the lower collectors, retrieving and scaling the score from the score map. If the document is not in the score map then send down 0. You'll have setup a dummy scorer to feed to lower collectors. The CollapsingQParserPlugin has an example of how to do this. On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan peterlkee...@gmail.com wrote: Hi Joel, I thought about using a PostFilter, but the problem is that the 'scale' function must be done after all matching docs have been scored but before adding them to the PriorityQueue that sorts just the rows to be returned. Doing the 'scale' function wrapped in a 'query' is proving to be too slow when it visits every document in the index. In the Collector, I can see how to get the field values like this: indexSearcher.getSchema().getField(field(myfield).getType().getValueSource(SchemaField, QParser).getValues() But, 'getValueSource' needs a QParser, which isn't available. And I can't create a QParser without a SolrQueryRequest, which isn't available. Thanks, Peter On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein joels...@gmail.com wrote: Peter, It sounds like you could achieve what you want to do in a PostFilter rather then extending the TopDocsCollector. Is there a reason why a PostFilter won't work for you? Joel On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan peterlkee...@gmail.com wrote: Quick question: In the context of a custom collector, how does one get the values of a field of type 'ExternalFileField'? Thanks,
Re: Configurable collectors for custom ranking
Thanks, I agree this powerful stuff. One of the reasons that I haven't gotten back to pluggable collectors is that I've been using PostFilters instead. When you start doing stuff with scores in postfilters you'll run into the bug in SOLR-5416. This will effect you when you use facets in combination with the QueryResultCache or tag and exclude faceting. The patch in SOLR-5416 resolves this issue. You'll just need your PostFilter to implement ScoreFilter and the SolrIndexSearcher will know how to handle things. The DelegatingCollector.finish() method is so new, these kinds of bugs are still being cleaned out of the system. SOLR-5416 should be in Solr 4.7. On Thu, Dec 12, 2013 at 12:54 PM, Peter Keegan peterlkee...@gmail.comwrote: This is pretty cool, and worthy of adding to Solr in Action (v2) and the other books. With function queries, flexible filter processing and caching, custom collectors, and post filters, there's a lot of flexibility here. Btw, the query times using a custom collector to scale/recompute scores is excellent (will have to see how it compares to your outlined solution). Thanks, Peter On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein joels...@gmail.com wrote: The sorting is going to happen in the lower level collectors. You need a value source that returns the score of the document being collected. Here is how you can make this happen: 1) Create an object in your PostFilter that simply holds the current score. Place this object in the SearchRequest context map. Update object.score as you pass the docs and scores to the lower collectors. 2) Create a values source that checks the SearchRequest context for the object that's holding the current score. Use this object to return the current score when called. For example if you give the value source a handle called score a compound function call will look like this: sum(score(), field(x)) Joel On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan peterlkee...@gmail.com wrote: Regarding my original goal, which is to perform a math function using the scaled score and a field value, and sort on the result, how does this fit in? Must I implement another custom PostFilter with a higher cost than the scale PostFilter? Thanks, Peter On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan peterlkee...@gmail.com wrote: Thanks very much for the guidance. I'd be happy to donate a working solution. Peter On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein joels...@gmail.com wrote: SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher I believe. They might apply to 4.3. I think as long you have the finish method that's all you'll need. If you can get this working it would be excellent if you could donate back the Scale PostFilter. On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan peterlkee...@gmail.com wrote: This is what I was looking for, but the DelegatingCollector 'finish' method doesn't exist in 4.3.0 :( Can this be patched in and are there any other PostFilter dependencies on 4.5? Thanks, Peter On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein joels...@gmail.com wrote: Here is one approach to use in a postfilter 1) In the collect() method call score for each doc. Use the scores to create your scaleInfo. 2) Keep a bitset of the hits and a priorityQueue of your top X ScoreDocs. 3) Don't delegate any documents to lower collectors in the collect() method. 4) In the finish method create a score mapping (use the hppc IntFloatOpenHashMap) with your top X docIds pointing to their score, using the priorityQueue created in step 2. Then iterate the bitset (also created in step 2) sending down each doc to the lower collectors, retrieving and scaling the score from the score map. If the document is not in the score map then send down 0. You'll have setup a dummy scorer to feed to lower collectors. The CollapsingQParserPlugin has an example of how to do this. On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan peterlkee...@gmail.com wrote: Hi Joel, I thought about using a PostFilter, but the problem is that the 'scale' function must be done after all matching docs have been scored but before adding them to the PriorityQueue that sorts just the rows to be returned. Doing the 'scale' function wrapped in a 'query' is proving to be too slow when it visits every document in the index. In the Collector, I can see how to get the field values like this:
Re: Metrics in monitoring SolrCloud
Hi Michael; I've implemented a management console and dashboard for such kind of purposes. After a time later I want to make it an open source project for the people who needs it. It is a more complicated but very flexible and pluggable management console and dashboard. I suggest you to look at the Solr admin page. You can see what you can get and you should debug the incoming and outcoming requests via a tools as like firebug. So you can understand what is going on behind the scene. Of course you should read the wiki to learn what Solr exposes via jmx and http. If you want a basic mechanism start with a CloudSolrServer that connects to Zookeeper. You can learn the clusterstate from there. If you check it within a time period you can be up to date with recent situation of cluster. That's the way how I started. If you have any question feel free to ask it I can answer them. Thanks; Furkan KAMACI 12 Aralık 2013 Perşembe tarihinde Ahmet Arslan iori...@yahoo.com adlı kullanıcı şöyle yazdı: Hi Michael, Not sure about collection, solr exposes 2) and 3) via : http://wiki.apache.org/solr/SolrJmx https://cwiki.apache.org/confluence/display/solr/MBean+Request+Handler On Thursday, December 12, 2013 4:47 PM, michael.boom my_sky...@yahoo.com wrote: Hi, I'm trying to add SolrCloud to out internal monitoring tools and I wonder if anybody else proceeded in this direction and could maybe provide some tips. I would want to be able to get from SolrCloud: 1. The status for each collection - meaning can it serve queries or not. 2. Average query time per collection 3. Nr of requests per second/min for each collection Would i need to implement some solr plugins for this, or is the information already existing? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Metrics-in-monitoring-SolrCloud-tp4106384.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Metrics in monitoring SolrCloud
Hi Michael, You may want to give http://sematext.com/spm/solr-performance-monitoring a shot. It's got the metrics you need + others + alerts + Ping if you want a Christmas discount. :) Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Dec 12, 2013 at 9:47 AM, michael.boom my_sky...@yahoo.com wrote: Hi, I'm trying to add SolrCloud to out internal monitoring tools and I wonder if anybody else proceeded in this direction and could maybe provide some tips. I would want to be able to get from SolrCloud: 1. The status for each collection - meaning can it serve queries or not. 2. Average query time per collection 3. Nr of requests per second/min for each collection Would i need to implement some solr plugins for this, or is the information already existing? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Metrics-in-monitoring-SolrCloud-tp4106384.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr OOM Crash
Hi Sandra, Not a direct answer, but if you are seeing this around merges, have you tried relaxing the merge factor to, say, 10? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Dec 12, 2013 at 12:10 PM, Sandra Scott scottsandr...@gmail.comwrote: Helllo, We are experiencing unexplained OOM crashes. We have already seen it a few times, over our different solr instances. The crash happens only at a single shard of the collection. Environment details: 1. Solr 4.3, running on tomcat. 2. 24 Shards. 3. Indexing rate of ~800 docs per minute. Solrconfig.xml: 1. Merge factor 4 2. Sofrcommit every 10 min 3. Hardcommit every 30 min Main findings: 1. Solr logs: No query failures prior to the OOM, but DOUBLE the amount of soft and hard commits in comparison to other shards. 2. Analyzing the dump (VisualVM): Class byte[] takes 4gb out of 5gb resourced to the JVM, mainly referenced by CompressingStoredFieldsReader GC root (which by looking at the code, we suspect they were created due to CompressingSortedFieldsWriter.merge). Sub findings: 1. GC logs: Showed 108 GC fails prior to the crash. 2. CPI: Overall usage seems fine, but the % of CPU time for the GC stays high 6 min before the OOM. 3. Memory: Half an hour before OOM the usage slowly rises, until it gets to 5.4gb. Has anyone encountered higher than normal commit rate that seem to increase merge rate and cause what I described?
Re: Solr Cloud error with shard update
I have created Jira issue here: https://issues.apache.org/jira/browse/SOLR-5551 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-error-with-shard-update-tp4106260p4106448.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Change Velocity Template Directory in Solr 4.6
Thank you very much for the confirmation iorixxx. When I started this thread on Dec. 6, I did not know about the confluence wiki (https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide). I learned about it through another thread I started (http://lucene.472066.n3.nabble.com/Use-of-Deprecated-Classes-SortableIntField-SortableFloatField-SortableDoubleField-tp4105762p4106001.html). I think that is much more up to date and has a lot more information than the official Solr Wiki and I would be reading it before posting here. Thank you again for your help. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Change-Velocity-Template-Directory-in-Solr-4-6-tp4105381p4106467.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re: LanguageIdentifierUpdateProcessor uses only firstValue() on multivalued fields
Hmm... haven't run into the case where null was returned in a multi-valued scenario yet... I probably just haven't tested that case. I likely need to add a null check there - thanks for pointing it out. -Trey On Fri, Nov 29, 2013 at 6:10 AM, Müller, Stephan muel...@ponton-consulting.de wrote: Hello Trey, thank you for this example. We've solved it by omitting the multivalued field and passing the distinct string fields instead, still I go with proposing a patch, so the language processor is able to concatenate multivalues by default. I think it's a reasonable feature (and can't remember to have ever contributed a patch to an open source project) My thoughts on the patch implementation are quite the same as Yours, iterating on getValues(). I'll have this discussed in the dev-list and probably in JIRA. One thing: How do you guard against a possible NPE in line 129 (final Object inputValue : inputField.getValues()) { SolrInputField.getValues() will return NULL if the associated value was null. It does not create an empty Collection. That, btw, seems to be a minor bug in the javadoc, not stating that this method returns null. Regards, Stephan - srm [...] The langsToPrepend variable above will contain a set of languages, where detectLanguage was called separately for each value in the multivalued field. If you just want to concatenate all the values and detect languages once (as opposed to only using the first value in the multivalued field, like it does today), just concatenate each of the input values in the first loop and call detectLanguage once at the end. I wrote code that does this for an example in the Solr in Action book. The particular example was detecting languages for each value in a multivalued field and then pre-pending the language to the text for the multivalued field (so the analyzer would know which stemmer to use, as they were being dynamically substituted in based upon the language). The code is available here if you are interested: https://github.com/treygrainger/solr-in- action/blob/master/src/main/java/sia/ch14/MultiTextFieldLanguageIdentifier UpdateProcessor.java Good luck! -Trey On Wed, Nov 27, 2013 at 10:16 AM, Müller, Stephan Mueller@ponton- consulting.de wrote: I suspect that it is an oversight for a use case that was not considered. I mean, it should probably either ignore or convert non text/string values. Ok, I'll see that I provide a patch against trunk. It actually ignores non string values, but is unable to check the remaining values of a multivalued field. Hmmm... are you using JSON input? I mean, how are the types being set? Solr XML doesn't have a way to set the value types. No. It's a field with multivalued=true. That results in a SolrInputField where value (which is defined to be Object) actually holds a List. This list is populated with Integer, String, Date, you name it. I'm talking about the actual Java-Datatypes. The values in the list are probably set by this 3rdparty Textbodyprocessor thingy. Now the Language processor just asks for field.getValue(). This is delegated to the SolrInputField which in turn calls firstValue() Interestingly enough, already is able to handle a Collection as its value. But if the value is a collection, it just returns the first element. You could workaround it with an update processor that copied the field and massaged the multiple values into what you really want the language detection to see. You could even implement that processor as a JavaScript script with the stateless script update processor. Our workaround would be to not feed the multivalued field but only the String fields (which are also included in the multivalued field) Filing a Bug/Feature request and providing the patch will take some time as I haven't setup a fully working trunk in my IDEA installation. But I'm eager to do it :) Regards, Stephan -- Jack Krupansky -Original Message- From: Müller, Stephan Sent: Wednesday, November 27, 2013 5:02 AM To: solr-user@lucene.apache.org Subject: LanguageIdentifierUpdateProcessor uses only firstValue() on multivalued fields Hello, this is a repost. This message was originally posted on the 'general' list but it was suggested, that the 'user' list might be a better place to ask. Original Message Hi, we are passing a multivalued field to the LanguageIdentifierUpdateProcessor. This multivalued field contains arbitrary types (Integer, String, Date). Now, the LanguageIdentifierUpdateProcessor.concatFields(SolrInputDocument doc, String[] fields), which btw does not use the parameter fields, is unable to parse all fields of the/a multivalued field. The call Object content =
Re: Prioritize search returns by URL path?
Thanks Chris. I think you've hit the nail on the head. I understand your concern about prioritizing content simply by content type, and generally I'd agree with you. However, our situation is a bit unusual. We don't use our Wiki feature as true wikis. We publish only authoritative content to them, and to our blogs, so those really are the things we want returned first. The wikis most often contain the information we want our customers to find. Thanks again for the syntax help. We'll give it a try. JRG -- View this message in context: http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4106481.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FuzzyLookupFactory fwfsta.bin
This error is misleading. It tries to load the suggester index from the storeDir parameter even on the first run, when the index was not created to begin with, and hence errors. (it will create the index itself when a build command is issued). I believe you will not see the error once the index is built for the suggester and the error on the first run does not have any consequences in terms of functionality. On Wed, Dec 11, 2013 at 4:53 AM, Harun Reşit Zafer harun.za...@tubitak.gov.tr wrote: With the configration below: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest. Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst. FuzzyLookupFactory/str str name=storeDirfuzzy_suggest_analyzing/str str name=buildOnCommittrue/str str name=suggestAnalyzerFieldTypetext_tr/str str name=sourceLocationsuggestions.txt/str !-- Suggester properties -- bool name=exactMatchFirsttrue/bool bool name=preserveSepfalse/bool /lst str name=queryAnalyzerFieldTypelowercase/str /searchComponent *I got th**e error:* ...\solr-4.6.0\example\solr\collection1\data\fuzzy_suggest_analyzing\fwfsta.bin (The system cannot find the file specified) -- Harun Reşit Zafer TÜBİTAK BİLGEM BTE Metin Madenciliği ve Kaynaştırma Sistemleri Bölümü T +90 262 675 3268 Whttp://www.hrzafer.com
Unable to check Solr 4.6 SPLITSHARD command progress
I have a big index, approx size 350 GB in a single shard, which I want to split. The SPLITSHARD command initiates successfully as I can see in the logs. (It times out but reading the forums here it is the expected behavior). The problem is it never completes even after a full day, and I doubt it is actually running in the background. Because the created two shard folders are of constant size in KBs and is not changing over time. In the zookeeper the two shards are active and marked as under construction. The logs do not show any error, the last log entry is like: INFO - 2013-12-12 14:48:36.110; org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: partition #0 range=8000- Now, what I'm asking is if it is possible to check the progress of this command. Even if it is as simple as changing of files in some directory... Or taking dump of threads, which just assures me that it is running progressing. I'd also be happy to know how does the overseer split. I also tried looking into its code but was unable to figure out the way it does so. Also, in the zookeeper files related to overseer I was unable to find anything related to the split command. Thanks, Binit -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-check-Solr-4-6-SPLITSHARD-command-progress-tp4106520.html Sent from the Solr - User mailing list archive at Nabble.com.
custom group sort in solr
Hi, I want to use solr/lucene's grouping feature with a some customisations like - sorting the groups based on average scores instead of max scores or some other complex computation over scores. - group articles based on some computation instead of a field value. So far it seems like I have to write some code for it. Can someone please point me to the right direction? - If I have to write a plugin, which files I need to check? - Which part of the code currently executes the grouping feature? Does it happen in solr or lucene? Is it SearchHandler? Parvesh Garg http://www.zettata.com
Re: custom group sort in solr
Hi You may try to write a custom function and sort your group according to the result of the custom function. If that might work, check out ValueSourceParser, ValueSource and its descendant classes for a better understanding. Thanks Regards Mukund On Fri, Dec 13, 2013 at 10:54 AM, Parvesh Garg parv...@zettata.com wrote: Hi, I want to use solr/lucene's grouping feature with a some customisations like - sorting the groups based on average scores instead of max scores or some other complex computation over scores. - group articles based on some computation instead of a field value. So far it seems like I have to write some code for it. Can someone please point me to the right direction? - If I have to write a plugin, which files I need to check? - Which part of the code currently executes the grouping feature? Does it happen in solr or lucene? Is it SearchHandler? Parvesh Garg http://www.zettata.com
Re: Updating shard range in Zookeeper
Zookeeper client for eclipse is the tool you're looking for. You can edit directly the clusterstate. http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper Another option can be using the delivered zkclient (distributed with solr 4.5 and above) and upload a new clusterstate with a new shard range. Good luck
Re: Sudden Solr crush after commit
Running solr 4.3, sharded collection. Tomcat 7.0.39 Faceting on multivalue fields works perfectly fine, I was describing this log to emphasize the fact the servlet failed right after a new searcher was opened and the event listener finished running a warming faceting query.