Intermittent issue in solr index update
Hi, I am facing "Cannot talk to ZooKeeper" issue intermittently in solr index update. While facing this issue strange thing is that there is no error in ZooKeeper logs and also all shards are showing active in solr admin panel. Please find below details logs and Solr server configuration. Logs: ERROR (qtp41903949-261266) [c:documents s:shard1 r:core_node4 x:documents] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1490) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:678) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.AsiteDocumentUpdateReqProcessor.processAdd(AsiteDocumentUpdateReqProcessorFactory.java:125) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:274) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:239) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:157) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2036) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:518) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:745) Solr server configuration: Processor: Intel(R) Xeon(R) CPU ES-2630 V4 @ 2.20Ghz (2 processor) RAM : 128 GB usable System type : 64-bit OS : Window Server 2012 standard Thanks & Regards, Bhaumik Joshi
Issue in SolrInputDocument
Hi, I am getting below error while converting json to my object. I am using Gson class (gson-2.2.4.jar) to generate json from object and object from json. gson fromJson() method throws below error. Note: This was working fine with solr-solrj-5.2.0.jar but it causing issue when i uses solr-solrj-6.1.0.jar. As i checked SolrInputDocument class has changed in solr-solrj-5.5.0. java.lang.IllegalArgumentException: Can not set org.apache.solr.common.SolrInputDocument field com.test.common.MySolrMessage.body to com.google.gson.internal.LinkedTreeMap at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:167) at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:171) at sun.reflect.UnsafeObjectFieldAccessorImpl.set(UnsafeObjectFieldAccessorImpl.java:81) at java.lang.reflect.Field.set(Field.java:764) at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:108) at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:185) at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40) at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:81) at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:1) at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:106) at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:185) at com.google.gson.Gson.fromJson(Gson.java:825) at com.google.gson.Gson.fromJson(Gson.java:790) at com.google.gson.Gson.fromJson(Gson.java:739) at com.google.gson.Gson.fromJson(Gson.java:711) public class MySolrMessage implements IMessage { private static final long serialVersionUID = 1L; private T body = null; private String collection; private int action; private int errorCode; private long msgId; //few parameterized constructor //getter and setter method of all above attributes } public interface IMessage extends Serializable { public long getMsgId(); public void setMsgId(long id); public Object getBody(); public void setBody(Object o); public void setErrorCode(int ec); public int getErrorCode(); } public class Request { LinkedList msgList = new LinkedList(); public Request() { } public Request(LinkedList l) { this.msgList = l; } public LinkedList getMsgList() { return this.msgList; } } @JsonAutoDetect(JsonMethod.FIELD) @JsonSerialize(include = JsonSerialize.Inclusion.NON_NULL) public class Request2 { @JsonProperty @JsonDeserialize(as=LinkedList.class,contentAs = MySolrMessage.class) LinkedList> msgList = new LinkedList>(); public Request() { } public Request(LinkedList> l) { this.msgList = l; } public LinkedList> getMsgList() { return this.msgList; } } public class Test { public static void main(String[] args) { SolrInputDocument solrDocument = new SolrInputDocument(); solrDocument.addField("id", "1234"); solrDocument.addField("name", "test"); MySolrMessage asm = new MySolrMessage(solrDocument, "collection1", 1); IMessage message = asm; List msgList = new ArrayList(); msgList.add(message); LinkedList ex = new LinkedList(); ex.addAll(msgList); Request request = new Request(ex); try { String json = ""; Gson gson = (new GsonBuilder()).serializeNulls().create(); gson.setASessionId((String) null); json = gson.toJson(request); Gson gson2 = new Gson(); Request2 retObj = gson2.fromJson(json, Request2.class); //this will gives the above error. } catch (Exception e) { e.printStackTrace(); } } } Any idea? Thanks & Regards, Bhaumik Joshi
Re: Disabling solr scoring
Thanks Hoss got the point. Bhaumik Joshi From: Chris Hostetter Sent: Friday, July 8, 2016 4:52 PM To: solr-user Subject: Re: Disabling solr scoring : Can you please elaborate? I am passing user defined sort field and order whenever i search. I think Mikhail just missunderstood your question -- he was giving an example of how to override the default sort (which uses score) with one that would ensure scores are not computed. : > Is there any way to completely disable scoring in solr cloud as i am : > always passing sort parameter whenever i search. In general, you don't have to do anythign special. Solr's internal code looks at the sort specified, and the fields requested (via the fl param) to determine if/when scores need to be computed while colleting documents. If scores aren't needed for any reason, then that info is passed down to the low level lucene document matching/collection code for optimizing the collection so scores aren't computed. -Hoss http://www.lucidworks.com/ Lucidworks<http://www.lucidworks.com/> www.lucidworks.com Lucidworks Fusion is the search and analytics platform powering the next generation of big data applications.
Re: Disabling solr scoring
Can you please elaborate? I am passing user defined sort field and order whenever i search. Thanks & Regards, Bhaumik Joshi From: Mikhail Khludnev Sent: Friday, July 8, 2016 4:13 AM To: solr-user Subject: Re: Disabling solr scoring What about sort=_docid_ asc ? 08 2016 ?. 13:50 ???? "Bhaumik Joshi" < bhaumik.jo...@outlook.com> ???: > Hi, > > > Is there any way to completely disable scoring in solr cloud as i am > always passing sort parameter whenever i search. > > And disabling scoring will improve performance? > > > Thanks & Regards, > > Bhaumik Joshi >
Disabling solr scoring
Hi, Is there any way to completely disable scoring in solr cloud as i am always passing sort parameter whenever i search. And disabling scoring will improve performance? Thanks & Regards, Bhaumik Joshi
Re: Passing Ids in query takes more time
Thanks Jeff. TermsQueryParser worked for me. Thanks & Regards, Bhaumik Joshi From: Jeff Wartes Sent: Thursday, May 5, 2016 8:19 AM To: solr-user@lucene.apache.org Subject: Re: Passing Ids in query takes more time An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for 80k ids though is basically 80k searches as far as Solr is concerned, so it’s not altogether surprising that it takes a while. Your complaint seems to be that the query planner doesn’t know in advance that should be run first, and then the id selection applied to the reduced set. So, I can think of a few things for you to look at, in no particular order: 1. TermsQueryParser is designed for lists of terms, you might get better results from that: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser 2. If your is the real discriminating factor in your search, you could just search for and then apply your ID list as a PostFilter: http://yonik.com/advanced-filter-caching-in-solr/ I guess that’d look something like &fq={!terms f= v="= 100 should qualify it as a post filter, which only operates on an already-found result set instead of the full index. (Note: I haven’t confirmed that the Terms query parser supports post filtering.) 3. I’m not really aware of any storage engine that’ll love doing a filter on 80k ids at once, but a key-value store like Cassandra might work out better for that. 4. There is a thing called a JoinQParserPlugin (https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser) that can join to another collection (https://issues.apache.org/jira/browse/SOLR-4905). But I’ve never used it, and there are some significant restrictions. On 5/5/16, 2:46 AM, "Bhaumik Joshi" wrote: >Hi, > > >I am retrieving ids from collection1 based on some query and passing those ids >as a query to collection2 so the query to collection2 which contains ids in it >takes much more time compare to normal query. > > >Que. 1 - While passing ids to query why it takes more time compare to normal >query however we are narrowing the criteria by passing ids? > >e.g. query-1: doc_id:(111 222 333 444 ...) AND slower >(passing 80k ids takes 7-9 sec) than query-2: only (700-800 >ms). Both returns 250 records with same set of fields. > > >Que. 2 - Any idea on how i can achieve above (get ids from one collection and >pass those ids to other one) in efficient manner or any other way to get data >from one collection based on response of other collection? > > >Thanks & Regards, > >Bhaumik Joshi
Re: Passing IDs in query takes more time
Thanks Erick. TermsQueryParser worked for me. Thanks & Regards, Bhaumik Joshi From: Erick Erickson Sent: Friday, May 6, 2016 10:00 AM To: solr-user Subject: Re: Passing IDs in query takes more time Well, you're parsing 80K IDs and forming them into a query. Consider what has to happen. Even in the very best case of the being evaluated first, for every doc that satisfies that clause the inverted index must be examined 80,000 times to see if that doc matches one of the IDs in your huge clause for scoring purposes. You might be better off by moving the 80K list to an fq clause like fq={!cache=false}docid:(111 222 333). Additionally, you probably want to use the TermsQueryParser, something like: fq={!terms f=id cache=false}111,222,333 see: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser In any case, though, an 80K clause will slow things down considerably. Best, Erick On Thu, May 5, 2016 at 2:42 AM, Bhaumik Joshi wrote: > Hi, > > > I am retrieving ids from collection1 based on some query and passing those > ids as a query to collection2 so the query to collection2 which contains ids > in it takes much more time compare to normal query. > > > Que. 1 - While passing ids to query why it takes more time compare to normal > query however we are narrowing the criteria by passing ids? > > e.g. query-1: doc_id:(111 222 333 444 ...) AND slower > (takes 7-9 sec) than > > only (700-800 ms). Please note that in this case i am > passing 80k ids in and retrieving 250 rows. > > > Que. 2 - Any idea on how i can achieve above (get ids from one collection and > pass those ids to other one) in efficient manner or any other way to get data > from one collection based on response of other collection? > > > Thanks & Regards, > > Bhaumik Joshi
Passing Ids in query takes more time
Hi, I am retrieving ids from collection1 based on some query and passing those ids as a query to collection2 so the query to collection2 which contains ids in it takes much more time compare to normal query. Que. 1 - While passing ids to query why it takes more time compare to normal query however we are narrowing the criteria by passing ids? e.g. query-1: doc_id:(111 222 333 444 ...) AND slower (passing 80k ids takes 7-9 sec) than query-2: only (700-800 ms). Both returns 250 records with same set of fields. Que. 2 - Any idea on how i can achieve above (get ids from one collection and pass those ids to other one) in efficient manner or any other way to get data from one collection based on response of other collection? Thanks & Regards, Bhaumik Joshi
Passing IDs in query takes more time
Hi, I am retrieving ids from collection1 based on some query and passing those ids as a query to collection2 so the query to collection2 which contains ids in it takes much more time compare to normal query. Que. 1 - While passing ids to query why it takes more time compare to normal query however we are narrowing the criteria by passing ids? e.g. query-1: doc_id:(111 222 333 444 ...) AND slower (takes 7-9 sec) than only (700-800 ms). Please note that in this case i am passing 80k ids in and retrieving 250 rows. Que. 2 - Any idea on how i can achieve above (get ids from one collection and pass those ids to other one) in efficient manner or any other way to get data from one collection based on response of other collection? Thanks & Regards, Bhaumik Joshi
Re: Solr Sharding Strategy
Hi , Toke - I tried with pausing the indexing fully but got the slight improvement so the impact of indexing is not that much. Shawn - Answer to your question - I am sending one document in one update request. I have test solr cloud configured like 2 shards on one machine and each of has one replica on another machine. So in order to check the network latency is bottleneck or not i have disabled replicas and run the test but didn't get improvement. Another thing i have tried in order to balance the load and providing more CPU and memory resources i have configured only 2 shards both are on separate machine and no replica and then run the test but in that case performance got down. Talking about the production we want to have 2 shard in order to make platform scalable and future proof. Just want inform that we have 22 collections on production in that 4 are major in terms of volume and complexity and which frequently used for querying and indexing and rest of them are comparatively minor and have less query and index hits. Below are the production index statistics. No of collections: 22 collections having 139 million documents with index size of 85 GB. Major collections: 4 collections having 134 million documents with index size of 77 GB. Minor collections: 18 collections having 5 million documents with index size of 8 GB. So any idea on how to improve query performance with this statistics along with Index-heavy (100 index updates per sec) and Query-heavy (100 queries per sec) scenario? Thanks & Regards, Bhaumik Joshi From: Shawn Heisey Sent: Tuesday, April 12, 2016 7:37 AM To: solr-user@lucene.apache.org Subject: Re: Solr Sharding Strategy On 4/11/2016 6:31 AM, Bhaumik Joshi wrote: > We are using solr 5.2.0 and we have Index-heavy (100 index updates per > sec) and Query-heavy (100 queries per sec) scenario. > > *Index stats: *10 million documents and 16 GB index size > > > > Which sharding strategy is best suited in above scenario? > > Please share reference resources which states detailed comparison of > single shard over multi shard if any. > > > > Meanwhile we did some tests with SolrMeter (Standalone java tool for > stress tests with Solr) for single shard and two shards. > > *Index stats of test solr cloud: *0.7 million documents and 1 GB index > size. > > As observed in test average query time with 2 shards is much higher > than single shard. > On the same hardware, multiple shards will usually be slower than one shard, especially under a high load. Sharding can give good results with *more* hardware, providing more CPU and memory resources. When the query load is high, there should only be only one core (shard replica) per server, and Solr works best when it is running on bare metal, not virtualized. Handling 100 queries per second will require multiple copies of your index on separate hardware. This is a fairly high query load. There are installations handling much higher loads, of course. Those installations have a LOT of replicas and some way to balance load across them. For 10 million documents and 16GB of index, I'm not sure that I would shard at all, just make sure that each machine has plenty of memory -- probably somewhere in the neighborhood of 24GB to 32GB. That assumes that Solr is the only thing running on that server, and that if it's virtualized, making sure that the physical server's memory is not oversubscribed. Regarding your specific numbers: The low queries per second may be caused by one or more of these problems, or perhaps something I haven't thought of: 1) your queries are particularly heavy. 2) updates are interfering by tying up scarce resources. 3) you don't have enough memory in the machine. How many documents are in each update request that you are sending? In another thread on the list, you have stated that you have a 1 second maxTime on autoSoftCommit. This is *way* too low, and a *major* source of performance issues. Very few people actually need that level of latency -- a maxTime measured in minutes may be fast enough, and is much friendlier for performance. Thanks, Shawn
Re: Soft commit does not affecting query performance
Hi Bill, Please find below reference. http://www.cloudera.com/documentation/enterprise/5-4-x/topics/search_tuning_solr.html * "Enable soft commits and set the value to the largest value that meets your requirements. The default value of 1000 (1 second) is too aggressive for some environments." Thanks & Regards, Bhaumik Joshi From: billnb...@gmail.com Sent: Monday, April 11, 2016 7:07 AM To: solr-user@lucene.apache.org Subject: Re: Soft commit does not affecting query performance Why do you think it would ? Bill Bell Sent from mobile > On Apr 11, 2016, at 7:48 AM, Bhaumik Joshi wrote: > > Hi All, > > We are doing query performance test with different soft commit intervals. In > the test with 1sec of soft commit interval and 1min of soft commit interval > we didn't notice any improvement in query timings. > > > > We did test with SolrMeter (Standalone java tool for stress tests with Solr) > for 1sec soft commit and 1min soft commit. > > Index stats of test solr cloud: 0.7 million documents and 1 GB index size. > > Solr cloud has 2 shard and each shard has one replica. > > > > Please find below detailed test readings: (all timings are in milliseconds) > > > Soft commit - 1sec > Queries per sec Updates per sec Total Queries > Total Q time Avg Q Time Total Client time > Avg Client time > 1 5 > 100 44340 >443 48834 > 488 > 5 5 > 101 128914 > 1276 143239 1418 > 10 5 > 104 295325 > 2839 330931 3182 > 25 5 > 102 675319 > 6620 793874 7783 > > Soft commit - 1min > Queries per sec Updates per sec Total Queries > Total Q time Avg Q Time Total Client time > Avg Client time > 1 5 > 100 44292 >442 48569 > 485 > 5 5 > 105 131389 > 1251 147174 1401 > 10 5 > 102 299518 > 2936 337748 3311 > 25 5 > 108 742639 > 6876 865222 8011 > > As theory suggests soft commit affects query performance but in my case it > doesn't. Can you put some light on this? > Also suggest if I am missing something here. > > Regards, > Bhaumik Joshi > > > > > > > > > > > [Asite] > > The Hyperloop Station Design Competition - A 48hr design collaboration, from > mid-day, 23rd May 2016. > REGISTER HERE http://www.buildearthlive.com/hyperloop [http://www.buildearthlive.com/resources/images/BuildEarthLiveLogo-Hyperloop-2.png]<http://www.buildearthlive.com/hyperloop> The Hyperloop Station Design Competition - Build Earth Live<http://www.buildearthlive.com/hyperloop> www.buildearthlive.com The Hyperloop Station Design Competition. A 48hr design collaboration, from mid-day,23rd May. > > [Build Earth Live Hyperloop]<http://www.buildearthlive.com/hyperloop> > > [CC Award Winners 2015]
Re: Solr Sharding Strategy
Ok i will try with pausing the indexing fully and will check the impact. In performance test queries issued sequentially. Thanks & Regards, Bhaumik Joshi From: Toke Eskildsen Sent: Monday, April 11, 2016 11:13 PM To: Bhaumik Joshi Cc: solr-user@lucene.apache.org Subject: Re: Solr Sharding Strategy On Tue, 2016-04-12 at 05:57 +0000, Bhaumik Joshi wrote: > //Insert Document > UpdateResponse resp = cloudServer.add(doc, 1000); > Don't insert documents one at a time, if it can be avoided: https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/ Try pausing the indexing fully when you do your query test, to check how big the impact of indexing is. When you run your query performance test, are the queries issued sequentially or in parallel? - Toke Eskildsen, State and Univeristy Library, Denmark
Re: Solr Sharding Strategy
Please note that all caches are disable in mentioned test. In 2 shards: Intended queries and updates = 10 per sec Actual queries per sec = 3.3 Actual updates per sec = 10 so for 302 queries avg query time is 2192ms. In 1 shard: Intended queries and updates = 10 per sec Actual queries per sec = 9.7 Actual updates per sec = 10.3 so for 302 queries avg query time is 83ms. We do soft commit when we insert/update document. //Insert Document UpdateResponse resp = cloudServer.add(doc, 1000); if (resp.getStatus() == 0) { success = true; } //Update Document UpdateRequest req = new UpdateRequest(); req.setCommitWithin(1000); req.add(docs); UpdateResponse resp = req.process(cloudServer); if (resp.getStatus() == 0) { success = true; } Here is commit settings in solrconfig.xml. 60 2 false ${solr.autoSoftCommit.maxTime:-1} Thanks & Regards, Bhaumik Joshi From: Daniel Collins Sent: Monday, April 11, 2016 8:12 AM To: solr-user@lucene.apache.org Subject: Re: Solr Sharding Strategy I'd also ask about your indexing times, what QTime do you see for indexing (in both scenarios), and what commit times are you using (which Toke already asked). Not entirely sure how to read your table, but looking at the indexing side of things, with 2 shards, there is inherently more work to do, so you would expect indexing latency to increase (we have to index in 1 shard, and then index in the 2nd shard, so logically its twice the workload). Your table suggests you managed 10 updates per second, but you never managed 25 updates per second either with 1 shard or 2 shards. Though the numbers don't make sense, you managed 13.9 updates per sec on 1 shard, and 21.9 updates per sec on 2 shards. That suggests to me that in the single shard case, your searches are causing your indexing to throttle, maybe the resourcing is favoring searches and so the indexing threads aren't getting a look in... Whereas in the 2 shard case, it seems clear (as Toke said), that search isn't really hitting the index much, not sure where the bottleneck is, but its not on the index, which is why your indexing load can get more requests through. On 11 April 2016 at 15:36, Toke Eskildsen wrote: > On Mon, 2016-04-11 at 11:23 +, Bhaumik Joshi wrote: > > We are using solr 5.2.0 and we have Index-heavy (100 index updates per > > sec) and Query-heavy (100 queries per sec) scenario. > > > Index stats: 10 million documents and 16 GB index size > > > Which sharding strategy is best suited in above scenario? > > Sharding reduces query throughput and can improve query latency as well > as indexing speed. For small indexes, the overhead of sharding is likely > to worsen query latency. So as always, it depends. > > Qualified guess: Don't use multiple shards, but consider using replicas. > > > Please share reference resources which states detailed comparison of > > single shard over multi shard if any. > > Sorry, could not find the one I had in mind. > > > > Meanwhile we did some tests with SolrMeter (Standalone java tool for > > stress tests with Solr) for single shard and two shards. > > > > Index stats of test solr cloud: 0.7 million documents and 1 GB index > > size. > > > > As observed in test average query time with 2 shards is much higher > > than single shard. > > Makes sense: Your shards are so small that the actual time spend on the > queries is very low. So relatively, the overhead of distributed (aka > multi-shard) searching is high, negating any search-gain you got by > sharding. I would not have expected the performance drop-off to be that > large (factor 20-60) though. > > Your query speed is unusually low for an index of your size, which leads > me to believe that your indexing is slowing everything down. This is > often due to too frequent commits and/or too many warm up queries. > > There is a bit about it at > https://wiki.apache.org/solr/SolrPerformanceFactors > > > - Toke Eskildsen, State and University Library, Denmark > > > >
Soft commit does not affecting query performance
Hi All, We are doing query performance test with different soft commit intervals. In the test with 1sec of soft commit interval and 1min of soft commit interval we didn't notice any improvement in query timings. We did test with SolrMeter (Standalone java tool for stress tests with Solr) for 1sec soft commit and 1min soft commit. Index stats of test solr cloud: 0.7 million documents and 1 GB index size. Solr cloud has 2 shard and each shard has one replica. Please find below detailed test readings: (all timings are in milliseconds) Soft commit - 1sec Queries per sec Updates per sec Total Queries Total Q time Avg Q Time Total Client time Avg Client time 1 5 100 44340 443 48834488 5 5 101 128914 1276 143239 1418 10 5 104 295325 2839 330931 3182 25 5 102 675319 6620 793874 7783 Soft commit - 1min Queries per sec Updates per sec Total Queries Total Q time Avg Q Time Total Client time Avg Client time 1 5 100 44292 442 48569485 5 5 105 131389 1251 147174 1401 10 5 102 299518 2936 337748 3311 25 5 108 742639 6876 865222 8011 As theory suggests soft commit affects query performance but in my case it doesn't. Can you put some light on this? Also suggest if I am missing something here. Regards, Bhaumik Joshi [Asite] The Hyperloop Station Design Competition - A 48hr design collaboration, from mid-day, 23rd May 2016. REGISTER HERE http://www.buildearthlive.com/hyperloop [Build Earth Live Hyperloop]<http://www.buildearthlive.com/hyperloop> [CC Award Winners 2015]
Solr Sharding Strategy
Hi, We are using solr 5.2.0 and we have Index-heavy (100 index updates per sec) and Query-heavy (100 queries per sec) scenario. Index stats: 10 million documents and 16 GB index size Which sharding strategy is best suited in above scenario? Please share reference resources which states detailed comparison of single shard over multi shard if any. Meanwhile we did some tests with SolrMeter (Standalone java tool for stress tests with Solr) for single shard and two shards. Index stats of test solr cloud: 0.7 million documents and 1 GB index size. As observed in test average query time with 2 shards is much higher than single shard. Please find below detailed readings: 2 Shards Intended queries per sec Actual queries per min Actual queries per sec Intended updates per sec Actual updates per min Actual updates per sec Total Queries Total Q time (ms) Avg Q Time (ms) Avg Q Time (sec) Total Client time (ms) Avg Client time (ms) 10 198 3.3 10 600 10 302 662176 2192 2.192 756603 2505 25 168 2.8 25 1314 21.9 301 2019735 6710 6.71 2370018 7873 1 Shard Intended queries per sec Actual queries per min Actual queries per sec Intended updates per sec Actual updates per min Actual updates per sec Total Queries Total Q time (ms) Avg Q Time (ms) Avg Q Time (sec) Total Client time (ms) Avg Client time (ms) 10 582 9.7 10 618 10.3 302 25081 83 0.083 55612 184 25 1026 17.1 25 834 13.9 306 33366 109 0.109 259392 847 Note: Query returns 250 rows and matches 57880 documents Thanks & Regards, [Description: Description: Description: C:\Users\hparekh\AppData\Roaming\Microsoft\Signatures\images\logo.jpg] Bhaumik Joshi Developer Asite, A4, Shivalik Business Center, B/h. Rajpath Club, Opp. Kens Ville Golf Academy, Bodakdev, Ahmedabad 380054, Gujarat, India. T: +91 (079) 4021 1900 Ext: 5234 | M: +91 94282 99055 | E: bjo...@asite.com<mailto:bjo...@asite.com> W: www.asite.com<http://www.asite.com/> | Twitter: @Asite<https://twitter.com/Asite/> | Facebook: facebook.com/Asite<http://www.facebook.com/pages/ASITE/201872569531> [Asite] The Hyperloop Station Design Competition - A 48hr design collaboration, from mid-day, 23rd May 2016. REGISTER HERE http://www.buildearthlive.com/hyperloop [Build Earth Live Hyperloop]<http://www.buildearthlive.com/hyperloop> [CC Award Winners 2015]
Solr Sharding Strategy
Hi, We are using solr 5.2.0 and we have Index-heavy (100 index updates per sec) and Query-heavy (100 queries per sec) scenario. Index stats: 10 million documents and 16 GB index size Which sharding strategy is best suited in above scenario? Please share reference resources which states detailed comparison of single shard over multi shard if any. Meanwhile we did some tests with SolrMeter (Standalone java tool for stress tests with Solr) for single shard and two shards. Index stats of test solr cloud: 0.7 million documents and 1 GB index size. As observed in test average query time with 2 shards is much higher than single shard. Please find below detailed readings: 2 Shards Intended queries per sec Actual queries per min Actual queries per sec Intended updates per sec Actual updates per min Actual updates per sec Total Queries Total Q time (ms) Avg Q Time (ms) Avg Q Time (sec) Total Client time (ms) Avg Client time (ms) 10 198 3.3 10 600 10 302 662176 2192 2.192 756603 2505 25 168 2.8 25 1314 21.9 301 2019735 6710 6.71 2370018 7873 1 Shard Intended queries per sec Actual queries per min Actual queries per sec Intended updates per sec Actual updates per min Actual updates per sec Total Queries Total Q time (ms) Avg Q Time (ms) Avg Q Time (sec) Total Client time (ms) Avg Client time (ms) 10 582 9.7 10 618 10.3 302 25081 83 0.083 55612 184 25 1026 17.1 25 834 13.9 306 33366 109 0.109 259392 847 Note: Query returns 250 rows and matches 57880 documents Thanks & Regards, [Description: Description: Description: C:\Users\hparekh\AppData\Roaming\Microsoft\Signatures\images\logo.jpg] Bhaumik Joshi Developer Asite, A4, Shivalik Business Center, B/h. Rajpath Club, Opp. Kens Ville Golf Academy, Bodakdev, Ahmedabad 380054, Gujarat, India. T: +91 (079) 4021 1900 Ext: 5234 | M: +91 94282 99055 | E: bjo...@asite.com<mailto:bjo...@asite.com> W: www.asite.com<http://www.asite.com/> | Twitter: @Asite<https://twitter.com/Asite/> | Facebook: facebook.com/Asite<http://www.facebook.com/pages/ASITE/201872569531> [Asite] The Hyperloop Station Design Competition - A 48hr design collaboration, from mid-day, 23rd May 2016. REGISTER HERE http://www.buildearthlive.com/hyperloop [Build Earth Live Hyperloop]<http://www.buildearthlive.com/hyperloop> [CC Award Winners 2015]