Re: Solr performance issues
Thanks all. I've the same index with a bit different schema and 200M documents, installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size of index is about 1.5TB, have many updates every 5 minutes, complex queries and faceting with response time of 100ms that is acceptable for us. Toke Eskildsen, Is the index updated while you are searching? *No* Do you do any faceting or other heavy processing as part of a search? *No* How many hits does a search typically have and how many documents are returned? *The test for QTime only with no documents returned and No. of hits varying from 50,000 to 50,000,000.* How many concurrent searches do you need to support? How fast should the response time be? *May be 100 concurrent searches with 100ms with facets.* Does splitting the shard to two shards on the same node so every shard will be on a single EBS Volume better than using LVM? Thanks On Mon, Dec 29, 2014 at 2:00 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Mahmoud Almokadem [prog.mahm...@gmail.com] wrote: We've installed a cluster of one collection of 350M documents on 3 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS General purpose (1x1TB + 1x500GB) on each instance. Then we create logical volume using LVM of 1.5TB to fit our index. Your search speed will be limited by the slowest storage in your group, which would be your 500GB EBS. The General Purpose SSD option means (as far as I can read at http://aws.amazon.com/ebs/details/#piops) that your baseline of 3 IOPS/MB = 1500 IOPS, with bursts of 3000 IOPS. Unfortunately they do not say anything about latency. For comparison, I checked the system logs from a local test with our 21TB / 7 billion documents index. It used ~27,000 IOPS during the test, with mean search time a bit below 1 second. That was with ~100GB RAM for disk cache, which is about ½% of index size. The test was with simple term queries (1-3 terms) and some faceting. Back of the envelope: 27,000 IOPS for 21TB is ~1300 IOPS/TB. Your indexes are 1.1TB, so 1.1*1300 IOPS ~= 1400 IOPS. All else being equal (which is never the case), getting 1-3 second response times for a 1.1TB index, when one link in the storage chain is capped at a few thousand IOPS, you are using networked storage and you have little RAM for caching, does not seem unrealistic. If possible, you could try temporarily boosting performance of the EBS, to see if raw IO is the bottleneck. The response time is about 1 and 3 seconds for simple queries (1 token). Is the index updated while you are searching? Do you do any faceting or other heavy processing as part of a search? How many hits does a search typically have and how many documents are returned? How many concurrent searches do you need to support? How fast should the response time be? - Toke Eskildsen
Re: Solr performance issues
On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote: I've the same index with a bit different schema and 200M documents, installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size of index is about 1.5TB, have many updates every 5 minutes, complex queries and faceting with response time of 100ms that is acceptable for us. Toke Eskildsen, Is the index updated while you are searching? *No* Do you do any faceting or other heavy processing as part of a search? *No* How many hits does a search typically have and how many documents are returned? *The test for QTime only with no documents returned and No. of hits varying from 50,000 to 50,000,000.* How many concurrent searches do you need to support? How fast should the response time be? *May be 100 concurrent searches with 100ms with facets.* Does splitting the shard to two shards on the same node so every shard will be on a single EBS Volume better than using LVM? The basic problem is simply that the system has so little memory that it must read large amounts of data from the disk when it does a query. There is not enough RAM to cache the important parts of the index. RAM is much faster than disk, even SSD. Typical consumer-grade DDR3-1600 memory has a data transfer rate of about 12800 megabytes per second. If it's ECC memory (which I would say is a requirement) then the transfer rate is probably a little bit slower than that. Figuring 9 bits for every byte gets us about 11377 MB/s. That's only an estimate, and it could be wrong in either direction, but I'll go ahead and use it. http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules If your SSD is SATA, the transfer rate will be limited to approximately 600MB/s -- the 6 gigabit per second transfer rate of the newest SATA standard. That makes memory about 18 times as fast as SATA SSD. I saw one PCI express SSD that claimed a transfer rate of 2900 MB/s. Even that is only about one fourth of the estimated speed of DDR3-1600 with ECC. I don't know what interface technology Amazon uses for their SSD volumes, but I would bet on it being the cheaper version, which would mean SATA. The networking between the EC2 instance and the EBS storage is unknown to me and may be a further bottleneck. http://ocz.com/enterprise/z-drive-4500/specifications Bottom line -- you need a lot more memory. Speeding up the disk may *help* ... but it will not replace that simple requirement. With EC2 as the platform, you may need more instances and more shards. Your 200 million document index that works well with only 90GB of total memory ... that's surprising to me. That means that the important parts of that index *do* fit in memory ... but if the index gets much larger, performance is likely to drop off sharply. Thanks, Shawn
Re: Solr performance issues
Thanks Shawn. What do you mean with important parts of index? and how to calculate their size? Thanks, Mahmoud Sent from my iPhone On Dec 29, 2014, at 8:19 PM, Shawn Heisey apa...@elyograg.org wrote: On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote: I've the same index with a bit different schema and 200M documents, installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size of index is about 1.5TB, have many updates every 5 minutes, complex queries and faceting with response time of 100ms that is acceptable for us. Toke Eskildsen, Is the index updated while you are searching? *No* Do you do any faceting or other heavy processing as part of a search? *No* How many hits does a search typically have and how many documents are returned? *The test for QTime only with no documents returned and No. of hits varying from 50,000 to 50,000,000.* How many concurrent searches do you need to support? How fast should the response time be? *May be 100 concurrent searches with 100ms with facets.* Does splitting the shard to two shards on the same node so every shard will be on a single EBS Volume better than using LVM? The basic problem is simply that the system has so little memory that it must read large amounts of data from the disk when it does a query. There is not enough RAM to cache the important parts of the index. RAM is much faster than disk, even SSD. Typical consumer-grade DDR3-1600 memory has a data transfer rate of about 12800 megabytes per second. If it's ECC memory (which I would say is a requirement) then the transfer rate is probably a little bit slower than that. Figuring 9 bits for every byte gets us about 11377 MB/s. That's only an estimate, and it could be wrong in either direction, but I'll go ahead and use it. http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules If your SSD is SATA, the transfer rate will be limited to approximately 600MB/s -- the 6 gigabit per second transfer rate of the newest SATA standard. That makes memory about 18 times as fast as SATA SSD. I saw one PCI express SSD that claimed a transfer rate of 2900 MB/s. Even that is only about one fourth of the estimated speed of DDR3-1600 with ECC. I don't know what interface technology Amazon uses for their SSD volumes, but I would bet on it being the cheaper version, which would mean SATA. The networking between the EC2 instance and the EBS storage is unknown to me and may be a further bottleneck. http://ocz.com/enterprise/z-drive-4500/specifications Bottom line -- you need a lot more memory. Speeding up the disk may *help* ... but it will not replace that simple requirement. With EC2 as the platform, you may need more instances and more shards. Your 200 million document index that works well with only 90GB of total memory ... that's surprising to me. That means that the important parts of that index *do* fit in memory ... but if the index gets much larger, performance is likely to drop off sharply. Thanks, Shawn
Re: Solr performance issues
On 12/29/2014 12:07 PM, Mahmoud Almokadem wrote: What do you mean with important parts of index? and how to calculate their size? I have no formal education in what's important when it comes to doing a query, but I can make some educated guesses. Starting with this as a reference: http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/codecs/lucene410/package-summary.html#file-names I would guess that the segment info (.si) files and the term index (*.tip) files would be supremely important to *always* have in memory, and they are fairly small. Next would be the term dictionary (*.tim) files. The term dictionary is pretty big, and would be very important for fast queries. Frequencies, positions, and norms may also be important, depending on exactly what kind of query you have. Frequencies and positions are quite large. Frequencies are critical for relevence ranking (the default sort by score), and positions are important for phrase queries. Position data may also be used by relevance ranking, but I am not familiar enough with it to say for sure. If you have docvalues defined, then *.dvm and *.dvd files would be used for facets and sorting on those specific fields. The *.dvd files can be very big, depending on your schema. The *.fdx and *.fdt files become important when actually retrieving results after the matching documents have been determined. The stored data is compressed, so additional CPU power is required to uncompress that data before it is sent to the client. Stored data may be large or small, depending on your schema. Stored data does not directly affect search speed, but if memory space is limited, every block of stored data that gets retrieved will result in some other part of the index being removed from the OS disk cache, which means that it might need to be re-read from the disk on the next query. Thanks, Shawn
RE: Solr performance issues
Mahmoud Almokadem [prog.mahm...@gmail.com] wrote: I've the same index with a bit different schema and 200M documents, installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size of index is about 1.5TB, have many updates every 5 minutes, complex queries and faceting with response time of 100ms that is acceptable for us. So you have Setup 1: 3 * (30GB RAM + 600GB SSD) for a total of 1.5TB index 200M docs. Acceptable performance. Setup 2: 3 * (60GB RAM + 1TB SSD + 500GB SSD) for a total of 3.3TB 350M docs. Poor performance. The only real difference, besides doubling everything, is the LVM? I understand why you find that to be the culprit, but from what I can read, the overhead should not be anywhere near enough to result in the performance drop you are describing. Could it be that some snapshotting or backup was running when you tested? Splitting your shards and doubling the number of machines, as you suggest, would result in Setup 3: 6 * (60GB RAM + 600GB SSD) for a total of 3.3TB 350M docs. which would be remarkable similar to your setup 1. I think that would be the next logical step, unless you can easily do a temporary boost of your IOPS. BTW: You are getting dangerously close to your storage limits here - it seems that a single large merge could make you run out of space. - Toke Eskildsen
Re: Solr performance issues
On 12/26/2014 7:17 AM, Mahmoud Almokadem wrote: We've installed a cluster of one collection of 350M documents on 3 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS General purpose (1x1TB + 1x500GB) on each instance. Then we create logical volume using LVM of 1.5TB to fit our index. The response time is about 1 and 3 seconds for simple queries (1 token). Is the LVM become a bottleneck for our index? SSD is very fast, but its speed is very slow when compared to RAM. The problem here is that Solr must read data off the disk in order to do a query, and even at SSD speeds, that is slow. LVM is not the problem here, though it's possible that it may be a contributing factor. You need more RAM. For Solr to be fast, a large percentage (ideally 100%, but smaller fractions can often be enough) of the index must be loaded into unused RAM by the operating system. Your information seems to indicate that the index is about 3 terabytes. If that's the index size, I would guess that you would need somewhere between 1 and 2 terabytes of total RAM for speed to be acceptable. Because RAM is *very* expensive on Amazon and is not available in sizes like 256GB-1TB, that typically means a lot of their virtual machines, with a lot of shards in SolrCloud. You may find that real hardware is less expensive for very large Solr indexes in the long term than cloud hardware. Thanks, Shawn
RE: Solr performance issues
Mahmoud Almokadem [prog.mahm...@gmail.com] wrote: We've installed a cluster of one collection of 350M documents on 3 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS General purpose (1x1TB + 1x500GB) on each instance. Then we create logical volume using LVM of 1.5TB to fit our index. Your search speed will be limited by the slowest storage in your group, which would be your 500GB EBS. The General Purpose SSD option means (as far as I can read at http://aws.amazon.com/ebs/details/#piops) that your baseline of 3 IOPS/MB = 1500 IOPS, with bursts of 3000 IOPS. Unfortunately they do not say anything about latency. For comparison, I checked the system logs from a local test with our 21TB / 7 billion documents index. It used ~27,000 IOPS during the test, with mean search time a bit below 1 second. That was with ~100GB RAM for disk cache, which is about ½% of index size. The test was with simple term queries (1-3 terms) and some faceting. Back of the envelope: 27,000 IOPS for 21TB is ~1300 IOPS/TB. Your indexes are 1.1TB, so 1.1*1300 IOPS ~= 1400 IOPS. All else being equal (which is never the case), getting 1-3 second response times for a 1.1TB index, when one link in the storage chain is capped at a few thousand IOPS, you are using networked storage and you have little RAM for caching, does not seem unrealistic. If possible, you could try temporarily boosting performance of the EBS, to see if raw IO is the bottleneck. The response time is about 1 and 3 seconds for simple queries (1 token). Is the index updated while you are searching? Do you do any faceting or other heavy processing as part of a search? How many hits does a search typically have and how many documents are returned? How many concurrent searches do you need to support? How fast should the response time be? - Toke Eskildsen
Solr performance issues
Dears, We've installed a cluster of one collection of 350M documents on 3 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS General purpose (1x1TB + 1x500GB) on each instance. Then we create logical volume using LVM of 1.5TB to fit our index. The response time is about 1 and 3 seconds for simple queries (1 token). Is the LVM become a bottleneck for our index? Thanks for help.
Re: Solr performance issues
Likely lots of disk + network IO, yes. Put SPM for Solr on your nodes to double check. Otis On Dec 26, 2014, at 09:17, Mahmoud Almokadem prog.mahm...@gmail.com wrote: Dears, We've installed a cluster of one collection of 350M documents on 3 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS General purpose (1x1TB + 1x500GB) on each instance. Then we create logical volume using LVM of 1.5TB to fit our index. The response time is about 1 and 3 seconds for simple queries (1 token). Is the LVM become a bottleneck for our index? Thanks for help.
Solr performance issues for simple query - q=*:* with start and rows
We have a solr core with about 115 million documents. We are trying to migrate data and running a simple query with *:* query and with start and rows param. The performance is becoming too slow in solr, its taking almost 2 mins to get 4000 rows and migration is being just too slow. Logs snippet below: INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=168308 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=122771 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=137615 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=141223 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=97474 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=98115 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=143822 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=118066 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=121498 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=164062 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=165518 INFO: [coreName] webapp=/solr path=/select params={start=55486000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=118163 INFO: [coreName] webapp=/solr path=/select params={start=55494000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=141642 INFO: [coreName] webapp=/solr path=/select params={start=5549q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=145037 I've taken some thread dumps in the solr server and most of the time the threads seem to be busy in the following stacks mostly: Is there anything that can be done to improve the performance? Is it a known issue? Its very surprising that querying for some just rows starting at some points is taking in order of minutes. 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a runnable [0x7f42865dd000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252) at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184) at org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61) at org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) 1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956 runnable [0x7f42869e1000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252) at org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:210) at org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:62) at org.apache.lucene.search.Scorer.score(Scorer.java:64) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:605) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1491) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at
Re: Solr performance issues for simple query - q=*:* with start and rows
Hi, How many shards do you have? This is a known issue with deep paging with multi shard, see https://issues.apache.org/jira/browse/SOLR-1726 You may be more successful in going to each shard, one at a time (with distrib=false) to avoid this issue. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com: We have a solr core with about 115 million documents. We are trying to migrate data and running a simple query with *:* query and with start and rows param. The performance is becoming too slow in solr, its taking almost 2 mins to get 4000 rows and migration is being just too slow. Logs snippet below: INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=168308 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=122771 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=137615 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=141223 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=97474 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=98115 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=143822 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=118066 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=121498 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=164062 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=165518 INFO: [coreName] webapp=/solr path=/select params={start=55486000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=118163 INFO: [coreName] webapp=/solr path=/select params={start=55494000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=141642 INFO: [coreName] webapp=/solr path=/select params={start=5549q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=145037 I've taken some thread dumps in the solr server and most of the time the threads seem to be busy in the following stacks mostly: Is there anything that can be done to improve the performance? Is it a known issue? Its very surprising that querying for some just rows starting at some points is taking in order of minutes. 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a runnable [0x7f42865dd000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252) at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184) at org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61) at org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) 1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956 runnable [0x7f42869e1000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252) at org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:210) at org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:62) at org.apache.lucene.search.Scorer.score(Scorer.java:64) at
Re: Solr performance issues for simple query - q=*:* with start and rows
Jan, Would the same distrib=false help for distributed faceting? We are running into a similar issue with facet paging. Dmitry On Mon, Apr 29, 2013 at 11:58 AM, Jan Høydahl jan@cominvent.com wrote: Hi, How many shards do you have? This is a known issue with deep paging with multi shard, see https://issues.apache.org/jira/browse/SOLR-1726 You may be more successful in going to each shard, one at a time (with distrib=false) to avoid this issue. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com: We have a solr core with about 115 million documents. We are trying to migrate data and running a simple query with *:* query and with start and rows param. The performance is becoming too slow in solr, its taking almost 2 mins to get 4000 rows and migration is being just too slow. Logs snippet below: INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=168308 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=122771 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=137615 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=141223 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=97474 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=98115 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=143822 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=118066 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=121498 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=164062 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=165518 INFO: [coreName] webapp=/solr path=/select params={start=55486000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=118163 INFO: [coreName] webapp=/solr path=/select params={start=55494000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=141642 INFO: [coreName] webapp=/solr path=/select params={start=5549q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=145037 I've taken some thread dumps in the solr server and most of the time the threads seem to be busy in the following stacks mostly: Is there anything that can be done to improve the performance? Is it a known issue? Its very surprising that querying for some just rows starting at some points is taking in order of minutes. 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a runnable [0x7f42865dd000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252) at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184) at org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61) at org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) 1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956 runnable [0x7f42869e1000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252) at
Re: Solr performance issues for simple query - q=*:* with start and rows
We have a single shard, and all the data is in a single box only. Definitely looks like deep-paging is having problems. Just to understand, is the searcher looping over the result set everytime and skipping the first start count? This will definitely take a toll when we reach higher start values. On 4/29/13 2:28 PM, Jan Høydahl wrote: Hi, How many shards do you have? This is a known issue with deep paging with multi shard, see https://issues.apache.org/jira/browse/SOLR-1726 You may be more successful in going to each shard, one at a time (with distrib=false) to avoid this issue. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com: We have a solr core with about 115 million documents. We are trying to migrate data and running a simple query with *:* query and with start and rows param. The performance is becoming too slow in solr, its taking almost 2 mins to get 4000 rows and migration is being just too slow. Logs snippet below: INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=168308 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=122771 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=137615 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=141223 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=97474 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=98115 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=143822 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=118066 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=121498 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=164062 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=165518 INFO: [coreName] webapp=/solr path=/select params={start=55486000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=118163 INFO: [coreName] webapp=/solr path=/select params={start=55494000q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=141642 INFO: [coreName] webapp=/solr path=/select params={start=5549q=*:*wt=javabinversion=2rows=4000} hits=115760479 status=0 QTime=145037 I've taken some thread dumps in the solr server and most of the time the threads seem to be busy in the following stacks mostly: Is there anything that can be done to improve the performance? Is it a known issue? Its very surprising that querying for some just rows starting at some points is taking in order of minutes. 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a runnable [0x7f42865dd000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252) at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184) at org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61) at org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) 1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956 runnable [0x7f42869e1000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252) at
Re: Solr performance issues for simple query - q=*:* with start and rows
Abhishek, There is a wiki regarding this: http://wiki.apache.org/solr/CommonQueryParameters search pageDoc and pageScore. On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam abhi.sanou...@gmail.comwrote: We have a single shard, and all the data is in a single box only. Definitely looks like deep-paging is having problems. Just to understand, is the searcher looping over the result set everytime and skipping the first start count? This will definitely take a toll when we reach higher start values. On 4/29/13 2:28 PM, Jan Høydahl wrote: Hi, How many shards do you have? This is a known issue with deep paging with multi shard, see https://issues.apache.org/**jira/browse/SOLR-1726https://issues.apache.org/jira/browse/SOLR-1726 You may be more successful in going to each shard, one at a time (with distrib=false) to avoid this issue. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com : We have a solr core with about 115 million documents. We are trying to migrate data and running a simple query with *:* query and with start and rows param. The performance is becoming too slow in solr, its taking almost 2 mins to get 4000 rows and migration is being just too slow. Logs snippet below: INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=168308 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=122771 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=137615 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141223 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=97474 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=98115 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=143822 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118066 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=121498 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=164062 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=165518 INFO: [coreName] webapp=/solr path=/select params={start=55486000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118163 INFO: [coreName] webapp=/solr path=/select params={start=55494000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141642 INFO: [coreName] webapp=/solr path=/select params={start=5549q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=145037 I've taken some thread dumps in the solr server and most of the time the threads seem to be busy in the following stacks mostly: Is there anything that can be done to improve the performance? Is it a known issue? Its very surprising that querying for some just rows starting at some points is taking in order of minutes. 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a runnable [0x7f42865dd000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.**PriorityQueue.downHeap(** PriorityQueue.java:252) at org.apache.lucene.util.**PriorityQueue.pop(** PriorityQueue.java:184) at org.apache.lucene.search.**TopDocsCollector.** populateResults(**TopDocsCollector.java:61) at org.apache.lucene.search.**TopDocsCollector.topDocs(** TopDocsCollector.java:156) at org.apache.solr.search.**SolrIndexSearcher.**getDocListNC(** SolrIndexSearcher.java:1499) at org.apache.solr.search.**SolrIndexSearcher.getDocListC(** SolrIndexSearcher.java:1366) at org.apache.solr.search.**SolrIndexSearcher.search(** SolrIndexSearcher.java:457) at org.apache.solr.handler.**component.QueryComponent.** process(QueryComponent.java:**410) at org.apache.solr.handler.**component.SearchHandler.** handleRequestBody(**SearchHandler.java:208) at org.apache.solr.handler.**RequestHandlerBase.**handleRequest( **RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1817) at org.apache.solr.servlet.**SolrDispatchFilter.execute(**
Re: Solr performance issues for simple query - q=*:* with start and rows
We've found that you can do a lot for yourself by using a filter query to page through your data if it has a natural range to do so instead of start and rows. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan solrexp...@gmail.com wrote: Abhishek, There is a wiki regarding this: http://wiki.apache.org/solr/CommonQueryParameters search pageDoc and pageScore. On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam abhi.sanou...@gmail.comwrote: We have a single shard, and all the data is in a single box only. Definitely looks like deep-paging is having problems. Just to understand, is the searcher looping over the result set everytime and skipping the first start count? This will definitely take a toll when we reach higher start values. On 4/29/13 2:28 PM, Jan Høydahl wrote: Hi, How many shards do you have? This is a known issue with deep paging with multi shard, see https://issues.apache.org/**jira/browse/SOLR-1726https://issues.apache.org/jira/browse/SOLR-1726 You may be more successful in going to each shard, one at a time (with distrib=false) to avoid this issue. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com : We have a solr core with about 115 million documents. We are trying to migrate data and running a simple query with *:* query and with start and rows param. The performance is becoming too slow in solr, its taking almost 2 mins to get 4000 rows and migration is being just too slow. Logs snippet below: INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=168308 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=122771 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=137615 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141223 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=97474 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=98115 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=143822 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118066 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=121498 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=164062 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=165518 INFO: [coreName] webapp=/solr path=/select params={start=55486000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118163 INFO: [coreName] webapp=/solr path=/select params={start=55494000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141642 INFO: [coreName] webapp=/solr path=/select params={start=5549q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=145037 I've taken some thread dumps in the solr server and most of the time the threads seem to be busy in the following stacks mostly: Is there anything that can be done to improve the performance? Is it a known issue? Its very surprising that querying for some just rows starting at some points is taking in order of minutes. 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a runnable [0x7f42865dd000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.**PriorityQueue.downHeap(** PriorityQueue.java:252) at org.apache.lucene.util.**PriorityQueue.pop(** PriorityQueue.java:184) at org.apache.lucene.search.**TopDocsCollector.** populateResults(**TopDocsCollector.java:61) at org.apache.lucene.search.**TopDocsCollector.topDocs(** TopDocsCollector.java:156) at org.apache.solr.search.**SolrIndexSearcher.**getDocListNC(** SolrIndexSearcher.java:1499) at org.apache.solr.search.**SolrIndexSearcher.getDocListC(** SolrIndexSearcher.java:1366) at org.apache.solr.search.**SolrIndexSearcher.search(** SolrIndexSearcher.java:457) at
Re: Solr performance issues for simple query - q=*:* with start and rows
I guess so, you'd have to use a filter query to page through the set of documents you were faceting against and sum them all at the end. It's not quite the same operation as paging through results, because facets are aggregate statistics, but if you're willing to go through the trouble, I bet it would also help performance. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Apr 29, 2013 at 9:06 AM, Dmitry Kan solrexp...@gmail.com wrote: Michael, Interesting! Do (Can) you apply this to facet searches as well? Dmitry On Mon, Apr 29, 2013 at 4:02 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: We've found that you can do a lot for yourself by using a filter query to page through your data if it has a natural range to do so instead of start and rows. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan solrexp...@gmail.com wrote: Abhishek, There is a wiki regarding this: http://wiki.apache.org/solr/CommonQueryParameters search pageDoc and pageScore. On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam abhi.sanou...@gmail.comwrote: We have a single shard, and all the data is in a single box only. Definitely looks like deep-paging is having problems. Just to understand, is the searcher looping over the result set everytime and skipping the first start count? This will definitely take a toll when we reach higher start values. On 4/29/13 2:28 PM, Jan Høydahl wrote: Hi, How many shards do you have? This is a known issue with deep paging with multi shard, see https://issues.apache.org/**jira/browse/SOLR-1726 https://issues.apache.org/jira/browse/SOLR-1726 You may be more successful in going to each shard, one at a time (with distrib=false) to avoid this issue. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com : We have a solr core with about 115 million documents. We are trying to migrate data and running a simple query with *:* query and with start and rows param. The performance is becoming too slow in solr, its taking almost 2 mins to get 4000 rows and migration is being just too slow. Logs snippet below: INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=168308 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=122771 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=137615 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141223 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=97474 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=98115 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=143822 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118066 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=121498 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=164062 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=165518 INFO: [coreName] webapp=/solr path=/select params={start=55486000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118163 INFO: [coreName] webapp=/solr path=/select params={start=55494000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141642 INFO: [coreName] webapp=/solr path=/select params={start=5549q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=145037 I've taken some thread dumps in the solr server and most of the time the threads seem to be busy in the following stacks mostly: Is there anything that can be done to improve the performance? Is it a known issue? Its very surprising that querying for some just rows starting at some
Re: Solr performance issues for simple query - q=*:* with start and rows
Thanks. Only question is how to smoothly transition to this model. Our facet (string) fields contain timestamp prefixes, that are reverse ordered starting from the freshest value. In theory, we could try computing the filter queries for those. But before doing so, we would need the matched ids from solr, so it becomes at least 2 pass algorithm? The biggest concern in general we have with the paging is that the system seems to pass way more data back and forth, than is needed for computing the values. On Mon, Apr 29, 2013 at 4:14 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I guess so, you'd have to use a filter query to page through the set of documents you were faceting against and sum them all at the end. It's not quite the same operation as paging through results, because facets are aggregate statistics, but if you're willing to go through the trouble, I bet it would also help performance. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Apr 29, 2013 at 9:06 AM, Dmitry Kan solrexp...@gmail.com wrote: Michael, Interesting! Do (Can) you apply this to facet searches as well? Dmitry On Mon, Apr 29, 2013 at 4:02 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: We've found that you can do a lot for yourself by using a filter query to page through your data if it has a natural range to do so instead of start and rows. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan solrexp...@gmail.com wrote: Abhishek, There is a wiki regarding this: http://wiki.apache.org/solr/CommonQueryParameters search pageDoc and pageScore. On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam abhi.sanou...@gmail.comwrote: We have a single shard, and all the data is in a single box only. Definitely looks like deep-paging is having problems. Just to understand, is the searcher looping over the result set everytime and skipping the first start count? This will definitely take a toll when we reach higher start values. On 4/29/13 2:28 PM, Jan Høydahl wrote: Hi, How many shards do you have? This is a known issue with deep paging with multi shard, see https://issues.apache.org/**jira/browse/SOLR-1726 https://issues.apache.org/jira/browse/SOLR-1726 You may be more successful in going to each shard, one at a time (with distrib=false) to avoid this issue. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com : We have a solr core with about 115 million documents. We are trying to migrate data and running a simple query with *:* query and with start and rows param. The performance is becoming too slow in solr, its taking almost 2 mins to get 4000 rows and migration is being just too slow. Logs snippet below: INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=168308 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=122771 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=137615 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141223 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=97474 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=98115 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=143822 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118066 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=121498 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=164062 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:* **wt=javabinversion=2rows=**4000} hits=115760479 status=0
Re: Occasional Solr performance issues
On Mon, Oct 29, 2012 at 7:04 AM, Shawn Heisey s...@elyograg.org wrote: They are indeed Java options. The first two control the maximum and starting heap sizes. NewRatio controls the relative size of the young and old generations, making the young generation considerably larger than it is by default. The others are garbage collector options. This seems to be a good summary: http://www.petefreitag.com/articles/gctuning/ Here's the official Sun (Oracle) documentation on GC tuning: http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html Thank you Shawn! Those are exactly the documents that I need. Google should hire you to fill in the pages when someone searches for java garbage collection. Interestingly, I just check and bing.com does list the Oracle page on the first pager of results. I shudder to think that I might have to switch search engines! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
On Fri, Oct 26, 2012 at 11:04 PM, Shawn Heisey s...@elyograg.org wrote: Warming doesn't seem to be a problem here -- all your warm times are zero, so I am going to take a guess that it may be a heap/GC issue. I would recommend starting with the following additional arguments to your JVM. Since I have no idea how solr gets started on your server, I don't know where you would add these: -Xmx4096M -Xms4096M -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled Thanks. I've added those flags to the Solr line that I use to start Solr. Those are Java flags, not Solr, correct? I'm googling the flags now, but I find it interesting that I cannot find a canonical reference for them. This allocates 4GB of RAM to java, sets up a larger than normal Eden space in the heap, and uses garbage collection options that usually fare better in a server environment than the default.Java memory management options are like religion to some people ... I may start a flamewar with these recommendations. ;) The best I can tell you about these choices: They made a big difference for me. Thanks. I will experiment with them empirically. The first step is to learn to read the debug info, though. I've been googing for days, but I must be missing something. Where is the information that I pasted in pastebin documented? I would also recommend switching to a Sun/Oracle jvm. I have heard that previous versions of Solr were not happy on variants like OpenJDK, I have no idea whether that might still be the case with 4.0. If you choose to do this, you probably have package choices in Ubuntu. I know that in Debian, the package is called sun-java6-jre ... Ubuntu is probably something similar. Debian has a CLI command 'update-java-alternatives' that will quickly switch between different java implementations that are installed. Hopefully Ubuntu also has this. If not, you might need the following command instead to switch the main java executable: update-alternatives --config java Thanks, I will take a look at the current Oracle JVM. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
On 10/28/2012 2:28 PM, Dotan Cohen wrote: On Fri, Oct 26, 2012 at 11:04 PM, Shawn Heisey s...@elyograg.org wrote: Warming doesn't seem to be a problem here -- all your warm times are zero, so I am going to take a guess that it may be a heap/GC issue. I would recommend starting with the following additional arguments to your JVM. Since I have no idea how solr gets started on your server, I don't know where you would add these: -Xmx4096M -Xms4096M -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled Thanks. I've added those flags to the Solr line that I use to start Solr. Those are Java flags, not Solr, correct? I'm googling the flags now, but I find it interesting that I cannot find a canonical reference for them. They are indeed Java options. The first two control the maximum and starting heap sizes. NewRatio controls the relative size of the young and old generations, making the young generation considerably larger than it is by default. The others are garbage collector options. This seems to be a good summary: http://www.petefreitag.com/articles/gctuning/ Here's the official Sun (Oracle) documentation on GC tuning: http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html Thanks, Shawn
Re: Occasional Solr performance issues
On Wed, Oct 24, 2012 at 4:33 PM, Walter Underwood wun...@wunderwood.org wrote: Please consider never running optimize. That should be called force merge. Thanks. I have been letting the system run for about two days already without an optimize. I will let it run a week, then merge to see the effect. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
I spoke too soon! Wereas three days ago when the index was new 500 records could be written to it in 3 seconds, now that operation is taking a minute and a half, sometimes longer. I ran optimize() but that did not help the writes. What can I do to improve the write performance? Even opening the Logging tab of the Solr instance is taking quite a long time. In fact, I just left it for 20 minutes and it still hasn't come back with anything. I do have an SSH window open on the server hosting Solr and it doesn't look overloaded at all: $ date du -sh data/ uptime free -m Fri Oct 26 13:15:59 UTC 2012 578Mdata/ 13:15:59 up 4 days, 17:59, 1 user, load average: 0.06, 0.12, 0.22 total used free sharedbuffers cached Mem: 14980 3237 11743 0284 -/+ buffers/cache:729 14250 Swap:0 0 0 -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
On 10/26/2012 7:16 AM, Dotan Cohen wrote: I spoke too soon! Wereas three days ago when the index was new 500 records could be written to it in 3 seconds, now that operation is taking a minute and a half, sometimes longer. I ran optimize() but that did not help the writes. What can I do to improve the write performance? Even opening the Logging tab of the Solr instance is taking quite a long time. In fact, I just left it for 20 minutes and it still hasn't come back with anything. I do have an SSH window open on the server hosting Solr and it doesn't look overloaded at all: $ date du -sh data/ uptime free -m Fri Oct 26 13:15:59 UTC 2012 578Mdata/ 13:15:59 up 4 days, 17:59, 1 user, load average: 0.06, 0.12, 0.22 total used free sharedbuffers cached Mem: 14980 3237 11743 0284 -/+ buffers/cache:729 14250 Swap:0 0 0 Taking all the information I've seen so far, my bet is on either cache warming or heap/GC trouble as the source of your problem. It's now specific information gathering time. Can you gather all the following information and put it into a web paste page, such as pastie.org, and reply with the link? I have gathered the same information from my test server and created a pastie example. http://pastie.org/5118979 On the dashboard of the GUI, it lists all the jvm arguments. Include those. Click Java Properties and gather the java.runtime.version and java.specification.vendor information. After one of the long update times, pause/stop your indexing application. Click on your core in the GUI, open Plugins/Stats, and paste the following bits with a header to indicate what each section is: CACHE-filterCache CACHE-queryResultCache CORE-searcher Thanks, Shawn
Re: Occasional Solr performance issues
On Fri, Oct 26, 2012 at 4:02 PM, Shawn Heisey s...@elyograg.org wrote: Taking all the information I've seen so far, my bet is on either cache warming or heap/GC trouble as the source of your problem. It's now specific information gathering time. Can you gather all the following information and put it into a web paste page, such as pastie.org, and reply with the link? I have gathered the same information from my test server and created a pastie example. http://pastie.org/5118979 On the dashboard of the GUI, it lists all the jvm arguments. Include those. Click Java Properties and gather the java.runtime.version and java.specification.vendor information. After one of the long update times, pause/stop your indexing application. Click on your core in the GUI, open Plugins/Stats, and paste the following bits with a header to indicate what each section is: CACHE-filterCache CACHE-queryResultCache CORE-searcher Thanks, Shawn Thank you Shawn. The information is here: http://pastebin.com/aqEfeYVA -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
On 10/26/2012 9:41 AM, Dotan Cohen wrote: On the dashboard of the GUI, it lists all the jvm arguments. Include those. Click Java Properties and gather the java.runtime.version and java.specification.vendor information. After one of the long update times, pause/stop your indexing application. Click on your core in the GUI, open Plugins/Stats, and paste the following bits with a header to indicate what each section is: CACHE-filterCache CACHE-queryResultCache CORE-searcher Thanks, Shawn Thank you Shawn. The information is here: http://pastebin.com/aqEfeYVA Warming doesn't seem to be a problem here -- all your warm times are zero, so I am going to take a guess that it may be a heap/GC issue. I would recommend starting with the following additional arguments to your JVM. Since I have no idea how solr gets started on your server, I don't know where you would add these: -Xmx4096M -Xms4096M -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled This allocates 4GB of RAM to java, sets up a larger than normal Eden space in the heap, and uses garbage collection options that usually fare better in a server environment than the default.Java memory management options are like religion to some people ... I may start a flamewar with these recommendations. ;) The best I can tell you about these choices: They made a big difference for me. I would also recommend switching to a Sun/Oracle jvm. I have heard that previous versions of Solr were not happy on variants like OpenJDK, I have no idea whether that might still be the case with 4.0. If you choose to do this, you probably have package choices in Ubuntu. I know that in Debian, the package is called sun-java6-jre ... Ubuntu is probably something similar. Debian has a CLI command 'update-java-alternatives' that will quickly switch between different java implementations that are installed. Hopefully Ubuntu also has this. If not, you might need the following command instead to switch the main java executable: update-alternatives --config java Thanks, Shawn
Re: Occasional Solr performance issues
On Tue, Oct 23, 2012 at 3:07 PM, Erick Erickson erickerick...@gmail.com wrote: Maybe you've been looking at it but one thing that I didn't see on a fast scan was that maybe the commit bit is the problem. When you commit, eventually the segments will be merged and a new searcher will be opened (this is true even if you're NOT optimizing). So you're effectively committing every 1-2 seconds, creating many segments which get merged, but more importantly opening new searchers (which you are getting since you pasted the message: Overlapping onDeckSearchers=2). You could pinpoint this by NOT committing explicitly, just set your autocommit parameters (or specify commitWithin in your indexing program, which is preferred). Try setting it at a minute or so and see if your problem goes away perhaps? The NRT stuff happens on soft commits, so you have that option to have the documents immediately available for search. Thanks, Erick. I'll play around with different configurations. So far just removing the periodic optimize command worked wonders. I'll see how much it helps or hurts to run that daily or more or less frequent. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
Please consider never running optimize. That should be called force merge. wunder On Oct 24, 2012, at 3:28 AM, Dotan Cohen wrote: On Tue, Oct 23, 2012 at 3:07 PM, Erick Erickson erickerick...@gmail.com wrote: Maybe you've been looking at it but one thing that I didn't see on a fast scan was that maybe the commit bit is the problem. When you commit, eventually the segments will be merged and a new searcher will be opened (this is true even if you're NOT optimizing). So you're effectively committing every 1-2 seconds, creating many segments which get merged, but more importantly opening new searchers (which you are getting since you pasted the message: Overlapping onDeckSearchers=2). You could pinpoint this by NOT committing explicitly, just set your autocommit parameters (or specify commitWithin in your indexing program, which is preferred). Try setting it at a minute or so and see if your problem goes away perhaps? The NRT stuff happens on soft commits, so you have that option to have the documents immediately available for search. Thanks, Erick. I'll play around with different configurations. So far just removing the periodic optimize command worked wonders. I'll see how much it helps or hurts to run that daily or more or less frequent. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
Maybe you've been looking at it but one thing that I didn't see on a fast scan was that maybe the commit bit is the problem. When you commit, eventually the segments will be merged and a new searcher will be opened (this is true even if you're NOT optimizing). So you're effectively committing every 1-2 seconds, creating many segments which get merged, but more importantly opening new searchers (which you are getting since you pasted the message: Overlapping onDeckSearchers=2). You could pinpoint this by NOT committing explicitly, just set your autocommit parameters (or specify commitWithin in your indexing program, which is preferred). Try setting it at a minute or so and see if your problem goes away perhaps? The NRT stuff happens on soft commits, so you have that option to have the documents immediately available for search. Best Erick On Mon, Oct 22, 2012 at 10:44 AM, Dotan Cohen dotanco...@gmail.com wrote: I've got a script writing ~50 documents to Solr at a time, then commiting. Each of these documents is no longer than 1 KiB of text, some much less. Usually the write-and-commit will take 1-2 seconds or less, but sometimes it can go over 60 seconds. During a recent time of over-60-second write-and-commits, I saw that the server did not look overloaded: $ uptime 14:36:46 up 19:20, 1 user, load average: 1.08, 1.16, 1.16 $ free -m total used free sharedbuffers cached Mem: 14980 2091 12889 0233 1243 -/+ buffers/cache:613 14366 Swap:0 0 0 Other than Solr, nothing is running on this machine other than stock Ubuntu Server services (no Apache, no MySQL). The machine is running on an Extra Large Amazon EC2 instance, with a virtual 4-core 2.4 GHz Xeon processor and ~16 GiB of RAM. The solr home is on a mounted EBS volume. What might make some queries take so long, while others perform fine? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
When Solr is slow, I'm seeing these in the logs: [collection1] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. [collection1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 Googling, I found this in the FAQ: Typically the way to avoid this error is to either reduce the frequency of commits, or reduce the amount of warming a searcher does while it's on deck (by reducing the work in newSearcher listeners, and/or reducing the autowarmCount on your caches) http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F I happen to know that the script will try to commit once every 60 seconds. How does one reduce the work in newSearcher listeners? What effect will this have? What effect will reducing the autowarmCount on caches have? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
Hello! You can check if the long warming is causing the overlapping searchers. Check Solr admin panel and look at cache statistics, there should be warmupTime property. Lowering the autowarmCount should lower the time needed to warm up, howere you can also look at your warming queries (if you have such) and see how long they take. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch When Solr is slow, I'm seeing these in the logs: [collection1] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. [collection1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 Googling, I found this in the FAQ: Typically the way to avoid this error is to either reduce the frequency of commits, or reduce the amount of warming a searcher does while it's on deck (by reducing the work in newSearcher listeners, and/or reducing the autowarmCount on your caches) http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F I happen to know that the script will try to commit once every 60 seconds. How does one reduce the work in newSearcher listeners? What effect will this have? What effect will reducing the autowarmCount on caches have? Thanks.
Re: Occasional Solr performance issues
Are you using Solr 3X? The occasional long commit should no longer show up in Solr 4. - Mark On Mon, Oct 22, 2012 at 10:44 AM, Dotan Cohen dotanco...@gmail.com wrote: I've got a script writing ~50 documents to Solr at a time, then commiting. Each of these documents is no longer than 1 KiB of text, some much less. Usually the write-and-commit will take 1-2 seconds or less, but sometimes it can go over 60 seconds. During a recent time of over-60-second write-and-commits, I saw that the server did not look overloaded: $ uptime 14:36:46 up 19:20, 1 user, load average: 1.08, 1.16, 1.16 $ free -m total used free sharedbuffers cached Mem: 14980 2091 12889 0233 1243 -/+ buffers/cache:613 14366 Swap:0 0 0 Other than Solr, nothing is running on this machine other than stock Ubuntu Server services (no Apache, no MySQL). The machine is running on an Extra Large Amazon EC2 instance, with a virtual 4-core 2.4 GHz Xeon processor and ~16 GiB of RAM. The solr home is on a mounted EBS volume. What might make some queries take so long, while others perform fine? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com -- - Mark
Re: Occasional Solr performance issues
On Mon, Oct 22, 2012 at 5:02 PM, Rafał Kuć r@solr.pl wrote: Hello! You can check if the long warming is causing the overlapping searchers. Check Solr admin panel and look at cache statistics, there should be warmupTime property. Thank you, I have gone over the Solr admin panel twice and I cannot find the cache statistics. Where are they? Lowering the autowarmCount should lower the time needed to warm up, howere you can also look at your warming queries (if you have such) and see how long they take. Thank you, I will look at that! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
On Mon, Oct 22, 2012 at 5:27 PM, Mark Miller markrmil...@gmail.com wrote: Are you using Solr 3X? The occasional long commit should no longer show up in Solr 4. Thank you Mark. In fact, this is the production release of Solr 4. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
On 10/22/2012 9:58 AM, Dotan Cohen wrote: Thank you, I have gone over the Solr admin panel twice and I cannot find the cache statistics. Where are they? If you are running Solr4, you can see individual cache autowarming times here, assuming your core is named collection1: http://server:port/solr/#/collection1/plugins/cache?entry=queryResultCache http://server:port/solr/#/collection1/plugins/cache?entry=filterCache The warmup time for the entire searcher can be found here: http://server:port/solr/#/collection1/plugins/core?entry=searcher If you are on an older Solr release, everything is in various sections of the stats page. Do a page search for warmup multiple times to see them all: http://server:port/solr/corename/admin/stats.jsp Thanks, Shawn
Re: Occasional Solr performance issues
On Mon, Oct 22, 2012 at 7:29 PM, Shawn Heisey s...@elyograg.org wrote: On 10/22/2012 9:58 AM, Dotan Cohen wrote: Thank you, I have gone over the Solr admin panel twice and I cannot find the cache statistics. Where are they? If you are running Solr4, you can see individual cache autowarming times here, assuming your core is named collection1: http://server:port/solr/#/collection1/plugins/cache?entry=queryResultCache http://server:port/solr/#/collection1/plugins/cache?entry=filterCache The warmup time for the entire searcher can be found here: http://server:port/solr/#/collection1/plugins/core?entry=searcher Thank you Shawn! I can see how I missed that data. I'm reviewing it now. Solr has a low barrier to entry, but quite a learning curve. I'm loving it! I see that the server is using less than 2 GiB of memory, whereas it is a dedicated Solr server with 16 GiB of memory. I understand that I can increase the query and document caches to increase performance, but I worry that this will increase the warm-up time to unacceptable levels. What is a good strategy for increasing the caches yet preserving performance after an optimize operation? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
Perhaps you can grab a snapshot of the stack traces when the 60 second delay is occurring? You can get the stack traces right in the admin ui, or you can use another tool (jconsole, visualvm, jstack cmd line, etc) - Mark On Mon, Oct 22, 2012 at 1:47 PM, Dotan Cohen dotanco...@gmail.com wrote: On Mon, Oct 22, 2012 at 7:29 PM, Shawn Heisey s...@elyograg.org wrote: On 10/22/2012 9:58 AM, Dotan Cohen wrote: Thank you, I have gone over the Solr admin panel twice and I cannot find the cache statistics. Where are they? If you are running Solr4, you can see individual cache autowarming times here, assuming your core is named collection1: http://server:port/solr/#/collection1/plugins/cache?entry=queryResultCache http://server:port/solr/#/collection1/plugins/cache?entry=filterCache The warmup time for the entire searcher can be found here: http://server:port/solr/#/collection1/plugins/core?entry=searcher Thank you Shawn! I can see how I missed that data. I'm reviewing it now. Solr has a low barrier to entry, but quite a learning curve. I'm loving it! I see that the server is using less than 2 GiB of memory, whereas it is a dedicated Solr server with 16 GiB of memory. I understand that I can increase the query and document caches to increase performance, but I worry that this will increase the warm-up time to unacceptable levels. What is a good strategy for increasing the caches yet preserving performance after an optimize operation? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com -- - Mark
Re: Occasional Solr performance issues
On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote: Perhaps you can grab a snapshot of the stack traces when the 60 second delay is occurring? You can get the stack traces right in the admin ui, or you can use another tool (jconsole, visualvm, jstack cmd line, etc) Thanks. I've refactored so that the index is optimized once per hour, instead after each dump of commits. But when I will need to increase the optmize frequency in the future I will go through the stack traces. Thanks! In any case, the server has an extra 14 GiB of memory available, how might I make the best use of that for Solr assuming both heavy reads and writes? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
First, stop optimizing. You do not need to manually force merges. The system does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and might be the cause of your problem. Second, the OS will use the extra memory for file buffers, which really helps performance, so you might not need to do anything. This will work better after you stop forcing merges. A forced merge replaces every file, so the OS needs to reload everything into file buffers. wunder On Oct 22, 2012, at 12:55 PM, Dotan Cohen wrote: On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote: Perhaps you can grab a snapshot of the stack traces when the 60 second delay is occurring? You can get the stack traces right in the admin ui, or you can use another tool (jconsole, visualvm, jstack cmd line, etc) Thanks. I've refactored so that the index is optimized once per hour, instead after each dump of commits. But when I will need to increase the optmize frequency in the future I will go through the stack traces. Thanks! In any case, the server has an extra 14 GiB of memory available, how might I make the best use of that for Solr assuming both heavy reads and writes? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
Has the Solr team considered renaming the optimize function to avoid leading people down the path of this antipattern? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Oct 22, 2012 at 4:01 PM, Walter Underwood wun...@wunderwood.org wrote: First, stop optimizing. You do not need to manually force merges. The system does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and might be the cause of your problem. Second, the OS will use the extra memory for file buffers, which really helps performance, so you might not need to do anything. This will work better after you stop forcing merges. A forced merge replaces every file, so the OS needs to reload everything into file buffers. wunder On Oct 22, 2012, at 12:55 PM, Dotan Cohen wrote: On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote: Perhaps you can grab a snapshot of the stack traces when the 60 second delay is occurring? You can get the stack traces right in the admin ui, or you can use another tool (jconsole, visualvm, jstack cmd line, etc) Thanks. I've refactored so that the index is optimized once per hour, instead after each dump of commits. But when I will need to increase the optmize frequency in the future I will go through the stack traces. Thanks! In any case, the server has an extra 14 GiB of memory available, how might I make the best use of that for Solr assuming both heavy reads and writes? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
Lucene already did that: https://issues.apache.org/jira/browse/LUCENE-3454 Here is the Solr issue: https://issues.apache.org/jira/browse/SOLR-3141 People over-use this regardless of the name. In Ultraseek Server, it was called force merge and we had to tell people to stop doing that nearly every month. wunder On Oct 22, 2012, at 1:39 PM, Michael Della Bitta wrote: Has the Solr team considered renaming the optimize function to avoid leading people down the path of this antipattern? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Oct 22, 2012 at 4:01 PM, Walter Underwood wun...@wunderwood.org wrote: First, stop optimizing. You do not need to manually force merges. The system does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and might be the cause of your problem. Second, the OS will use the extra memory for file buffers, which really helps performance, so you might not need to do anything. This will work better after you stop forcing merges. A forced merge replaces every file, so the OS needs to reload everything into file buffers. wunder On Oct 22, 2012, at 12:55 PM, Dotan Cohen wrote: On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote: Perhaps you can grab a snapshot of the stack traces when the 60 second delay is occurring? You can get the stack traces right in the admin ui, or you can use another tool (jconsole, visualvm, jstack cmd line, etc) Thanks. I've refactored so that the index is optimized once per hour, instead after each dump of commits. But when I will need to increase the optmize frequency in the future I will go through the stack traces. Thanks! In any case, the server has an extra 14 GiB of memory available, how might I make the best use of that for Solr assuming both heavy reads and writes? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com -- Walter Underwood wun...@wunderwood.org
Re: Occasional Solr performance issues
On Mon, Oct 22, 2012 at 4:39 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Has the Solr team considered renaming the optimize function to avoid leading people down the path of this antipattern? If it were never the right thing to do, it could simply be removed. The problem is that it's sometimes the right thing to do - but it depends heavily on the use cases and trade-offs. The best thing is to simply document what it does and the cost of doing it. -Yonik http://lucidworks.com
Re: Occasional Solr performance issues
On Mon, Oct 22, 2012 at 10:01 PM, Walter Underwood wun...@wunderwood.org wrote: First, stop optimizing. You do not need to manually force merges. The system does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and might be the cause of your problem. Thanks. Looking at the index statistics, I see that within minutes after running optimize that the stats say the index needs to be reoptimized. Though, the index still reads and writes fine even in that state. Second, the OS will use the extra memory for file buffers, which really helps performance, so you might not need to do anything. This will work better after you stop forcing merges. A forced merge replaces every file, so the OS needs to reload everything into file buffers. I don't see that the memory is being used: $ free -g total used free sharedbuffers cached Mem:14 2 12 0 0 1 -/+ buffers/cache: 0 14 Swap:0 0 0 -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
On Mon, Oct 22, 2012 at 10:44 PM, Walter Underwood wun...@wunderwood.org wrote: Lucene already did that: https://issues.apache.org/jira/browse/LUCENE-3454 Here is the Solr issue: https://issues.apache.org/jira/browse/SOLR-3141 People over-use this regardless of the name. In Ultraseek Server, it was called force merge and we had to tell people to stop doing that nearly every month. Thank you for those links. I commented on the Solr bug. There are some very insightful comments in there. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Occasional Solr performance issues
On 10/22/2012 3:11 PM, Dotan Cohen wrote: On Mon, Oct 22, 2012 at 10:01 PM, Walter Underwood wun...@wunderwood.org wrote: First, stop optimizing. You do not need to manually force merges. The system does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and might be the cause of your problem. Thanks. Looking at the index statistics, I see that within minutes after running optimize that the stats say the index needs to be reoptimized. Though, the index still reads and writes fine even in that state. As soon as you make any change at all to an index, it's no longer optimized. Delete one document, add one document, anything. Most of the time you will not see a performance increase from optimizing an index that consists of one large segment and a bunch of very tiny segments or deleted documents. Second, the OS will use the extra memory for file buffers, which really helps performance, so you might not need to do anything. This will work better after you stop forcing merges. A forced merge replaces every file, so the OS needs to reload everything into file buffers. I don't see that the memory is being used: $ free -g total used free sharedbuffers cached Mem:14 2 12 0 0 1 -/+ buffers/cache: 0 14 Swap:0 0 0 How big is your index, and did you run this right after a reboot? If you did, then the cache will be fairly empty, and Solr has only read enough from the index files to open the searcher.The number is probably too small to show up on a gigabyte scale. As you issue queries, the cached amount will get bigger. If your index is small enough to fit in the 14GB of free RAM that you have, you can manually populate the disk cache by going to your index directory and doing 'cat * /dev/null' from the commandline or a script. The first time you do it, it may go slowly, but if you immediately do it again, it will complete VERY fast -- the data will all be in RAM. The 'free -m' command in your first email shows cache usage of 1243MB, which suggests that maybe your index is considerably smaller than your available RAM. Having loads of free RAM is a good thing for just about any workload, but especially for Solr.Try running the free command without the -g so you can see those numbers in kilobytes. I have seen a tendency towards creating huge caches in Solr because people have lots of memory. It's important to realize that the OS is far better at the overall job of caching the index files than Solr itself is. Solr caches are meant to cache result sets from queries and filters, not large sections of the actual index contents. Make the caches big enough that you see some benefit, but not big enough to suck up all your RAM. If you are having warm time problems, make the autowarm counts low. I have run into problems with warming on my filter cache, because we have filters that are extremely hairy and slow to run. I had to reduce my autowarm count on the filter cache to FOUR, with a cache size of 512. When it is 8 or higher, it can take over a minute to autowarm. Thanks, Shawn
Re: Occasional Solr performance issues
On Tue, Oct 23, 2012 at 3:52 AM, Shawn Heisey s...@elyograg.org wrote: As soon as you make any change at all to an index, it's no longer optimized. Delete one document, add one document, anything. Most of the time you will not see a performance increase from optimizing an index that consists of one large segment and a bunch of very tiny segments or deleted documents. I've since realized that by experimentation. I've probably saved quite a few minutes of reading time by investing hours of experiment time! How big is your index, and did you run this right after a reboot? If you did, then the cache will be fairly empty, and Solr has only read enough from the index files to open the searcher.The number is probably too small to show up on a gigabyte scale. As you issue queries, the cached amount will get bigger. If your index is small enough to fit in the 14GB of free RAM that you have, you can manually populate the disk cache by going to your index directory and doing 'cat * /dev/null' from the commandline or a script. The first time you do it, it may go slowly, but if you immediately do it again, it will complete VERY fast -- the data will all be in RAM. The cat trick to get the files in RAM is great. I would not have thought that would work for binary files. The index is small, much less than the available RAM, for the time being. Therefore, there was nothing to fill it with I now understand. Both 'free' outputs were after the system had been running for some time. The 'free -m' command in your first email shows cache usage of 1243MB, which suggests that maybe your index is considerably smaller than your available RAM. Having loads of free RAM is a good thing for just about any workload, but especially for Solr.Try running the free command without the -g so you can see those numbers in kilobytes. I have seen a tendency towards creating huge caches in Solr because people have lots of memory. It's important to realize that the OS is far better at the overall job of caching the index files than Solr itself is. Solr caches are meant to cache result sets from queries and filters, not large sections of the actual index contents. Make the caches big enough that you see some benefit, but not big enough to suck up all your RAM. I see, thanks. If you are having warm time problems, make the autowarm counts low. I have run into problems with warming on my filter cache, because we have filters that are extremely hairy and slow to run. I had to reduce my autowarm count on the filter cache to FOUR, with a cache size of 512. When it is 8 or higher, it can take over a minute to autowarm. I will have to experiment with the warning. Thank you for the tips. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Solr Performance Issues
? If you don't leave enough free memory for the OS, the OS won't have a large enough disk cache, and you will be hitting the disk for lots of queries. You might want to monitor your Disk I/O using iostat and look at the iowait. If you are doing phrase queries and your *prx file is significantly larger than the available memory then when a slow phrase query hits Solr, the contention for disk I/O with other queries could be slowing everything down. You might also want to look at the 90th and 99th percentile query times in addition to the average. For our large indexes, we found at least an order of magnitude difference between the average and 99th percentile queries. Again, if Solr gets hit with a few of those 99th percentile slow queries and your not hitting your caches, chances are you will see serious contention for disk I/O.. Of course if you don't see any waiting on i/o, then your bottleneck is probably somewhere else:) See http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 for more background on our experience. Tom Burton-West University of Michigan Library www.hathitrust.org On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel siddhantg...@gmail.com wrote: Hi everyone, I have an index corresponding to ~2.5 million documents. The index size is 43GB. The configuration of the machine which is running Solr is - Dual Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB RAM, and 250 GB HDD. I'm observing a strange trend in the queries that I send to Solr. The query times for queries that I send earlier is much lesser than the queries I send afterwards. For instance, if I write a script to query solr 5000 times (with 5000 distinct queries, most of them containing not more than 3-5 words) with 10 threads running in parallel, the average times for queries goes from ~50ms in the beginning to ~6000ms. Is this expected or is there something wrong with my configuration. Currently I've configured the queryResultCache and the documentCache to contain 2048 entries (hit ratios for both is close to 50%). Apart from this, a general question that I want to ask is that is such a hardware enough for this scenario? I'm aiming at achieving around 20 queries per second with the hardware mentioned above. Thanks, Regards, -- - Siddhant -- - Siddhant -- View this message in context: http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Siddhant -- - Siddhant -- - Siddhant -- Lance Norskog goks...@gmail.com
Re: Solr Performance Issues
I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk caching. I think that at any point of time, there can be a maximum of number of threads concurrent requests, which happens to make sense btw (does it?). As I increase the number of threads, the load average shown by top goes up to as high as 80%. But if I keep the number of threads low (~10), the load average never goes beyond ~8). So probably thats the number of requests I can expect Solr to serve concurrently on this index size with this hardware. Can anyone give a general opinion as to how much hardware should be sufficient for a Solr deployment with an index size of ~43GB, containing around 2.5 million documents? I'm expecting it to serve at least 20 requests per second. Any experiences? Thanks On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West tburtonw...@gmail.comwrote: How much of your memory are you allocating to the JVM and how much are you leaving free? If you don't leave enough free memory for the OS, the OS won't have a large enough disk cache, and you will be hitting the disk for lots of queries. You might want to monitor your Disk I/O using iostat and look at the iowait. If you are doing phrase queries and your *prx file is significantly larger than the available memory then when a slow phrase query hits Solr, the contention for disk I/O with other queries could be slowing everything down. You might also want to look at the 90th and 99th percentile query times in addition to the average. For our large indexes, we found at least an order of magnitude difference between the average and 99th percentile queries. Again, if Solr gets hit with a few of those 99th percentile slow queries and your not hitting your caches, chances are you will see serious contention for disk I/O.. Of course if you don't see any waiting on i/o, then your bottleneck is probably somewhere else:) See http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 for more background on our experience. Tom Burton-West University of Michigan Library www.hathitrust.org On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel siddhantg...@gmail.com wrote: Hi everyone, I have an index corresponding to ~2.5 million documents. The index size is 43GB. The configuration of the machine which is running Solr is - Dual Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB RAM, and 250 GB HDD. I'm observing a strange trend in the queries that I send to Solr. The query times for queries that I send earlier is much lesser than the queries I send afterwards. For instance, if I write a script to query solr 5000 times (with 5000 distinct queries, most of them containing not more than 3-5 words) with 10 threads running in parallel, the average times for queries goes from ~50ms in the beginning to ~6000ms. Is this expected or is there something wrong with my configuration. Currently I've configured the queryResultCache and the documentCache to contain 2048 entries (hit ratios for both is close to 50%). Apart from this, a general question that I want to ask is that is such a hardware enough for this scenario? I'm aiming at achieving around 20 queries per second with the hardware mentioned above. Thanks, Regards, -- - Siddhant -- - Siddhant -- View this message in context: http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Siddhant
Re: Solr Performance Issues
You've probably already looked at this, but here goes anyway. The first question probably should have been what are you measuring? I've been fooled before by looking at, say, average response time and extrapolating. You're getting 20 qps if your response time is 1 second, but you have 20 threads running simultaneously, ditto if you're getting 2 second response time and 40 threads. So And what is response time? It would clarify things a lot if you broke out which parts of the operation are taking the time. Going from memory, debugQuery=on will let you know how much time was spent in various operations in SOLR. It's important to know whether it was the searching, assembling the response, or transmitting the data back to the client. If your timings are all just how long it takes the response to get back to the client, you could even be hammered by network latency. How many threads does it take to peg the CPU? And what response times are you getting when your number of threads is around 10? Erick On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel siddhantg...@gmail.comwrote: I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk caching. I think that at any point of time, there can be a maximum of number of threads concurrent requests, which happens to make sense btw (does it?). As I increase the number of threads, the load average shown by top goes up to as high as 80%. But if I keep the number of threads low (~10), the load average never goes beyond ~8). So probably thats the number of requests I can expect Solr to serve concurrently on this index size with this hardware. Can anyone give a general opinion as to how much hardware should be sufficient for a Solr deployment with an index size of ~43GB, containing around 2.5 million documents? I'm expecting it to serve at least 20 requests per second. Any experiences? Thanks On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West tburtonw...@gmail.com wrote: How much of your memory are you allocating to the JVM and how much are you leaving free? If you don't leave enough free memory for the OS, the OS won't have a large enough disk cache, and you will be hitting the disk for lots of queries. You might want to monitor your Disk I/O using iostat and look at the iowait. If you are doing phrase queries and your *prx file is significantly larger than the available memory then when a slow phrase query hits Solr, the contention for disk I/O with other queries could be slowing everything down. You might also want to look at the 90th and 99th percentile query times in addition to the average. For our large indexes, we found at least an order of magnitude difference between the average and 99th percentile queries. Again, if Solr gets hit with a few of those 99th percentile slow queries and your not hitting your caches, chances are you will see serious contention for disk I/O.. Of course if you don't see any waiting on i/o, then your bottleneck is probably somewhere else:) See http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 for more background on our experience. Tom Burton-West University of Michigan Library www.hathitrust.org On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel siddhantg...@gmail.com wrote: Hi everyone, I have an index corresponding to ~2.5 million documents. The index size is 43GB. The configuration of the machine which is running Solr is - Dual Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB RAM, and 250 GB HDD. I'm observing a strange trend in the queries that I send to Solr. The query times for queries that I send earlier is much lesser than the queries I send afterwards. For instance, if I write a script to query solr 5000 times (with 5000 distinct queries, most of them containing not more than 3-5 words) with 10 threads running in parallel, the average times for queries goes from ~50ms in the beginning to ~6000ms. Is this expected or is there something wrong with my configuration. Currently I've configured the queryResultCache and the documentCache to contain 2048 entries (hit ratios for both is close to 50%). Apart from this, a general question that I want to ask is that is such a hardware enough for this scenario? I'm aiming at achieving around 20 queries per second with the hardware mentioned above. Thanks, Regards, -- - Siddhant -- - Siddhant -- View this message in context: http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Siddhant
Re: Solr Performance Issues
) with 10 threads running in parallel, the average times for queries goes from ~50ms in the beginning to ~6000ms. Is this expected or is there something wrong with my configuration. Currently I've configured the queryResultCache and the documentCache to contain 2048 entries (hit ratios for both is close to 50%). Apart from this, a general question that I want to ask is that is such a hardware enough for this scenario? I'm aiming at achieving around 20 queries per second with the hardware mentioned above. Thanks, Regards, -- - Siddhant -- - Siddhant -- View this message in context: http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Siddhant -- - Siddhant
Solr Performance Issues
Hi everyone, I have an index corresponding to ~2.5 million documents. The index size is 43GB. The configuration of the machine which is running Solr is - Dual Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB RAM, and 250 GB HDD. I'm observing a strange trend in the queries that I send to Solr. The query times for queries that I send earlier is much lesser than the queries I send afterwards. For instance, if I write a script to query solr 5000 times (with 5000 distinct queries, most of them containing not more than 3-5 words) with 10 threads running in parallel, the average times for queries goes from ~50ms in the beginning to ~6000ms. Is this expected or is there something wrong with my configuration. Currently I've configured the queryResultCache and the documentCache to contain 2048 entries (hit ratios for both is close to 50%). Apart from this, a general question that I want to ask is that is such a hardware enough for this scenario? I'm aiming at achieving around 20 queries per second with the hardware mentioned above. Thanks, Regards, -- - Siddhant
Re: Solr Performance Issues
How many outstanding queries do you have at a time? Is it possible that when you start, you have only a few queries executing concurrently but as your test runs you have hundreds? This really is a question of how your load test is structured. You might get a better sense of how it works if your tester had a limited number of threads running so the max concurrent requests SOLR was serving at once were capped (30, 50, whatever). But no, I wouldn't expect SOLR to bog down the way you're describing just because it was running for a while. HTH Erick On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel siddhantg...@gmail.comwrote: Hi everyone, I have an index corresponding to ~2.5 million documents. The index size is 43GB. The configuration of the machine which is running Solr is - Dual Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB RAM, and 250 GB HDD. I'm observing a strange trend in the queries that I send to Solr. The query times for queries that I send earlier is much lesser than the queries I send afterwards. For instance, if I write a script to query solr 5000 times (with 5000 distinct queries, most of them containing not more than 3-5 words) with 10 threads running in parallel, the average times for queries goes from ~50ms in the beginning to ~6000ms. Is this expected or is there something wrong with my configuration. Currently I've configured the queryResultCache and the documentCache to contain 2048 entries (hit ratios for both is close to 50%). Apart from this, a general question that I want to ask is that is such a hardware enough for this scenario? I'm aiming at achieving around 20 queries per second with the hardware mentioned above. Thanks, Regards, -- - Siddhant
Re: Solr Performance Issues
Hi Erick, The way the load test works is that it picks up 5000 queries, splits them according to the number of threads (so if we have 10 threads, it schedules 10 threads - each one sending 500 queries). So it might be possible that the number of queries at a point later in time is greater than the number of queries earlier in time. I'm not very sure about that though. Its a simple Ruby script that starts up threads, calls the search function in each thread, and then waits for each of them to exit. How many queries per second can we expect Solr to serve, given this kind of hardware? If what you suggest is true, then is it possible that while Solr is serving a query, another query hits it, which increases the response time even further? I'm not sure about it. But yes I can observe the query times going up as I increase the number of threads. Thanks, Regards, On Thu, Mar 11, 2010 at 8:30 PM, Erick Erickson erickerick...@gmail.comwrote: How many outstanding queries do you have at a time? Is it possible that when you start, you have only a few queries executing concurrently but as your test runs you have hundreds? This really is a question of how your load test is structured. You might get a better sense of how it works if your tester had a limited number of threads running so the max concurrent requests SOLR was serving at once were capped (30, 50, whatever). But no, I wouldn't expect SOLR to bog down the way you're describing just because it was running for a while. HTH Erick On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel siddhantg...@gmail.com wrote: Hi everyone, I have an index corresponding to ~2.5 million documents. The index size is 43GB. The configuration of the machine which is running Solr is - Dual Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB RAM, and 250 GB HDD. I'm observing a strange trend in the queries that I send to Solr. The query times for queries that I send earlier is much lesser than the queries I send afterwards. For instance, if I write a script to query solr 5000 times (with 5000 distinct queries, most of them containing not more than 3-5 words) with 10 threads running in parallel, the average times for queries goes from ~50ms in the beginning to ~6000ms. Is this expected or is there something wrong with my configuration. Currently I've configured the queryResultCache and the documentCache to contain 2048 entries (hit ratios for both is close to 50%). Apart from this, a general question that I want to ask is that is such a hardware enough for this scenario? I'm aiming at achieving around 20 queries per second with the hardware mentioned above. Thanks, Regards, -- - Siddhant -- - Siddhant
Re: Solr Performance Issues
I dont mean to turn this into a sales pitch, but there is a tool for Java app performance management that you may find helpful. Its called New Relic (www.newrelic.com) and the tool can be installed in 2 minutes. It can give you very deep visibility inside Solr and other Java apps. (Full disclosure I work at New Relic.) Mike Siddhant Goel wrote: Hi everyone, I have an index corresponding to ~2.5 million documents. The index size is 43GB. The configuration of the machine which is running Solr is - Dual Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB RAM, and 250 GB HDD. I'm observing a strange trend in the queries that I send to Solr. The query times for queries that I send earlier is much lesser than the queries I send afterwards. For instance, if I write a script to query solr 5000 times (with 5000 distinct queries, most of them containing not more than 3-5 words) with 10 threads running in parallel, the average times for queries goes from ~50ms in the beginning to ~6000ms. Is this expected or is there something wrong with my configuration. Currently I've configured the queryResultCache and the documentCache to contain 2048 entries (hit ratios for both is close to 50%). Apart from this, a general question that I want to ask is that is such a hardware enough for this scenario? I'm aiming at achieving around 20 queries per second with the hardware mentioned above. Thanks, Regards, -- - Siddhant -- View this message in context: http://old.nabble.com/Solr-Performance-Issues-tp27864278p27872139.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr performance issues
On Jun 19, 2008, at 6:28 PM, Yonik Seeley wrote: 2. I use acts_as_solr and by default they only make post requests, even for /select. With that setup the response time for most queries, simple or complex ones, were ranging from 150ms to 600ms, with an average of 250ms. I changed the select request to use get requests instead and now the response time is down to 10ms to 60ms. Did someone seen that before? Why is it doing it? Are the get requests being cached by the ruby stuff? No, I'm sure that the results aren't being cached by Ruby's library, solr-ruby, or acts_as_solr. But even with no caching, I've seen differences with get/post on Linux with the python client when persistent HTTP connections were in use. I tracked it down to the POST being written in two parts, triggering nagle's algorithm in the networking stack. There was another post I found that mentioned this a couple of years ago: http://markmail.org/message/45qflvwnakhripqp I would welcome patches with tests that allow solr-ruby to send most requests with GET, and the ones that are actually sending a body beyond just parameters (delete, update, commit) as POST. Erik
Re: Solr performance issues
On Fri, Jun 20, 2008 at 8:32 AM, Erik Hatcher [EMAIL PROTECTED] wrote: On Jun 19, 2008, at 6:28 PM, Yonik Seeley wrote: 2. I use acts_as_solr and by default they only make post requests, even for /select. With that setup the response time for most queries, simple or complex ones, were ranging from 150ms to 600ms, with an average of 250ms. I changed the select request to use get requests instead and now the response time is down to 10ms to 60ms. Did someone seen that before? Why is it doing it? Are the get requests being cached by the ruby stuff? No, I'm sure that the results aren't being cached by Ruby's library, solr-ruby, or acts_as_solr. I confirm that the results are not cached by Ruby's library. But even with no caching, I've seen differences with get/post on Linux with the python client when persistent HTTP connections were in use. I tracked it down to the POST being written in two parts, triggering nagle's algorithm in the networking stack. There was another post I found that mentioned this a couple of years ago: http://markmail.org/message/45qflvwnakhripqp I would welcome patches with tests that allow solr-ruby to send most requests with GET, and the ones that are actually sending a body beyond just parameters (delete, update, commit) as POST. Erik I made a few modifications but it still need more testing... Sebastien
Solr performance issues
Hi, I've been using solr for a little without worrying too much about how it works but now it's becoming a bottleneck in my application. I have a couple issues with it: 1. My index always gets slower and slower when commiting/optimizing for some obscure reason. It goes from 1 second with a new index to 45 seconds with an index with the same amount of data but used for a few days. Restarting solr doesn't fix it. The only way I found to fix that is to delete the whole index completely by deleting the index folder. Then when I rebuild the index everything goes back to normal and fast... and then performance slowly deteriorates again. So, the amount of data is not a factor because rebuilding the index from scratch fixes the problem and I am sending optimize once in a while... even maybe too often. 2. I use acts_as_solr and by default they only make post requests, even for /select. With that setup the response time for most queries, simple or complex ones, were ranging from 150ms to 600ms, with an average of 250ms. I changed the select request to use get requests instead and now the response time is down to 10ms to 60ms. Did someone seen that before? Why is it doing it? Thanks in advance, Sebastien