Re: Improving Solr performance
The tests are performed with a selfmade program. The arguments are the number of threads and the path to a file which contains available queries (in the last test only one). When each thread is created, it gets the current date (in milisecs), and when it gets the response from the query, the thread logs the diff with that initial date. In the last post, I wrote the results of the 100 threads example orderered by the response date. The results ordered by the creation date are: 100 simultaneous queries: 9265, 11922, 12375, 4109, 4890, 7093, 21875, 8547, 13562, 13219, 1531, 11875, 21281, 31985, 11703, 7391, 32031, 22172, 21469, 13875, 1969, 11406, 8172, 9609, 16953, 13828, 17282, 22141, 16625, 2203, 24985, 2375, 25188, 2891, 5047, 6422, 20860, 7594, 23125, 32281, 32016, 5312, 23125, 11484, 10344, 11500, 18172, 3937, 11547, 13500, 28297, 20594, 24641, 7063, 24797, 12922, 1297, 8984, 20625, 13407, 23203, 32016, 15922, 21875, 8750, 12875, 23203, 26453, 26016, 11797, 31782, 24672, 21625, 7672, 18985, 14672, 22157, 26485, 23328, 9907, 5563, 24625, 14078, 4703, 25844, 12328, 11484, 6437, 25937, 26437, 18484, 13719, 16328, 28687, 23141, 14016, 26437, 13187, 25031, 31969 -- View this message in context: http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2254121.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Improving Solr performance
On the one hand, I found really interesting those comments about the reasons for sharding. Documentation agrees you about why to split an index in several shards (big sizes problems) but I don't find any explanation about the inconvenients as an Access Control List. I guess there should be some and they can be critical in this design. Any example? On the other hand, the performance problems. I have configured big caches and I launch a test of simultaneous requests (with the same query) without commiting during the test. The caches are initially empty and after the test: namequeryResultCache stats lookups 1129 hits1120 hitratio0.99 inserts 16 evictions 0 size9 warmupTime 0 cumulative_lookups 1129 cumulative_hits 1120 cumulative_hitratio 0.99 cumulative_inserts 16 cumulative_evictions0 namedocumentCache stats lookups 6750 hits6440 hitratio0.95 inserts 310 evictions 0 size310 warmupTime 0 cumulative_lookups 6750 cumulative_hits 6440 cumulative_hitratio 0.95 cumulative_inserts 310 cumulative_evictions0 Although most of the queries are cache hits, the performance is still dependent of the number of simultaneous queries: 1 simultaneous query: 3437 ms (cache fails) 2 simultaneous queries: 594, 954 ms 10 simultaneous queries: 1047, 1313, 1438, 1797, 1922, 2094, 2250, 2500, 2938, 3000 ms 50 simultaneous queries: 1203, 1453, 1453, 1437, 1625, 1953, 5688, 12938, 14953, 16281, 15984, 16453, 15812, 16469, 16563, 16844, 17703, 16843, 17359, 16828, 18235, 18219, 18172, 18203, 17672, 17344, 17453, 18484, 18157, 18531, 18297, 18359, 18063, 18516, 18125, 17516, 18562, 18016, 18187, 18610, 18703, 18672, 17829, 18344, 18797, 18781, 18265, 18875, 18250, 18812 100 simultaneous queries: 1297, 1531, 1969, 2203, 2375, 2891, 3937, 4109, 4703, 4890, 5047, 5312, 5563, 6422, 6437, 7063, 7093, 7391, 7594, 7672, 8172, 8547, 8750, 8984, 9265, 9609, 9907, 10344, 11406, 11484, 11484, 11500, 11547, 11703, 11797, 11875, 11922, 12328, 12375, 12875, 12922, 13187, 13219, 13407, 13500, 13562, 13719, 13828, 13875, 14016, 14078, 14672, 15922, 16328, 16625, 16953, 17282, 18172, 18484, 18985, 20594, 20625, 20860, 21281, 21469, 21625, 21875, 21875, 22141, 22157, 22172, 23125, 23125, 23141, 23203, 23203, 23328, 24625, 24641, 24672, 24797, 24985, 25031, 25188, 25844, 25937, 26016, 26437, 26453, 26437, 26485, 28297, 28687, 31782, 31985, 31969, 32016, 32031, 32016, 32281 ms Is this an expected situation? Is there any technique for not being so dependent of the number simultaneuos queries? (due to economical reasons, replication in more servers is not an option) Thanks in advance (and also thanks for previous comments) -- View this message in context: http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2249108.html Sent from the Solr - User mailing list archive at Nabble.com.
Token Counter
Hello, I would like to know if there is a trivial procedure/tool for displaying the number of appearances of each token from query results. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Token-Counter-tp2227795p2227795.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Token Counter
As I understand, a faceted search would be useful if keywords is a multivalued field and the its field value is just a token. I want to display the occurences of the tokens wich appear in a indexed (and stored) text field. -- View this message in context: http://lucene.472066.n3.nabble.com/Token-Counter-tp2227795p2228991.html Sent from the Solr - User mailing list archive at Nabble.com.
Improving Solr performance
have deployed a 5-sharded infrastructure where: shard1 has 3124422 docs shard2 has 920414 docs shard3 has 602772 docs shard4 has 2083492 docs shard5 has 11915639 docs Indexes total size: 100GB The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I run the server using Jetty (from Solr example download) with: java -Xmx3024M -Dsolr.solr.home=multicore -jar start.jar The response time for a query is around 2-3 seconds. Nevertheless, if I execute several queries at the same time the performance goes down inmediately: 1 simultaneous query: 2516ms 2 simultaneous queries: 4250,4469 ms 3 simultaneous queries: 5781, 6219, 6219 ms 4 simultaneous queries: 6484, 7203, 7719, 7781 ms... Using JConsole for monitoring the server java proccess I checked that Heap Memory and the CPU Usages don't reach the upper limits so the server shouldn't perform as overloaded. Can anyone give me an approach of how I should tune the instance for not being so hardly dependent of the number of simultaneous queries? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2210843.html Sent from the Solr - User mailing list archive at Nabble.com.
Improving Solr performance
have deployed a 5-sharded infrastructure where: shard1 has 3124422 docs shard2 has 920414 docs shard3 has 602772 docs shard4 has 2083492 docs shard5 has 11915639 docs Indexes total size: 100GB The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I run the server using Jetty (from Solr example download) with: java -Xmx3024M -Dsolr.solr.home=multicore -jar start.jar The response time for a query is around 2-3 seconds. Nevertheless, if I execute several queries at the same time the performance goes down inmediately: 1 simultaneous query: 2516ms 2 simultaneous queries: 4250,4469 ms 3 simultaneous queries: 5781, 6219, 6219 ms 4 simultaneous queries: 6484, 7203, 7719, 7781 ms... Using JConsole for monitoring the server java proccess I checked that Heap Memory and the CPU Usages don't reach the upper limits so the server shouldn't perform as overloaded. Can anyone give me an approach of how I should tune the instance for not being so hardly dependent of the number of simultaneous queries? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210842p2210842.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Improving Solr performance
1 - Yes, all the shards are in the same machine 2 - The machine RAM is 7.8GB and I assign 3.4GB to Solr server 3 - The shards sizes (GB) are 17, 5, 3, 11, 64 -- View this message in context: http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2211135.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Improving Solr performance
The reason of this distribution is the kind of the documents. In spite of having the same schema structure (and solr conf), a document belongs to 1 of 5 different kinds. Each kind corresponds to a concrete shard and due to this, the implemented client tool avoids searching in all the shards when the users selects just one or a few of kinds. The tool runs a multisharded query of the proper shards. I guess this is a right approach but correct me if I am wrong. The real problem of this architecture is the correlation between concurrent users and response time: 1 query: n seconds 2 queries: 2*n second each query 3 queries: 3*n seconds each query and so... This is being a real headache because 1 single query has an acceptable response time but when many users are accessing to the server the performance goes hardly down. -- View this message in context: http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2211305.html Sent from the Solr - User mailing list archive at Nabble.com.