Re: Improving Solr performance

2011-01-14 Thread supersoft

The tests are performed with a selfmade program. The arguments are the number
of threads and the path to a file which contains available queries (in the
last test only one). When each thread is created, it gets the current date
(in milisecs), and when it gets the response from the query, the thread logs
the diff with that initial date. 

In the last post, I wrote the results of the 100 threads example orderered
by the response date. The results ordered by the creation date are:

100 simultaneous queries: 9265, 11922, 12375, 4109, 4890, 7093, 21875, 8547,
13562, 13219, 1531, 11875, 21281, 31985, 11703, 7391, 32031, 22172, 21469,
13875, 1969, 11406, 8172, 9609, 16953, 13828, 17282, 22141, 16625, 2203,
24985, 2375, 25188, 2891, 5047, 6422, 20860, 7594, 23125, 32281, 32016,
5312, 23125, 11484, 10344, 11500, 18172, 3937, 11547, 13500, 28297, 20594,
24641, 7063, 24797, 12922, 1297, 8984, 20625, 13407, 23203, 32016, 15922,
21875, 8750, 12875, 23203, 26453, 26016, 11797, 31782, 24672, 21625, 7672,
18985, 14672, 22157, 26485, 23328, 9907, 5563, 24625, 14078, 4703, 25844,
12328, 11484, 6437, 25937, 26437, 18484, 13719, 16328, 28687, 23141, 14016,
26437, 13187, 25031, 31969
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2254121.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Improving Solr performance

2011-01-13 Thread supersoft

On the one hand, I found really interesting those comments about the reasons
for sharding. Documentation agrees you about why to split an index in
several shards (big sizes problems) but I don't find any explanation about
the inconvenients as an Access Control List. I guess there should be some
and they can be critical in this design. Any example?

On the other hand, the performance problems. I have configured big caches
and I launch a test of simultaneous requests (with the same query) without
commiting during the test. The caches are initially empty and after the
test:

namequeryResultCache  
stats   
lookups 1129
hits1120
hitratio0.99
inserts 16
evictions   0
size9
warmupTime  0
cumulative_lookups  1129
cumulative_hits 1120
cumulative_hitratio 0.99
cumulative_inserts  16
cumulative_evictions0

namedocumentCache  
stats   
lookups 6750
hits6440
hitratio0.95
inserts 310
evictions   0
size310
warmupTime  0
cumulative_lookups  6750
cumulative_hits 6440
cumulative_hitratio 0.95
cumulative_inserts  310
cumulative_evictions0

Although most of the queries are cache hits, the performance is still
dependent of the number of simultaneous queries:

1 simultaneous query: 3437 ms (cache fails)

2 simultaneous queries: 594, 954 ms

10 simultaneous queries: 1047, 1313, 1438, 1797, 1922, 2094, 2250, 2500,
2938, 3000 ms

50 simultaneous queries: 1203, 1453, 1453, 1437, 1625, 1953, 5688, 12938,
14953, 16281, 15984, 16453, 15812, 16469, 16563, 16844, 17703, 16843, 17359,
16828, 18235, 18219, 18172, 18203, 17672, 17344, 17453, 18484, 18157, 18531,
18297, 18359, 18063, 18516, 18125, 17516, 18562, 18016, 18187, 18610, 18703,
18672, 17829, 18344, 18797, 18781, 18265, 18875, 18250, 18812

100 simultaneous queries: 1297, 1531, 1969, 2203, 2375, 2891, 3937, 4109,
4703, 4890, 5047, 5312, 5563, 6422, 6437, 7063, 7093, 7391, 7594, 7672,
8172, 8547, 8750, 8984, 9265, 9609, 9907, 10344, 11406, 11484, 11484, 11500,
11547, 11703, 11797, 11875, 11922, 12328, 12375, 12875, 12922, 13187, 13219,
13407, 13500, 13562, 13719, 13828, 13875, 14016, 14078, 14672, 15922, 16328,
16625, 16953, 17282, 18172, 18484, 18985, 20594, 20625, 20860, 21281, 21469,
21625, 21875, 21875, 22141, 22157, 22172, 23125, 23125, 23141, 23203, 23203,
23328, 24625, 24641, 24672, 24797, 24985, 25031, 25188, 25844, 25937, 26016,
26437, 26453, 26437, 26485, 28297, 28687, 31782, 31985, 31969, 32016, 32031,
32016, 32281 ms

Is this an expected situation? Is there any technique for not being so
dependent of the number simultaneuos queries? (due to economical reasons,
replication in more servers is not an option)

Thanks in advance (and also thanks for previous comments)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2249108.html
Sent from the Solr - User mailing list archive at Nabble.com.


Token Counter

2011-01-10 Thread supersoft

Hello,

I would like to know if there is a trivial procedure/tool for displaying the
number of appearances of each token from query results. 

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Token-Counter-tp2227795p2227795.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Token Counter

2011-01-10 Thread supersoft

As I understand, a faceted search would be useful if keywords is a
multivalued field and the its field value is just a token. 

I want to display the occurences of the tokens wich appear in a indexed (and
stored) text field.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Token-Counter-tp2227795p2228991.html
Sent from the Solr - User mailing list archive at Nabble.com.


Improving Solr performance

2011-01-07 Thread supersoft

have deployed a 5-sharded infrastructure where: shard1 has 3124422 docs
shard2 has 920414 docs shard3 has 602772 docs shard4 has 2083492 docs shard5
has 11915639 docs Indexes total size: 100GB

The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I
run the server using Jetty (from Solr example download) with: java -Xmx3024M
-Dsolr.solr.home=multicore -jar start.jar

The response time for a query is around 2-3 seconds. Nevertheless, if I
execute several queries at the same time the performance goes down
inmediately: 1 simultaneous query: 2516ms 2 simultaneous queries: 4250,4469
ms 3 simultaneous queries: 5781, 6219, 6219 ms 4 simultaneous queries: 6484,
7203, 7719, 7781 ms...

Using JConsole for monitoring the server java proccess I checked that Heap
Memory and the CPU Usages don't reach the upper limits so the server
shouldn't perform as overloaded. Can anyone give me an approach of how I
should tune the instance for not being so hardly dependent of the number of
simultaneous queries?

Thanks in advance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2210843.html
Sent from the Solr - User mailing list archive at Nabble.com.


Improving Solr performance

2011-01-07 Thread supersoft

have deployed a 5-sharded infrastructure where: shard1 has 3124422 docs
shard2 has 920414 docs shard3 has 602772 docs shard4 has 2083492 docs shard5
has 11915639 docs Indexes total size: 100GB

The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I
run the server using Jetty (from Solr example download) with: java -Xmx3024M
-Dsolr.solr.home=multicore -jar start.jar

The response time for a query is around 2-3 seconds. Nevertheless, if I
execute several queries at the same time the performance goes down
inmediately: 1 simultaneous query: 2516ms 2 simultaneous queries: 4250,4469
ms 3 simultaneous queries: 5781, 6219, 6219 ms 4 simultaneous queries: 6484,
7203, 7719, 7781 ms...

Using JConsole for monitoring the server java proccess I checked that Heap
Memory and the CPU Usages don't reach the upper limits so the server
shouldn't perform as overloaded. Can anyone give me an approach of how I
should tune the instance for not being so hardly dependent of the number of
simultaneous queries?

Thanks in advance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210842p2210842.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Improving Solr performance

2011-01-07 Thread supersoft

1 - Yes, all the shards are in the same machine
2 - The machine RAM is 7.8GB and I assign 3.4GB to Solr server
3 - The shards sizes (GB) are 17, 5, 3, 11, 64
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2211135.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Improving Solr performance

2011-01-07 Thread supersoft

The reason of this distribution is the kind of the documents. In spite of
having the same schema structure (and solr conf), a document belongs to 1 of
5 different kinds. 

Each kind corresponds to a concrete shard and due to this, the implemented
client tool avoids searching in all the shards when the users selects just
one or a few of kinds. The tool runs a multisharded query of the proper
shards. I guess this is a right approach but correct me if I am wrong.

The real problem of this architecture is the correlation between concurrent
users and response time:
1 query: n seconds
2 queries: 2*n second each query
3 queries: 3*n seconds each query
and so...

This is being a real headache because 1 single query has an acceptable
response time but when many users are accessing to the server the
performance goes hardly down.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-performance-tp2210843p2211305.html
Sent from the Solr - User mailing list archive at Nabble.com.