Thanks David

I will try both options. I am infact doing some performance testing now. 
I have created 100,000 search result set and it takes around 5 seconds 
(end to end) on my internal server to be returned (with 1 user). I am 
only doing 6 significant searches on this set. One for the main results 
and one for the top level categories. This is only on my test server and 
not in the larger production server and I am happy with this 
performance. If however I were to do my second level category search 
that has around 40 nodes in it, that would be 30 searches. I am not sure 
how this would perform.

What I am seeing is CPU hungry search but not memory hungry. This makes 
sense to me.

Q - I have test data set up in my tests that has some random junk in and 
then a word such as "fish" at the end of it. I am starting to think that 
I may have set up the test data wrong and should use a lot of different 
words in the result set because I am sure that Ferret will cache the 
search. This would give me a false impression on speed of search.

I will create more test data however at the weekend but my instinct is 
that your method outlined above may be faster.

I have 5 top level categories and this will not change much. Depending 
on the search there were be a lot more results in one category that the 
rest after the initial search.

Drilling into the second level categories, the most nodes I have in a 
single second level category is around 40 at the moment although this is 
likely to be added to over time. The resuls again will not be normally 
distributed over the results set but assuming for now that they were and 
I had 500,000 records, and drilled into the second tier category 
structure I would have 100,000 records in this category. I would be 
doing 40 searches over 100,000 records.

Q - What do you think will perform faster in this instance?

I would love to have the time to build a x-dimensional memory resident 
result (bucket set) that kept all the results parameterised for all the 
categories, built at the initial time of the search. Would be memory 
hungry but would make searching through categories and nodes and 
parameters in subsequent searches lightening fast.

Would be a great addition or am I missing something?

I am really interested in the performance testing scenarios. As stated 
above, I only have one word "FISH" in my test data with random made up 
beforehand. e.g. "sadssderssdaatg FISH" etc.

Q - Would I be better using more words in my test data?

Also - I am interested in the round trip performance of search. The 
length of time it takes from when the user clicks on search and gets the 
results back. I will do this on the production server in the production 
environment. My rule of thumb is that it should not take longer than 8 
seconds to return the results or the user will refresh (even worse for 
performance). With one user on my test system with 6 searches over 
100,000 records it takes 5 seconds at the moment.

I am expecting a large number of concurrent searches happening. I am 
defining concurrency as someone searching at the same time as another 
user is either searching or waiting for the results to be returned.

Most testing tools that I can see only show you what is happening on the 
server. I am interested from the users perspective.

I had a thought of setting up a script that would open a number of 
browser sessions and doing random searches concurrently and hammering 
the server to see when it 1) breaks search, 2) breaks something else 3) 
search goes over the 8 second limit.

Q - does anyone have any experience in this area. Even better does 
anyone have a script to do this? If not, and I do write a script to do 
this would this be of value to the greater community?

Sorry for the long winded post. My search page and category search is 
the most critical part of my site and I am anal on performance of this 
because if it does not work then my site will not work.

Thanks once again for all your assistance. Sorry for any stupid or 
ignorant thoughts/remarks.

Ferret rocks!

Clare


-- 
Posted via http://www.ruby-forum.com/.
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to