[ 
https://issues.apache.org/jira/browse/NUTCH-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated NUTCH-50:
-----------------------------------

    Fix Version/s: 2.0
                       (was: 1.2)

> Benchmarks & Performance goals
> ------------------------------
>
>                 Key: NUTCH-50
>                 URL: https://issues.apache.org/jira/browse/NUTCH-50
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher
>         Environment: Linux, Windows
>            Reporter: byron miller
>            Assignee: Chris A. Mattmann
>             Fix For: 2.0
>
>
> I am interested in developing a strategy and toolset used to benchmark nutch 
> search.  Please give your feedback on the following approaches or 
> recommendations for setting standards and goals.
> Example test case(s).
> JDK 1.4.x 32 bit/Linux Platform
> Single Node/2 gigs of memory
> Single Index/Segment
> 1 million pages  
> -- single node --
> JDK 1.4.x 32 bit/Linux Platform
> Single Node/2 gigs of memory
> Single Index/Segment
> 10 million pages
> JDK 1.4.x 32 bit/Linux Platform
> Single Node/2 gigs of memory
> Single Index/Segment
> 10 million pages
> -- dual node --
> JDK 1.4.2 32 bit/Linux Platform
> 2 Node/2 gigs of memory
> 2 Indexes/Segments (1 per node)
> 1 million pages
> JDK 1.4.2 32 bit/Linux Platform
> 2 Node/2 gigs of memory
> 2 Indexes/Segments (1 per node)
> 1 million pages
> -- test queries --
> * single term
> * term AND term
> * exact "small phrase"
> * lang:en term
> * term cluster
> --- standards ----
> 10 results per page
> ---------------------
> For me a testcase will help prove scalability, bottlenecks, application 
> environments, settings and such.  The amount of customizations availble is 
> where we need to really look at setting the best base for X amount of 
> documents and some type of scalability scale.  For example a 10 node system 
> may only scale x percent better for x reasons and x is the bottleneck for 
> that scenerio.
> Test cases would serve multiple purposes for returning performance, response 
> time and application stability. 
> Tools/possibilities:
> * JMX components
> * http://grinder.sourceforge.net/
> * JMeter
> * others???
> ---------------------
> Query "stuffing" - use of dictionary that contains broad & vastly different 
> terms. Something that could be scripted as a "warm up" for production systems 
> as well.  Possibly combine terms from our logs of common search queries to 
> use as a benchmark?
> What is your feedback/ideas on building a good test case/stress testing 
> system/framework?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to