On Tue, Dec 28, 2010 at 10:47 AM, Smiley, David W. <dsmi...@mitre.org> wrote:
> Presently, I’m working on Lucene’s benchmark contrib module to evaluate the
> performance of SOLR-2155 compared to the LatLon type (i.e. a pair of lat-lon
> range queries), and then I’ll work on a more efficient probably non-geohash
> implementation but based on the same underlying concept of a hierarchical
> grid.  I’m using the geonames.org data set.  Unfortunately, the benchmark
> code seems very oriented to a generic title-body document whereas I’m
> looking to create lat-lon pairs… and furthermore to create documents
> containing multiple lat-lon pairs, and even furthermore a query generator
> that generates random box queries centered on a random location from the
> data set.  I seem to be stretching the benchmark framework beyond the
> use-case it was designed for and so perhaps it won’t be committable but at
> least I’ll have a patch for other geospatial birds-of-a-feather like you to
> use.
>
> Stretch away.  The Title/Body orientation is just a relic of what we have
> done in the past, it doesn't have to stay that way.

just for reference, a couple of us are using a python front-end to
contrib/benchmark that Mike developed:

http://code.google.com/p/luceneutil/

This is nice as its designed for you to just declare 'competitors' (2
checkouts of solrcene), and then you run the python script and it
gives you the relative comparison... because they are 2 different
checkouts its simple to compare different approaches, and each
checkout can run with a different index (e.g. different codecs or test
index format changes).

I thought it might be interesting to you, because there's a variety of
queries tested here like numeric range, sorting, primary-key lookup,
span queries etc beyond the "standard" set of queries. The framework
also ensures that you are bringing back the same results in the same
order, runs multiple iterations (including iterations in new JVMs),
makes it easy to test optimized, optimized with deletions,
multi-segment, multi-segment with deletions, and can output to txt,
html, jira format for convenience.

currently we are generally testing with a line file format from
wikipedia, but besides geonames i wanted to point out that wikipedia
does include lat/long information for many articles (this is a major
source for much of geonames place data!).

it would definitely be cool if we could test spatial queries with this
as well... e.g by parsing out the lat/long from the wikipedia XML and
adding to the line files, and adding some spatial queries to the
default list of queries being tested.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to