At 9:26 -0500 2/8/06, Susie Stephens wrote:
I will find out more about the Uniprot subgraph that we used for the
VLDB paper, and see if we can make it available.
However, I really like Eric Jain's offer of providing stable data
sets of different sizes for benchmarking. It makes sense to me to
have an independent organization providing the data sets.
Susie
I love this idea, but I would go a bit further - be even nicer for us
non-biologists if it also included some example queries to run (and
maybe even the correct answer sets) - I think if that existed, we
could push some of the triple store developers to use it as a
benchmark, which would help both communities...
Eric Miller wrote:
On Feb 8, 2006, at 6:22 AM, Eric Jain wrote:
Ian Wilson wrote:
We will thus want to maintain a local copy of this extract (on
the wiki?) so changes in the graph don't change the benchmarking
results.
The data in http://www.isb-sib.ch/~ejain/rdf/data/ is indeed
updated every two weeks, but I could also provide some more stable
data sets for benchmarking if there is interest, perhaps with 1M,
10M and 100M triples?
I think this would be extremely useful for a variety of
communities trying to assess issues of scalability; the more
"connected" graphs subsets for testing, the better.
thanks in advance!
-- eric miller http://www.w3.org/people/em/
semantic web activity lead http://www.w3.org/2001/sw/
w3c world wide web consortium http://www.w3.org/
--
Professor James Hendler Director
Joint Institute for Knowledge Discovery 301-405-2696
UMIACS, Univ of Maryland 301-314-9734 (Fax)
College Park, MD 20742 http://www.cs.umd.edu/~hendler
Web Log: http://www.mindswap.org/blog/author/hendler