At 9:26 -0500 2/8/06, Susie Stephens wrote:
I will find out more about the Uniprot subgraph that we used for the VLDB paper, and see if we can make it available.

However, I really like Eric Jain's offer of providing stable data sets of different sizes for benchmarking. It makes sense to me to have an independent organization providing the data sets.

Susie




I love this idea, but I would go a bit further - be even nicer for us non-biologists if it also included some example queries to run (and maybe even the correct answer sets) - I think if that existed, we could push some of the triple store developers to use it as a benchmark, which would help both communities...




Eric Miller wrote:



 On Feb 8, 2006, at 6:22 AM, Eric Jain wrote:


 Ian Wilson wrote:

We will thus want to maintain a local copy of this extract (on the wiki?) so changes in the graph don't change the benchmarking results.


The data in http://www.isb-sib.ch/~ejain/rdf/data/ is indeed updated every two weeks, but I could also provide some more stable data sets for benchmarking if there is interest, perhaps with 1M, 10M and 100M triples?


I think this would be extremely useful for a variety of communities trying to assess issues of scalability; the more "connected" graphs subsets for testing, the better.

 thanks in advance!

 -- eric miller                              http://www.w3.org/people/em/
 semantic web activity lead               http://www.w3.org/2001/sw/
 w3c world wide web consortium            http://www.w3.org/




--
Professor James Hendler                   Director
Joint Institute for Knowledge Discovery           301-405-2696
UMIACS, Univ of Maryland                          301-314-9734 (Fax)
College Park, MD 20742                    http://www.cs.umd.edu/~hendler
Web Log: http://www.mindswap.org/blog/author/hendler

Reply via email to