Andrea Splendiani wrote:
Hi,
In the context of a data-integration project, I'm doing some
preliminary analysis to see whether it makes sense to use a
triple-store to setup a backend/repository.
I have some experience with Jena, and In know projects making use of
Virtuoso or Sesame.
However, I'm not aware of a review/benchmark of these systems, both
regarding performances and features.
I've seen a few links like:
http://esw.w3.org/topic/LargeTripleStores
or
http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html
But I would like to know how these systems scale with large
knowledge-base (load/query).
Here is a rather old one:
http://simile.mit.edu/reports/stores/stores.pdf. A more current report
is: http://www.springerlink.com/content/m14k476lr726x1g2/
I wold also like to get some rough intuition on how much it makes
sense to store data such as sequences and microarray values in them,
and how sparql is usable to query based on these values.
Is there anyone that can provide me with some good pointers ?
Or is this some area that you think needs more exploration ?
To me, semantic web/ontology has the potential to help facilitate
meta-analysis of microarray data by helping researchers to identify
comparable datasets if the metadata describing the samples/experiments
are richly captured. Using semantic web to represent large tables of
measurement values might be an overkill. Also, it's difficult to compete
with all the commercial and public tools that have already existed for
large-scale microarray data querying and analysis. Just my personal 2 cents.
-Kei
It seems to me that to the question "why did you use this triplestore
?", the usual answer is "I'e tried a few and this worked".
best,
Andrea Splendiani