I guess I asked the question wrong - the linked open data project
currently identifies a specific set of dat resources that are linked
together - so thie "entity" is definable - I didn't mean to ask how
big the whole Semantic Web is - I meant how many triples are in this
particular group - the set that are described on http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
I've been able to download pictures of this graph every few months or
so, and you can see the number of datasets growing, but the last
published number of triples for the thing (as stated on that page) is
from over a year ago, and a whole bunch of stuff has been added and
some of these have grown a lot - so we have a publicly shared, large-
scale, RDF data resource that can be used for benchmarking, trying
different interfaces and new technologies, etc
So it would be really nice to get a number every now and then so we
could plot growth, explain to people what is in it better, etc.
I know, I know, I know all the technical reasons this is relatively
meaningless, but I gotta tell you, when I hear someone say "20 billion
triples," I can tell you it it causes people to pay attention --
problem is I would like to use a number that has some validity before
I start quoting it....
On Nov 20, 2008, at 5:12 AM, Michael Hausenblas wrote:
My 2c in order to capture this for others as well:
http://community.linkeddata.org/MediaWiki/index.php?HowBigIsTheDangedThing
Cheers,
Michael
----------------------------------------------------------
Dr. Michael Hausenblas
DERI - Digital Enterprise Research Institute
National University of Ireland, Lower Dangan,
Galway, Ireland
----------------------------------------------------------
Jim Hendler wrote:
So I've been to a number of talks lately where the size of the
current (Sept 08 diagram) Linked Open Data cloud, in triples, has
been stated - with numbers that vary quite widely. The esw wiki
says 2B triples as of 2007, which isn't very useful given the
growth we've seen in the past year -- I've also seen the various
blog posts and mail threads saying why we shouldn't cit meaningless
numbers and such - but frankly, I've recently been on a bunch of
panels with DB guys, and I'd love to have a reasonable number to
quote -- anyone have a good estimate of the size of the danged
thing (number of triples in the whole as an RDF graph would be
nice) -- would also be nice for general audiences where big numbers
tend to impress and for research purposes (for example, we know how
far we can compress the triples for an in memory approach we are
playing with, but we want to figure out how much memory we need for
the whole cloud - we want to know if we need to shell out for the
16G iphone)
anyway, if anyone has a decent estimate, or even a smart educated
guess, I'd love to hear it
JH
"If we knew what we were doing, it wouldn't be called research,
would it?." - Albert Einstein
Prof James Hendler http://www.cs.rpi.edu/~hendler
Tetherless World Constellation Chair
Computer Science Dept
Rensselaer Polytechnic Institute, Troy NY 12180
"If we knew what we were doing, it wouldn't be called research, would
it?." - Albert Einstein
Prof James Hendler http://www.cs.rpi.edu/~hendler
Tetherless World Constellation Chair
Computer Science Dept
Rensselaer Polytechnic Institute, Troy NY 12180