Sandro Hawke wrote:
we all like to think "p2p", distributed, etc.
but the fact is that we love it too much, disregarding the basic
economic reasons that underly how the world (in fairness) works.

But lets put a constraint.

Lets imagine that we dont live forever and tha tthe time one should
work on a topic should be limited (e.g. 10years is a good span so i
began in 2002, 3 years left) dont you want to see some actual
advantange delivered to the end user within this timeframe? I do and
very strongly.

Yes, I thought it a little ironic that you, of all people, were being
cast as a centralist.  (I'm sure no insult was intended by anyone, of
course.)  In practice, yes, we'd all love more decentralization, if we
could have it for free.... but sometimes it's impractically expensive.

Let me try to be more clear about my use case, though.  I am in no way
complaining about Google or Sindice; they are great.  But by their
nature (as I understand it, at least), they are not complete, and will
not be able to do one particular (important) thing I want.

I'd like to be able to run queries like this: tell me all showings of
Star Trek in Cambridge, MA, on 2009-05-17.  (I'm not talking about the
natural language part of that; I just want to be able to run the SPARQL
equivalent of that natural language query.)  And I really do want the
answer to be complete; if a showing is missing from my result set,
that's because that showing is not being properly published.  (Right
now, Google has a special mechanism, different from its normal search
engine, to handle this particular example, because it's so compelling.
I want something general, of course, that handles all queries -- not
just movie times.)

I think this is doable if by "properly published" we include the notion
of backlinking.  I propose this rule: whenever you publish some RDF, you
must notify all the backlink servers for all the URIs you use in your
content.
Sandro,

Amen re. backlinks, and they should even exist where the source isn't RDF :-) This is basically what I mean by the "owl:shameAs" pattern, since in due course it "shames" the original data owner into considering structured data granularity by making impression opportunity costs palpable. It also provides attribution.
 If you don't do that, your content will not be fully
searchable.  (In some cases, you will have to register a SPARQL end
point, instead of numerous graphs.  This is part of what makes this
feasible.)

Yes.
So, I'm picturing a market for backlink servers.  Everyone minting URIs
for other people to use should pick some (probably two or three)
backlink servers.  They don't have to run the service themselves.  They
might or might not have to pay for the service, depending how the market
evolves.

So when I mention <http://lod.openlinksw.com/void/Dataset> which is part of any <http://lod.openlinksw.com> (meaning: anyone will be able to make personal and service specific variants in the cloud or in their own setup etc..), plus discoverable sparql endpoints that expose these stats (which also include backlinks to original data sources), I hope the vision I espouse is a little clearer re. congruence to yours :-)

It might be that Sindice comes to dominate this market; they (you)
probably have the best base technology to use for it at the moment.  But
the point is that if there is a market, and a standard interface, then
the service can probably be relied upon.

The market is too big for 1000 googles. The network is scale-free, so no single entity can pull it off effectively, something will give. All we can do is build a federation that has user configurable traversal paths (what happens then a user interacts with the Web, established a beached via a representation, and then beams SPARQL from there, covertly or overtly).

Put bluntly, Google model is obsolete in this context, really :-)


--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO OpenLink Software Web: http://www.openlinksw.com





Reply via email to