[
https://issues.apache.org/jira/browse/CLEREZZA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021485#comment-13021485
]
Henry Story commented on CLEREZZA-463:
--------------------------------------
" the local caches I think should be named differently from the remote graph."
ideally local graphs should have relative URIs. Then the coder would not need
to know what the local deployment hostname is when developing his code. This is
similar to how rdf/xml allows one to write relative URIs in the XML, or how
one can use relative URLs in N3, or indeed in HTML. These allow the editor to
see how his documents link together on the file system before someone publishes
them on a server. In fact that is also how JSR311 works. If this is found to be
problematic with current instantiations of RDF, then at least the API could
hide whatever other system is used to get around limitations in RDF stores such
as adding http://zz.localhost before a relative URI.
But in a world where local graphs had relative URIs only things would be
locally good, but it would not allow that good a communication with the
external world, because the local system would not know when foreign servers
were speaking about it. To be aware of global communication the local system
does need to know about where foreign documents are linking to the local
documents. Though this is not such a big deal either.
Remote graphs as I mentioned are easiest named after the remote resource from
which they come, especially in a graph database. One could decide that every
graph stored locally is just a temporary representation of a remote graph,
which would be useful for temporal reasoning - to keep track of changes to
versions of a resource for example. To push that logic further one may want to
distinguish in a multi user system between graphs for the same resource when
requested by different people, as explained in great detail in CLEREZZA-490. It
is clear then that naming a remote graph by the URL of the resource of that
remote graph is not the final solution. But unless we have a function from
(user, time, URI) -> graph then giving the resource the name of the graph is
certainly the easiest, as it makes reading the database a lot easier: one just
needs to look at the name of a graph to know where it came from.
The fully correct system for remote graphs would be something like this in n3
:g202323 a :Graph;
= { ... };
:fetchedFrom <https://remote.com/>;
:fetchedBy <http://bblfish.net/people/henry/card#me>;
:representation <file:/tmp/repr/202323>;
:httpMeta [ etag "sdfsdfsddfs";
validTo "2012...."^^xsd:dateTime;
... redirected info?
] .
:g202324 a :Graph;
= { ... };
:fetchedFrom <https://remote.com/>;
:fetchedBy <http://farewellutopia.com/reto#me>;
:representation <file:/tmp/repr/202324>;
:httpMeta [ etag "ddfsdfsddfd";
validTo "2012...."^^xsd:dateTime;
... redirected info?
] .
This would allow the proxy to:
- be much more useful in debugging, say when a remote document is broken it
can help the user see where that is
- know when to fetch new remote representations
- know to distinguish representations sent to different users
It is arguable that the better the remote systems are written, the more RESTful
they are, the more the name of the remote graph can be the name of the remote
resource, since they will be names of unchanging entities. Given that we could
start with naming the remote graphs the RESTful way, as that is easiest and
most likely to force us to be RESTful ourselves.
> create a SemWebProxy bundle
> ---------------------------
>
> Key: CLEREZZA-463
> URL: https://issues.apache.org/jira/browse/CLEREZZA-463
> Project: Clerezza
> Issue Type: New Feature
> Reporter: Henry Story
> Assignee: Henry Story
> Labels: cache, web, webid
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> A Semantic Web CMS like Clerezza is all going to be about fetching data from
> the web and using it to create interesting services. Fetching remote graphs
> should therefore be a simple and very reliable service. The service should
> act as a semantic web proxy/cache service. It should
> - be able to fetch a remote resource
> - return a local cached version if the remote resource has not been update
> (this implies it should understand the logic of HTTP etags, valid-until,
> and so on)
> - keep track of redirects
> - of which resources are information resources and which not (eg
> http://xmlns.com/foaf/0.1/knows is not an information resource, but a
> relation, and so redirects to the ontology)
> - allow the user to specify if he wants a clean version to be fetched
> remotely, or force the usage of local version
> - return a graph of that remote resource
> - also return a message if the resource does not exist, or is unavailable
> Longer term:
> - be able to return graphs for how resources were in the past
> - fetch graphs as a user - so that it can authenticate with WebID to remote
> resources and get additional information
> - know how to get GRDDL transforms to make any xml easily transformable into
> graphs
> In my latest 'mistaken' checkin ( r1081290 which should have been a
> development branch really, but it's easier to fix now than to undo) this
> role is taken by the
> org.apache.clerezza.platform.users.WebDescriptionProvider, as a large part of
> this was correctly done there by reto. So the proposal is that the proxy part
> of the WebDescriptionProvider should be moved to its own module, and that the
> WebDescriptionProvider should use that proxy service.
> This service will be needed for fetching web pages on the web. It should be
> built to be efficient and parallellisable. Perhaps Scala Actors are the right
> thing to user here (I am looking into this).
> Since this service should be useable by SSPs that need to use remote data, it
> should have a class containing a fetch() method that implements the
> WebRendering function https://issues.apache.org/jira/browse/CLEREZZA-356
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira