[jira] [Commented] (CLEREZZA-463) create a SemWebProxy bundle

Henry Story (JIRA) Tue, 19 Apr 2011 01:44:58 -0700

    [ 
https://issues.apache.org/jira/browse/CLEREZZA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021485#comment-13021485
 ]


Henry Story commented on CLEREZZA-463:
--------------------------------------

" the local caches I think should be named differently from the remote graph."

ideally local graphs should have relative URIs. Then the coder would not need 
to know what the local deployment hostname is when developing his code. This is 
similar to how rdf/xml allows one to write relative URIs in the XML, or  how 
one can use relative URLs in N3, or indeed in HTML. These allow the editor to 
see how his documents link together on the file system before someone publishes 
them on a server. In fact that is also how JSR311 works. If this is found to be 
problematic with current instantiations of RDF, then at least the API could 
hide whatever other system is used to get around limitations in RDF stores such 
as adding http://zz.localhost before a relative URI. 

But in a world where local graphs had relative URIs only things would be 
locally good, but it would not allow that good a communication with the 
external world, because the local system would not know when foreign servers 
were speaking about it. To be aware of global communication the local system 
does need to know about where foreign documents are linking to the local 
documents. Though this is not such a big deal either.

Remote graphs as I mentioned are easiest named after the remote resource from 
which they come, especially in a graph database. One could decide that every 
graph stored locally is just a temporary representation of a remote graph, 
which would be useful for temporal reasoning - to keep track of changes to 
versions of a resource for example. To push that logic further one may want to 
distinguish in a multi user system between graphs for the same resource when 
requested by different people, as explained in great detail in CLEREZZA-490. It 
is clear then that naming a remote graph by the URL of the resource of that 
remote graph is not the final solution. But unless we have a function from 
(user, time, URI) -> graph then giving the resource the name of the graph is 
certainly the easiest, as it makes reading the database a lot easier: one just 
needs to look at the name of a graph to know where it came from.

The fully correct system for remote graphs would be something like this in n3 

:g202323 a :Graph;
    =  { ... };
    :fetchedFrom <https://remote.com/>;
    :fetchedBy <http://bblfish.net/people/henry/card#me>;
    :representation <file:/tmp/repr/202323>;
    :httpMeta [ etag "sdfsdfsddfs";
                     validTo "2012...."^^xsd:dateTime;
                    ... redirected info?
                    ] .

:g202324 a :Graph;
    =  { ... };
    :fetchedFrom <https://remote.com/>;
    :fetchedBy <http://farewellutopia.com/reto#me>;
    :representation <file:/tmp/repr/202324>;
    :httpMeta [ etag "ddfsdfsddfd";
                     validTo "2012...."^^xsd:dateTime;
                    ... redirected info?
                    ] .


This would allow the proxy to:
   - be much more useful in debugging, say when a remote document is broken it 
can help the user see where that is
   - know when to fetch new remote representations
   - know to distinguish representations sent to different users

It is arguable that the better the remote systems are written, the more RESTful 
they are, the more the name of the remote graph can be the name of the remote 
resource, since they will be names of unchanging entities. Given that we could 
start with naming the remote graphs the RESTful way, as that is easiest and 
most likely to force us to be RESTful ourselves.
      


> create a SemWebProxy bundle
> ---------------------------
>
>                 Key: CLEREZZA-463
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-463
>             Project: Clerezza
>          Issue Type: New Feature
>            Reporter: Henry Story
>            Assignee: Henry Story
>              Labels: cache, web, webid
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> A Semantic Web CMS like Clerezza is all going to be about fetching data from 
> the web and using it to create interesting services. Fetching remote graphs 
> should therefore be a simple and very reliable service. The service should 
> act as a semantic web proxy/cache service. It should
> - be able to fetch a remote resource
> - return a local cached version if the remote resource has not been update
>   (this implies it should understand the logic of HTTP etags, valid-until, 
> and so on)
> - keep track of redirects
> - of which resources are information resources and which not (eg 
> http://xmlns.com/foaf/0.1/knows is not an information resource, but a 
> relation, and so redirects to the ontology)
> - allow the user to specify if he wants a clean version to be fetched 
> remotely, or force the usage of local version
> - return a graph of that remote resource
> - also return a message if the resource does not exist, or is unavailable
> Longer term:
> - be able to return graphs for how resources were in the past
> - fetch graphs as a user - so that it can authenticate with WebID to remote 
> resources and get additional information
> - know how to get GRDDL transforms to make any xml easily transformable into 
> graphs
> In my latest 'mistaken' checkin ( r1081290 which should have been a 
> development branch really, but it's easier  to fix now than  to undo) this 
> role is taken by the 
> org.apache.clerezza.platform.users.WebDescriptionProvider, as a large part of 
> this was correctly done there by reto. So the proposal is that the proxy part 
> of the WebDescriptionProvider should be moved to its own module, and that the 
> WebDescriptionProvider should use that proxy service.
> This service will be needed for fetching web pages on the web. It should be 
> built to be efficient and parallellisable. Perhaps Scala Actors are the right 
> thing to user here (I am looking into this).
> Since this service should be useable by SSPs that need to use remote data, it 
> should have a class containing  a fetch() method that implements the 
> WebRendering function https://issues.apache.org/jira/browse/CLEREZZA-356

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CLEREZZA-463) create a SemWebProxy bundle

Reply via email to