[jira] Commented: (CLEREZZA-463) create a SemWebProxy bundle

Henry Story (JIRA) Wed, 16 Mar 2011 04:30:55 -0700

    [ 
https://issues.apache.org/jira/browse/CLEREZZA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007428#comment-13007428
 ]


Henry Story commented on CLEREZZA-463:
--------------------------------------

Yes. I'll leave the big optimization pieces out of this as much as possible. 
The really important optimization we want is the one described here:

http://metacircular.wordpress.com/2007/02/07/towards-polite-http-retrieval-in-scala/

That is we should remember what the e-tag and the valid-till date was when 
downloading stuff, so that we can do conditional GETs.
It should also be possible to download multiple URLs in parallel.

Perhaps I should open a new issue called "optimize the SemWebProxy bundle"?

[1] 
http://fmpwizard-scala.posterous.com/using-apache-httpclient-authentication-in-sca

> create a SemWebProxy bundle
> ---------------------------
>
>                 Key: CLEREZZA-463
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-463
>             Project: Clerezza
>          Issue Type: New Feature
>            Reporter: Henry Story
>            Assignee: Henry Story
>              Labels: cache, web, webid
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> A Semantic Web CMS like Clerezza is all going to be about fetching data from 
> the web and using it to create interesting services. Fetching remote graphs 
> should therefore be a simple and very reliable service. The service should 
> act as a semantic web proxy/cache service. It should
> - be able to fetch a remote resource
> - return a local cached version if the remote resource has not been update
>   (this implies it should understand the logic of HTTP etags, valid-until, 
> and so on)
> - keep track of redirects
> - of which resources are information resources and which not (eg 
> http://xmlns.com/foaf/0.1/knows is not an information resource, but a 
> relation, and so redirects to the ontology)
> - allow the user to specify if he wants a clean version to be fetched 
> remotely, or force the usage of local version
> - return a graph of that remote resource
> - also return a message if the resource does not exist, or is unavailable
> Longer term:
> - be able to return graphs for how resources were in the past
> - fetch graphs as a user - so that it can authenticate with WebID to remote 
> resources and get additional information
> - know how to get GRDDL transforms to make any xml easily transformable into 
> graphs
> In my latest 'mistaken' checkin ( r1081290 which should have been a 
> development branch really, but it's easier  to fix now than  to undo) this 
> role is taken by the 
> org.apache.clerezza.platform.users.WebDescriptionProvider, as a large part of 
> this was correctly done there by reto. So the proposal is that the proxy part 
> of the WebDescriptionProvider should be moved to its own module, and that the 
> WebDescriptionProvider should use that proxy service.
> This service will be needed for fetching web pages on the web. It should be 
> built to be efficient and parallellisable. Perhaps Scala Actors are the right 
> thing to user here (I am looking into this).
> Since this service should be useable by SSPs that need to use remote data, it 
> should have a class containing  a fetch() method that implements the 
> WebRendering function https://issues.apache.org/jira/browse/CLEREZZA-356

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CLEREZZA-463) create a SemWebProxy bundle

Reply via email to