[
https://issues.apache.org/jira/browse/CLEREZZA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007428#comment-13007428
]
Henry Story commented on CLEREZZA-463:
--------------------------------------
Yes. I'll leave the big optimization pieces out of this as much as possible.
The really important optimization we want is the one described here:
http://metacircular.wordpress.com/2007/02/07/towards-polite-http-retrieval-in-scala/
That is we should remember what the e-tag and the valid-till date was when
downloading stuff, so that we can do conditional GETs.
It should also be possible to download multiple URLs in parallel.
Perhaps I should open a new issue called "optimize the SemWebProxy bundle"?
[1]
http://fmpwizard-scala.posterous.com/using-apache-httpclient-authentication-in-sca
> create a SemWebProxy bundle
> ---------------------------
>
> Key: CLEREZZA-463
> URL: https://issues.apache.org/jira/browse/CLEREZZA-463
> Project: Clerezza
> Issue Type: New Feature
> Reporter: Henry Story
> Assignee: Henry Story
> Labels: cache, web, webid
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> A Semantic Web CMS like Clerezza is all going to be about fetching data from
> the web and using it to create interesting services. Fetching remote graphs
> should therefore be a simple and very reliable service. The service should
> act as a semantic web proxy/cache service. It should
> - be able to fetch a remote resource
> - return a local cached version if the remote resource has not been update
> (this implies it should understand the logic of HTTP etags, valid-until,
> and so on)
> - keep track of redirects
> - of which resources are information resources and which not (eg
> http://xmlns.com/foaf/0.1/knows is not an information resource, but a
> relation, and so redirects to the ontology)
> - allow the user to specify if he wants a clean version to be fetched
> remotely, or force the usage of local version
> - return a graph of that remote resource
> - also return a message if the resource does not exist, or is unavailable
> Longer term:
> - be able to return graphs for how resources were in the past
> - fetch graphs as a user - so that it can authenticate with WebID to remote
> resources and get additional information
> - know how to get GRDDL transforms to make any xml easily transformable into
> graphs
> In my latest 'mistaken' checkin ( r1081290 which should have been a
> development branch really, but it's easier to fix now than to undo) this
> role is taken by the
> org.apache.clerezza.platform.users.WebDescriptionProvider, as a large part of
> this was correctly done there by reto. So the proposal is that the proxy part
> of the WebDescriptionProvider should be moved to its own module, and that the
> WebDescriptionProvider should use that proxy service.
> This service will be needed for fetching web pages on the web. It should be
> built to be efficient and parallellisable. Perhaps Scala Actors are the right
> thing to user here (I am looking into this).
> Since this service should be useable by SSPs that need to use remote data, it
> should have a class containing a fetch() method that implements the
> WebRendering function https://issues.apache.org/jira/browse/CLEREZZA-356
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira