Hi all, during FOSS4G 2008 and associated code sprint me and Justin had a look at the WPS specification and the current GeoServer community module looking for a way to move it forward. This mail summarizes some of our findings and ideas. Don't read it as a "will do all of this by next month" by more of a set of wishful thoughts that I will probably try to turn into reality, slowly, in my spare time.
FIX EXISTING ISSUES We explored the current WPS module and found a number of issues that we'd like to address: - lack of unit testing - lack of xml request support for GetCapabilities, lack of KVP support for Execute (none of them is mandatory, but not having the choice of KVP for Execute is actually quite an annoyance, as KVP is often perceived as more immediate to use) - the transmuters API seems overkill, there is a need to create a class per type handled, class level javadoc missing. Justin has implemented on his notebook a tentative replacement that does the same job as the existing 12 classes in 3 simple classes and still leaves the door open to handle raster data (whilst the current complex data handler simply assumes data to be parsed is XML, but the input could be anything such as a geotiff or a compressed shapefile) Once this is fixed, we can try to hook up reading feature collections from remote servers. AVOID THE MID MAN IF POSSIBLE We would like to avoid GML encoding/decoding (or coverage encoding/decoding) when possible, and directly access the datastores if the source is local. We can try to recognize that from the host, port and first part of the path, also dealing with proxies and the like, but as a first step we can just have a special URL that is resolved to the current geoserver, something like: .?service=wfs&version=1.0&method=GetFeature&typeName=topp:states (think of it as a relative link, the WPS request is posted to http://host:port/geoserver/ows so "." would represent exactly that URL). Once we recognize that the request is local, we want to leverage the work the Dispatcher is already doing up to a point, that is, parsing the request, executing it, and returning the results without actually encoding them. For example, WFS returns a number of FeatureCollection objects out of GetFeature, and the same happens with WCS. We can then plug those feature collections directly into the process, allowing for more efficient handling of big amounts of data SCALE UP WITH REAL REMOTE REQUESTS If the request is really remote, we have to be prepared and parse whatever is coming in. Now, a process generally speaking will need to access a feature collection content multiple times, so we either need to: - store the feature collection in memory, hoping it's not too big - store the feature collection in a persitent datastore (postgis, spatialite, even a shapefile would do) and then return back a feature collection coming from that datastore, and eventually clean up the datastore when the execute it's done (meaning we need some extension allowing the after the fact clean up of those collections). For starters we'll follow the first path, the second one is the natural evolution. A detail that bugs me is that the second option is inefficient for those processes that only need to access the feature collection once, streaming over it, for those it would be ok to use a streaming parser and just scan over the remote input stream once. We could create a marker interface to identify those processes and act consequently. It the remote input happens to be a coverage instead we can just download it as is, put it into a temporary directory, and create a coverage out of it. Actually, for some processes we could again decide to pre-process it a little, such as tiling it, in order to get memory efficient processing as opposed to reading the coverage a single big bad block of memory. STORE=TRUE For long running processes it makes lots of sense to actually support storing. Yet just storing the result is kind of unsatisfactory, my guess is that most of the client code would be interested in being able to access the results of the computation using WMS/WFS. Following this line of thinking it would be nice to allow two types of store operation mode: * plain storage of the outputs on th file system (as the standard requires) * storing the results in a datastore/coverage, register the result in the catalog, and have the WPS response return a GetFeature or a GetCoverage as the "web accessible url" store requires us to put in the response Given that most of the computations have a temporary or per user nature, it would be nice if we could decide whether to store the results in a per-session catalog that only the current user sees, and that eventually expires killing the registered resources along with it, or the full catalog. This could be done by adding a per session catalog to be used side by side with the "global" one, and having the normal services use a wrappers that looks for data first in the session one, and then in the global one. Well, that's it. Thoughts? Cheers Andrea ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Geoserver-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/geoserver-devel
