>
> I could make things more explicit on the main proposal page, but each
> section contains links to more detailed information, including the
> actual "patch" to be seen inline. Namely:
> <http://geoserver.org/display/GEOS/GSIP+69+-+API+Proposal>
> As it's all about new stuff, when it comes strictly to the API
> proposal, I judged it'd be convenient to have the whole proposal to be
> seed inline. Yet there's also a link to the github branch where the
> whole work lives, both API proposal, code migration for exemplary use
> cases, and alternative jdbc backend:
> <https://github.com/groldan/geoserver/tree/GSIP69>
>
>
All right, took half a day to go though the proposal and the code.
The proposal direction is good, agree with all the needs it expresses
and the general idea that we should be able to filter and page and load
stuff interactively.
I strongly disagree on the idea that rolling a new filter API is better than
using the OGC one, this is a first show stopper for me.
The Predicate API is very limited and has no spatial filter support,
GeoServer core already depends on GeoTools heavily so the whole
reasoning about Predicate advantages is pretty empty, I actually
see a lot more weaknesses in rolling a new API:
- it violates the KISS principle, adding more stuff does not make anything
simpler
- it does not make writing encoders any easier, on the contrary, demands
more code to be written while we already have a pretty simple way to
split a complex OGC filter into a supported and unsupported part,
lots of encoders that we can mimick and/or copy code from
- it does not avod external dependencies, as geotools is already there
- it misses a lot of expressiveness, instead of writing clumsly Predicates
that can only run in memory (since they are not well known) we can
actually use a API that can get translated to SQL (thinking about the
name matching filters in the GUI here)
- the idea that the domain is different is quite surprising, most of the
elements
that grow in big numbers have a bbox attached to them, so they are
indeed spatial. One of the things you normally want to do in a security
subsystem is restrict access by geographic area, and we could not
express that with Predicate
Moreover, with OGC filters it would get really easy to create a datastore
based catalog implementation if we want to, and it would be much better
of a proof of concept than the current ones (more on this later).
The only drawback of Filter is that it is supposed to be a "closed" API,
with no way to implement a new filter, but that is actually less of a
limitation
since the model is rich, and easily worked around by implementing
whatever filter function is missing.
Moving forward, I would advise against having to check if the store
can data sort or not, it just makes the caller code more complicated
and forces it to do work arounds if sorting is really needed.
In GeoTools we have code that does merge-sort using disk space
if necessary that can sort whatever amount of data with little memory
footprint (of course, at the price of performance).
It would be better to have a method that checks if sorting can be done
fast instead, so if the code needs sorting as an optimization it can
leverage it or use an alternate path, but code that really needs sorting
will just ask it and have it done by the catalog impl without having to
repeat that in all places that do need sorting for good.
A small other thing is that these methods are supposed to access file system
or the network, but they don't throw any exception... I can live with that,
most likely the calling code does not have anything meaningful to do
in case of exception anyways, but thought I'd point it out anyways.
A thing that I find instead surprising is seeing no trace of a transaction
concept,
if the intent is to move to a model where multiple GeoServer share the same
db
and write on it in parallel, being able to use transactions seems quite
important,
there is a need for coordination that is not addressed by this proposal.
The modifications done below and above the API changes are simple proofs
of concept, meaning the validation of the API is low and the checks on its
quality low as well, not something we'd want to fast track on a code base
that we want to make more stable.
Let's start by what's above the API. All we have is a handful of examples,
but the code base is largerly unchanged. On one side this means the new
code is not going to be touched very much, on the other side it means we
get no benefit from the new API and we're not validating it and its
implementation
at all. Looks like a trojan horse to get in the higher level modifications
later,
which will actually destabilize the code base as we are already in RC or
bugfix release mode.
Moreover various of the predicates created have no chance to be encoded
in native mode since they are not "well known".
In fact the authorization subsystem should be changed too in order to
leverage
the new scalability api, so that it returns a filter instead of checking
point by
point a single layer.
Same goes for the GUI filtering code, which:
- loads the full data set in memory in case the store is not able to sort on
the desired attribute
- builds a predicate that is not encodable (with OGC filter we could
actually encode it instead).
The bits below the API are baffling too. Both the JE and JDBC
implementations
are based on key/value store where the value is the XML dump of each
CatalogInfo.
This makes the whole point about filter encoding moot, as there is almost
no filter
being encoded down at the DB level.
Say I want to do a GetMap request with a single layer, we know the name, we
end
up scanning the whole database, load all the blobs, parse them back in
memory,
apply the filter in memory. Sure it scales to 100M items, but nobody will
want to wait
for the time it takes this implementation to do a simple GetMap.
I know they are community modules, but even a community module should have
a split
chance of being used, this implementation seems so weak that I don't
believe anyone
will want actually want to use it, and in order to validate the API we
should have an
implementation that actually makes use of some of its concept (some actual
native filtering
for example).
(little aside, nice to see bonecp in the mix, I actually wanted to try out
that connection pool
me too)
Long story short, the proposal seems weak in some API points, and the
implementation is
proof of concept which I don't think should be allowed to land on trunk
right now.
But I want to offer a reasonable alternative: shall we move to a time boxed
6 months
release cycle? 4 months to add new stuff, 2 months to stabilize, rinse and
repeat,
push out 3-4 bugfix releases in the stable series.
This way we don't have to have these long waits for new features to show
up, this
much needed scalability improvement can land on a stable series in the next
8-9 months (assuming 2 months to release 2.2.0 and 6 months cycle), be
vetted
and improved so that it's an API and an implementation we can all be proud
of.
I really want to GSIP in, just not now and not in its current state.
But I'm willing to put forward resources to help out making it become a
reality
I really do hope that the rest of the PSC chimes in as well, this is an
important
GSIP and it deserves other people opinions (besides my personal rants).
Ah, next week I'll also try to prepare a GSIP for the 6 months release
cycle,
unless of course there are very negative reactions to the idea.
Cheers
Andrea
--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549
http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf
-------------------------------------------------------
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Geoserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geoserver-devel