Re: [Geotools-devel] distributed rendering

Andrea Aime Sun, 22 Feb 2015 00:53:17 -0800

On Mon, Feb 16, 2015 at 5:23 PM, Rich Fecher <rfec...@gmail.com> wrote:


>  I can dig into the concept of GetMapCallback building a DirectLayer -
> from that standpoint we will ill fit into the spirit of geoserver/geotools
> public API's much better I think, and still have full control, similar to
> the full control I jumped on with the custom map output format approach.
> However, my concerns are that we want to re-use as much code as possible,
> and we'd prefer not to be a unique snowflake in the geotools ecosystem if
> we can make this concept of broader appeal.
>

To get this "solid" one would need a implementation that is part of
GeoTools. See my comment about maintaining compatibility towards something
that is not
part of the API.
If it's small changes that can be locked down with tests I don't see a
particular problem, if we have instead to perform architectural changes
that would make
maintenance harder for something that does not contribute any direct value
to GeoTools, then we have a more difficult proposition.


> As a direct layer, I feel we may essentially end up in the same boat of
> wanting to re-use components of the streaming renderer in ways that are not
> available through the public API, which in the end is not really a
> maintainable proposition for us.  But I also feel we will likely be a
> unique snowflake if we just talk specifically "WMS rendering" when we talk
> about exposing/expressing distributed computation intuitively in
> geoserver/geotools.  It may end up being hard to justify any further hooks
> than you have already suggested without the broader appeal of perhaps
> thinking about the problem generally as rendering is the special case of
> "my data store wants to expose distributed processing."
>
>
Yes, pushing on the idea of the store willing to help rendering seems like
a more natural way to go, since we already have made some effort in that
direction.


> Here's discussion for the group that I don't think is on this thread:
>
> Myself: I know I tried to use a render transform first, much like the use
>> case where I subsample/decimate the results.  One key difference that made
>> a render transform awkward to use for the distributed rendering was that
>> the style rules are outside of the render transform and you can have
>> multiple featuretypestyle's with different render transforms for each
>> layer.  I want to be able to send one query that projects the
>> processing/rendering of the data, local to the data and not do it multiple
>> times.  The render transform technique was outside of this scope.  I say
>> "processing" because I wonder if it makes sense to expose as part of the
>> public API something at the DataStore level or the FeatureCollection level
>> that somehow marks it as distributed, and the same intuitive hooks for
>> distributing WPS processes can be used for distributing rendering too?
>>
>>> Andrea: Datastore and collection are just data providers, if you have a
>>> good distributed implementation of them more power to you, but I don't
>>> see how marking them as parallel will benefit the user of the collection?
>>> As data providers, processing is not their job (again, if you are hiding
>>> processing below the surface, that's fine, the normal code should not be
>>> bothered by that though), but you can pass them FeatureVisitor, and
>>> there we might have a more natural integration with map/reduce
>>> style processes, provided we figure out some way to mark a visitor as
>>> something serializable (so that it can be sent to other nodes), maybe
>>> with an indication of the preferred distribution strategy (slice over
>>> space, time, attribute ranges?),
>>> and "reduceable" (given the results of the N distributed visits, who do
>>> I put it back?).
>>> And the same could be of course used locally to leverage multiple
>>> cpu/cores.
>>
>>
> To clarify, yes we have a good distributed implementation of the data
> provider hooks. As a point of reference as to the backend approaches we're
> leveraging, essentially we are looking at initially supporting the stores
> that most closely resemble Google's BigTable (Accumulo right now, then
> HBase because its very similar, then perhaps Cassandra, with the
> understanding that there could be a bit more to take on there).  Who know
> we may look into more in the future.  But the idea that our feature
> collection can stream back features and lots of them to a client (in this
> case geoserver) is not quite enough in our situation.  We need to be able
> to distribute the processing too.  I really think Andrea is getting at
> something with the visitor pattern, particularly when you look at the
> ContentFeatureSource accepts() method. The real win is if we can decouple
> the processing from the data source, because as a source I would want to
> run any process defined by a third party that can work on my feature data
> (or grid coverages :-) ...or am I getting too greedy there). I think the
> distribution strategy can be encapsulated by the data provider, but that
> "reduceable" part of how do I put the results back together really has to
> be encapsulated by the process.
>

One downside of this is that currently StreamingRenderer does not use
visitors, and if we were to make it do so, I guess the
visitor would be in charge of creating the RenderedFeature and feeding the
paint queue... which is probably not easy to
distribute either...


>
> Re:
>
>> PS: in any case we might have a problems with labelling, if the style has
>> any
>> label the distributed rendering thing cannot be applied... I guess your
>> GetMapCallback
>> could split the style in two parts and create two layers, a traditional
>> one
>> for layers, and a distributed one for everything else.
>>
>
> Did I miss something in assuming that as long as I composite the
> distributed layers in the same order as StreamingRenderer ordinarily does
> with the drawOptimized() method and MergeLayersRequest I can handle any
> style, including labels?  Of course have to keep the images rendered by my
> distributed nodes separate for each featuretypestyle and for the labels,
> but then in my "reduceable" part, I composite the results for each
> featuretypestyle and the labels in the usual order, trying to re-use the
> existing StreamingRenderer pattern.
>

Yes, you did miss an important element: labeling with conflict resolution
is a global problem, a butterfly label in australia can cause a storm in
USA labelling.
It's also why labeling mixes so badly with tiling, you have a label that
crosses a tile border and you don't know if the label will be visible in the
nearby tile because the conflict resolution there is a separate process
from the current tile one.
And that's why by default, unless the vendor options say otherwise, we have
to throw away labels crossing the current map boundaries.

Now, as a result, if you distribute your label rendering, you get a
different labelling than a centralized system, with potentially a lot less
labels, depending on
how many bits you split your target image into. Your optimization is not
transparent anymore.


>
> Re:
>
>> In terms of how the initial request is handled, I am a little unclear on
>> the choices here.  I believe we can and should restrict ourselves to SLDs
>> which are 'easy to distribute'.  I think trading some style options for
>> fast WMS responses with data sets involving billions of features is
>> reasonable.  As we get a better handle on system preformance, I'd be
>> interested in supporting more styling as we zoom into a range which we know
>> we can range render quickly enough.
>
>
> It seems to me to add an unnecessary level of complication to choose your
> style based on zoom level, when as Chris eloquently pointed out we're
> really going for "data size independence." We're solving the problem not of
> being restricted by our style, we're solving the problem of being
> restricted by the amount of data.  Really what we are trying to leverage
> here is twofold: distribution of processing and short-circuiting traversal
> (taking advantage of the fact that an image represents a finite amount of
> information, we're talking pixels here or more generally samples organized
> by bands in multiple dimensions).  You can only render a pixel once for a
> map request.
>

"You can only render a pixel once for a map request." is a statement that
is wrong most of the time, you need specific styles/conditions to apply
that kind of optimization,
in particular:
* Your symbols are small enough
* Your geometry fits fully into a pixel

GeoTools ScreenMap is applied only under those conditions, it's not a
blanket optimization. I believe (hope) it's the same for your case.

Don't have particular feedback for the rest of the discussion, just a
suggestion, whatever you propose, try to do so from the point of view of
how it benefits GeoTools and other
users of GeoTools, and considering how it will affect maintenance of the
library, which is often left to the spare time of the individual
contributors.

Cheers
Andrea

-- 
==
GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.
==

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39  339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.



The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility  for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk

_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Re: [Geotools-devel] distributed rendering

Reply via email to