Re: [Geotools-devel] distributed rendering

Rich Fecher Sun, 22 Feb 2015 13:57:07 -0800

>
> To get this "solid" one would need a implementation that is part of
> GeoTools. See my comment about maintaining compatibility towards something
> that is not
> part of the API.
> If it's small changes that can be locked down with tests I don't see a
> particular problem, if we have instead to perform architectural changes
> that would make
> maintenance harder for something that does not contribute any direct value
> to GeoTools, then we have a more difficult proposition.



I think we're on the same page. I like your recent comment on
geoserver-devel which I think is appropriate in this case to geotools: "you
basically have only two types of behaviors, those that have tests failing
when you change them, and those that get broken, it's just a matter of
time..."  My hope is this thread on the forum may eventually lead to some
sort of future roadmap or proposal that could make this behavior
maintainable and expand the flexibility of the API particularly in enabling
third-parties to take advantage of distributed systems (but of course I'd
also be happy if uDig's rendering can leverage any of this discussion).

One downside of this is that currently StreamingRenderer does not use
> visitors, and if we were to make it do so, I guess the
> visitor would be in charge of creating the RenderedFeature and feeding the
> paint queue... which is probably not easy to
> distribute either...
>

Here's what I how envision this suggestion and please let me know if I'm on
the right track. In pulling from your previous discussion, the visitor
would essentially fit something like this description:

> another StreamingRenderer clone that can be serialized and sent over
> to the various nodes, I'd keep it purely descriptive instead, something
> like (by broad strokes):
>
> LayerRenderingContext {
>   ReferencedEnvelope renderingArea;
>   Rectangle paintArea;
>  RenderingHints java2dHints;
>  Map rendererHints;
>  List<FeatureTypeStyle> styles;
> }
>

Each feature source would be handed a "StreamingRenderer clone" in the
accepts method, but they would all share a rendering request queue, that a
single painter thread would work off.

The default implementation of every feature source would be feeding the
paint queue with rendering requests such as PaintShapeRequest, and you're
right that feeding a paint queue with these single feature requests won't
work for the distributed rendering case although just subsampling could
still work.  But again to subsample polygon and lines properly the data
nodes are basically doing the work of rendering them so it is nice to try
to take that next step and just composite the rendered images as a reduce
step.  So the way I'd envision taking that next step would be that perhaps
my feature source could serialize the StreamingRenderer clone, have the
data nodes use the clone to render the image, and then in the end my
feature source would be feeding the paint queue with a single
MergeLayersRequest instead of a rendering request per feature.  One side
effect that I see to this is that the RenderListener won't be getting
events for each feature rendered (which if the whole point of this is
scalability, we'd really have to compromise on that regardless).

Yes, you did miss an important element: labeling with conflict resolution
> is a global problem, a butterfly label in australia can cause a storm in
> USA labelling.
> It's also why labeling mixes so badly with tiling, you have a label that
> crosses a tile border and you don't know if the label will be visible in the
> nearby tile because the conflict resolution there is a separate process
> from the current tile one.
> And that's why by default, unless the vendor options say otherwise, we
> have to throw away labels crossing the current map boundaries.
>

> Now, as a result, if you distribute your label rendering, you get a
> different labelling than a centralized system, with potentially a lot less
> labels, depending on
> how many bits you split your target image into. Your optimization is not
> transparent anymore.
>

Yep, you're right, the global conflict resolution would be better (more
accurately) centralized. I'm probably reading too much into "depending on
how many bits you split your target image into" but to clarify for others,
we're not distributing different sections of the target image (like tiling)
- our target image is the full map request at each node.  The data within
each node is rendered using the existing StreamingRenderer code (as much as
possible) and the labels are kept separate.  In the end, the merge layers
request keeps all of the the labels as the top layer. Except for this
global conflict resolution, the distributed rendering side of labeling
should work similar to the centralized labeling - we have a range on our
space filling curve for each distributed result image so we also make sure
the order is consistent when we merge.  Each node will try to use the same
bit of logic to deconflict labels within its dataset (and each node's data
is organized spatially, so the overlap between labels on different nodes
may be slightly alleviated). I missed this point initially so I'm glad you
pointed it, and I would think our final implementation should probably at
least give the option of returning the labels (although subsampled) rather
than rendering them on the data node so global conflict resolution can be
done (less optimized, more accurate).

"You can only render a pixel once for a map request." is a statement that
> is wrong most of the time, you need specific styles/conditions to apply
> that kind of optimization,
> in particular:
> * Your symbols are small enough
> * Your geometry fits fully into a pixel
>
> GeoTools ScreenMap is applied only under those conditions, it's not a
> blanket optimization. I believe (hope) it's the same for your case.
>

You know, I realized after I sent that email that I misspoke...you won't
let me get away with anything :-).  In that quote from above I should have
quit while I was ahead because what I meant was "an image represents a
finite amount of information" and at best oversimplified it (and generally
was just wrong) with the render/paint a pixel once comment.  You can paint
a pixel multiple times and if you think of exposing blending functions like
glBlendFunc, or java 2d composites - you'd have options for overwriting the
existing value, keep it, or do a number of different things concerning
alpha. In my concept of subsampling I try to separate that concern as much
as possible so I have essentially I function that says to sub sample or not
to subsample a pixel that was painted (so the only reason I would say its
an oversimplification, if my version of "isPainted()" is the resolution of
this more complex function than just was it touched once).  In the end the
user just needs to know the use cases (styles) for their layer that they
might enable subsampling on (and distributed rendering with subsampling
might be considered a bit of an advanced feature).  I got into it a little
above the options that I expose to a user to resolve my "isPainted" method:
its things like only subsample on the topmost feature type style, subsample
using a number of times painted, and subsample using a threshold alpha.

I think what you're talking about relating how ScreenMap limits the
optimization to symbols/geometry that are small (ie. pixel/subpixel) is a
subset of our problem.  Correct me if I'm wrong, but if our space filling
curve represents more than a pixel (and our geometry is fully contained in
that space filling curve value), but all of the pixels "are painted" (again
according to whatever function the user may be comfortable with), we can
skip that range of space filling curves and never need to read that
geometry.  Considering the case where the entire target image is covered
with opaque pixels rendered from our topmost feature type style, the
subsample algorithm would have skipped to the end of the space filling
curve range, regardless of the size of the geometries represented (ie. they
may be much larger than subpixel).

Don't have particular feedback for the rest of the discussion, just a
> suggestion, whatever you propose, try to do so from the point of view of
> how it benefits GeoTools and other
> users of GeoTools, and considering how it will affect maintenance of the
> library, which is often left to the spare time of the individual
> contributors.
>

Again, completely on the same page here.  I guess from my perspective of
the conversation at this point, the most promising consideration so far may
be the re-use of the feature source accepts method with the streaming
renderer clone as a visitor. If this or something like this could start to
become a common place to implement more friendly functionality for
distributed processing, I think it could really benefit the community of
users looking into distributed computation.  And if the processes (WPS)
could implement some form of serialization and some form of reduce, perhaps
a distributable function could be decoupled from a distributable data
source, which I think could really benefit the API.  Again, thats just my
perspective, but I wouldn't mind helping out if I do get agreement from
others.

Another long response...sorry!
Rich



On Sun, Feb 22, 2015 at 3:51 AM, Andrea Aime <andrea.a...@geo-solutions.it>
wrote:

> On Mon, Feb 16, 2015 at 5:23 PM, Rich Fecher <rfec...@gmail.com> wrote:
>
>>  I can dig into the concept of GetMapCallback building a DirectLayer -
>> from that standpoint we will ill fit into the spirit of geoserver/geotools
>> public API's much better I think, and still have full control, similar to
>> the full control I jumped on with the custom map output format approach.
>> However, my concerns are that we want to re-use as much code as possible,
>> and we'd prefer not to be a unique snowflake in the geotools ecosystem if
>> we can make this concept of broader appeal.
>>
>
> To get this "solid" one would need a implementation that is part of
> GeoTools. See my comment about maintaining compatibility towards something
> that is not
> part of the API.
> If it's small changes that can be locked down with tests I don't see a
> particular problem, if we have instead to perform architectural changes
> that would make
> maintenance harder for something that does not contribute any direct value
> to GeoTools, then we have a more difficult proposition.
>
>
>> As a direct layer, I feel we may essentially end up in the same boat of
>> wanting to re-use components of the streaming renderer in ways that are not
>> available through the public API, which in the end is not really a
>> maintainable proposition for us.  But I also feel we will likely be a
>> unique snowflake if we just talk specifically "WMS rendering" when we talk
>> about exposing/expressing distributed computation intuitively in
>> geoserver/geotools.  It may end up being hard to justify any further hooks
>> than you have already suggested without the broader appeal of perhaps
>> thinking about the problem generally as rendering is the special case of
>> "my data store wants to expose distributed processing."
>>
>>
> Yes, pushing on the idea of the store willing to help rendering seems like
> a more natural way to go, since we already have made some effort in that
> direction.
>
>
>> Here's discussion for the group that I don't think is on this thread:
>>
>> Myself: I know I tried to use a render transform first, much like the use
>>> case where I subsample/decimate the results.  One key difference that made
>>> a render transform awkward to use for the distributed rendering was that
>>> the style rules are outside of the render transform and you can have
>>> multiple featuretypestyle's with different render transforms for each
>>> layer.  I want to be able to send one query that projects the
>>> processing/rendering of the data, local to the data and not do it multiple
>>> times.  The render transform technique was outside of this scope.  I say
>>> "processing" because I wonder if it makes sense to expose as part of the
>>> public API something at the DataStore level or the FeatureCollection level
>>> that somehow marks it as distributed, and the same intuitive hooks for
>>> distributing WPS processes can be used for distributing rendering too?
>>>
>>>> Andrea: Datastore and collection are just data providers, if you have a
>>>> good distributed implementation of them more power to you, but I don't
>>>> see how marking them as parallel will benefit the user of the
>>>> collection?
>>>> As data providers, processing is not their job (again, if you are
>>>> hiding processing below the surface, that's fine, the normal code should
>>>> not be
>>>> bothered by that though), but you can pass them FeatureVisitor, and
>>>> there we might have a more natural integration with map/reduce
>>>> style processes, provided we figure out some way to mark a visitor as
>>>> something serializable (so that it can be sent to other nodes), maybe
>>>> with an indication of the preferred distribution strategy (slice over
>>>> space, time, attribute ranges?),
>>>> and "reduceable" (given the results of the N distributed visits, who do
>>>> I put it back?).
>>>> And the same could be of course used locally to leverage multiple
>>>> cpu/cores.
>>>
>>>
>> To clarify, yes we have a good distributed implementation of the data
>> provider hooks. As a point of reference as to the backend approaches we're
>> leveraging, essentially we are looking at initially supporting the stores
>> that most closely resemble Google's BigTable (Accumulo right now, then
>> HBase because its very similar, then perhaps Cassandra, with the
>> understanding that there could be a bit more to take on there).  Who know
>> we may look into more in the future.  But the idea that our feature
>> collection can stream back features and lots of them to a client (in this
>> case geoserver) is not quite enough in our situation.  We need to be able
>> to distribute the processing too.  I really think Andrea is getting at
>> something with the visitor pattern, particularly when you look at the
>> ContentFeatureSource accepts() method. The real win is if we can decouple
>> the processing from the data source, because as a source I would want to
>> run any process defined by a third party that can work on my feature data
>> (or grid coverages :-) ...or am I getting too greedy there). I think the
>> distribution strategy can be encapsulated by the data provider, but that
>> "reduceable" part of how do I put the results back together really has to
>> be encapsulated by the process.
>>
>
> One downside of this is that currently StreamingRenderer does not use
> visitors, and if we were to make it do so, I guess the
> visitor would be in charge of creating the RenderedFeature and feeding the
> paint queue... which is probably not easy to
> distribute either...
>
>
>>
>> Re:
>>
>>> PS: in any case we might have a problems with labelling, if the style
>>> has any
>>> label the distributed rendering thing cannot be applied... I guess your
>>> GetMapCallback
>>> could split the style in two parts and create two layers, a traditional
>>> one
>>> for layers, and a distributed one for everything else.
>>>
>>
>> Did I miss something in assuming that as long as I composite the
>> distributed layers in the same order as StreamingRenderer ordinarily does
>> with the drawOptimized() method and MergeLayersRequest I can handle any
>> style, including labels?  Of course have to keep the images rendered by my
>> distributed nodes separate for each featuretypestyle and for the labels,
>> but then in my "reduceable" part, I composite the results for each
>> featuretypestyle and the labels in the usual order, trying to re-use the
>> existing StreamingRenderer pattern.
>>
>
> Yes, you did miss an important element: labeling with conflict resolution
> is a global problem, a butterfly label in australia can cause a storm in
> USA labelling.
> It's also why labeling mixes so badly with tiling, you have a label that
> crosses a tile border and you don't know if the label will be visible in the
> nearby tile because the conflict resolution there is a separate process
> from the current tile one.
> And that's why by default, unless the vendor options say otherwise, we
> have to throw away labels crossing the current map boundaries.
>
> Now, as a result, if you distribute your label rendering, you get a
> different labelling than a centralized system, with potentially a lot less
> labels, depending on
> how many bits you split your target image into. Your optimization is not
> transparent anymore.
>
>
>>
>> Re:
>>
>>> In terms of how the initial request is handled, I am a little unclear on
>>> the choices here.  I believe we can and should restrict ourselves to SLDs
>>> which are 'easy to distribute'.  I think trading some style options for
>>> fast WMS responses with data sets involving billions of features is
>>> reasonable.  As we get a better handle on system preformance, I'd be
>>> interested in supporting more styling as we zoom into a range which we know
>>> we can range render quickly enough.
>>
>>
>> It seems to me to add an unnecessary level of complication to choose your
>> style based on zoom level, when as Chris eloquently pointed out we're
>> really going for "data size independence." We're solving the problem not of
>> being restricted by our style, we're solving the problem of being
>> restricted by the amount of data.  Really what we are trying to leverage
>> here is twofold: distribution of processing and short-circuiting traversal
>> (taking advantage of the fact that an image represents a finite amount of
>> information, we're talking pixels here or more generally samples organized
>> by bands in multiple dimensions).  You can only render a pixel once for a
>> map request.
>>
>
> "You can only render a pixel once for a map request." is a statement that
> is wrong most of the time, you need specific styles/conditions to apply
> that kind of optimization,
> in particular:
> * Your symbols are small enough
> * Your geometry fits fully into a pixel
>
> GeoTools ScreenMap is applied only under those conditions, it's not a
> blanket optimization. I believe (hope) it's the same for your case.
>
> Don't have particular feedback for the rest of the discussion, just a
> suggestion, whatever you propose, try to do so from the point of view of
> how it benefits GeoTools and other
> users of GeoTools, and considering how it will affect maintenance of the
> library, which is often left to the spare time of the individual
> contributors.
>
> Cheers
> Andrea
>
> --
> ==
> GeoServer Professional Services from the experts! Visit
> http://goo.gl/NWWaa2 for more information.
> ==
>
> Ing. Andrea Aime
> @geowolf
> Technical Lead
>
> GeoSolutions S.A.S.
> Via Poggio alle Viti 1187
> 55054  Massarosa (LU)
> Italy
> phone: +39 0584 962313
> fax: +39 0584 1660272
> mob: +39  339 8844549
>
> http://www.geo-solutions.it
> http://twitter.com/geosolutions_it
>
> *AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*
>
> Le informazioni contenute in questo messaggio di posta elettronica e/o
> nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
> loro utilizzo è consentito esclusivamente al destinatario del messaggio,
> per le finalità indicate nel messaggio stesso. Qualora riceviate questo
> messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
> darcene notizia via e-mail e di procedere alla distruzione del messaggio
> stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
> divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
> utilizzarlo per finalità diverse, costituisce comportamento contrario ai
> principi dettati dal D.Lgs. 196/2003.
>
>
>
> The information in this message and/or attachments, is intended solely for
> the attention and use of the named addressee(s) and may be confidential or
> proprietary in nature or covered by the provisions of privacy act
> (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
> Code).Any use not in accord with its purpose, any disclosure, reproduction,
> copying, distribution, or either dissemination, either whole or partial, is
> strictly forbidden except previous formal approval of the named
> addressee(s). If you are not the intended recipient, please contact
> immediately the sender by telephone, fax or e-mail and delete the
> information in this message that has been received in error. The sender
> does not give any warranty or accept liability as the content, accuracy or
> completeness of sent messages and accepts no responsibility  for changes
> made after they were sent or for other risks which arise as a result of
> e-mail transmission, viruses, etc.
>
> -------------------------------------------------------
>

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk

_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Re: [Geotools-devel] distributed rendering

Reply via email to