Hi Simone,

Thanks for the feedback!

As quick response, for #1, I agree that using mosaicing / an image 
pyramid would be a great option.  I was mainly working at the prototype 
phase, and I wanted to have a discussion on the mailing lists 
(especially since changes are required in ImageIO-Ext or GeoTools and 
GeoServer.)

For #2, I do like the idea of having a cahce in the ImageInputStream.  
 From that suggestion, I take it that you'd be willing to entertain 
changes to the current ImageInputStreams and the additional of some way 
to cache data.

In terms of caching, do you have any suggestions?  Also, I'd be 
interested in any advice for how we can configure that cache and make 
those options available to a GeoServer admin appropriately.

Further, at a high-level, should the goal for this work be a community 
module?

Cheers,

Jim

On 04/22/2016 01:49 PM, Simone Giannecchini wrote:
> Dear Jim,
> quick feedback.
>
> First of all congratulation on making this work. As I suspected the
> bottleneck is getting the data out of HDFS.
> I can think about two things (which we are not mutually exclusive):
>
> -1- Maybe complex, put smaller bits into HFDS and use the mosaic to
> serve or even develop a light(er)weight layer that can pull the
> granules.
>
> This would help with WMS requests over large files as you'll end up
> use smaller chunks to satisfy them most of the time
>
> -2- We could build a more complex ImageInputStream that:
>
> - has an internal cache (file and or memory) that does not get thrown
> away upon each request but tends to live longer for each single file
> in HDF
> - we would have different streams reuse the same cache. Multiple
> requests might read data from the cache concurrently but when data is
> not there, we would block the thread for the request, go back to HFDS,
> pull the data, write to the cache and so on
>
> We could put together 1 and 2 to make things faster.
>
> Hope this helps, anyway, I am in favour of exploring this in order to
> allow the GeoServer stack to support data from HDFS.
>
> Regards,
> Simone Giannecchini
> ==
> GeoServer Professional Services from the experts!
> Visit http://goo.gl/it488V for more information.
> ==
> Ing. Simone Giannecchini
> @simogeo
> Founder/Director
>
> GeoSolutions S.A.S.
> Via di Montramito 3/A
> 55054  Massarosa (LU)
> Italy
> phone: +39 0584 962313
> fax:     +39 0584 1660272
> mob:   +39  333 8128928
>
> http://www.geo-solutions.it
> http://twitter.com/geosolutions_it
>
> -------------------------------------------------------
> AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
> Le informazioni contenute in questo messaggio di posta elettronica e/o
> nel/i file/s allegato/i sono da considerarsi strettamente riservate.
> Il loro utilizzo è consentito esclusivamente al destinatario del
> messaggio, per le finalità indicate nel messaggio stesso. Qualora
> riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
> cortesemente di darcene notizia via e-mail e di procedere alla
> distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
> Conservare il messaggio stesso, divulgarlo anche in parte,
> distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
> diverse, costituisce comportamento contrario ai principi dettati dal
> D.Lgs. 196/2003.
>
> The information in this message and/or attachments, is intended solely
> for the attention and use of the named addressee(s) and may be
> confidential or proprietary in nature or covered by the provisions of
> privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
> Data Protection Code).Any use not in accord with its purpose, any
> disclosure, reproduction, copying, distribution, or either
> dissemination, either whole or partial, is strictly forbidden except
> previous formal approval of the named addressee(s). If you are not the
> intended recipient, please contact immediately the sender by
> telephone, fax or e-mail and delete the information in this message
> that has been received in error. The sender does not give any warranty
> or accept liability as the content, accuracy or completeness of sent
> messages and accepts no responsibility  for changes made after they
> were sent or for other risks which arise as a result of e-mail
> transmission, viruses, etc.
>
>
> On Sun, Apr 17, 2016 at 9:49 PM, Jim Hughes <jn...@ccri.com> wrote:
>> Hi all,
>>
>> I want to report on my success with registering and displaying GeoTiffs
>> stored on HDFS.  There are some limitations with this approach;
>> particularly, I am unsure if there's anyway to cache / memory-map the
>> data.  As such, I believe each request is re-downloading the entire file.
>>
>> Generally, I hope to document my approach well enough so that others
>> could follow it (if needed) and to solicit feedback.  In terms of
>> feedback, I'd love to hear 1) if there are improvements, and 2) if the
>> changes are reasonable enough to be considered for a proposal/merge request.
>>
>> That out of the way, here's the rough outline:
>>
>> 1.  Register additional URL handlers.
>> 2.  Convince validation layers in GeoServer that 'hdfs' is an ok URL scheme.
>> 3.  Get bytes out of the HDFS file.
>>
>> For step 1, note that Java's URL scheme is pluggable via
>> java.net.URLStreamHandler.  The docs(1) point out that one can call
>> URL.setURLStreamHandlerFactory to setup a Factory to provide such a
>> handler.  This method can only be called once, and folks from the
>> internet (2) do yoga since Tomcat already registers a factory.  They
>> seem to have missed the fact that the Tomcat factory actually lets you
>> add your own.  I provide a gist (3) to show a little bean which will
>> instantiate a Hadoop URL handler and try to install it using both of
>> those methods.
>>
>> There are two places I found in GeoServer which validate the URL given
>> in the page for adding a GeoTiff.  The first is the GeoServer
>> FileExistValidator which calls out to a Wicket UrlValidator. Telling the
>> Wicket class to allow_all_schemes knocks out that issue.  For the
>> second, in the FileModel, one needs to provide a happy path for URLs
>> which are not local to the filesystem.  Those two small changes are here
>> (4).
>>
>> Once GeoServer will register a GeoTiff coverage with a non-'file://'
>> URL, we need to read the bytes.  Javax has an interface
>> javax.imageio.spi.ImageInputStreamSpi which adapts between instances of
>> a particular class and an ImageInputStream.
>>
>> For my prototype, I wrote an instance of this interface which takes a
>> string, checks if it starts with "hdfs", creates a URL, and returns new
>> MemoryCacheImageInputStream(url.openStream()).  The only problem with
>> this approach is that there is already an implementation which handles
>> Strings, and GeoTools's ImageIOExt tries the first one and skips any
>> others.  One can update that handling (5) slightly to try all the
>> handlers.  It'd probably be better to update (6) to try url.openStream
>> as a fallback.
>>
>> During testing, I worked with the sfdem.tif which ships with GeoServer.
>> The hdfs layer was a little slower than the local filesystem layer, but
>> it wasn't unusable.  To crank things up, I tried out a 600+ megabyte
>> GeoTiff from Natural Earth, and it was downright slow.  Using a network
>> monitor, I was able to observe network traffic consistent with the
>> entire file being re-read for most requests.  I think this approach may
>> be slightly useful for layers which are infrequently accessed and then
>> only be a few users.
>>
>> Thanks to everyone who had suggestions and encouragement for the
>> original thread!
>>
>> Cheers,
>>
>> Jim
>>
>> Step 1: Register additional URL handlers:
>>
>> 1.
>> http://download.java.net/jdk7/archive/b123/docs/api/java/net/URL.html#URL%28java.lang.String,%20java.lang.String,%20int,%20java.lang.String%29
>>
>> 2. http://skife.org/java/url/library/2012/05/14/java_url_handlers.html
>>
>> 3. Gist for a bean to register the Hadoop URL handlers:
>> https://gist.github.com/jnh5y/1739baa42466d66e383fa26ffd7235ca
>>
>> Step 2: GeoServer changes:
>> 4.
>> https://github.com/jnh5y/geoserver/commit/5320f26a0574f034433aa96097054ec1ec782d45
>> The FileModel change could be a little more robust.
>>
>> Step 3: GeoTools changes:
>> 5.
>> https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195babc1abb6f49
>>
>> Or one could modify the URL handling here:
>> 6.
>> https://github.com/geosolutions-it/imageio-ext/blob/master/library/streams/src/main/java/it/geosolutions/imageio/stream/input/spi/URLImageInputStreamSpi.java#L88-L97
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Find and fix application performance issues faster with Applications Manager
>> Applications Manager provides deep performance insights into multiple tiers 
>> of
>> your business applications. It resolves application problems quickly and
>> reduces your MTTR. Get your free trial!
>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
>> _______________________________________________
>> GeoTools-Devel mailing list
>> GeoTools-Devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/geotools-devel


------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to