Dear Jim,
quick feedback.

First of all congratulation on making this work. As I suspected the
bottleneck is getting the data out of HDFS.
I can think about two things (which we are not mutually exclusive):

-1- Maybe complex, put smaller bits into HFDS and use the mosaic to
serve or even develop a light(er)weight layer that can pull the
granules.

This would help with WMS requests over large files as you'll end up
use smaller chunks to satisfy them most of the time

-2- We could build a more complex ImageInputStream that:

- has an internal cache (file and or memory) that does not get thrown
away upon each request but tends to live longer for each single file
in HDF
- we would have different streams reuse the same cache. Multiple
requests might read data from the cache concurrently but when data is
not there, we would block the thread for the request, go back to HFDS,
pull the data, write to the cache and so on

We could put together 1 and 2 to make things faster.

Hope this helps, anyway, I am in favour of exploring this in order to
allow the GeoServer stack to support data from HDFS.

Regards,
Simone Giannecchini
==
GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.
==
Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via di Montramito 3/A
55054  Massarosa (LU)
Italy
phone: +39 0584 962313
fax:     +39 0584 1660272
mob:   +39  333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility  for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.


On Sun, Apr 17, 2016 at 9:49 PM, Jim Hughes <jn...@ccri.com> wrote:
> Hi all,
>
> I want to report on my success with registering and displaying GeoTiffs
> stored on HDFS.  There are some limitations with this approach;
> particularly, I am unsure if there's anyway to cache / memory-map the
> data.  As such, I believe each request is re-downloading the entire file.
>
> Generally, I hope to document my approach well enough so that others
> could follow it (if needed) and to solicit feedback.  In terms of
> feedback, I'd love to hear 1) if there are improvements, and 2) if the
> changes are reasonable enough to be considered for a proposal/merge request.
>
> That out of the way, here's the rough outline:
>
> 1.  Register additional URL handlers.
> 2.  Convince validation layers in GeoServer that 'hdfs' is an ok URL scheme.
> 3.  Get bytes out of the HDFS file.
>
> For step 1, note that Java's URL scheme is pluggable via
> java.net.URLStreamHandler.  The docs(1) point out that one can call
> URL.setURLStreamHandlerFactory to setup a Factory to provide such a
> handler.  This method can only be called once, and folks from the
> internet (2) do yoga since Tomcat already registers a factory.  They
> seem to have missed the fact that the Tomcat factory actually lets you
> add your own.  I provide a gist (3) to show a little bean which will
> instantiate a Hadoop URL handler and try to install it using both of
> those methods.
>
> There are two places I found in GeoServer which validate the URL given
> in the page for adding a GeoTiff.  The first is the GeoServer
> FileExistValidator which calls out to a Wicket UrlValidator. Telling the
> Wicket class to allow_all_schemes knocks out that issue.  For the
> second, in the FileModel, one needs to provide a happy path for URLs
> which are not local to the filesystem.  Those two small changes are here
> (4).
>
> Once GeoServer will register a GeoTiff coverage with a non-'file://'
> URL, we need to read the bytes.  Javax has an interface
> javax.imageio.spi.ImageInputStreamSpi which adapts between instances of
> a particular class and an ImageInputStream.
>
> For my prototype, I wrote an instance of this interface which takes a
> string, checks if it starts with "hdfs", creates a URL, and returns new
> MemoryCacheImageInputStream(url.openStream()).  The only problem with
> this approach is that there is already an implementation which handles
> Strings, and GeoTools's ImageIOExt tries the first one and skips any
> others.  One can update that handling (5) slightly to try all the
> handlers.  It'd probably be better to update (6) to try url.openStream
> as a fallback.
>
> During testing, I worked with the sfdem.tif which ships with GeoServer.
> The hdfs layer was a little slower than the local filesystem layer, but
> it wasn't unusable.  To crank things up, I tried out a 600+ megabyte
> GeoTiff from Natural Earth, and it was downright slow.  Using a network
> monitor, I was able to observe network traffic consistent with the
> entire file being re-read for most requests.  I think this approach may
> be slightly useful for layers which are infrequently accessed and then
> only be a few users.
>
> Thanks to everyone who had suggestions and encouragement for the
> original thread!
>
> Cheers,
>
> Jim
>
> Step 1: Register additional URL handlers:
>
> 1.
> http://download.java.net/jdk7/archive/b123/docs/api/java/net/URL.html#URL%28java.lang.String,%20java.lang.String,%20int,%20java.lang.String%29
>
> 2. http://skife.org/java/url/library/2012/05/14/java_url_handlers.html
>
> 3. Gist for a bean to register the Hadoop URL handlers:
> https://gist.github.com/jnh5y/1739baa42466d66e383fa26ffd7235ca
>
> Step 2: GeoServer changes:
> 4.
> https://github.com/jnh5y/geoserver/commit/5320f26a0574f034433aa96097054ec1ec782d45
> The FileModel change could be a little more robust.
>
> Step 3: GeoTools changes:
> 5.
> https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195babc1abb6f49
>
> Or one could modify the URL handling here:
> 6.
> https://github.com/geosolutions-it/imageio-ext/blob/master/library/streams/src/main/java/it/geosolutions/imageio/stream/input/spi/URLImageInputStreamSpi.java#L88-L97
>
>
>
>
> ------------------------------------------------------------------------------
> Find and fix application performance issues faster with Applications Manager
> Applications Manager provides deep performance insights into multiple tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> _______________________________________________
> GeoTools-Devel mailing list
> GeoTools-Devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geotools-devel

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to