Re: [Geotools-devel] [Geoserver-devel] Reading GeoTiffs from HDFS

2016-12-10 Thread Andrea Aime
On Tue, Nov 29, 2016 at 7:39 PM, Devon Tucker 
wrote:

> Hey everyone,
>
> Chatted with Jim about this a couple weeks ago and I wanted to revisit it,
> since we'd like to do something similar except with S3 instead of Hadoop,
> although many of the changes would be very similar.
>
> I'm interested in whether anyone has any objections to some of these
> changes. In particular the changes to GeoServer to change the File
> validation to allow URLs of any protocol and especially this change to the
> code which searches for an appropriate ImageInputStreamSPI as detailed here:
>
> https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195ba
> bc1abb6f49
>
> IMO it's a pretty sensible change, I think we do similar things elsewhere
> (catching exceptions from SPIs that we try and moving on even if they throw
> an exception.
>
> Thoughts anyone? Any reason not to do this? I can do a PR pretty quickly
> for it.
>

I don't have a clear view of the consequences, but I'm indeed skeptical, as
the check is done in a utility class that's used in
places other than the GeoTiff case you're focusing on.

I would see better an approach that registers a ImageInputStreamSpi that
knows how to deal with a certain protocol instead,
... possibly placing it at a lower priority to make sure the ones optimized
for random access files are tried first.

Cheers
Andrea

-- 
==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.
==

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054  Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39  339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.



The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility  for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

---
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel


Re: [Geotools-devel] [Geoserver-devel] Reading GeoTiffs from HDFS

2016-11-29 Thread Devon Tucker
Hey everyone,

Chatted with Jim about this a couple weeks ago and I wanted to revisit it,
since we'd like to do something similar except with S3 instead of Hadoop,
although many of the changes would be very similar.

I'm interested in whether anyone has any objections to some of these
changes. In particular the changes to GeoServer to change the File
validation to allow URLs of any protocol and especially this change to the
code which searches for an appropriate ImageInputStreamSPI as detailed here:

https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195babc1abb6f49

IMO it's a pretty sensible change, I think we do similar things elsewhere
(catching exceptions from SPIs that we try and moving on even if they throw
an exception.

Thoughts anyone? Any reason not to do this? I can do a PR pretty quickly
for it.

Cheers

On Fri, Apr 22, 2016 at 1:41 PM, Even Rouault 
wrote:

> Le vendredi 22 avril 2016 22:06:39, Jim Hughes a écrit :
> > Hi Chris,
> >
> > Nice!  That's a fun find.
> >
> > Generally, I do like the idea of using Map/Reduce or Spark to
> > pre-generate tiles or an image pyramid.  We've kicked around the idea of
> > GWC + M/R a few times in passing.  If one has Hadoop infrastructure
> > hanging around, it might make sense to use GeoTrellis, SpatialHadoop
> > (GeoJini), etc. for some of that processing.
> >
> > Either way, being able to read the odd raster file straight from hdfs://
> > or s3:// and have it cached in memory seems like an amusing/useful
> > project.  I'm hopeful we can nail down the details.
>
> Probably a bit out of topic, but in case that might be useful, GDAL for
> example can through its /vsicurl/ (and in the fresh new 2.1.0 /vsis3/)
> virtual
> file systems read remote http files by most its drivers. Perhaps that
> could be
> used through the imageio-ext GDAL bridge.
>
> http://www.gdal.org/cpl__vsi_8h.html#a4f791960f2d86713d16e99e9c0c36258
> http://www.gdal.org/cpl__vsi_8h.html#a5b4754999acd06444bfda172ff2aaa16
> http://download.osgeo.org/gdal/workshop/foss4ge2015/workshop
> _gdal.html#__RefHeading__5995_1333016408
> https://sgillies.net/2016/04/05/rasterio-0-34.html
>
> >
> > Cheers,
> >
> > Jim
> >
> > On 04/22/2016 02:27 PM, Chris Snider wrote:
> > > I did find this reference (helpful ?):
> > > https://github.com/openreserach/bin2seq/blob/master/src/
> main/java/com/ope
> > > nresearchinc/hadoop/sequencefile/GeoTiff.java
> > >
> > > " /@formatter:off
> > > /**
> > >
> > >   *
> > >   * A program to demo retrive attributes from Geotiff images as Hadoop
> > >   SequenceFile stored on hdfs:// or s3:// *
> > >   *
> > >   * @author heq
> > >   */
> > >
> > > // @formatter:on"
> > >
> > > Chris Snider
> > > Senior Software Engineer
> > > Intelligent Software Solutions, Inc.
> > >
> > >
> > >
> > > -Original Message-
> > > From: Chris Snider [mailto:chris.sni...@issinc.com]
> > > Sent: Friday, April 22, 2016 12:11 PM
> > > To: Jim Hughes ; Simone Giannecchini
> > >  Cc:
> > > geoserver-de...@lists.sourceforge.net; GeoTools Developers list
> > >  Subject: Re: [Geoserver-devel]
> > > [Geotools-devel] Reading GeoTiffs from HDFS
> > >
> > > Hi,
> > >
> > > I don't know that much about HDFS, but is there something that can be
> > > setup like a map/reduce function directly in the HDFS servers that can
> > > do some of the restriction of byte level data returned? Yarn/Sparql,
> > > some other acronym?  I assume it would have to be administrator
> > > responsibility to add said process to the server stack if it is even
> > > possible.
> > >
> > > Chris Snider
> > > Senior Software Engineer
> > > Intelligent Software Solutions, Inc.
> > >
> > >
> > > -Original Message-
> > > From: Jim Hughes [mailto:jn...@ccri.com]
> > > Sent: Friday, April 22, 2016 12:05 PM
> > > To: Simone Giannecchini 
> > > Cc: geoserver-de...@lists.sourceforge.net; GeoTools Developers list
> > >  Subject: Re: [Geoserver-devel]
> > > [Geotools-devel] Reading GeoTiffs from HDFS
> > >
> > > Hi Simone,
> > >
> > > Thanks for the feedback!
> > >
> > > As quick response, for #1, I agree that using mosaicing / an image
> > > pyramid would be a great option.  I was mainly working at the prototype
> > > phase, and I wanted to have a discussion on the mailing lists
> > > (especially since changes are required in ImageIO-Ext or GeoTools and
> > > GeoServer.)
> > >
> > > For #2, I do like the idea of having a cahce in the ImageInputStream.
> > >
> > >   From that suggestion, I take it that you'd be willing to entertain
> > >
> > > changes to the current ImageInputStreams and the additional of some way
> > > to cache data.
> > >
> > > In terms of caching, do you have any suggestions?  Also, I'd be
> > > interested in any advice for how we can configure that cache and make
> > > those options available to a 

Re: [Geotools-devel] [Geoserver-devel] Reading GeoTiffs from HDFS

2016-04-22 Thread Jim Hughes
Hi Chris,

Nice!  That's a fun find.

Generally, I do like the idea of using Map/Reduce or Spark to 
pre-generate tiles or an image pyramid.  We've kicked around the idea of 
GWC + M/R a few times in passing.  If one has Hadoop infrastructure 
hanging around, it might make sense to use GeoTrellis, SpatialHadoop 
(GeoJini), etc. for some of that processing.

Either way, being able to read the odd raster file straight from hdfs:// 
or s3:// and have it cached in memory seems like an amusing/useful 
project.  I'm hopeful we can nail down the details.

Cheers,

Jim

On 04/22/2016 02:27 PM, Chris Snider wrote:
> I did find this reference (helpful ?):
> https://github.com/openreserach/bin2seq/blob/master/src/main/java/com/openresearchinc/hadoop/sequencefile/GeoTiff.java
>
> " /@formatter:off
> /**
>   *
>   * A program to demo retrive attributes from Geotiff images as Hadoop 
> SequenceFile stored on hdfs:// or s3://
>   *
>   *
>   * @author heq
>   */
> // @formatter:on"
>
> Chris Snider
> Senior Software Engineer
> Intelligent Software Solutions, Inc.
>
>
>
> -Original Message-
> From: Chris Snider [mailto:chris.sni...@issinc.com]
> Sent: Friday, April 22, 2016 12:11 PM
> To: Jim Hughes ; Simone Giannecchini 
> 
> Cc: geoserver-de...@lists.sourceforge.net; GeoTools Developers list 
> 
> Subject: Re: [Geoserver-devel] [Geotools-devel] Reading GeoTiffs from HDFS
>
> Hi,
>
> I don't know that much about HDFS, but is there something that can be setup 
> like a map/reduce function directly in the HDFS servers that can do some of 
> the restriction of byte level data returned? Yarn/Sparql, some other acronym? 
>  I assume it would have to be administrator responsibility to add said 
> process to the server stack if it is even possible.
>
> Chris Snider
> Senior Software Engineer
> Intelligent Software Solutions, Inc.
>
>
> -Original Message-
> From: Jim Hughes [mailto:jn...@ccri.com]
> Sent: Friday, April 22, 2016 12:05 PM
> To: Simone Giannecchini 
> Cc: geoserver-de...@lists.sourceforge.net; GeoTools Developers list 
> 
> Subject: Re: [Geoserver-devel] [Geotools-devel] Reading GeoTiffs from HDFS
>
> Hi Simone,
>
> Thanks for the feedback!
>
> As quick response, for #1, I agree that using mosaicing / an image
> pyramid would be a great option.  I was mainly working at the prototype
> phase, and I wanted to have a discussion on the mailing lists
> (especially since changes are required in ImageIO-Ext or GeoTools and
> GeoServer.)
>
> For #2, I do like the idea of having a cahce in the ImageInputStream.
>   From that suggestion, I take it that you'd be willing to entertain
> changes to the current ImageInputStreams and the additional of some way
> to cache data.
>
> In terms of caching, do you have any suggestions?  Also, I'd be
> interested in any advice for how we can configure that cache and make
> those options available to a GeoServer admin appropriately.
>
> Further, at a high-level, should the goal for this work be a community
> module?
>
> Cheers,
>
> Jim
>
> On 04/22/2016 01:49 PM, Simone Giannecchini wrote:
>> Dear Jim,
>> quick feedback.
>>
>> First of all congratulation on making this work. As I suspected the
>> bottleneck is getting the data out of HDFS.
>> I can think about two things (which we are not mutually exclusive):
>>
>> -1- Maybe complex, put smaller bits into HFDS and use the mosaic to
>> serve or even develop a light(er)weight layer that can pull the
>> granules.
>>
>> This would help with WMS requests over large files as you'll end up
>> use smaller chunks to satisfy them most of the time
>>
>> -2- We could build a more complex ImageInputStream that:
>>
>> - has an internal cache (file and or memory) that does not get thrown
>> away upon each request but tends to live longer for each single file
>> in HDF
>> - we would have different streams reuse the same cache. Multiple
>> requests might read data from the cache concurrently but when data is
>> not there, we would block the thread for the request, go back to HFDS,
>> pull the data, write to the cache and so on
>>
>> We could put together 1 and 2 to make things faster.
>>
>> Hope this helps, anyway, I am in favour of exploring this in order to
>> allow the GeoServer stack to support data from HDFS.
>>
>> Regards,
>> Simone Giannecchini
>> ==
>> GeoServer Professional Services from the experts!
>> Visit http://goo.gl/it488V for more information.
>> ==
>> Ing. Simone Giannecchini
>> @simogeo
>> Founder/Director
>>
>> GeoSolutions S.A.S.
>> Via di Montramito 3/A
>> 55054  Massarosa (LU)
>> Italy
>> phone: +39 0584 962313
>> fax: +39 0584 1660272
>> mob:   +39  333 8128928
>>
>> http://www.geo-solutions.it
>> http://twitter.com/geosolutions_it
>>
>> ---
>> AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
>> Le