Re: Processing Large Images in Spark?

2015-04-07 Thread Steve Loughran
On 6 Apr 2015, at 23:05, Patrick Young patrick.mckendree.yo...@gmail.commailto:patrick.mckendree.yo...@gmail.com wrote: does anyone have any thoughts on storing a really large raster in HDFS? Seems like if I just dump the image into HDFS as it, it'll get stored in blocks all across the

Re: Processing Large Images in Spark?

2015-04-07 Thread andy petrella
Heya, You might be interesting at looking at GeoTrellis They use RDDs of Tiles to process big images like Landsat ones can be (specially 8). However, I see you have only 1G per file, so I guess you only care of a single band? Or is it a reboxed pic? Note: I think the GeoTrellis image format is

Processing Large Images in Spark?

2015-04-06 Thread Patrick Young
Hi all, I'm new to Spark and wondering if it's appropriate to use for some image processing tasks on pretty sizable (~1 GB) images. Here is an example use case. Amazon recently put the entire Landsat8 archive in S3: http://aws.amazon.com/public-data-sets/landsat/ I have a bunch of GDAL based

Processing Large Images in Spark?

2015-04-06 Thread patrick.mckendree.young
-images-td6752.html Further, I'd like to have the imagery in HDFS rather than on the file system to avoid I/O bottlenecks if possible! Thanks for any ideas and advice! -Patrick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Processing-Large-Images-in-Spark