Martin, This is extremely cool and much needed in the geospatial community! My company, DigitalGlobe, has done a lot with this and has open sourced many of the packages that can be found on GitHub today. Rasdaman[1] and PostGIS Raster are other open source examples of how to do this in relational databases. We have done a lot of research on how to store pixels and query for them in HBASE/Hadoop and ElasticSearch too. There are many options to this one!
Adam [1] http://rasdaman.org/ On Tue, Nov 10, 2015 at 6:09 AM, Martin Desruisseaux <[email protected]> wrote: > Hello all > > In the BigData Apache Conference in Budapest, I attended to some > meetings about exploiting geospatial big data using SQL language. I > though that we could make some long-term plans that could impact the > SIS-180 ( Place a crude JDBC driver over Dbase files) work [1]. This > email is not a request for any change now. This is just a proposal about > some possible long term plans. > > In one or two years, Apache SIS would hopefully have some DataStore > implementations ready for production use. But we have a strong request > for capability to use DataStores with big-data technologies like Hadoop. > This request increases the challenge of writing a SQL driver, since a > sophisticated SQL driver would need to be able to restructure query > plans according the available clusters. > > I had a discussion with peoples from Apache Drill project > (https://drill.apache.org/), which already provide SQL parsing > capabilities in various big-data environments. In my understanding, > instead of writing our own SQL parser in SIS we could have the following > approach: > > 1. Complete the org.apache.sis.storage.DataStore API (it is currently > very minimalist). > 2. Have the ShapeFile store to extend the abstract SIS DataStore. > 3. In a separated module, write a "SIS DataStore to Drill DataStore" > adapter. It should work for any SIS DataStore, not only the > ShapeFile one. > > In my understanding once we have a Drill DataStore implementation (I do > not know yet what is the exact name in Drill API), we should > automatically get big-data-ready SQL for any SIS DataStore. If for any > reason Drill DataStore is considered not suitable, we could fallback on > Apache Calcite (http://calcite.apache.org/), which is the SQL parser > used under the hood by Drill. Another project that may be worth to > explore is Magellan: Geospatial Analytics on Spark [2]. > > My proposal could be summarized as below: maybe in 2016 or 2017, we > could consider to put the SIS SQL support in its own module and allows > it to run not only for ShapeFile, but for any SIS DataStore, if possible > using technology like Drill designed for big-data environments. > > Any thoughts? > > Martin > > > [1] https://issues.apache.org/jira/browse/SIS-180 > [2] https://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/ > >
