Re: Long-term thoughts about big-data queries in SIS

Adam Estrada Tue, 10 Nov 2015 10:21:25 -0800

Martin,

This is extremely cool and much needed in the geospatial community! My
company, DigitalGlobe, has done a lot with this and has open sourced
many of the packages that can be found on GitHub today. Rasdaman[1]
and PostGIS Raster are other open source examples of how to do this in
relational databases. We have done a lot of research on how to store
pixels and query for them in HBASE/Hadoop and ElasticSearch too. There
are many options to this one!


Adam

[1] http://rasdaman.org/

On Tue, Nov 10, 2015 at 6:09 AM, Martin Desruisseaux
<[email protected]> wrote:
> Hello all
>
> In the BigData Apache Conference in Budapest, I attended to some
> meetings about exploiting geospatial big data using SQL language. I
> though that we could make some long-term plans that could impact the
> SIS-180 ( Place a crude JDBC driver over Dbase files) work [1]. This
> email is not a request for any change now. This is just a proposal about
> some possible long term plans.
>
> In one or two years, Apache SIS would hopefully have some DataStore
> implementations ready for production use. But we have a strong request
> for capability to use DataStores with big-data technologies like Hadoop.
> This request increases the challenge of writing a SQL driver, since a
> sophisticated SQL driver would need to be able to restructure query
> plans according the available clusters.
>
> I had a discussion with peoples from Apache Drill project
> (https://drill.apache.org/), which already provide SQL parsing
> capabilities in various big-data environments. In my understanding,
> instead of writing our own SQL parser in SIS we could have the following
> approach:
>
>  1. Complete the org.apache.sis.storage.DataStore API (it is currently
>     very minimalist).
>  2. Have the ShapeFile store to extend the abstract SIS DataStore.
>  3. In a separated module, write a "SIS DataStore to Drill DataStore"
>     adapter. It should work for any SIS DataStore, not only the
>     ShapeFile one.
>
> In my understanding once we have a Drill DataStore implementation (I do
> not know yet what is the exact name in Drill API), we should
> automatically get big-data-ready SQL for any SIS DataStore. If for any
> reason Drill DataStore is considered not suitable, we could fallback on
> Apache Calcite (http://calcite.apache.org/), which is the SQL parser
> used under the hood by Drill. Another project that may be worth to
> explore is Magellan: Geospatial Analytics on Spark [2].
>
> My proposal could be summarized as below: maybe in 2016 or 2017, we
> could consider to put the SIS SQL support in its own module and allows
> it to run not only for ShapeFile, but for any SIS DataStore, if possible
> using technology like Drill designed for big-data environments.
>
> Any thoughts?
>
>     Martin
>
>
> [1] https://issues.apache.org/jira/browse/SIS-180
> [2] https://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/
>
>

Re: Long-term thoughts about big-data queries in SIS

Reply via email to