Re: [Suggest] Add geo function to core

Mo Sarwat Tue, 17 Jan 2023 06:03:16 -0800

Grigory,

Thanks a lot for chiming - I really like the PostGIS to PostgreSQL analogy. 
That is exactly what Sedona (an Apache project) is to Spark. Spark core should 
remain light / generic enough (similar to PostgreSQL) and all spatial 
functionalities should be pluggable extensions (Sedona). Otherwise, the core 
will be unnecessarily heavy to maintain, release, and integrate.


Sedona already supports geo-hashing among many other geospatial standard 
functionality, which work seamlessly with Spark without any issues to the end 
user. If there is something missing, I would highly recommend that we bring it 
to the Sedona community, and that will directly feed into the benefit of Spark 
uses who are doing geo.

Implementing geospatial functionality in the core Spark will be a replication 
of work done already. Databricks for instance already uses Sedona internally 
with their geospatial capabilities.

Finally, I would like to mention that I am totally willing to be corrected on 
that. Especially, if you tried Sedona with Spark and figured that it does not 
serve the purpose at all. But, please try it first and let's come up with a few 
capabilities it cannot provide unless it is implemented in Spark core. And, 
then we can suggest those capabilities to the Spark community.

Thanks,
-Mo
 

On 2023/01/17 03:09:06 Grigory Pomadchin wrote:
> Hey folks,
> 
> Traditionally GIS functionality is distributed a bit separately - i.e.
> PostGIS is a great example; and indeed for GIS needs Sedona / GeoMesa /
> GeoWave may work out; I think GeoMesa implements GeoHash (see
> https://www.geomesa.org/documentation/stable/user/spark/sparksql_functions.html
> -
> could be used as an inspiration at least);
> 
> I'm pretty sure DataBricks provides some GIS functions (H3) at this point.
> Could be an argument for having smth in the core / officially supported by
> Spark community?
> 
> I'd really love to see some relatively lightweight (JTS + Proj4j / SIS)
> library with basic expressions and optimization rules in the wild that is
> usable in the Spark native interfaces primarily; so there is no need to
> figure out the API / way to set it up and / or resolve peculiar
> dependencies. Could be a step towards Spark GIS types standardization.
> 
> Best,
> 
> Grigory
> 
> On Mon, Jan 16, 2023 at 6:21 PM Mo Sarwat <mosar...@apache.org> wrote:
> 
> > Martin, thanks for chiming in and mentioning Apache SIS. However, Mark was
> > asking about Geo in Spark, which Sedona already supports.
> >
> > Yet, I like the idea of making all dependencies within the Apache family.
> > I believe a good solution would be for you (or the SIS community at large)
> > to include Apache SIS in Sedona to replace libs like GeoTools. The Sedona
> > community would definitely welcome your contribution :)
> >
> > Regards,
> > -Mo
> >
> > On 2023/01/16 22:24:14 Martin Desruisseaux wrote:
> > > Hello Mark
> > >
> > > Indeed Sedona is surely a serious candidate. Maybe one aspect to take in
> > consideration, depending how "core" the geospatial services would be, is
> > that Sedona depends on a LGPL library (GeoTools, bundled separately) for
> > map projections, Shapefile and GeoTIFF support. So those features could not
> > be in core since category X dependencies shall be optional.
> > >
> > > Regarding referencing by coordinates (including map projections), I'm
> > aware of 3 libraries having a license compatible with Apache:
> > >
> > > * Apache SIS (Apache License)
> > > * PROJ4J (Apache license)
> > > * PROJ-JNI (MIT license)
> > >
> > > PROJ-JNI is a binding to PROJ native library using Java Native Interface
> > (JNI). PROJ is the most well known map projection library, but it is
> > difficult to bundle native code in a Java application.
> > >
> > > I'm not in a neutral position to said that, but I believe that Apache
> > SIS is the most powerful open source pure-Java referencing library. But it
> > is relatively big, about 4 Mb for the referencing module with its
> > dependencies, not counting the optional EPSG geodetic dataset (because not
> > compatible with Apache license). Apache SIS is not the library with the
> > largest amount of map projections (PROJ4J has more), but it handles some
> > difficult problems and scale well with three- or four-dimensional data (or
> > more).
> > >
> > > PROJ4J is a lightweight library which may be sufficient if data are
> > mostly two-dimensional (limited 3D support seems also possible) and if
> > uncertainty of a few metres in coordinate transformations (depending how
> > datum shifts are specified) is acceptable.
> > >
> > > It is possible to write some code in an implementation-independent way
> > using GeoAPI interfaces, which aim to do what JDBC interfaces do for
> > databases. Apache SIS and PROJ-JNI are implementations of GeoAPI
> > interfaces, so by using those interfaces you can let users choose among
> > those two implementations. I think that GeoAPI wrappers could easily be
> > contributed to PROJ4J as well if there is a desire for that.
> > >
> > > Regarding Geohash, if we are talking about the algorithm described at
> > https://en.wikipedia.org/wiki/Geohash, then SIS already supports it. SIS
> > supports also the Military Grid Reference System (MGRS), which can be seen
> > as another kind of geohash with better characteristics.
> > >
> > > Regards,
> > >
> > >     Martin
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [Suggest] Add geo function to core

Reply via email to