Re: Apache Sedona

Paweł Kociński Fri, 21 Aug 2020 13:51:44 -0700

- Grant support for Scala 2.12 and Spark 3.0
I meant here Python.

- Implement loading geospatial data sources (geojson, shapefile, osm, wkb,
wkt) from Dataframe API like
-- spark.read.format("geojson").load(path)


 It is possible and I think it will be easier for users to load the data
(Also agree that is not priority).

- Add broadcast join for joining big and small dataframe

Agree

- Fix issue with 3D geometries while loading shapefile

Exactly

- Add support for multiline geojson (I have some code on my local branch)

We have to write our own in that case, it will require some amount of work
but is doable.

- Add direct writing to geospatial databases like PostgreSQL

I have to analyze spark code and will be back with a solution

- Remove NullPointer exception when there is null value within data or data
is wrong within some rows.

I meant SQL functions, some time they should replace the value with
null/Option instead of raising null pointer exception.

- geohash spatial join

I think in some cases it can be more suitable for users. It should not be
tough to implement but it brings additional value.

pt., 21 sie 2020 o 11:08 Jia Yu <jiayu198...@gmail.com> napisał(a):

> Hi Paweł and CCed sedona-dev and other committers,
>
> Please find my opinion below.
>
> - Grant support for Scala 2.12 and Spark 3.0
> Jia: the Scala and Java code in the master branch has supported Spark 3.0+
> 2.12. We need to support the following: Sedona Scala 2.12 support for other
> Spark versions and Scala 2.12 support in all Python APIs.
>
> - Implement loading geospatial data sources (geojson, shapefile, osm, wkb,
> wkt) from Dataframe API like
> -- spark.read.format("geojson").load(path)
> Jia: Direct DataFrame API support requires a bit more coding effort. I am
> actually not sure whether this func in DF is extensible. But if so, I am
> not against it. But it is not the top priority.
>
> - Add broadcast join for joining big and small dataframe
> Jia: Yes, we should have it here:
> https://github.com/DataSystemsLab/GeoSpark/blob/master/sql/src/main/scala/org/apache/spark/sql/geosparksql/strategy/join/TraitJoinQueryExec.scala#L67
> - Fix issue with 3D geometries while loading shapefile
> Jia: How do we fix it? Convert it to a 2D geoms and discard the Z
> dimension or M dimension?
>
> - Add support for multiline geojson (I have some code on my local branch)
> Jia: This is not easy. In Spark, its DF has a readjson API:
> https://spark.apache.org/docs/latest/sql-data-sources-json.html Not sure
> whether we can leverage this.
>
> - Add direct writing to geospatial databases like PostgreSQL
> Jia: Good point. Any particular challenge on this?
>
> - Add more geospatial functions
> Jia: Agree.
>
> - Remove NullPointer exception when there is null value within data or
> data is wrong within some rows
> Jia: I believe this has been solved by "allowTopologyInvalidGeometries"
> and "skipSyntaxInvalidGeometries"
> https://datasystemslab.github.io/GeoSpark/tutorial/rdd/#create-a-generic-spatialrdd-behavoir-changed-in-v120
>
> - geohash spatial join
> Jia: Yes, we can do that. But will it bring in any benefit as opposed to
> the existing spatial join algorithm?
>
> Thanks,
> Jia
>
> On Wed, Aug 19, 2020 at 10:22 AM Paweł Kociński <pawel93kocin...@gmail.com>
> wrote:
>
>> Hi Jia,
>> I hope you are fine. Do we have some features to add to Apache Sedona
>> after the code will be merged ?
>> My ideas of tasks:
>> - Grant support for Scala 2.12 and Spark 3.0
>> - Implement loading geospatial data sources (geojson, shapefile, osm,
>> wkb, wkt) from Dataframe API like
>> -- spark.read.format("geojson").load(path)
>> I have some code, but code migration is holding me back
>>
>> [image: image.png]
>> - Add broadcast join for joining big and small dataframe
>> - Fix issue with 3D geometries while loading shapefile
>> - Add support for multiline geojson (I have some code on my local branch)
>> - Add direct writing to geospatial databases like PostgreSQL
>> - Add more geospatial functions
>> - Remove NullPointer exception when there is null value within data or
>> data is wrong within some rows
>> - geohash spatial join
>>
>> What do you think?
>>
>> Regards,
>> Paweł
>>
>>
>> pon., 17 sie 2020 o 07:45 Jia Yu <jiayu198...@gmail.com> napisał(a):
>>
>>> Hello Paweł,
>>>
>>> I just posted the current situation into priv...@sedona.apache.org. The
>>> current problem is I have made everything ready to be imported to ASF
>>> GitHub repo (https://github.com/apache/incubator-sedona). But one
>>> committer (Masha from Facebook) who made thousands of lines of contribution
>>> to GeoSpark still didn't submit her CLA. The entire process is currently
>>> blocked by this.
>>>
>>> Mohamed and I have been trying to reach her a couple of times in the
>>> past 3 weeks but got no reply. I have asked the champion about how we can
>>> proceed in this case. Let's see what will happen.
>>>
>>> Thanks,
>>> Jia
>>>
>>>
>>> On Sun, Aug 16, 2020 at 9:06 AM Paweł Kociński <
>>> pawel93kocin...@gmail.com> wrote:
>>>
>>>> Hi Jia,
>>>> Do we know when the first release of Apache Sedona will occur ? Can I
>>>> help with something to make it happen? I have few ideas and some code which
>>>> will be useful in the future.
>>>>
>>>> Regards,
>>>> Pawel
>>>>
>>>

Re: Apache Sedona

Reply via email to