[GitHub] [incubator-sedona] jiayuasu merged pull request #537: [SEDONA-59] Make pyspark dependency of sedona python optional

2021-08-25 Thread GitBox


jiayuasu merged pull request #537:
URL: https://github.com/apache/incubator-sedona/pull/537


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: Will publish Sedona new version 1.1.0 soon

2021-08-25 Thread Paweł Kociński
Hi,
About  https://issues.apache.org/jira/browse/SEDONA-59, first option seems
most reasonable, we can assume that spark is already installed and does not
cause any confusion while installing with tools like poetry and pipenv.
Best Regards,
Paweł

śr., 25 sie 2021 o 11:18 Jia Yu  napisał(a):

> Sedona committers, contributors and users,
>
> The new Sedona version is long overdue because of this PySpark <= 3.0.1
> bug in Pipfile. I promise I will roll out the new version in the next 1 or
> 2 weeks regardless of the progress of all pending PRs. The new version will
> be 1.1.0 which contains the major update of R language bindings and Sedona
> raster support.
>
> @Paweł Kociński  Can you check out the
> proposal in https://issues.apache.org/jira/browse/SEDONA-59 ? Is it a
> good idea?
>
> Please let me know if you have any questions :-)
>
> Thanks,
> Jia Yu
>


[GitHub] [incubator-sedona] jiayuasu commented on pull request #537: [SEDONA-59] Make pyspark dependency of sedona python optional

2021-08-25 Thread GitBox


jiayuasu commented on pull request #537:
URL: https://github.com/apache/incubator-sedona/pull/537#issuecomment-905821127


   This is great. I just recalled one thing: Sedona Python also works for Spark 
2.3 although we have no plan to officially support it whatsoever. But can you 
please also change the Spark version in both Pipfile and setup.py to 
"pyspark>=2.3.0"? This way, users who are using the old 2.3 version can still 
use Sedona.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Issues migrating to Sedona from Geospark

2021-08-25 Thread Graybeal, Brandon (he, him)
Hello!

My name is Brandon Graybeal and I am a Data Engineer at Arity, which is a 
Telematics company owned by Allstate. We have been using the Geospark library 
in our production pipelines for a year now, and it has been great. However, we 
are migrating to EMR and Spark 3, so I need to upgrade all our pipielines to 
utilize Apache Sedona instead. In the process of following some of the examples 
and programming guides on your site, I am running into some unexpected issues.

To give some context, I am trying to POC the new library from a spark shell. We 
are just doing a spatial join on one spark dataframe with a point geometry to 
another spark dataframe that has a polygon geometry. The join appears to work, 
and whenever I call the function Adapter.toDf(geoRdd, spark) it returns the 
left and right geometries properly. However I get an exception I can’t quite 
decipher whenever I try to pull other columns through using 
Adapter.toDf(geoRdd, tripRDD.fieldNames, tractRDD.fieldNames, spark). As far as 
I can tell, I am calling the function the same way that is shown in the 
examples folder on Github.

I have attached the scala script that I have been running, as well as the 
resulting exception messages. Any guidance or help you all could provide would 
be greatly appreciated it!

Thanks!
Brandon Graybeal
Data Analytics Engineer
[signature_204968942]
222 W Merchandise Mart Plaza | Suite 875 | Chicago, IL 60654
brandon.grayb...@arity.com | 312-999-7744 | 
arity.com
[signature_74792671][signature_1180521648][signature_1051572743][signature_673829010]

CONFIDENTIALITY NOTE: This email and attachments may contain material that is 
confidential, proprietary, and/or privileged. It is distributed for the sole 
use of the intended recipient, and if it does contain material that is 
confidential, proprietary, and/or privileged, it should not be distributed or 
forwarded without the sender’s permission. If you are not the intended 
recipient, review, reliance, distribution, and forwarding without the sender’s 
permission are strictly prohibited and you should immediately contact the 
sender and delete all copies.

scala> geocodedDF.show(5)
+++ 
|leftgeometry|   rightgeometry|
+++
|POLYGON ((-95.521...|POINT (-95.520286...|
|POLYGON ((-76.510...|POINT (-76.480293...|
|POLYGON ((-77.582...|POINT (-77.574119...|
|POLYGON ((-96.643...|POINT (-96.425285...|
|POLYGON ((-121.88...|POINT (-121.86756...|
+++
only showing top 5 rows


scala> geocodedDF2.show(5)
21/08/25 18:27:36 WARN TaskSetManager: Lost task 0.0 in stage 13.0 (TID 1205) 
(ip-10-97-77-192.intr.ue1.prd.aws.cloud.arity.com executor 1): 
java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
java.lang.String is not a valid external type for schema of geometry
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else 
newInstance(class org.apache.spark.sql.sedona_sql.UDT.GeometryUDT).serialize AS 
leftgeometry#81
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else 
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true]), 1, id), StringType), true, false) AS id#82
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else 
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true]), 2, organizationId), StringType), true, false) 
AS organizationId#83
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else 
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true]), 3, clusterLat), StringType), true, false) AS 
clusterLat#84
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else 
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true]), 4, clusterLon), StringType), true, false) AS 
clusterLon#85
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else 
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true]), 5, wkt), StringType), true, false) AS wkt#86
if (assertnotnull(input[0, 

[jira] [Commented] (SEDONA-59) Remove explicit pyspark dependency

2021-08-25 Thread Sebastian Eckweiler (Jira)


[ 
https://issues.apache.org/jira/browse/SEDONA-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404403#comment-17404403
 ] 

Sebastian Eckweiler commented on SEDONA-59:
---

I created [https://github.com/apache/incubator-sedona/pull/537]

> Remove explicit pyspark dependency
> --
>
> Key: SEDONA-59
> URL: https://issues.apache.org/jira/browse/SEDONA-59
> Project: Apache Sedona
>  Issue Type: Improvement
>Reporter: Sebastian Eckweiler
>Priority: Normal
>
> The currently published sedona python package has an explicit dependency on 
> pyspark.
> When used on spark platforms such as Databricks spark comes pre-installed, 
> but not integrated with pip. A `pip install sedona` will thus install another 
> pyspark copy - which in the best case is just superfluous. In the worst case 
> it might cause trouble in combination with the pre-installed spark.
> Workarounds, such as installing sedona without dependencies can work for a 
> while.
>  But this is fragile: as soon as dependency validation as performed e.g. by 
> setuptools entrypoints comes around it will break.
>  
> I guess there are two options:
>  * Removing the pyspark dependency completely, considering it to "obvious"
>  * Add pyspark as an optional `extras_require` to an extra called "spark".
>  This would allow a pip install as below, which would get sedona and the 
> corresponding pyspark distribution:
> {code:java}
> pip install sedona[spark]{code}
>  
> I'd be willing to create a corresponding pull-request if one of the options 
> would be accepted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-sedona] sebbegg commented on pull request #537: Make pyspark dependency of sedona python optional

2021-08-25 Thread GitBox


sebbegg commented on pull request #537:
URL: https://github.com/apache/incubator-sedona/pull/537#issuecomment-905448291


   Note: I also bumped the pyspark dependency in `setup.py` to 2.4, so it's 
consistent with the `Pipfile`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-sedona] sebbegg opened a new pull request #537: Make pyspark dependency of sedona python optional

2021-08-25 Thread GitBox


sebbegg opened a new pull request #537:
URL: https://github.com/apache/incubator-sedona/pull/537


   ## Is this PR related to a proposed Issue?
   
   Yes - this implements https://issues.apache.org/jira/browse/SEDONA-59
   
   ## What changes were proposed in this PR?
   
   This PR moves the `pyspark` dependency to a extra dependency called spark. 
Allowing to
   
   - Get apache-sedona and the pyspark pip package in one go:
   ```bash
   pip install apache-sedona[spark]
   ```
   - Or get only apache sedona its dependencies except for pyspark:
   ```bash
   pip install apache-sedona
   ```
   
   ## How was this patch tested?
   
   Built python wheel locally and tested that the two `pip install` commands 
above work as expected.
   
   ## Did this PR include necessary documentation updates?
   
   Added a note in the install docs.
   
   
   
   Sebastian Eckweiler (sebastian.eckwei...@daimler.com), Mercedes-Benz AG 
on behalf of MBition GmbH 
([Imprint](https://github.com/Daimler/daimler-foss/blob/master/LEGAL_IMPRINT.md))


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-sedona] swamirishi commented on pull request #536: [SEDONA-36] Parquet reader & Writers

2021-08-25 Thread GitBox


swamirishi commented on pull request #536:
URL: https://github.com/apache/incubator-sedona/pull/536#issuecomment-905344497


   @Imbruced I have addressed the review comments. Can you check if I need to 
change something else


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-sedona] swamirishi commented on a change in pull request #536: [SEDONA-36] Parquet reader & Writers

2021-08-25 Thread GitBox


swamirishi commented on a change in pull request #536:
URL: https://github.com/apache/incubator-sedona/pull/536#discussion_r695577641



##
File path: 
core/src/main/java/org/apache/sedona/core/formatMapper/ParquetReader.java
##
@@ -0,0 +1,38 @@
+package org.apache.sedona.core.formatMapper;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.sedona.core.enums.GeometryType;
+import org.apache.sedona.core.formatMapper.parquet.ParquetFormatMapper;
+import org.apache.sedona.core.geometryObjects.Circle;
+import org.apache.sedona.core.io.parquet.ParquetFileReader;
+import org.apache.sedona.core.spatialRDD.SpatialRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.locationtech.jts.geom.Geometry;
+import org.locationtech.jts.geom.GeometryFactory;
+import org.locationtech.jts.geom.LineString;
+import org.locationtech.jts.geom.Polygon;
+
+import java.io.IOException;
+import java.util.List;
+
+public class ParquetReader extends RddReader {
+public static  SpatialRDD createSpatialRDD(JavaRDD 
rawRDD,
+  
ParquetFormatMapper formatMapper,
+  
GeometryType geometryType) {
+SpatialRDD spatialRDD = new SpatialRDD(geometryType);
+spatialRDD.rawSpatialRDD = rawRDD.mapPartitions(formatMapper);
+return spatialRDD;
+}
+
+public static  SpatialRDD 
readToGeometryRDD(JavaSparkContext sc,
+   
List inputPath,

Review comment:
   The input Paths can be a path regex. I have changed the testcase 
accordingly




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Will publish Sedona new version 1.1.0 soon

2021-08-25 Thread Jia Yu
Sedona committers, contributors and users,

The new Sedona version is long overdue because of this PySpark <= 3.0.1 bug
in Pipfile. I promise I will roll out the new version in the next 1 or 2
weeks regardless of the progress of all pending PRs. The new version will
be 1.1.0 which contains the major update of R language bindings and Sedona
raster support.

@Paweł Kociński  Can you check out the proposal
in https://issues.apache.org/jira/browse/SEDONA-59 ? Is it a good idea?

Please let me know if you have any questions :-)

Thanks,
Jia Yu


Re: Py4JError

2021-08-25 Thread Jia Yu
This is most likely because you didn't include Sedona Java Jars. Please
read:
https://sedona.apache.org/download/overview/#prepare-python-adapter-jar

On Tue, Aug 17, 2021 at 7:54 PM BEHNAZ NIKKHAH 
wrote:

> Hi
> It is my first project using Apache Sedona and I get the following error
> when I want to read a csv file as a pointRDD.
>
> Py4JError: PointRDD does not exist in the JVM
>
> Would you please help me to fix it?
>
> Thanks,
> Behnaz
>


[jira] [Commented] (SEDONA-55) Publish Python artifact 1.0.2

2021-08-25 Thread Jia Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SEDONA-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404295#comment-17404295
 ] 

Jia Yu commented on SEDONA-55:
--

We will roll out this version in one or two weeks.

> Publish Python artifact 1.0.2
> -
>
> Key: SEDONA-55
> URL: https://issues.apache.org/jira/browse/SEDONA-55
> Project: Apache Sedona
>  Issue Type: Task
>Reporter: Artur Dryomov
>Priority: Major
>
> As noted in release notes for [the {{1.0.1}} 
> release|http://sedona.apache.org/download/release-notes/#sedona-101] there 
> was a configuration issue, resulting in PySpark version mismatch. 
> Unfortunately suggested workarounds do not work with [tools like 
> {{pip-tools}}|https://github.com/jazzband/pip-tools] which auto-generate 
> dependencies.
> Seems like this kind of change fits the patch release so it would be great to 
> have a {{1.0.2}} release, resolving this incovinience.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SEDONA-59) Remove explicit pyspark dependency

2021-08-25 Thread Jia Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SEDONA-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404294#comment-17404294
 ] 

Jia Yu commented on SEDONA-59:
--

This is an awesome solution. Can you create this PR at your earliest 
convenience? We are planing to roll out 1.1.0 in the next one or two weeks.

> Remove explicit pyspark dependency
> --
>
> Key: SEDONA-59
> URL: https://issues.apache.org/jira/browse/SEDONA-59
> Project: Apache Sedona
>  Issue Type: Improvement
>Reporter: Sebastian Eckweiler
>Priority: Normal
>
> The currently published sedona python package has an explicit dependency on 
> pyspark.
> When used on spark platforms such as Databricks spark comes pre-installed, 
> but not integrated with pip. A `pip install sedona` will thus install another 
> pyspark copy - which in the best case is just superfluous. In the worst case 
> it might cause trouble in combination with the pre-installed spark.
> Workarounds, such as installing sedona without dependencies can work for a 
> while.
>  But this is fragile: as soon as dependency validation as performed e.g. by 
> setuptools entrypoints comes around it will break.
>  
> I guess there are two options:
>  * Removing the pyspark dependency completely, considering it to "obvious"
>  * Add pyspark as an optional `extras_require` to an extra called "spark".
>  This would allow a pip install as below, which would get sedona and the 
> corresponding pyspark distribution:
> {code:java}
> pip install sedona[spark]{code}
>  
> I'd be willing to create a corresponding pull-request if one of the options 
> would be accepted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (SEDONA-51) Build fails: /packages cannot be represented as URI

2021-08-25 Thread Jia Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SEDONA-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Yu closed SEDONA-51.

Resolution: Won't Fix

We currently only focus on widely used Java versions such as Java 8 11 13

> Build fails: /packages cannot be represented as URI
> ---
>
> Key: SEDONA-51
> URL: https://issues.apache.org/jira/browse/SEDONA-51
> Project: Apache Sedona
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: Docker 3.3.3 on Windows 10 with WSL2.0
>Reporter: Bart
>Priority: Normal
>  Labels: Debian, newbie, setup
> Fix For: 1.0.1
>
> Attachments: log.txt
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> When running mvn clean install -DskipTests Maven throws this error.
>  
> mvn -v:
> Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
> Maven home: /opt/maven/apache-maven-3.8.1
> Java version: 16.0.1, vendor: Oracle Corporation, runtime: 
> /usr/local/openjdk-16
> Default locale: en, platform encoding: UTF-8
> OS name: "linux", version: "5.4.72-microsoft-standard-wsl2", arch: "amd64", 
> family: "unix"
> I'm setting it up in this Docker image:
> openjdk:16.0.1-jdk-slim-buster
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SEDONA-51) Build fails: /packages cannot be represented as URI

2021-08-25 Thread Jia Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SEDONA-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404291#comment-17404291
 ] 

Jia Yu commented on SEDONA-51:
--

It seems that you can fix it by changing Java version to 1.8

 

https://stackoverflow.com/questions/60308229/scala-packages-cannot-be-represented-as-uri

> Build fails: /packages cannot be represented as URI
> ---
>
> Key: SEDONA-51
> URL: https://issues.apache.org/jira/browse/SEDONA-51
> Project: Apache Sedona
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: Docker 3.3.3 on Windows 10 with WSL2.0
>Reporter: Bart
>Priority: Normal
>  Labels: Debian, newbie, setup
> Fix For: 1.0.1
>
> Attachments: log.txt
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> When running mvn clean install -DskipTests Maven throws this error.
>  
> mvn -v:
> Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
> Maven home: /opt/maven/apache-maven-3.8.1
> Java version: 16.0.1, vendor: Oracle Corporation, runtime: 
> /usr/local/openjdk-16
> Default locale: en, platform encoding: UTF-8
> OS name: "linux", version: "5.4.72-microsoft-standard-wsl2", arch: "amd64", 
> family: "unix"
> I'm setting it up in this Docker image:
> openjdk:16.0.1-jdk-slim-buster
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-sedona] jiayuasu commented on pull request #536: [SEDONA-36] Parquet reader & Writers

2021-08-25 Thread GitBox


jiayuasu commented on pull request #536:
URL: https://github.com/apache/incubator-sedona/pull/536#issuecomment-905314053


   @swamirishi 
   
   Thank you again for your patience! I traveled back to WSU and the new 
semester just started. So I was kind of overwhelmed by many things on my plate. 
Now I am ready to work with you!
   
   I personally are very excited about this PR but I think the reason why 
several committers hesitate to approve this PR is that 
   
   (1) it does not have a detailed proposal to explain the proposed file 
structure and how it works. We know the overall idea of having a geospatial 
Parquet file and it is an awesome idea for sure. But it is unclear to us how 
this is achieved in this PR.
   (2) it contains 44 file changes which are too many to be reviewed by a 
human... It touches several critical places in Sedona. For instance, it adds a 
new module called Parquet, adds some Sedona exceptions, change some 
dependencies. More importantly, it changes the core/SpatialRDD.java file which 
will affect many places.
   
   Therefore, I believe here are some action items
   
   (1) would you please add a Sedona website doc in this PR to explain 
* The usage of Sedona Parquet reader and writer
* The structure of the proposed geo Parquet file, especially the 
metadata structure
* the algorithm: how it skips irrelevant data chunks by comparing the 
spatial query predicate and Parquet metadata
   
   You can put this doc at https://sedona.apache.org/tutorial/ as a programming 
guide for geospatial Parquet
   
   (2) for the file changes themselves
* please fix the comments raised by @Imbruced 
* Please address my comment for SpatialRDD change
* please remove all whitespace file changes, and changes that just 
include "re-formatting the code"
   
   Once these changes + doc are done and we fully understand what this PR is 
doing exactly, I will approve this PR. Let the customers give it a try. I 
believe once this PR is accepted, your follow-up PRs for the Parquet file will 
quickly pass our review.
   
   Thank you again for your contribution. We really appreciate your 
contribution!
   
   Jia Yu


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-sedona] jiayuasu commented on a change in pull request #536: [SEDONA-36] Parquet reader & Writers

2021-08-25 Thread GitBox


jiayuasu commented on a change in pull request #536:
URL: https://github.com/apache/incubator-sedona/pull/536#discussion_r695540349



##
File path: core/src/main/java/org/apache/sedona/core/spatialRDD/SpatialRDD.java
##
@@ -129,9 +138,24 @@
  * The sample number.
  */
 private int sampleNumber = -1;
-
-public int getSampleNumber()
-{
+/**
+ * Geometry Type Defaults to Geometry Collection
+ */
+private GeometryType geometryType;
+
+public SpatialRDD() {

Review comment:
   Is there a reason why you need to add geometry type to SpatialRDD? 
Because you need to use this information to define the type of geospatial 
objects in a Parquet file?
   
   I don't think this is a good idea. Currently, this generic SpatialRDD is not 
bounded to a particular type. It allows mixed spatial objects in an RDD. E.g.,
   
   POINT (XXX)
   POLYGON (XXX)
   MULTIPOLYGON (XXX)
   POINT (XXX)
   
   This scenario is actually quite common in real world. A WKT file of state 
boundaries or country boundaries may consist of both POLYGON and MULTIPOLYGON.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (SEDONA-59) Remove explicit pyspark dependency

2021-08-25 Thread Sebastian Eckweiler (Jira)
Sebastian Eckweiler created SEDONA-59:
-

 Summary: Remove explicit pyspark dependency
 Key: SEDONA-59
 URL: https://issues.apache.org/jira/browse/SEDONA-59
 Project: Apache Sedona
  Issue Type: Improvement
Reporter: Sebastian Eckweiler


The currently published sedona python package has an explicit dependency on 
pyspark.

When used on spark platforms such as Databricks spark comes pre-installed, but 
not integrated with pip. A `pip install sedona` will thus install another 
pyspark copy - which in the best case is just superfluous. In the worst case it 
might cause trouble in combination with the pre-installed spark.

Workarounds, such as installing sedona without dependencies can work for a 
while.
But this is fragile: as soon as dependency validation as performed e.g. by 
setuptools entrypoints comes around it will break.

 

I guess there are two options:
 * Removing the pyspark dependency completely, considering it to "obvious"
 * Add pyspark as an optional `extras_require` to an extra called "spark".
This would allow a pip install as below, which would get sedona and the 
corresponding pyspark distribution:

{code:java}
pip install sedona[spark]{code}
 

I'd be willing to create a corresponding pull-request if one of the option 
would be accepted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)