Sebastian Eckweiler created SEDONA-59:
-----------------------------------------

             Summary: Remove explicit pyspark dependency
                 Key: SEDONA-59
                 URL: https://issues.apache.org/jira/browse/SEDONA-59
             Project: Apache Sedona
          Issue Type: Improvement
            Reporter: Sebastian Eckweiler


The currently published sedona python package has an explicit dependency on 
pyspark.

When used on spark platforms such as Databricks spark comes pre-installed, but 
not integrated with pip. A `pip install sedona` will thus install another 
pyspark copy - which in the best case is just superfluous. In the worst case it 
might cause trouble in combination with the pre-installed spark.

Workarounds, such as installing sedona without dependencies can work for a 
while.
But this is fragile: as soon as dependency validation as performed e.g. by 
setuptools entrypoints comes around it will break.

 

I guess there are two options:
 * Removing the pyspark dependency completely, considering it to "obvious"
 * Add pyspark as an optional `extras_require` to an extra called "spark".
This would allow a pip install as below, which would get sedona and the 
corresponding pyspark distribution:

{code:java}
pip install sedona[spark]{code}
 

I'd be willing to create a corresponding pull-request if one of the option 
would be accepted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to