Hello, I've been trying to run Sedona for Python on Databricks for 2 days and I think I've stumbled upon a bug.
*Configuration*: - Spark 3.0.1 - Scala 2.12 - Python 3.7 *Librairies*: - apache-sedona (from PyPi) - org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubating (from Maven) *What I'm trying to do:* I'm trying to load a series of Shapefiles files into a dataframe for geospatial analysis. See code snippet below, based of your example notebook <https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaCore.ipynb> > from sedona.core.formatMapper.shapefileParser import ShapefileReader > from sedona.register import SedonaRegistrator > from sedona.utils.adapter import Adapter > > SedonaRegistrator.registerAll(spark) > shape_rdd = ShapefileReader.readToGeometryRDD(spark.sparkContext, > file_name) > df = Adapter.toDf(shape_rdd, spark) > *Bug*: The ShapefileReader.readToGeometryRDD() currently throws the following error: > Py4JJavaError: An error occurred while calling > z:org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD. > : java.lang.NoClassDefFoundError: org/opengis/referencing/FactoryException > at > org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:79) > at > org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at > py4j.Gateway.invoke(Gateway.java:295) at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at > py4j.commands.CallCommand.execute(CallCommand.java:79) at > py4j.GatewayConnection.run(GatewayConnection.java:251) at > java.lang.Thread.run(Thread.java:748) Caused by: > java.lang.ClassNotFoundException: org.opengis.referencing.FactoryException > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at > java.lang.ClassLoader.loadClass(ClassLoader.java:419) at > com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) > : java.lang.NoClassDefFoundError: org/opengis/referencing/FactoryException > at > org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:79) > at > org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) > at py4j.Gateway.invoke(Gateway.java:295) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:251) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassNotFoundException: > org.opengis.referencing.FactoryException > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:419) > at > com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) Adding the org.apache.sedona:sedona-core-3.0_2.12:1.0.0-incubating library from Maven doesn't solve the error. Adding the org.datasyslab:geospark:1.3.1 library from Maven solves the error, but it creates conflicts with the underlying org.locationtech.jts dependencies. This makes me think there is a missing OpenGIS dependency in the sedona-python-adapter. Regards, G. Dugernier -- Grégory Dugernier Software Engineer g...@aloalto.com <f...@aloalto.com> +32 (0)484 11 26 09 www.aloalto.com +32 (0)2 736 10 17 -- DISCLAIMER : The content of this e-mail message does not constitute a commitment of S.A. ALOALTO N.V. or its subsidiaries/affiliates. This e-mail and any attachments thereto may contain information which is confidential and/or protected by intellectual property rights and are intended for the intended recipient only. Any use of the information contained herein (including, but not limited to, total or partial reproduction, communication or distribution in any form) by persons other than the designated recipient(s) is prohibited. If an addressing or transmission error has misdirected this e-mail, please notify the author, either by telephone or by e-mail and delete the material from any computer.