Hi Gregory, Thanks for letting us know. This is not a bug. We cannot include GeoTools jars due to license issues. But indeed we forgot to update the docs and jupyter notebook examples. I just updated them. Please read them here:
https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaSQL.ipynb (Make sure you disable the browser cache or open it in an incognito window) http://sedona.apache.org/download/overview/#install-sedona-python In short, you need to add the following coordinates in the notebook: spark = SparkSession. \ builder. \ appName('appName'). \ config( "spark.serializer", KryoSerializer.getName). \ config( "spark.kryo.registrator", SedonaKryoRegistrator.getName). \ config( "spark.jars.repositories", 'https://repo.osgeo.org/repository/release,' ' https://download.java.net/maven/2'). \ config('spark.jars.packages', 'org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubating,' 'org.geotools:gt-main:24.0,' 'org.geotools:gt-referencing:24.0,' 'org.geotools:gt-epsg-hsql:24.0'). \ getOrCreate() On Wed, Feb 10, 2021 at 2:35 AM Grégory Dugernier <[email protected]> wrote: > Hello, > > I've been trying to run Sedona for Python on Databricks for 2 days and I > think I've stumbled upon a bug. > > *Configuration*: > > - Spark 3.0.1 > - Scala 2.12 > - Python 3.7 > > *Librairies*: > > - apache-sedona (from PyPi) > - org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubating > (from Maven) > > *What I'm trying to do:* > > I'm trying to load a series of Shapefiles files into a dataframe for > geospatial analysis. See code snippet below, based of your example notebook > < > https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaCore.ipynb > > > > > > from sedona.core.formatMapper.shapefileParser import ShapefileReader > > from sedona.register import SedonaRegistrator > > from sedona.utils.adapter import Adapter > > > > SedonaRegistrator.registerAll(spark) > > shape_rdd = ShapefileReader.readToGeometryRDD(spark.sparkContext, > > file_name) > > df = Adapter.toDf(shape_rdd, spark) > > > > *Bug*: > > The ShapefileReader.readToGeometryRDD() currently throws the following > error: > > > Py4JJavaError: An error occurred while calling > > > z:org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD. > > : java.lang.NoClassDefFoundError: > org/opengis/referencing/FactoryException > > at > > > org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:79) > > at > > > org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) at > > py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at > > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at > > py4j.Gateway.invoke(Gateway.java:295) at > > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at > > py4j.commands.CallCommand.execute(CallCommand.java:79) at > > py4j.GatewayConnection.run(GatewayConnection.java:251) at > > java.lang.Thread.run(Thread.java:748) Caused by: > > java.lang.ClassNotFoundException: > org.opengis.referencing.FactoryException > > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at > > java.lang.ClassLoader.loadClass(ClassLoader.java:419) at > > > com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) > > : java.lang.NoClassDefFoundError: > org/opengis/referencing/FactoryException > > at > > > org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:79) > > at > > > org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) > > at py4j.Gateway.invoke(Gateway.java:295) > > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > > at py4j.commands.CallCommand.execute(CallCommand.java:79) > > at py4j.GatewayConnection.run(GatewayConnection.java:251) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: java.lang.ClassNotFoundException: > > org.opengis.referencing.FactoryException > > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:419) > > at > > > com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) > > > Adding the org.apache.sedona:sedona-core-3.0_2.12:1.0.0-incubating library > from Maven doesn't solve the error. Adding the > org.datasyslab:geospark:1.3.1 > library from Maven solves the error, but it creates conflicts with the > underlying org.locationtech.jts dependencies. This makes me think there is > a missing OpenGIS dependency in the sedona-python-adapter. > > Regards, > G. Dugernier > > -- > > > > Grégory Dugernier > Software Engineer > > [email protected] <[email protected]> > +32 (0)484 11 26 09 > > www.aloalto.com > +32 (0)2 736 10 17 > > -- > > > > > DISCLAIMER : The content of this e-mail > message does not constitute a > commitment of S.A. ALOALTO N.V. or its > subsidiaries/affiliates. This e-mail > and any attachments thereto may contain > information which is confidential > and/or protected by intellectual property > rights and are intended for the > intended recipient only. Any use of the > information contained herein > (including, but not limited to, total or partial > reproduction, > communication or distribution in any form) by persons other than > the > designated recipient(s) is prohibited. If an addressing or transmission > error has misdirected this e-mail, please notify the author, either by > telephone or by e-mail and delete the material from any computer. > >
