Hi Gregory, Please let us know if you get your issue fixed. I know many of our users are also using Databricks cluster. We are also interested in the solution.
Thanks, Jia On Wed, Feb 10, 2021 at 5:17 AM Grégory Dugernier <g...@aloalto.com> wrote: > Thank you for the quick reply! > > It seems my particular situation is a bit more complex than that, since > I'm running the notebook on a Databricks cluster, and the default spark > config doesn't seem to allow for more jar repositories (GeoTools isn't on > Maven Central), nor does creating a new SparkSession appears to work. I've > tried to download the jars and add them manually to the cluster but it > doesn't seem to work either. But at least I know where the issue's at! > > Thanks again for your help, > Regards > > On Wed, 10 Feb 2021 at 12:22, Jia Yu <ji...@apache.org> wrote: > >> Hi Gregory, >> >> Thanks for letting us know. This is not a bug. We cannot include GeoTools >> jars due to license issues. But indeed we forgot to update the docs and >> jupyter notebook examples. I just updated them. Please read them here: >> >> >> https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaSQL.ipynb >> >> (Make sure you disable the browser cache or open it in an incognito >> window) >> http://sedona.apache.org/download/overview/#install-sedona-python >> >> In short, you need to add the following coordinates in the notebook: >> >> spark = SparkSession. \ builder. \ appName('appName'). \ config( >> "spark.serializer", KryoSerializer.getName). \ config( >> "spark.kryo.registrator", SedonaKryoRegistrator.getName). \ config( >> "spark.jars.repositories", 'https://repo.osgeo.org/repository/release,' ' >> https://download.java.net/maven/2'). \ config('spark.jars.packages', >> 'org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubating,' >> 'org.geotools:gt-main:24.0,' 'org.geotools:gt-referencing:24.0,' >> 'org.geotools:gt-epsg-hsql:24.0'). \ getOrCreate() >> >> On Wed, Feb 10, 2021 at 2:35 AM Grégory Dugernier <g...@aloalto.com> wrote: >> >>> Hello, >>> >>> I've been trying to run Sedona for Python on Databricks for 2 days and I >>> think I've stumbled upon a bug. >>> >>> *Configuration*: >>> >>> - Spark 3.0.1 >>> - Scala 2.12 >>> - Python 3.7 >>> >>> *Librairies*: >>> >>> - apache-sedona (from PyPi) >>> - org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubating >>> (from Maven) >>> >>> *What I'm trying to do:* >>> >>> I'm trying to load a series of Shapefiles files into a dataframe for >>> geospatial analysis. See code snippet below, based of your example >>> notebook >>> < >>> https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaCore.ipynb >>> > >>> >>> >>> > from sedona.core.formatMapper.shapefileParser import ShapefileReader >>> > from sedona.register import SedonaRegistrator >>> > from sedona.utils.adapter import Adapter >>> > >>> > SedonaRegistrator.registerAll(spark) >>> > shape_rdd = ShapefileReader.readToGeometryRDD(spark.sparkContext, >>> > file_name) >>> > df = Adapter.toDf(shape_rdd, spark) >>> > >>> >>> *Bug*: >>> >>> The ShapefileReader.readToGeometryRDD() currently throws the following >>> error: >>> >>> > Py4JJavaError: An error occurred while calling >>> > >>> z:org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD. >>> > : java.lang.NoClassDefFoundError: >>> org/opengis/referencing/FactoryException >>> > at >>> > >>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:79) >>> > at >>> > >>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66) >>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at >>> > >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> > at >>> > >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> > at java.lang.reflect.Method.invoke(Method.java:498) at >>> > py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at >>> > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at >>> > py4j.Gateway.invoke(Gateway.java:295) at >>> > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at >>> > py4j.commands.CallCommand.execute(CallCommand.java:79) at >>> > py4j.GatewayConnection.run(GatewayConnection.java:251) at >>> > java.lang.Thread.run(Thread.java:748) Caused by: >>> > java.lang.ClassNotFoundException: >>> org.opengis.referencing.FactoryException >>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at >>> > java.lang.ClassLoader.loadClass(ClassLoader.java:419) at >>> > >>> com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) >>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) >>> > : java.lang.NoClassDefFoundError: >>> org/opengis/referencing/FactoryException >>> > at >>> > >>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:79) >>> > at >>> > >>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66) >>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> > at >>> > >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> > at >>> > >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> > at java.lang.reflect.Method.invoke(Method.java:498) >>> > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) >>> > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) >>> > at py4j.Gateway.invoke(Gateway.java:295) >>> > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) >>> > at py4j.commands.CallCommand.execute(CallCommand.java:79) >>> > at py4j.GatewayConnection.run(GatewayConnection.java:251) >>> > at java.lang.Thread.run(Thread.java:748) >>> > Caused by: java.lang.ClassNotFoundException: >>> > org.opengis.referencing.FactoryException >>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) >>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:419) >>> > at >>> > >>> com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) >>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) >>> >>> >>> Adding the org.apache.sedona:sedona-core-3.0_2.12:1.0.0-incubating >>> library >>> from Maven doesn't solve the error. Adding the >>> org.datasyslab:geospark:1.3.1 >>> library from Maven solves the error, but it creates conflicts with the >>> underlying org.locationtech.jts dependencies. This makes me think there >>> is >>> a missing OpenGIS dependency in the sedona-python-adapter. >>> >>> Regards, >>> G. Dugernier >>> >>> -- >>> >>> >>> >>> Grégory Dugernier >>> Software Engineer >>> >>> g...@aloalto.com <f...@aloalto.com> >>> +32 (0)484 11 26 09 >>> >>> www.aloalto.com >>> +32 (0)2 736 10 17 >>> >>> -- >>> >>> >>> >>> >>> DISCLAIMER : The content of this e-mail >>> message does not constitute a >>> commitment of S.A. ALOALTO N.V. or its >>> subsidiaries/affiliates. This e-mail >>> and any attachments thereto may contain >>> information which is confidential >>> and/or protected by intellectual property >>> rights and are intended for the >>> intended recipient only. Any use of the >>> information contained herein >>> (including, but not limited to, total or partial >>> reproduction, >>> communication or distribution in any form) by persons other than >>> the >>> designated recipient(s) is prohibited. If an addressing or transmission >>> error has misdirected this e-mail, please notify the author, either by >>> telephone or by e-mail and delete the material from any computer. >>> >>> > > -- > > > > Grégory Dugernier > Software Engineer > > g...@aloalto.com <f...@aloalto.com> > +32 (0)484 11 26 09 > > www.aloalto.com > +32 (0)2 736 10 17 > > DISCLAIMER : The content of this e-mail message does not constitute a > commitment of S.A. ALOALTO N.V. or its subsidiaries/affiliates. This e-mail > and any attachments thereto may contain information which is confidential > and/or protected by intellectual property rights and are intended for the > intended recipient only. Any use of the information contained herein > (including, but not limited to, total or partial reproduction, > communication or distribution in any form) by persons other than the > designated recipient(s) is prohibited. If an addressing or transmission > error has misdirected this e-mail, please notify the author, either by > telephone or by e-mail and delete the material from any computer. >