[GitHub] [sedona] robertnagy1 opened a new issue, #860: Cannot read Shapefile from azure mounted storage

via GitHub Tue, 13 Jun 2023 02:00:09 -0700


robertnagy1 opened a new issue, #860:
URL: https://github.com/apache/sedona/issues/860


   ## Expected behavior
   I am trying to convert  a shapefile SRDD to a DF.
   
   1. I have mounted  an azure blob storage to 
/dbfs/mnt/spatial/gis_osm_roads_free_1.shp in Databricks. I know it is 
reachable:
        If i run it works well:
                   import pandas as pd
                   gdf = 
gpd.read_file('/dbfs/mnt/spatial/gis_osm_roads_free_1.shp')
                   gdf = gdf.replace(pd.NA, '')
                   osm_points = spark.createDataFrame(gdf)
   
   3. I have done the necessary imports as shown in 
https://github.com/apache/sedona/blob/master/binder/ApacheSedonaCore.ipynb
   4. I have run the following command without an error:
                 shape_rdd = ShapefileReader.readToGeometryRDD(sc, 
"/dbfs/mnt/spatial/gis_osm_roads_free_1.shp")
   5. I run the following command and get this output:
                 shape_rdd.analyze()
                 OUTPUT: 
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
not exist: dbfs:/dbfs/mnt/spatial/gis_osm_roads_free_1.shp
   
   6. Subsequently the following command fails as well with the same error: 
Adapter.toDf(shape_rdd, spark)
   
   
   ## Actual behavior
   I would expect if i can read through geopandas and create a dataframe, as 
shown in point 2, this would work as well when running points 4 and 5
   
   ## Steps to reproduce the problem:
   
   Read in any shapefile through ShapefileReader.readToGeometryRDD and try to 
run shape_rdd.analyze() and/or  Adapter.toDf(shape_rdd, spark) afterwards.
   
   ## Settings
   
   I am using this library in databricks 
   org.apache.sedona:sedona-spark-shaded-3.0_2.12:1.4.0
   
   And i have installed the latest apache-sedona from PyPI
   
   Python version = 3.11
   
   Environment = Databricks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [sedona] robertnagy1 opened a new issue, #860: Cannot read Shapefile from azure mounted storage

Reply via email to