robertnagy1 opened a new issue, #860:
URL: https://github.com/apache/sedona/issues/860
## Expected behavior
I am trying to convert a shapefile SRDD to a DF.
1. I have mounted an azure blob storage to
/dbfs/mnt/spatial/gis_osm_roads_free_1.shp in Databricks. I know it is
reachable:
If i run it works well:
import pandas as pd
gdf =
gpd.read_file('/dbfs/mnt/spatial/gis_osm_roads_free_1.shp')
gdf = gdf.replace(pd.NA, '')
osm_points = spark.createDataFrame(gdf)
3. I have done the necessary imports as shown in
https://github.com/apache/sedona/blob/master/binder/ApacheSedonaCore.ipynb
4. I have run the following command without an error:
shape_rdd = ShapefileReader.readToGeometryRDD(sc,
"/dbfs/mnt/spatial/gis_osm_roads_free_1.shp")
5. I run the following command and get this output:
shape_rdd.analyze()
OUTPUT:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
not exist: dbfs:/dbfs/mnt/spatial/gis_osm_roads_free_1.shp
6. Subsequently the following command fails as well with the same error:
Adapter.toDf(shape_rdd, spark)
## Actual behavior
I would expect if i can read through geopandas and create a dataframe, as
shown in point 2, this would work as well when running points 4 and 5
## Steps to reproduce the problem:
Read in any shapefile through ShapefileReader.readToGeometryRDD and try to
run shape_rdd.analyze() and/or Adapter.toDf(shape_rdd, spark) afterwards.
## Settings
I am using this library in databricks
org.apache.sedona:sedona-spark-shaded-3.0_2.12:1.4.0
And i have installed the latest apache-sedona from PyPI
Python version = 3.11
Environment = Databricks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]