Kristin Cowalcijk created SEDONA-646:
----------------------------------------
Summary: Shapefile data source for DataFrame API
Key: SEDONA-646
URL: https://issues.apache.org/jira/browse/SEDONA-646
Project: Apache Sedona
Issue Type: New Feature
Reporter: Kristin Cowalcijk
Fix For: 1.7.0
The current shapefile reader returns a SpatialRDD, if users want a DataFrame,
they must use the Adapter.toDf to convert the SpatialRDD to a DataFrame. A
better approach is to support loading shapefiles as DataFrames using the
DataFrame API:
{code:python}
df = sedona.read.format("shapefile").load("/path/to/shapefile")
{code}
This is more intuitive than
{code:python}
rdd = ShapefileReader.readToGeometryRDD(spark.sparkContext,
"/path/to/shapefile")
df = Adapter.toDf(rdd, spark)
{code}
We'll also make several more improvements:
1. Making the non-spatial attributes having appropriate data types.
{{Adapter.toDf}} converts all non-spatial fields to string fields, which loses
the original data types of non-spatial attributes.
2. Better handling of input paths. We should support paths of directories and
paths of .shp files.
3. Infer code page from .cpg file, so that users don't have to change the Java
system property {{sedona.global.charset}} to combat with encoding problems.
4. Infer the SRID of geometries from .prj file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)