FengJiang2018 opened a new issue, #1059:
URL: https://github.com/apache/sedona/issues/1059
## Expected behavior
geoparquet should have geo metadata be generated and should not raise error
during read by using
``` python
df = sedona.read.format("geoparquet").load(path)
```
## Actual behavior
geoparquet was created without geo metadata and got error during read by
using
``` python
df = sedona.read.format("geoparquet").load(path)
```
## Steps to reproduce the problem
Seems like the issue is when I was using df.write to a geoparquet file, the
geo metadata was not created for the Sedona geometry column. I am not sure if
anything I missed.
#1, I am using overture public dataset as input for the dataframe as
following with Sedona Geometry column
``` python
df_building = sedona.read.option("inferschema",True).parquet(inputpath) \
.withColumn("geometry2",expr("ST_GeomFromWKB(geometry)"))
df_building.createOrReplaceTempView("rawdf")
```
#2, Yes I am using DataFrame to write a geoparquet file with Sedona Geometry
Type column on databricks.
``` python
newdf = spark.sql("select *, ST_GeoHash(geometry2, 5) as geohash from rawdf
order by geohash").drop("geometry").withColumnRenamed("geometry2", "geometry")
newdf.write.mode("overwrite").format("geoparquet") \
.save(path+"/final1.parquet")
```
Here is what I saw from the printSchema, it shows as geometry type, but the
nullable is true seems like this is expected. Correct me if this is wrong.
``` cmd
root
|-- geometry: geometry (nullable = true)
|-- geohash: string (nullable = true)
```
#3, I got that error when I am using following way to read the geoparquet
from #2
``` python
df = sedona.read.format("geoparquet").load(newpath)
```
But there is read error if I use following code, but **no geo metadata**
cound be found from df schema
``` python
df = sedona.read.format("geoparquet").parquet(newpath)
```
## Settings
Sedona version = 1.5.0
Apache Spark version = 3.4.0
Apache Flink version = N/A
API type = Python
Scala version = 2.12
JRE version = 1.8
Python version = 3.10
Environment = Azure Databricks, notebook
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]