jwass opened a new issue, #1268:
URL: https://github.com/apache/sedona/issues/1268
Is there a way to spatially partition a dataframe, presumably by first
converting to rdd then back, and write it out using that partitioning scheme?
This is my guess as to how to accomplish this but I'm not sure if I'm
misunderstanding things... I'm also relatively new to working with Spark and
Sedona.
## Expected behavior
Loading a dataframe, converting to rdd, spatially partition it, convert back
to dataframe, and save the result - I'd expect the final dataframe partitioning
to be preserved from the rdd.
## Actual behavior
Adapter.toDf() does not preserve partitioning - or I'm doing something else
wrong.
## Steps to reproduce the problem
```
df = sedona.read.format("geoparquet").load(path)
rdd = Adapter.toSpatialRdd(df, "geometry")
rdd.analyze()
rdd.spatialPartitioning(GridType.KDBTREE, num_partitions=6)
df2 = Adapter.toDf(rdd, spark)
df2.write.format("geoparquet").save(output_path)
```
But it looked like that doesn't work - number of partitions written in df2
was far greater than 6.
## Settings
Sedona version = 1.5.1
Apache Spark version = ?
API type = Python
Python version = ?
Environment = Databricks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]