Copilot commented on code in PR #632:
URL: https://github.com/apache/sedona-db/pull/632#discussion_r2819248762
##########
python/sedonadb/python/sedonadb/dataframe.py:
##########
@@ -416,6 +417,91 @@ def to_parquet(
overwrite_bbox_columns,
)
+ def to_pyogrio(
+ self,
+ path: Union[str, Path, io.BytesIO],
+ *,
+ driver: Optional[str] = None,
+ geometry_type: Optional[str] = None,
+ geometry_name: Optional[str] = None,
+ crs: Optional[str] = None,
+ append: bool = False,
+ **kwargs,
+ ):
+ """Write using GDAL/OGR via pyogrio
+
+ Writes this DataFrame batchwise to a file using GDAL/OGR using the
+ implementation provided by the pyogrio package. This is the same
backend
+ used by GeoPandas and this function is a light wrapper around
+ `pyogrio.raw.write_arrow()` that fills in default values using
+ information available to the DataFrame (e.g., geometry column and CRS).
+
+ Args:
+ path: An output path or `BytesIO` output buffer.
+ driver: An explicit GDAL OGR driver. Usually inferred from `path`
but
+ must be provided if path is a `BytesIO`. Not all drivers
support
+ writing to `BytesIO`.
+ geometry_type: A GeoJSON-style geometry type or `None` to provide
an
+ inferred default value (which may be `"Unknown"`). This is
required
+ to write some types of output (e.g. Shapefiles) and may provide
+ files that are more efficiently read.
+ geometry_name: The column to write as the primary geometry column.
If
+ `None`, the name of the geometry column will be inferred.
+ crs: An optional string overriding the CRS of `geometry_name`.
+ append: Use `True` to append to the file for drivers that support
+ appending.
+ kwargs: Extra arguments passed to `pyogrio.raw.write_arrow()`.
+
+ Examples:
+
+ >>> import tempfile
+ >>> sd = sedona.db.connect()
+ >>> td = tempfile.TemporaryDirectory()
+ >>> sd.sql("SELECT ST_Point(0, 1,
3857)").to_pyogrio(f"{td.name}/tmp.fgb")
+ >>> sd.read_pyogrio(f"{td.name}/tmp.fgb").show()
+ ┌──────────────┐
+ │ wkb_geometry │
+ │ geometry │
+ ╞══════════════╡
+ │ POINT(0 1) │
+ └──────────────┘
+ """
+ if geometry_name is None:
+ geometry_name = self._impl.primary_geometry_column()
+
+ if crs is None:
+ inferred_crs = self.schema.field(geometry_name).type.crs
+ crs = None if inferred_crs is None else inferred_crs.to_json()
Review Comment:
If the DataFrame has no geometry columns, `primary_geometry_column()`
appears to return a falsy value (see `to_pandas()` which checks `if
geometry:`). In that case `self.schema.field(geometry_name)` will raise a
confusing exception. Consider validating `geometry_name` after inference and
raising a clear error (e.g., require `geometry_name` when there is no geometry
column, or error out that `to_pyogrio()` requires a geometry column).
##########
python/sedonadb/python/sedonadb/dataframe.py:
##########
@@ -416,6 +417,91 @@ def to_parquet(
overwrite_bbox_columns,
)
+ def to_pyogrio(
+ self,
+ path: Union[str, Path, io.BytesIO],
+ *,
+ driver: Optional[str] = None,
+ geometry_type: Optional[str] = None,
+ geometry_name: Optional[str] = None,
+ crs: Optional[str] = None,
+ append: bool = False,
+ **kwargs,
+ ):
+ """Write using GDAL/OGR via pyogrio
+
+ Writes this DataFrame batchwise to a file using GDAL/OGR using the
+ implementation provided by the pyogrio package. This is the same
backend
+ used by GeoPandas and this function is a light wrapper around
+ `pyogrio.raw.write_arrow()` that fills in default values using
+ information available to the DataFrame (e.g., geometry column and CRS).
+
+ Args:
+ path: An output path or `BytesIO` output buffer.
+ driver: An explicit GDAL OGR driver. Usually inferred from `path`
but
+ must be provided if path is a `BytesIO`. Not all drivers
support
+ writing to `BytesIO`.
+ geometry_type: A GeoJSON-style geometry type or `None` to provide
an
+ inferred default value (which may be `"Unknown"`). This is
required
+ to write some types of output (e.g. Shapefiles) and may provide
+ files that are more efficiently read.
+ geometry_name: The column to write as the primary geometry column.
If
+ `None`, the name of the geometry column will be inferred.
+ crs: An optional string overriding the CRS of `geometry_name`.
+ append: Use `True` to append to the file for drivers that support
+ appending.
+ kwargs: Extra arguments passed to `pyogrio.raw.write_arrow()`.
+
+ Examples:
+
+ >>> import tempfile
+ >>> sd = sedona.db.connect()
+ >>> td = tempfile.TemporaryDirectory()
+ >>> sd.sql("SELECT ST_Point(0, 1,
3857)").to_pyogrio(f"{td.name}/tmp.fgb")
+ >>> sd.read_pyogrio(f"{td.name}/tmp.fgb").show()
+ ┌──────────────┐
+ │ wkb_geometry │
+ │ geometry │
+ ╞══════════════╡
+ │ POINT(0 1) │
+ └──────────────┘
+ """
+ if geometry_name is None:
+ geometry_name = self._impl.primary_geometry_column()
+
+ if crs is None:
+ inferred_crs = self.schema.field(geometry_name).type.crs
+ crs = None if inferred_crs is None else inferred_crs.to_json()
+
+ if geometry_type is None:
+ # This is required for pyogrio.raw.write_arrow(). We could try
harder
+ # to infer this because some drivers need this information.
+ geometry_type = "Unknown"
+
+ if isinstance(path, Path):
+ path = str(path)
+
+ # There may be more endings worth special-casing here but zipped
FlatGeoBuf
+ # is particularly useful and isn't automatically recognized
+ if driver is None and path.endswith(".fgb.zip"):
Review Comment:
`path` can be an `io.BytesIO` (per the type hints/docs), but this code
unconditionally calls `path.endswith(".fgb.zip")`. That will raise
`AttributeError` for non-string paths. Consider guarding the suffix check with
`isinstance(path, str)` (or coercing `PathLike` only), and explicitly raising a
`ValueError` when `path` is a `BytesIO` and `driver` is not provided (as the
docstring requires).
```suggestion
if isinstance(path, io.BytesIO) and driver is None:
raise ValueError("driver must be provided when path is a
BytesIO")
# There may be more endings worth special-casing here but zipped
FlatGeoBuf
# is particularly useful and isn't automatically recognized
if driver is None and isinstance(path, str) and
path.endswith(".fgb.zip"):
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]