This is an automated email from the ASF dual-hosted git repository.
jiayu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/sedona.git
The following commit(s) were added to refs/heads/master by this push:
new a22d21d977 [GH-2769] Improve raster loader, writer, and viz docs
(#2802)
a22d21d977 is described below
commit a22d21d9777e6a9e61b03544e2091245ca4810dc
Author: Jia Yu <[email protected]>
AuthorDate: Sun Mar 29 23:14:22 2026 -0700
[GH-2769] Improve raster loader, writer, and viz docs (#2802)
---
docs/api/sql/Overview.md | 25 ++
docs/api/sql/Raster-Functions.md | 5 +
docs/api/sql/Raster-Operators/RS_AsRaster.md | 116 ++++++++
docs/api/sql/Raster-Output/RS_AsArcGrid.md | 63 +++++
docs/api/sql/Raster-Output/RS_AsCOG.md | 85 ++++++
docs/api/sql/Raster-Output/RS_AsGeoTiff.md | 65 +++++
docs/api/sql/Raster-Output/RS_AsPNG.md | 62 +++++
docs/api/sql/Raster-loader.md | 114 --------
docs/api/sql/Raster-writer.md | 403 ---------------------------
docs/tutorial/raster.md | 250 ++++++++++++-----
docs/usecases/ApacheSedonaRaster.ipynb | 5 +-
mkdocs.yml | 2 -
12 files changed, 600 insertions(+), 595 deletions(-)
diff --git a/docs/api/sql/Overview.md b/docs/api/sql/Overview.md
index e3f1cca34e..219cc90a22 100644
--- a/docs/api/sql/Overview.md
+++ b/docs/api/sql/Overview.md
@@ -50,6 +50,31 @@ Sedona also provides an Adapter to convert SpatialRDD <->
DataFrame. Please read
SedonaSQL supports SparkSQL query optimizer, documentation is
[Here](Optimizer.md)
+## Raster function list
+
+SedonaSQL also supports raster data processing. Raster functions use the `RS_`
prefix. All raster operators can be called in the same way as vector operators:
+
+```scala
+var myDataFrame = sedona.sql("YOUR_SQL")
+```
+
+* Constructor: Construct a Raster given an input file or parameters
+ * Example: RS_FromGeoTiff (binary). Create a Raster from a GeoTiff
binary.
+ * Documentation: [Here](Raster-Functions.md#raster-constructors)
+* Function: Execute a function on the given Raster column or columns
+ * Example: RS_Value (raster, point). Given a Raster and a Point
geometry, return the pixel value at that location.
+ * Documentation: Functions are organized by category. See [Raster
Accessors](Raster-Functions.md#raster-accessors), [Raster
Operators](Raster-Functions.md#raster-operators), [Raster Band
Accessors](Raster-Functions.md#raster-band-accessors), [Raster
Output](Raster-Functions.md#raster-output), and other categories in the sidebar.
+* Aggregate function: Return the aggregated value on the given Raster column
+ * Example: RS_Union_Aggr (Raster column). Given a Raster column,
combine all rasters into a single multiband raster.
+ * Documentation: [Here](Raster-Functions.md#raster-aggregate-functions)
+* Predicate: Execute a logic judgement on the given columns and return true or
false
+ * Example: RS_Intersects (raster, geometry). Check if a raster
intersects a geometry. Return "True" if yes, else return "False".
+ * Documentation: [Here](Raster-Functions.md#raster-predicates)
+
+## Raster quick start
+
+The detailed explanation is here [Write a Raster DataFrame/SQL
application](../../tutorial/raster.md).
+
## Quick start
The detailed explanation is here [Write a SQL/DataFrame
application](../../tutorial/sql.md).
diff --git a/docs/api/sql/Raster-Functions.md b/docs/api/sql/Raster-Functions.md
index 6c1e6ffb38..dd2a6bb645 100644
--- a/docs/api/sql/Raster-Functions.md
+++ b/docs/api/sql/Raster-Functions.md
@@ -131,6 +131,7 @@ These functions perform operations on raster objects.
| [RS_Union](Raster-Operators/RS_Union.md) | Raster | Returns a combined
multi-band raster from 2 or more input Rasters. The order of bands in the
resultant raster will be in the order of the input rasters. For example if
`RS_Union` is called on two 2... | v1.6.0 |
| [RS_Value](Raster-Operators/RS_Value.md) | Double | Returns the value at the
given point in the raster. If no band number is specified it defaults to 1. |
v1.4.0 |
| [RS_Values](Raster-Operators/RS_Values.md) | `Array<Double>` | Returns the
values at the given points or grid coordinates in the raster. If no band number
is specified it defaults to 1. | v1.4.0 |
+| [RS_AsRaster](Raster-Operators/RS_AsRaster.md) | Raster | Converts a vector
geometry into a raster dataset by assigning a specified value to all pixels
covered by the geometry. | v1.5.0 |
## Raster Tiles
@@ -197,3 +198,7 @@ These functions convert raster data to various output
formats for visualization.
| [RS_AsBase64](Raster-Output/RS_AsBase64.md) | String | Returns a base64
encoded string of the given raster. If the datatype is integral then this
function internally takes the first 4 bands as RGBA, and converts them to the
PNG format, finally produces... | v1.5.0 |
| [RS_AsImage](Raster-Output/RS_AsImage.md) | String | Returns a HTML that
when rendered using an HTML viewer or via a Jupyter Notebook, displays the
raster as a square image of side length `imageWidth`. Optionally, an imageWidth
parameter can be passe... | v1.5.0 |
| [RS_AsMatrix](Raster-Output/RS_AsMatrix.md) | String | Returns a string,
that when printed, outputs the raster band as a pretty printed 2D matrix. All
the values of the raster are cast to double for the string. RS_AsMatrix allows
specifying the number ... | |
+| [RS_AsArcGrid](Raster-Output/RS_AsArcGrid.md) | Binary | Returns a binary
value (byte array) representing the raster as an ArcGrid image. Single band
only. | v1.4.1 |
+| [RS_AsGeoTiff](Raster-Output/RS_AsGeoTiff.md) | Binary | Returns a binary
value (byte array) encoding the raster as a GeoTiff image. | v1.4.1 |
+| [RS_AsCOG](Raster-Output/RS_AsCOG.md) | Binary | Returns a binary value
(byte array) encoding the raster as a Cloud Optimized GeoTiff (COG). | v1.9.0 |
+| [RS_AsPNG](Raster-Output/RS_AsPNG.md) | Binary | Returns a binary value
(byte array) encoding the raster as a PNG image. Only accepts unsigned integer
pixel types. | v1.5.0 |
diff --git a/docs/api/sql/Raster-Operators/RS_AsRaster.md
b/docs/api/sql/Raster-Operators/RS_AsRaster.md
new file mode 100644
index 0000000000..7e5a8e8afa
--- /dev/null
+++ b/docs/api/sql/Raster-Operators/RS_AsRaster.md
@@ -0,0 +1,116 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# RS_AsRaster
+
+Introduction: `RS_AsRaster` converts a vector geometry into a raster dataset
by assigning a specified value to all pixels covered by the geometry. Unlike
`RS_Clip`, which extracts a subset of an existing raster while preserving its
original values, `RS_AsRaster` generates a new raster where the geometry is
rasterized onto a raster grid. The function supports all geometry types and
takes the following parameters:
+
+* `geom`: The geometry to be rasterized.
+* `raster`: The reference raster to be used for overlaying the `geom` on.
+* `pixelType`: Defines data type of the output raster. This can be one of the
following, D (double), F (float), I (integer), S (short), US (unsigned short)
or B (byte).
+* `allTouched` (Since: `v1.7.1`): Decides the pixel selection criteria. If set
to `true`, the function selects all pixels touched by the geometry, else,
selects only pixels whose centroids intersect the geometry. Defaults to `false`.
+* `value`: The value to be used for assigning pixels covered by the geometry.
Defaults to using `1.0` if not provided.
+* `noDataValue`: Used for assigning the no data value of the resultant raster.
Defaults to `null` if not provided.
+* `useGeometryExtent`: Defines the extent of the resultant raster. When set to
`true`, it corresponds to the extent of `geom`, and when set to false, it
corresponds to the extent of `raster`. Default value is `true` if not set.
+
+Format:
+
+```
+RS_AsRaster(geom: Geometry, raster: Raster, pixelType: String, allTouched:
Boolean, value: Double, noDataValue: Double, useGeometryExtent: Boolean)
+```
+
+```
+RS_AsRaster(geom: Geometry, raster: Raster, pixelType: String, allTouched:
Boolean, value: Double, noDataValue: Double)
+```
+
+```
+RS_AsRaster(geom: Geometry, raster: Raster, pixelType: String, allTouched:
Boolean, value: Double)
+```
+
+```
+RS_AsRaster(geom: Geometry, raster: Raster, pixelType: String, allTouched:
Boolean)
+```
+
+```
+RS_AsRaster(geom: Geometry, raster: Raster, pixelType: String)
+```
+
+Return type: `Raster`
+
+Since: `v1.5.0`
+
+!!!note
+ The function doesn't support rasters that have any one of the following
properties:
+ ```
+ ScaleX < 0
+ ScaleY > 0
+ SkewX != 0
+ SkewY != 0
+ ```
+ If a raster is provided with any one of these properties, then an
IllegalArgumentException is thrown.
+
+For more information about ScaleX, ScaleY, SkewX, SkewY, please refer to the
[Affine Transformations](../Raster-affine-transformation.md) section.
+
+SQL Example
+
+```sql
+SELECT RS_AsRaster(
+ ST_GeomFromWKT('POLYGON((15 15, 18 20, 15 24, 24 25, 15 15))'),
+ RS_MakeEmptyRaster(2, 255, 255, 3, -215, 2, -2, 0, 0, 4326),
+ 'D', false, 255.0, 0d
+ )
+```
+
+Output:
+
+```
+GridCoverage2D["g...
+```
+
+SQL Example
+
+```sql
+SELECT RS_AsRaster(
+ ST_GeomFromWKT('POLYGON((15 15, 18 20, 15 24, 24 25, 15 15))'),
+ RS_MakeEmptyRaster(2, 255, 255, 3, -215, 2, -2, 0, 0, 4326),
+ 'D'
+ )
+```
+
+Output:
+
+```
+GridCoverage2D["g...
+```
+
+SQL Example
+
+```sql
+SELECT RS_AsRaster(
+ ST_GeomFromWKT('POLYGON((15 15, 18 20, 15 24, 24 25, 15 15))'),
+ RS_MakeEmptyRaster(2, 255, 255, 3, 215, 2, -2, 0, 0, 0),
+ 'D', true, 255, 0d, false
+)
+```
+
+Output:
+
+```
+GridCoverage2D["g...
+```
diff --git a/docs/api/sql/Raster-Output/RS_AsArcGrid.md
b/docs/api/sql/Raster-Output/RS_AsArcGrid.md
new file mode 100644
index 0000000000..90c001e6e7
--- /dev/null
+++ b/docs/api/sql/Raster-Output/RS_AsArcGrid.md
@@ -0,0 +1,63 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# RS_AsArcGrid
+
+Introduction: Returns a binary value (byte array) representing an ArcGrid
image for each input raster. ArcGrid only supports a single band. If your
raster has multiple bands, you need to specify which band to use as the source.
+
+Possible values for `sourceBand`: any non-negative value (>=0). If not given,
it will use Band 0.
+
+Format:
+
+`RS_AsArcGrid(raster: Raster)`
+
+`RS_AsArcGrid(raster: Raster, sourceBand: Integer)`
+
+Return type: `Binary`
+
+Since: `v1.4.1`
+
+SQL Example
+
+```sql
+SELECT RS_AsArcGrid(raster) FROM my_raster_table
+```
+
+SQL Example
+
+```sql
+SELECT RS_AsArcGrid(raster, 1) FROM my_raster_table
+```
+
+Output:
+
+```html
++--------------------+
+| arcgrid|
++--------------------+
+|[4D 4D 00 2A 00 0...|
++--------------------+
+```
+
+Output schema:
+
+```sql
+root
+ |-- arcgrid: binary (nullable = true)
+```
diff --git a/docs/api/sql/Raster-Output/RS_AsCOG.md
b/docs/api/sql/Raster-Output/RS_AsCOG.md
new file mode 100644
index 0000000000..050ccc09ce
--- /dev/null
+++ b/docs/api/sql/Raster-Output/RS_AsCOG.md
@@ -0,0 +1,85 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# RS_AsCOG
+
+Introduction: Returns a binary value (byte array) encoding the input raster as
a [Cloud Optimized GeoTIFF](https://www.cogeo.org/) (COG). COG is a GeoTIFF
that is internally organized to enable efficient range-read access over HTTP,
making it ideal for cloud-hosted raster data.
+
+Possible values for `compression`: `Deflate` (default), `LZW`, `JPEG`,
`PackBits`. Case-insensitive.
+
+`tileSize` must be a power of 2 (e.g., 128, 256, 512). Default value: `256`
+
+Possible values for `quality`: any decimal number between 0 and 1. 0 means
maximum compression and 1 means minimum compression. Default value: `0.2`
+
+Possible values for `resampling`: `Nearest` (default), `Bilinear`, `Bicubic`.
Case-insensitive. This controls the resampling algorithm used to build overview
levels.
+
+`overviewCount` controls the number of overview levels. Use `-1` for automatic
(default), `0` for no overviews, or any positive integer for a specific count.
+
+Format:
+
+`RS_AsCOG(raster: Raster)`
+
+`RS_AsCOG(raster: Raster, compression: String)`
+
+`RS_AsCOG(raster: Raster, compression: String, tileSize: Integer)`
+
+`RS_AsCOG(raster: Raster, compression: String, tileSize: Integer, quality:
Double)`
+
+`RS_AsCOG(raster: Raster, compression: String, tileSize: Integer, quality:
Double, resampling: String)`
+
+`RS_AsCOG(raster: Raster, compression: String, tileSize: Integer, quality:
Double, resampling: String, overviewCount: Integer)`
+
+Return type: `Binary`
+
+Since: `v1.9.0`
+
+SQL Example
+
+```sql
+SELECT RS_AsCOG(raster) FROM my_raster_table
+```
+
+SQL Example
+
+```sql
+SELECT RS_AsCOG(raster, 'LZW') FROM my_raster_table
+```
+
+SQL Example
+
+```sql
+SELECT RS_AsCOG(raster, 'LZW', 512, 0.75, 'Bilinear', 3) FROM my_raster_table
+```
+
+Output:
+
+```html
++--------------------+
+| cog|
++--------------------+
+|[4D 4D 00 2A 00 0...|
++--------------------+
+```
+
+Output schema:
+
+```sql
+root
+ |-- cog: binary (nullable = true)
+```
diff --git a/docs/api/sql/Raster-Output/RS_AsGeoTiff.md
b/docs/api/sql/Raster-Output/RS_AsGeoTiff.md
new file mode 100644
index 0000000000..525c81d844
--- /dev/null
+++ b/docs/api/sql/Raster-Output/RS_AsGeoTiff.md
@@ -0,0 +1,65 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# RS_AsGeoTiff
+
+Introduction: Returns a binary value (byte array) encoding the input raster as
a GeoTiff image.
+
+Possible values for `compressionType`: `None`, `PackBits`, `Deflate`,
`Huffman`, `LZW` and `JPEG`
+
+Possible values for `imageQuality`: any decimal number between 0 and 1. 0
means the lowest quality and 1 means the highest quality.
+
+Format:
+
+`RS_AsGeoTiff(raster: Raster)`
+
+`RS_AsGeoTiff(raster: Raster, compressionType: String, imageQuality: Double)`
+
+Return type: `Binary`
+
+Since: `v1.4.1`
+
+SQL Example
+
+```sql
+SELECT RS_AsGeoTiff(raster) FROM my_raster_table
+```
+
+SQL Example
+
+```sql
+SELECT RS_AsGeoTiff(raster, 'LZW', 0.75) FROM my_raster_table
+```
+
+Output:
+
+```html
++--------------------+
+| geotiff|
++--------------------+
+|[4D 4D 00 2A 00 0...|
++--------------------+
+```
+
+Output schema:
+
+```sql
+root
+ |-- geotiff: binary (nullable = true)
+```
diff --git a/docs/api/sql/Raster-Output/RS_AsPNG.md
b/docs/api/sql/Raster-Output/RS_AsPNG.md
new file mode 100644
index 0000000000..0c9ef412e8
--- /dev/null
+++ b/docs/api/sql/Raster-Output/RS_AsPNG.md
@@ -0,0 +1,62 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# RS_AsPNG
+
+Introduction: Returns a PNG byte array, that can be written to raster files as
PNGs using the Sedona raster data source writer. This function can only accept
pixel data type of unsigned integer. PNG can accept 1 or 3 bands of data from
the raster, refer to [RS_Band](../Raster-Band-Accessors/RS_Band.md) for more
details.
+
+!!!Note
+ Raster having `UNSIGNED_8BITS` pixel data type will have range of `0 -
255`, whereas rasters having `UNSIGNED_16BITS` pixel data type will have range
of `0 - 65535`. If provided pixel value is greater than either `255` for
`UNSIGNED_8BITS` or `65535` for `UNSIGNED_16BITS`, then the extra bit will be
truncated.
+
+!!!Note
+ Raster that have float or double values will result in an empty byte
array. PNG only accepts Integer values, if you want to write your raster to an
image file, please refer to [RS_AsGeoTiff](RS_AsGeoTiff.md).
+
+Format:
+
+`RS_AsPNG(raster: Raster)`
+
+`RS_AsPNG(raster: Raster, maxWidth: Integer)`
+
+Return type: `Binary`
+
+Since: `v1.5.0`
+
+SQL Example
+
+```sql
+SELECT RS_AsPNG(raster) FROM Rasters
+```
+
+Output:
+
+```
+[-119, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73...]
+```
+
+SQL Example
+
+```sql
+SELECT RS_AsPNG(RS_Band(raster, Array(3, 1, 2)))
+```
+
+Output:
+
+```
+[-103, 78, 94, -26, 61, -16, -91, -103, -65, -116...]
+```
diff --git a/docs/api/sql/Raster-loader.md b/docs/api/sql/Raster-loader.md
deleted file mode 100644
index b062471b6d..0000000000
--- a/docs/api/sql/Raster-loader.md
+++ /dev/null
@@ -1,114 +0,0 @@
-<!--
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
- -->
-
-!!!note
- Sedona loader are available in Scala, Java and Python and have the same
APIs.
-
-## Loading raster using the raster data source
-
-The `raster` data source loads GeoTiff files and automatically splits them
into smaller tiles. Each tile is a row in the resulting DataFrame stored in
`Raster` format.
-
-=== "Scala"
- ```scala
- var rawDf = sedona.read.format("raster").load("/some/path/*.tif")
- rawDf.createOrReplaceTempView("rawdf")
- rawDf.show()
- ```
-
-=== "Java"
- ```java
- Dataset<Row> rawDf =
sedona.read().format("raster").load("/some/path/*.tif");
- rawDf.createOrReplaceTempView("rawdf");
- rawDf.show();
- ```
-
-=== "Python"
- ```python
- rawDf = sedona.read.format("raster").load("/some/path/*.tif")
- rawDf.createOrReplaceTempView("rawdf")
- rawDf.show()
- ```
-
-The output will look like this:
-
-```
-+--------------------+---+---+----+
-| rast| x| y|name|
-+--------------------+---+---+----+
-|GridCoverage2D["g...| 0| 0| ...|
-|GridCoverage2D["g...| 1| 0| ...|
-|GridCoverage2D["g...| 2| 0| ...|
-...
-```
-
-The output contains the following columns:
-
-- `rast`: The raster data in `Raster` format.
-- `x`: The 0-based x-coordinate of the tile. This column is only present when
retile is not disabled.
-- `y`: The 0-based y-coordinate of the tile. This column is only present when
retile is not disabled.
-- `name`: The name of the raster file.
-
-The size of the tile is determined by the internal tiling scheme of the raster
data. It is recommended to use [Cloud Optimized GeoTIFF
(COG)](https://www.cogeo.org/) format for raster data since they usually
organize pixel data as square tiles. You can also disable automatic tiling
using `option("retile", "false")`, or specify the tile size manually using
options such as `option("tileWidth", "256")` and `option("tileHeight", "256")`.
-
-The options for the `raster` data source are as follows:
-
-- `retile`: Whether to enable tiling. Default is `true`.
-- `tileWidth`: The width of the tile. If not specified, the size of internal
tiles will be used.
-- `tileHeight`: The height of the tile. If not specified, will use `tileWidth`
if `tileWidth` is explicitly set, otherwise the size of internal tiles will be
used.
-- `padWithNoData`: Pad the right and bottom of the tile with NODATA values if
the tile is smaller than the specified tile size. Default is `false`.
-
-!!!note
- If the internal tiling scheme of raster data is not friendly for tiling,
the `raster` data source will throw an error, and you can disable automatic
tiling using `option("retile", "false")`, or specify the tile size manually to
workaround this issue. A better solution is to translate the raster data into
COG format using `gdal_translate` or other tools.
-
-The `raster` data source also works with Spark generic file source options,
such as `option("pathGlobFilter", "*.tif*")` and `option("recursiveFileLookup",
"true")`. For instance, you can load all the `.tif` files recursively in a
directory using
-
-```python
-sedona.read.format("raster").option("recursiveFileLookup", "true").option(
- "pathGlobFilter", "*.tif*"
-).load(path_to_raster_data_folder)
-```
-
-One difference from other file source loaders is that when the loaded path
ends with `/`, the `raster` data source will look up raster files in the
directory and all its subdirectories recursively. This is equivalent to
specifying a path without trailing `/` and setting
`option("recursiveFileLookup", "true")`.
-
-## Loading raster using binaryFile loader (Deprecated)
-
-The raster loader of Sedona leverages Spark built-in binary data source and
works with several RS constructors to produce Raster type. Each raster is a row
in the resulting DataFrame and stored in a `Raster` format.
-
-!!!tip
- After loading rasters, you can quickly visualize them in a Jupyter
notebook using `SedonaUtils.display_image(df)`. It automatically detects raster
columns and renders them as images. See [Raster visualizer
docs](Raster-Functions.md#raster-output) for details.
-
-By default, these functions uses lon/lat order since `v1.5.0`. Before, it used
lat/lon order.
-
-### Step 1: Load raster to a binary DataFrame
-
-You can load any type of raster data using the code below. Then use the RS
constructors below to create a Raster DataFrame.
-
-```scala
-sedona.read.format("binaryFile").load("/some/path/*.asc")
-```
-
-### Step 2: Create a raster type column
-
-Use one of the following raster constructors to create a Raster DataFrame:
-
-- [RS_FromArcInfoAsciiGrid](Raster-Constructors/RS_FromArcInfoAsciiGrid.md) -
Create raster from Arc Info Ascii Grid files
-- [RS_FromGeoTiff](Raster-Constructors/RS_FromGeoTiff.md) - Create raster from
GeoTiff files
-- [RS_MakeEmptyRaster](Raster-Constructors/RS_MakeEmptyRaster.md) - Create an
empty raster geometry
-
-See the full list of [Raster
Constructors](Raster-Functions.md#raster-constructors) for more options.
diff --git a/docs/api/sql/Raster-writer.md b/docs/api/sql/Raster-writer.md
deleted file mode 100644
index 101ba567df..0000000000
--- a/docs/api/sql/Raster-writer.md
+++ /dev/null
@@ -1,403 +0,0 @@
-<!--
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
- -->
-
-!!!note
- Sedona writers are available in Scala. Java and Python have the same
APIs.
-
-## Write Raster DataFrame to raster files
-
-To write a Sedona Raster DataFrame to raster files, you need to (1) first
convert the Raster DataFrame to a binary DataFrame using `RS_AsXXX` functions
and (2) then write the binary DataFrame to raster files using Sedona's built-in
`raster` data source.
-
-### Write raster DataFrame to a binary DataFrame
-
-You can use the following RS output functions (`RS_AsXXX`) to convert a Raster
DataFrame to a binary DataFrame. Generally the output format of a raster can be
different from the original input format. For example, you can use
`RS_FromGeoTiff` to create rasters and save them using `RS_AsArcInfoAsciiGrid`.
-
-#### RS_AsArcGrid
-
-Introduction: Returns a binary DataFrame from a Raster DataFrame. Each raster
object in the resulting DataFrame is an ArcGrid image in binary format. ArcGrid
only takes 1 source band. If your raster has multiple bands, you need to
specify which band you want to use as the source.
-
-Possible values for `sourceBand`: any non-negative value (>=0). If not given,
it will use Band 0.
-
-Format:
-
-`RS_AsArcGrid(raster: Raster)`
-
-`RS_AsArcGrid(raster: Raster, sourceBand: Integer)`
-
-Since: `v1.4.1`
-
-SQL Example
-
-```sql
-SELECT RS_AsArcGrid(raster) FROM my_raster_table
-```
-
-SQL Example
-
-```sql
-SELECT RS_AsArcGrid(raster, 1) FROM my_raster_table
-```
-
-Output:
-
-```html
-+--------------------+
-| arcgrid|
-+--------------------+
-|[4D 4D 00 2A 00 0...|
-+--------------------+
-```
-
-Output schema:
-
-```sql
-root
- |-- arcgrid: binary (nullable = true)
-```
-
-#### RS_AsGeoTiff
-
-Introduction: Returns a binary DataFrame from a Raster DataFrame. Each raster
object in the resulting DataFrame is a GeoTiff image in binary format.
-
-Possible values for `compressionType`: `None`, `PackBits`, `Deflate`,
`Huffman`, `LZW` and `JPEG`
-
-Possible values for `imageQuality`: any decimal number between 0 and 1. 0
means the lowest quality and 1 means the highest quality.
-
-Format:
-
-`RS_AsGeoTiff(raster: Raster)`
-
-`RS_AsGeoTiff(raster: Raster, compressionType: String, imageQuality: Double)`
-
-Since: `v1.4.1`
-
-SQL Example
-
-```sql
-SELECT RS_AsGeoTiff(raster) FROM my_raster_table
-```
-
-SQL Example
-
-```sql
-SELECT RS_AsGeoTiff(raster, 'LZW', '0.75') FROM my_raster_table
-```
-
-Output:
-
-```html
-+--------------------+
-| geotiff|
-+--------------------+
-|[4D 4D 00 2A 00 0...|
-+--------------------+
-```
-
-Output schema:
-
-```sql
-root
- |-- geotiff: binary (nullable = true)
-```
-
-#### RS_AsCOG
-
-Introduction: Returns a binary DataFrame from a Raster DataFrame. Each raster
object in the resulting DataFrame is a [Cloud Optimized
GeoTIFF](https://www.cogeo.org/) (COG) image in binary format. COG is a GeoTIFF
that is internally organized to enable efficient range-read access over HTTP,
making it ideal for cloud-hosted raster data.
-
-Possible values for `compression`: `Deflate` (default), `LZW`, `JPEG`,
`PackBits`. Case-insensitive.
-
-`tileSize` must be a power of 2 (e.g., 128, 256, 512). Default value: `256`
-
-Possible values for `quality`: any decimal number between 0 and 1. 0 means
maximum compression and 1 means minimum compression. Default value: `0.2`
-
-Possible values for `resampling`: `Nearest` (default), `Bilinear`, `Bicubic`.
Case-insensitive. This controls the resampling algorithm used to build overview
levels.
-
-`overviewCount` controls the number of overview levels. Use `-1` for automatic
(default), `0` for no overviews, or any positive integer for a specific count.
-
-Format:
-
-`RS_AsCOG(raster: Raster)`
-
-`RS_AsCOG(raster: Raster, compression: String)`
-
-`RS_AsCOG(raster: Raster, compression: String, tileSize: Integer)`
-
-`RS_AsCOG(raster: Raster, compression: String, tileSize: Integer, quality:
Double)`
-
-`RS_AsCOG(raster: Raster, compression: String, tileSize: Integer, quality:
Double, resampling: String)`
-
-`RS_AsCOG(raster: Raster, compression: String, tileSize: Integer, quality:
Double, resampling: String, overviewCount: Integer)`
-
-Since: `v1.9.0`
-
-SQL Example
-
-```sql
-SELECT RS_AsCOG(raster) FROM my_raster_table
-```
-
-SQL Example
-
-```sql
-SELECT RS_AsCOG(raster, 'LZW') FROM my_raster_table
-```
-
-SQL Example
-
-```sql
-SELECT RS_AsCOG(raster, 'LZW', 512, 0.75, 'Bilinear', 3) FROM my_raster_table
-```
-
-Output:
-
-```html
-+--------------------+
-| cog|
-+--------------------+
-|[4D 4D 00 2A 00 0...|
-+--------------------+
-```
-
-Output schema:
-
-```sql
-root
- |-- cog: binary (nullable = true)
-```
-
-#### RS_AsPNG
-
-Introduction: Returns a PNG byte array, that can be written to raster files as
PNGs using the [sedona function](#write-a-binary-dataframe-to-raster-files).
This function can only accept pixel data type of unsigned integer. PNG can
accept 1 or 3 bands of data from the raster, refer to
[RS_Band](Raster-Band-Accessors/RS_Band.md) for more details.
-
-!!!Note
- Raster having `UNSIGNED_8BITS` pixel data type will have range of `0 -
255`, whereas rasters having `UNSIGNED_16BITS` pixel data type will have range
of `0 - 65535`. If provided pixel value is greater than either `255` for
`UNSIGNED_8BITS` or `65535` for `UNSIGNED_16BITS`, then the extra bit will be
truncated.
-
-!!!Note
- Raster that have float or double values will result in an empty byte
array. PNG only accepts Integer values, if you want to write your raster to an
image file, please refer to [RS_AsGeoTiff](#rs_asgeotiff).
-
-Format:
-
-`RS_AsPNG(raster: Raster, maxWidth: Integer)`
-
-`RS_AsPNG(raster: Raster)`
-
-Since: `v1.5.0`
-
-SQL Example
-
-```sql
-SELECT RS_AsPNG(raster) FROM Rasters
-```
-
-Output:
-
-```
-[-119, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73...]
-```
-
-SQL Example
-
-```sql
-SELECT RS_AsPNG(RS_Band(raster, Array(3, 1, 2)))
-```
-
-Output:
-
-```
-[-103, 78, 94, -26, 61, -16, -91, -103, -65, -116...]
-```
-
-### Write a binary DataFrame to raster files
-
-Introduction: You can write a Sedona binary DataFrame to external storage
using Sedona's built-in `raster` data source. Note that: `raster` data source
does not support reading rasters. Please use Spark built-in `binaryFile` and
Sedona RS constructors together to read rasters.
-
-Since: `v1.4.1`
-
-Available options:
-
-* rasterField:
- * Default value: the `binary` type column in the DataFrame. If the
input DataFrame has several binary columns, please specify which column you
want to use. You can use one of the `RS_As*` functions mentioned above to
convert the raster objects to binary raster file content to write.
- * Allowed values: the name of the to-be-saved binary type column
-* fileExtension
- * Default value: `.tiff`
- * Allowed values: any string values such as `.png`, `.jpeg`, `.asc`
-* pathField
- * No default value. If you use this option, then the column specified
in this option must exist in the DataFrame schema. If this option is not used,
each produced raster image will have a random UUID file name.
- * Allowed values: any column name that indicates the paths of each
raster file
-* useDirectCommitter (Since: `v1.6.1`)
- * Default value: `true`. If set to `true`, the output files will be
written directly to the target location. If set to `false`, the output files
will be written to a temporary location and finally be committed to their
target location. It is usually slower to write large amount of raster files
with `useDirectCommitter` set to `false`, especially when writing to object
stores such as S3.
- * Allowed values: `true` or `false`
-
-The schema of the Raster dataframe to be written can be one of the following
two schemas:
-
-```html
-root
- |-- raster_binary: binary (nullable = true)
-```
-
-or
-
-```html
-root
- |-- raster_binary: binary (nullable = true)
- |-- path: string (nullable = true)
-```
-
-Spark SQL example 1:
-
-```scala
-// Assume that df contains a raster column named "rast"
-df.withColumn("raster_binary", expr("RS_AsGeoTiff(rast)"))\
- .write.format("raster").mode("overwrite").save("my_raster_file")
-```
-
-Spark SQL example 2:
-
-```scala
-// Assume that df contains a raster column named "rast" and a string column
named "path"
-df.withColumn("raster_binary", expr("RS_AsGeoTiff(rast)"))\
- .write.format("raster")\
- .option("rasterField", "raster_binary")\
- .option("pathField", "path")\
- .option("fileExtension", ".tiff")\
- .mode("overwrite")\
- .save("my_raster_file")
-```
-
-The produced file structure will look like this:
-
-```html
-my_raster_file
-- part-00000-6c7af016-c371-4564-886d-1690f3b27ca8-c000
- - test1.tiff
- - .test1.tiff.crc
-- part-00001-6c7af016-c371-4564-886d-1690f3b27ca8-c000
- - test2.tiff
- - .test2.tiff.crc
-- part-00002-6c7af016-c371-4564-886d-1690f3b27ca8-c000
- - test3.tiff
- - .test3.tiff.crc
-- _SUCCESS
-```
-
-To read it back to Sedona Raster DataFrame, you can use the following command
(note the `*` in the path):
-
-```scala
-sparkSession.read.format("binaryFile").load("my_raster_file/*")
-```
-
-Then you can create Raster type in Sedona like this `RS_FromGeoTiff(content)`
(if the written data was in GeoTiff format).
-
-The newly created DataFrame can be written to disk again but must be under a
different name such as `my_raster_file_modified`
-
-### Write Geometry to Raster dataframe
-
-#### RS_AsRaster
-
-Introduction: `RS_AsRaster` converts a vector geometry into a raster dataset
by assigning a specified value to all pixels covered by the geometry. Unlike
`RS_Clip`, which extracts a subset of an existing raster while preserving its
original values, `RS_AsRaster` generates a new raster where the geometry is
rasterized onto a raster grid. The function supports all geometry types and
takes the following parameters:
-
-* `geom`: The geometry to be rasterized.
-* `raster`: The reference raster to be used for overlaying the `geom` on.
-* `pixelType`: Defines data type of the output raster. This can be one of the
following, D (double), F (float), I (integer), S (short), US (unsigned short)
or B (byte).
-* `allTouched` (Since: `v1.7.1`): Decides the pixel selection criteria. If set
to `true`, the function selects all pixels touched by the geometry, else,
selects only pixels who's centroids intersect the geometry. Defaults to `false`.
-* `Value`: The value to be used for assigning pixels covered by the geometry.
Defaults to using `1.0` for cell `value` if not provided.
-* `noDataValue`: Used for assigning the no data value of the resultant raster.
Defaults to `null` if not provided.
-* `useGeometryExtent`: Defines the extent of the resultant raster. When set to
`true`, it corresponds to the extent of `geom`, and when set to false, it
corresponds to the extent of `raster`. Default value is `true` if not set.
-
-Format:
-
-```
-RS_AsRaster(geom: Geometry, raster: Raster, pixelType: String, allTouched:
Boolean, value: Double, noDataValue: Double, useGeometryExtent: Boolean)
-```
-
-```
-RS_AsRaster(geom: Geometry, raster: Raster, pixelType: String, allTouched:
Boolean, value: Double, noDataValue: Double)
-```
-
-```
-RS_AsRaster(geom: Geometry, raster: Raster, pixelType: String, allTouched:
Boolean, value: Double)
-```
-
-```
-RS_AsRaster(geom: Geometry, raster: Raster, pixelType: String, allTouched:
Boolean)
-```
-
-```
-RS_AsRaster(geom: Geometry, raster: Raster, pixelType: String)
-```
-
-Since: `v1.5.0`
-
-!!!note
- The function doesn't support rasters that have any one of the following
properties:
- ```
- ScaleX < 0
- ScaleY > 0
- SkewX != 0
- SkewY != 0
- ```
- If a raster is provided with anyone of these properties then
IllegalArgumentException is thrown.
-
-For more information about ScaleX, ScaleY, SkewX, SkewY, please refer to the
[Affine Transformations](Raster-affine-transformation.md) section.
-
-SQL Example
-
-```sql
-SELECT RS_AsRaster(
- ST_GeomFromWKT('POLYGON((15 15, 18 20, 15 24, 24 25, 15 15))'),
- RS_MakeEmptyRaster(2, 255, 255, 3, -215, 2, -2, 0, 0, 4326),
- 'D', false, 255.0, 0d
- )
-```
-
-Output:
-
-```
-GridCoverage2D["g...
-```
-
-SQL Example
-
-```sql
-SELECT RS_AsRaster(
- ST_GeomFromWKT('POLYGON((15 15, 18 20, 15 24, 24 25, 15 15))'),
- RS_MakeEmptyRaster(2, 255, 255, 3, -215, 2, -2, 0, 0, 4326),
- 'D'
- )
-```
-
-Output:
-
-```
-GridCoverage2D["g...
-```
-
-```sql
-SELECT RS_AsRaster(
- ST_GeomFromWKT('POLYGON((15 15, 18 20, 15 24, 24 25, 15 15))'),
- RS_MakeEmptyRaster(2, 255, 255, 3, 215, 2, -2, 0, 0, 0),
- 'D', true, 255, 0d, false
-)
-```
-
-Output:
-
-```
-GridCoverage2D["g...
-```
diff --git a/docs/tutorial/raster.md b/docs/tutorial/raster.md
index 79f392c2f0..5b744b911e 100644
--- a/docs/tutorial/raster.md
+++ b/docs/tutorial/raster.md
@@ -142,68 +142,121 @@ Add the following line after creating the Sedona config.
If you already have a S
You can also register everything by passing `--conf
spark.sql.extensions=org.apache.sedona.sql.SedonaSqlExtensions` to
`spark-submit` or `spark-shell`.
-## Load data from files
+## Load GeoTiff data
-Assume we have a single raster data file called rasterData.tiff, [at
Path](https://github.com/apache/sedona/blob/0eae42576c2588fe278f75cef3b17fee600eac90/spark/common/src/test/resources/raster/raster_with_no_data/test5.tiff).
-
-Use the following code to load the data and create a raw Dataframe.
+The recommended way to load GeoTiff raster data is the `raster` data source.
It loads GeoTiff files and automatically splits them into smaller tiles. Each
tile becomes a row in the resulting DataFrame stored in `Raster` format.
=== "Scala"
```scala
- var rawDf = sedona.read.format("binaryFile").load(path_to_raster_data)
- rawDf.createOrReplaceTempView("rawdf")
- rawDf.show()
+ var rasterDf = sedona.read.format("raster").load("/some/path/*.tif")
+ rasterDf.createOrReplaceTempView("rasterDf")
+ rasterDf.show()
```
=== "Java"
```java
- Dataset<Row> rawDf =
sedona.read.format("binaryFile").load(path_to_raster_data)
- rawDf.createOrReplaceTempView("rawdf")
- rawDf.show()
+ Dataset<Row> rasterDf =
sedona.read().format("raster").load("/some/path/*.tif");
+ rasterDf.createOrReplaceTempView("rasterDf");
+ rasterDf.show();
```
=== "Python"
```python
- rawDf = sedona.read.format("binaryFile").load(path_to_raster_data)
- rawDf.createOrReplaceTempView("rawdf")
- rawDf.show()
+ rasterDf = sedona.read.format("raster").load("/some/path/*.tif")
+ rasterDf.createOrReplaceTempView("rasterDf")
+ rasterDf.show()
```
The output will look like this:
```
-| path| modificationTime|length| content|
-+--------------------+--------------------+------+--------------------+
-|file:/Download/ra...|2023-09-06 16:24:...|174803|[49 49 2A 00 08 0...|
++--------------------+---+---+----+
+| rast| x| y|name|
++--------------------+---+---+----+
+|GridCoverage2D["g...| 0| 0| ...|
+|GridCoverage2D["g...| 1| 0| ...|
+|GridCoverage2D["g...| 2| 0| ...|
+...
```
-For multiple raster data files use the following code to load the data [from
path](https://github.com/apache/sedona/blob/0eae42576c2588fe278f75cef3b17fee600eac90/spark/common/src/test/resources/raster/)
and create raw DataFrame.
+The output contains the following columns:
+
+- `rast`: The raster data in `Raster` format.
+- `x`: The 0-based x-coordinate of the tile. This column is only present when
retile is not disabled.
+- `y`: The 0-based y-coordinate of the tile. This column is only present when
retile is not disabled.
+- `name`: The name of the raster file.
+
+### Tiling options
+
+By default, tiling is enabled (`retile = true`) and the tile size is
determined by the GeoTiff file's internal tiling scheme — you do not need to
specify `tileWidth` or `tileHeight`. It is recommended to use [Cloud Optimized
GeoTIFF (COG)](https://www.cogeo.org/) format for raster data since they
usually organize pixel data as square tiles.
+
+You can optionally override the tile size, or disable tiling entirely:
+
+| Option | Default | Description |
+| :--- | :--- | :--- |
+| `retile` | `true` | Whether to enable tiling. Set to `false` to load the
entire raster as a single row. |
+| `tileWidth` | GeoTiff's internal tile width | Optional. Override the width
of each tile in pixels. |
+| `tileHeight` | Same as `tileWidth` if set, otherwise GeoTiff's internal tile
height | Optional. Override the height of each tile in pixels. |
+| `padWithNoData` | `false` | Pad the right and bottom tiles with NODATA
values if they are smaller than the specified tile size. |
+
+To override the tile size:
+
+=== "Python"
+ ```python
+ rasterDf = (
+ sedona.read.format("raster")
+ .option("tileWidth", "256")
+ .option("tileHeight", "256")
+ .load("/some/path/*.tif")
+ )
+ ```
!!!note
- The above code works too for loading multiple raster data files. If the
raster files are in separate directories and the option also makes sure that
only `.tif` or `.tiff` files are being loaded.
+ If the internal tiling scheme of raster data is not friendly for tiling,
the `raster` data source will throw an error, and you can disable automatic
tiling using `option("retile", "false")`, or specify the tile size manually to
workaround this issue. A better solution is to translate the raster data into
COG format using `gdal_translate` or other tools.
+
+### Loading raster files from directories
+
+The `raster` data source also works with Spark generic file source options,
such as `option("pathGlobFilter", "*.tif*")` and `option("recursiveFileLookup",
"true")`. For instance, you can load all the `.tif` files recursively in a
directory using:
+
+=== "Python"
+ ```python
+ rasterDf = (
+ sedona.read.format("raster")
+ .option("recursiveFileLookup", "true")
+ .option("pathGlobFilter", "*.tif*")
+ .load(path_to_raster_data_folder)
+ )
+ ```
+
+!!!tip
+ When the loaded path ends with `/`, the `raster` data source will look up
raster files in the directory and all its subdirectories recursively. This is
equivalent to specifying a path without trailing `/` and setting
`option("recursiveFileLookup", "true")`.
+
+!!!tip
+ After loading rasters, you can quickly visualize them in a Jupyter
notebook using `SedonaUtils.display_image(df)`. It automatically detects raster
columns and renders them as images. See [Raster visualizer
docs](../api/sql/Raster-Functions.md#raster-output) for details.
+
+## Load non-GeoTiff data (NetCDF, Arc Grid)
+
+For non-GeoTiff raster formats such as NetCDF or Arc Info ASCII Grid, use the
Spark built-in `binaryFile` data source together with Sedona raster
constructors.
+
+### Step 1: Load to a binary DataFrame
=== "Scala"
```scala
- var rawDf = sedona.read.format("binaryFile").option("recursiveFileLookup",
"true").option("pathGlobFilter", "*.tif*").load(path_to_raster_data_folder)
+ var rawDf = sedona.read.format("binaryFile").load(path_to_raster_data)
rawDf.createOrReplaceTempView("rawdf")
rawDf.show()
```
=== "Java"
```java
- Dataset<Row> rawDf =
sedona.read.format("binaryFile").option("recursiveFileLookup",
"true").option("pathGlobFilter", "*.tif*").load(path_to_raster_data_folder);
+ Dataset<Row> rawDf =
sedona.read().format("binaryFile").load(path_to_raster_data);
rawDf.createOrReplaceTempView("rawdf");
rawDf.show();
```
=== "Python"
```python
- rawDf = (
- sedona.read.format("binaryFile")
- .option("recursiveFileLookup", "true")
- .option("pathGlobFilter", "*.tif*")
- .load(path_to_raster_data_folder)
- )
+ rawDf = sedona.read.format("binaryFile").load(path_to_raster_data)
rawDf.createOrReplaceTempView("rawdf")
rawDf.show()
```
@@ -213,31 +266,48 @@ The output will look like this:
```
| path| modificationTime|length| content|
+--------------------+--------------------+------+--------------------+
-|file:/Download/ra...|2023-09-06 16:24:...|209199|[4D 4D 00 2A 00 0...|
-|file:/Download/ra...|2023-09-06 16:24:...|174803|[49 49 2A 00 08 0...|
|file:/Download/ra...|2023-09-06 16:24:...|174803|[49 49 2A 00 08 0...|
-|file:/Download/ra...|2023-09-06 16:24:...| 6619|[49 49 2A 00 08 0...|
```
-The content column in the raster table is still in the raw form, binary form.
+For multiple raster data files, you can load them recursively:
+
+=== "Python"
+ ```python
+ rawDf = (
+ sedona.read.format("binaryFile")
+ .option("recursiveFileLookup", "true")
+ .option("pathGlobFilter", "*.asc*")
+ .load(path_to_raster_data_folder)
+ )
+ rawDf.createOrReplaceTempView("rawdf")
+ rawDf.show()
+ ```
-## Create a Raster type column
+### Step 2: Create a Raster type column
-All raster operations in SedonaSQL require Raster type objects. Therefore,
this should be the next step after loading the data.
+All raster operations in SedonaSQL require Raster type objects. Use one of the
following constructors:
-### From Geotiff
+#### From GeoTiff
```sql
SELECT RS_FromGeoTiff(content) AS rast, modificationTime, length, path FROM
rawdf
```
-To verify this, use the following code to print the schema of the DataFrame:
+#### From Arc Grid
```sql
-rasterDf.printSchema()
+SELECT RS_FromArcInfoAsciiGrid(content) AS rast, modificationTime, length,
path FROM rawdf
```
-The output will be like this:
+#### From NetCDF
+
+See [RS_FromNetCDF](../api/sql/Raster-Constructors/RS_FromNetCDF.md) for
details on loading NetCDF files.
+
+To verify the raster column was created successfully:
+
+```python
+rasterDf.printSchema()
+```
```
root
@@ -247,14 +317,6 @@ root
|-- path: string (nullable = true)
```
-### From Arc Grid
-
-The raster data is loaded the same way as `tiff` file, but the raster data is
stored with the extension `.asc`, ASCII format. The following code creates a
Raster type objects from binary data:
-
-```sql
-SELECT RS_FromArcInfoAsciiGrid(content) AS rast, modificationTime, length,
path FROM rawdf
-```
-
## Raster's metadata
Sedona has a function to get the metadata for the raster, and also a function
to get the world file of the raster.
@@ -410,7 +472,7 @@ For more information please refer to [Map Algebra
API](../api/sql/Raster-map-alg
### Geometry As Raster
-Sedona allows you to rasterize a geometry by using
[RS_AsRaster](../api/sql/Raster-writer.md#rs_asraster).
+Sedona allows you to rasterize a geometry by using
[RS_AsRaster](../api/sql/Raster-Operators/RS_AsRaster.md).
```sql
SELECT RS_AsRaster(
@@ -503,47 +565,93 @@ Please refer to [Raster visualizer
docs](../api/sql/Raster-Functions.md#raster-o
## Save to permanent storage
-Sedona has APIs that can save an entire raster column to files in a specified
location. Before saving, the raster type column needs to be converted to a
binary format. Sedona provides several functions to convert a raster column
into a binary column suitable for file storage. Once in binary format, the
raster data can then be written to files on disk using the Sedona file storage
APIs.
-
-```sparksql
-rasterDf.write.format("raster").option("rasterField",
"raster").option("fileExtension",
".tiff").mode(SaveMode.Overwrite).save(dirPath)
-```
+Saving raster data is a two-step process: (1) convert the Raster column to
binary format using an `RS_AsXXX` function, and (2) write the binary DataFrame
to files using Sedona's `raster` data source writer.
-Sedona has a few writer functions that create the binary DataFrame necessary
for saving the raster images.
+### Step 1: Convert to binary format
-### As Arc Grid
+Choose one of the following output format functions:
-Use [RS_AsArcGrid](../api/sql/Raster-writer.md#rs_asarcgrid) to get the binary
Dataframe of the raster in Arc Grid format.
+| Function | Format | Description |
+| :--- | :--- | :--- |
+| [RS_AsGeoTiff](../api/sql/Raster-Output/RS_AsGeoTiff.md) | GeoTiff |
General-purpose raster format with optional compression |
+| [RS_AsCOG](../api/sql/Raster-Output/RS_AsCOG.md) | Cloud Optimized GeoTiff |
Ideal for cloud storage with efficient range-read access |
+| [RS_AsArcGrid](../api/sql/Raster-Output/RS_AsArcGrid.md) | Arc Grid |
ASCII-based format, single band only |
+| [RS_AsPNG](../api/sql/Raster-Output/RS_AsPNG.md) | PNG | Image format,
unsigned integer pixel types only |
```sql
-SELECT RS_AsArcGrid(raster)
+SELECT RS_AsGeoTiff(rast) AS raster_binary FROM rasterDf
```
-### As GeoTiff
+### Step 2: Write to files
-Use [RS_AsGeoTiff](../api/sql/Raster-writer.md#rs_asgeotiff) to get the binary
Dataframe of the raster in GeoTiff format.
+Use Sedona's built-in `raster` data source to write the binary DataFrame:
-```sql
-SELECT RS_AsGeoTiff(raster)
-```
+=== "Scala"
+ ```scala
+ rasterDf.withColumn("raster_binary", expr("RS_AsGeoTiff(rast)"))
+ .write.format("raster").mode("overwrite").save("my_raster_file")
+ ```
-### As Cloud Optimized GeoTiff
+=== "Python"
+ ```python
+ rasterDf.withColumn("raster_binary",
expr("RS_AsGeoTiff(rast)")).write.format(
+ "raster"
+ ).mode("overwrite").save("my_raster_file")
+ ```
-Use [RS_AsCOG](../api/sql/Raster-writer.md#rs_ascog) to get the binary
Dataframe of the raster in [Cloud Optimized GeoTiff](https://www.cogeo.org/)
(COG) format. COG is ideal for cloud-hosted raster data because it supports
efficient range-read access over HTTP.
+The writer data source options are:
-```sql
-SELECT RS_AsCOG(raster)
-```
+| Option | Default | Description |
+| :--- | :--- | :--- |
+| `rasterField` | Last `binary` column in the schema | The name of the binary
column to write. When the DataFrame has multiple binary columns, setting this
explicitly is strongly recommended. |
+| `fileExtension` | `.tiff` | File extension for output files (e.g., `.png`,
`.asc`). |
+| `pathField` | None | Column name containing the output file names. Only the
base name is used (directory components are stripped), and any existing file
extension is replaced by `fileExtension`. If not set, each file gets a random
UUID name. |
+| `useDirectCommitter` | `true` | If `true`, files are written directly to the
target location. If `false`, files are written to a temp location first.
Writing with `false` is slower, especially on object stores like S3. |
-### As PNG
+Example with all options:
-Use [RS_AsPNG](../api/sql/Raster-writer.md#rs_aspng) to get the binary
Dataframe of the raster in PNG format.
+=== "Scala"
+ ```scala
+ rasterDf.withColumn("raster_binary", expr("RS_AsGeoTiff(rast)"))
+ .write.format("raster")
+ .option("rasterField", "raster_binary")
+ .option("pathField", "name")
+ .option("fileExtension", ".tiff")
+ .mode("overwrite")
+ .save("my_raster_file")
+ ```
-```sql
-SELECT RS_AsPNG(raster)
+=== "Python"
+ ```python
+ rasterDf.withColumn("raster_binary",
expr("RS_AsGeoTiff(rast)")).write.format(
+ "raster"
+ ).option("rasterField", "raster_binary").option("pathField",
"name").option(
+ "fileExtension", ".tiff"
+ ).mode(
+ "overwrite"
+ ).save(
+ "my_raster_file"
+ )
+ ```
+
+The produced file structure will look like this:
+
+```
+my_raster_file
+- part-00000-6c7af016-c371-4564-886d-1690f3b27ca8-c000
+ - test1.tiff
+ - .test1.tiff.crc
+- part-00001-6c7af016-c371-4564-886d-1690f3b27ca8-c000
+ - test2.tiff
+ - .test2.tiff.crc
+- _SUCCESS
```
-Please refer to [Raster writer docs](../api/sql/Raster-writer.md) for more
details.
+To read the saved rasters back:
+
+```python
+rasterDf = sedona.read.format("raster").load("my_raster_file/*/*.tiff")
+```
## Collecting raster Dataframes and working with them locally in Python
@@ -561,9 +669,7 @@ The raster objects are represented as `SedonaRaster`
objects in Python, which ca
```python
df_raster = (
- sedona.read.format("binaryFile")
- .load("/path/to/raster.tif")
- .selectExpr("RS_FromGeoTiff(content) as rast")
+ sedona.read.format("raster").option("retile",
"false").load("/path/to/raster.tif")
)
rows = df_raster.collect()
raster = rows[0].rast
diff --git a/docs/usecases/ApacheSedonaRaster.ipynb
b/docs/usecases/ApacheSedonaRaster.ipynb
index 985d645bb1..a6d2645432 100644
--- a/docs/usecases/ApacheSedonaRaster.ipynb
+++ b/docs/usecases/ApacheSedonaRaster.ipynb
@@ -447,10 +447,7 @@
"cell_type": "markdown",
"id": "8ca7e862-45c9-4559-a2e1-4e044d6b5c84",
"metadata": {},
- "source": [
- "### Convert a geometry to raster (Rasterize a geometry)\n",
- "A geometry can be converted to a raster using
[RS_AsRaster](https://sedona.apache.org/1.5.0/api/sql/Raster-writer/#rs_asraster)"
- ]
+ "source": "### Convert a geometry to raster (Rasterize a geometry)\nA
geometry can be converted to a raster using
[RS_AsRaster](https://sedona.apache.org/latest/api/sql/Raster-Operators/RS_AsRaster/)"
},
{
"cell_type": "code",
diff --git a/mkdocs.yml b/mkdocs.yml
index 72c8a220fe..5da82fee88 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -85,9 +85,7 @@ nav:
- SedonaPyDeck: api/sql/Visualization-SedonaPyDeck.md
- SedonaKepler: api/sql/Visualization-SedonaKepler.md
- Raster data:
- - Raster loader: api/sql/Raster-loader.md
- Raster Functions: api/sql/Raster-Functions.md
- - Raster writer: api/sql/Raster-writer.md
- Raster map algebra: api/sql/Raster-map-algebra.md
- Raster affine transformation:
api/sql/Raster-affine-transformation.md
- Parameter: api/sql/Parameter.md