This is an automated email from the ASF dual-hosted git repository.
jiayu pushed a commit to branch release-1.4.0
in repository https://gitbox.apache.org/repos/asf/sedona.git
The following commit(s) were added to refs/heads/release-1.4.0 by this push:
new a90c54fe Fix examples
a90c54fe is described below
commit a90c54fe01584bc9c0f980f88e70459e2706f30e
Author: Jia Yu <[email protected]>
AuthorDate: Sun Mar 19 19:07:22 2023 -0700
Fix examples
---
.github/workflows/example.yml | 8 +++++---
docs/tutorial/flink/sql.md | 29 +++++++++++++++++++++--------
2 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/.github/workflows/example.yml b/.github/workflows/example.yml
index eccc97dd..7e39f745 100644
--- a/.github/workflows/example.yml
+++ b/.github/workflows/example.yml
@@ -32,6 +32,8 @@ jobs:
~/.ivy2/cache
~/.sbt
key: ${{ runner.os }}-sbt-${{ hashFiles('**/build.sbt') }}
- - run: (cd examples/rdd-colocation-mining;sbt clean assembly;java -jar
target/scala-2.12/*.jar)
- - run: (cd examples/sql;sbt clean assembly;java -jar
target/scala-2.12/*.jar)
- - run: (cd examples/viz;sbt clean assembly;java -jar
target/scala-2.12/*.jar)
+ - run: (cd examples/spark-rdd-colocation-mining;sbt clean assembly;java
-jar target/scala-2.12/*.jar)
+ - run: (cd examples/spark-sql;sbt clean assembly;java -jar
target/scala-2.12/*.jar)
+ - run: (cd examples/spark-viz;sbt clean assembly;java -jar
target/scala-2.12/*.jar)
+ - run: (cd examples/spark-viz;sbt clean assembly;java -jar
target/scala-2.12/*.jar)
+ - run: (cd examples/flink-sql;mvn clean install;java -jar
target/sedona-flink-example-1.0.0.jar)
\ No newline at end of file
diff --git a/docs/tutorial/flink/sql.md b/docs/tutorial/flink/sql.md
index 58907b5c..720d1aca 100644
--- a/docs/tutorial/flink/sql.md
+++ b/docs/tutorial/flink/sql.md
@@ -207,7 +207,7 @@ geomTable.execute().print()
## Join query
-This equi-join leverages Flink's internal equi-join algorithm. You can opt to
skip the Sedona refinement step by sacrificing query accuracy.
+This equi-join leverages Flink's internal equi-join algorithm. You can opt to
skip the Sedona refinement step by sacrificing query accuracy. A running
example is in [SQL example project](../../demo/).
Please use the following steps:
@@ -216,16 +216,30 @@ Please use the following steps:
Use [ST_S2CellIds](../../../api/flink/Function/#st_s2cellids) to generate cell
IDs. Each geometry may produce one or more IDs.
```sql
-SELECT id, geom, name, explode(ST_S2CellIDs(geom, 15)) as cellId
+SELECT id, geom, name, ST_S2CellIDs(geom, 15) as idarray
FROM lefts
```
```sql
-SELECT id, geom, name, explode(ST_S2CellIDs(geom, 15)) as cellId
+SELECT id, geom, name, ST_S2CellIDs(geom, 15) as idarray
FROM rights
```
-### 2. Perform equi-join
+### 2. Explode id array
+
+The produced S2 ids are arrays of integers. We need to explode these Ids to
multiple rows so later we can join two tables by ids.
+
+```
+SELECT id, geom, name, cellId
+FROM lefts CROSS JOIN UNNEST(lefts.idarray) AS tmpTbl1(cellId)
+```
+
+```
+SELECT id, geom, name, cellId
+FROM rights CROSS JOIN UNNEST(rights.idarray) AS tmpTbl2(cellId)
+```
+
+### 3. Perform equi-join
Join the two tables by their S2 cellId
@@ -234,12 +248,11 @@ SELECT lcs.id as lcs_id, lcs.geom as lcs_geom, lcs.name
as lcs_name, rcs.id as r
FROM lcs JOIN rcs ON lcs.cellId = rcs.cellId
```
-
-### 3. Optional: Refine the result
+### 4. Optional: Refine the result
Due to the nature of S2 Cellid, the equi-join results might have a few
false-positives depending on the S2 level you choose. A smaller level indicates
bigger cells, less exploded rows, but more false positives.
-To ensure the correctness, you can use one of the [Spatial
Predicates](../../../api/Predicate/) to filter out them. Use this query instead
of the query in Step 2.
+To ensure the correctness, you can use one of the [Spatial
Predicates](../../../api/Predicate/) to filter out them. Use this query as the
query in Step 3.
```sql
SELECT lcs.id as lcs_id, lcs.geom as lcs_geom, lcs.name as lcs_name, rcs.id as
rcs_id, rcs.geom as rcs_geom, rcs.name as rcs_name
@@ -252,7 +265,7 @@ As you see, compared to the query in Step 2, we added one
more filter, which is
!!!tip
You can skip this step if you don't need 100% accuracy and want faster
query speed.
-### 4. Optional: De-duplcate
+### 5. Optional: De-duplcate
Due to the explode function used when we generate S2 Cell Ids, the resulting
DataFrame may have several duplicate <lcs_geom, rcs_geom> matches. You can
remove them by performing a GroupBy query.