[sedona] branch release-1.4.0 updated: Fix examples

jiayu Sun, 19 Mar 2023 19:07:33 -0700

This is an automated email from the ASF dual-hosted git repository.

jiayu pushed a commit to branch release-1.4.0
in repository https://gitbox.apache.org/repos/asf/sedona.git



The following commit(s) were added to refs/heads/release-1.4.0 by this push:
     new a90c54fe Fix examples
a90c54fe is described below

commit a90c54fe01584bc9c0f980f88e70459e2706f30e
Author: Jia Yu <[email protected]>
AuthorDate: Sun Mar 19 19:07:22 2023 -0700

    Fix examples
---
 .github/workflows/example.yml |  8 +++++---
 docs/tutorial/flink/sql.md    | 29 +++++++++++++++++++++--------
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/.github/workflows/example.yml b/.github/workflows/example.yml
index eccc97dd..7e39f745 100644
--- a/.github/workflows/example.yml
+++ b/.github/workflows/example.yml
@@ -32,6 +32,8 @@ jobs:
           ~/.ivy2/cache
           ~/.sbt
         key: ${{ runner.os }}-sbt-${{ hashFiles('**/build.sbt') }}
-    - run: (cd examples/rdd-colocation-mining;sbt clean assembly;java -jar 
target/scala-2.12/*.jar)
-    - run: (cd examples/sql;sbt clean assembly;java -jar 
target/scala-2.12/*.jar)
-    - run: (cd examples/viz;sbt clean assembly;java -jar 
target/scala-2.12/*.jar)
+    - run: (cd examples/spark-rdd-colocation-mining;sbt clean assembly;java 
-jar target/scala-2.12/*.jar)
+    - run: (cd examples/spark-sql;sbt clean assembly;java -jar 
target/scala-2.12/*.jar)
+    - run: (cd examples/spark-viz;sbt clean assembly;java -jar 
target/scala-2.12/*.jar)
+    - run: (cd examples/spark-viz;sbt clean assembly;java -jar 
target/scala-2.12/*.jar)
+    - run: (cd examples/flink-sql;mvn clean install;java -jar 
target/sedona-flink-example-1.0.0.jar)
\ No newline at end of file
diff --git a/docs/tutorial/flink/sql.md b/docs/tutorial/flink/sql.md
index 58907b5c..720d1aca 100644
--- a/docs/tutorial/flink/sql.md
+++ b/docs/tutorial/flink/sql.md
@@ -207,7 +207,7 @@ geomTable.execute().print()
 
 ## Join query
 
-This equi-join leverages Flink's internal equi-join algorithm. You can opt to 
skip the Sedona refinement step  by sacrificing query accuracy.
+This equi-join leverages Flink's internal equi-join algorithm. You can opt to 
skip the Sedona refinement step  by sacrificing query accuracy. A running 
example is in [SQL example project](../../demo/).
 
 Please use the following steps:
 
@@ -216,16 +216,30 @@ Please use the following steps:
 Use [ST_S2CellIds](../../../api/flink/Function/#st_s2cellids) to generate cell 
IDs. Each geometry may produce one or more IDs.
 
 ```sql
-SELECT id, geom, name, explode(ST_S2CellIDs(geom, 15)) as cellId
+SELECT id, geom, name, ST_S2CellIDs(geom, 15) as idarray
 FROM lefts
 ```
 
 ```sql
-SELECT id, geom, name, explode(ST_S2CellIDs(geom, 15)) as cellId
+SELECT id, geom, name, ST_S2CellIDs(geom, 15) as idarray
 FROM rights
 ```
 
-### 2. Perform equi-join
+### 2. Explode id array
+
+The produced S2 ids are arrays of integers. We need to explode these Ids to 
multiple rows so later we can join two tables by ids.
+
+```
+SELECT id, geom, name, cellId
+FROM lefts CROSS JOIN UNNEST(lefts.idarray) AS tmpTbl1(cellId)
+```
+
+```
+SELECT id, geom, name, cellId
+FROM rights CROSS JOIN UNNEST(rights.idarray) AS tmpTbl2(cellId)
+```
+
+### 3. Perform equi-join
 
 Join the two tables by their S2 cellId
 
@@ -234,12 +248,11 @@ SELECT lcs.id as lcs_id, lcs.geom as lcs_geom, lcs.name 
as lcs_name, rcs.id as r
 FROM lcs JOIN rcs ON lcs.cellId = rcs.cellId
 ```
 
-
-### 3. Optional: Refine the result
+### 4. Optional: Refine the result
 
 Due to the nature of S2 Cellid, the equi-join results might have a few 
false-positives depending on the S2 level you choose. A smaller level indicates 
bigger cells, less exploded rows, but more false positives.
 
-To ensure the correctness, you can use one of the [Spatial 
Predicates](../../../api/Predicate/) to filter out them. Use this query instead 
of the query in Step 2.
+To ensure the correctness, you can use one of the [Spatial 
Predicates](../../../api/Predicate/) to filter out them. Use this query as the 
query in Step 3.
 
 ```sql
 SELECT lcs.id as lcs_id, lcs.geom as lcs_geom, lcs.name as lcs_name, rcs.id as 
rcs_id, rcs.geom as rcs_geom, rcs.name as rcs_name
@@ -252,7 +265,7 @@ As you see, compared to the query in Step 2, we added one 
more filter, which is
 !!!tip
        You can skip this step if you don't need 100% accuracy and want faster 
query speed.
 
-### 4. Optional: De-duplcate
+### 5. Optional: De-duplcate
 
 Due to the explode function used when we generate S2 Cell Ids, the resulting 
DataFrame may have several duplicate <lcs_geom, rcs_geom> matches. You can 
remove them by performing a GroupBy query.

[sedona] branch release-1.4.0 updated: Fix examples

Reply via email to