This is an automated email from the ASF dual-hosted git repository. jiayu pushed a commit to branch branch-1.5.2 in repository https://gitbox.apache.org/repos/asf/sedona.git
commit 1d1608ac950c4f7c786c4766aa5c65696ee67bfa Author: Jia Yu <[email protected]> AuthorDate: Tue Apr 30 09:39:40 2024 -0700 [DOCS] Update Microsoft Fabric tutorial with Spark properties (#1388) * Add the spark properties * Refactor the doc * Update docs/setup/fabric.md Co-authored-by: John Bampton <[email protected]> --------- Co-authored-by: John Bampton <[email protected]> --- docs/image/fabric/{fabric-9.png => fabric-10.png} | Bin docs/image/fabric/fabric-5.png | Bin 103507 -> 192084 bytes docs/image/fabric/fabric-6.png | Bin 189504 -> 103507 bytes docs/image/fabric/fabric-7.png | Bin 123955 -> 189504 bytes docs/image/fabric/fabric-8.png | Bin 146759 -> 123955 bytes docs/image/fabric/fabric-9.png | Bin 150114 -> 146759 bytes docs/setup/fabric.md | 82 +++++++++++++++------- 7 files changed, 55 insertions(+), 27 deletions(-) diff --git a/docs/image/fabric/fabric-9.png b/docs/image/fabric/fabric-10.png similarity index 100% copy from docs/image/fabric/fabric-9.png copy to docs/image/fabric/fabric-10.png diff --git a/docs/image/fabric/fabric-5.png b/docs/image/fabric/fabric-5.png index f4f3b7bc0..0c1127a55 100644 Binary files a/docs/image/fabric/fabric-5.png and b/docs/image/fabric/fabric-5.png differ diff --git a/docs/image/fabric/fabric-6.png b/docs/image/fabric/fabric-6.png index 00b250cf2..f4f3b7bc0 100644 Binary files a/docs/image/fabric/fabric-6.png and b/docs/image/fabric/fabric-6.png differ diff --git a/docs/image/fabric/fabric-7.png b/docs/image/fabric/fabric-7.png index 2162e33b1..00b250cf2 100644 Binary files a/docs/image/fabric/fabric-7.png and b/docs/image/fabric/fabric-7.png differ diff --git a/docs/image/fabric/fabric-8.png b/docs/image/fabric/fabric-8.png index eb0ac3c2c..2162e33b1 100644 Binary files a/docs/image/fabric/fabric-8.png and b/docs/image/fabric/fabric-8.png differ diff --git a/docs/image/fabric/fabric-9.png b/docs/image/fabric/fabric-9.png index 45effa93a..eb0ac3c2c 100644 Binary files a/docs/image/fabric/fabric-9.png and b/docs/image/fabric/fabric-9.png differ diff --git a/docs/setup/fabric.md b/docs/setup/fabric.md index b7e0ac322..aa5ca6ee6 100644 --- a/docs/setup/fabric.md +++ b/docs/setup/fabric.md @@ -4,68 +4,67 @@ This tutorial will guide you through the process of installing Sedona on Microso Go to the [Microsoft Fabric portal](https://app.fabric.microsoft.com/) and choose the `Data Engineering` option. - + ## Step 2: Create a Microsoft Fabric Data Engineering environment On the left side, click `My Workspace` and then click `+ New` to create a new `Environment`. Let's name it `ApacheSedona`. - + ## Step 3: Select the Apache Spark version In the `Environment` page, click the `Home` tab and select the appropriate version of Apache Spark. You will need this version to install the correct version of Apache Sedona. - + ## Step 4: Install the Sedona Python package In the `Environment` page, click the `Public libraries` tab and then type in `apache-sedona`. Please select the appropriate version of Apache Sedona. The source is `PyPI`. - + -## Step 5: Save and publish the environment +## Step 5: Set Spark properties -Click the `Save` button and then click the `Publish` button to save and publish the environment. This will create the environment with the Apache Sedona Python package installed. The publishing process will take about 10 minutes. - - +In the `Environment` page, click the `Spark properties` tab, then create the following 3 properties: -## Step 6: Download Sedona jars +- `spark.sql.extensions`: `org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions` +- `spark.serializer`: `org.apache.spark.serializer.KryoSerializer` +- `spark.kryo.registrator`: `org.apache.sedona.core.serde.SedonaKryoRegistrator` -1. Learn the Sedona jars you need from our [Sedona maven coordinate](maven-coordinates.md) -2. Download the `sedona-spark-shaded` jars from [Maven Central](https://search.maven.org/search?q=g:org.apache.sedona). Please pay attention to the Spark version and Scala version of the jars. If you select Spark 3.4 in the Fabric environment, you should download the Sedona jars with Spark 3.4 and Scala 2.12 and the jar name should be like `sedona-spark-shaded-3.4_2.12-1.5.1.jar`. -3. Download the `geotools-wrapper` jars from [Maven Central](https://search.maven.org/search?q=g:org.datasyslab). Please pay attention to the Sedona versions of the jar. If you select Sedona 1.5.1, you should download the `geotools-wrapper` jar with version 1.5.1 and the jar name should be like `geotools-wrapper-1.5.1-28.2.jar`. + -## Step 7: Upload Sedona jars to the Fabric environment LakeHouse storage +## Step 6: Save and publish the environment -In the notebook page, choose the `Explorer` and click the `LakeHouses` option. If you don't have a LakeHouse, you can create one. Then choose `Files` and upload the 2 jars you downloaded in the previous step. +Click the `Save` button and then click the `Publish` button to save and publish the environment. This will create the environment with the Apache Sedona Python package installed. The publishing process will take about 10 minutes. -After the upload, you should be able to see the 2 jars in the LakeHouse storage. Then please copy the `ABFS` paths of the 2 jars. In this example, the paths are + -```angular2html -abfss://9e9d4196-870a-4901-8fa5-e24841492...@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar +## Step 7: Find the download links of Sedona jars -abfss://9e9d4196-870a-4901-8fa5-e24841492...@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/geotools-wrapper-1.5.1-28.2.jar -``` +1. Learn the Sedona jars you need from our [Sedona maven coordinate](maven-coordinates.md) +2. Find the `sedona-spark-shaded` jar from [Maven Central](https://search.maven.org/search?q=g:org.apache.sedona). Please pay attention to the Spark version and Scala version of the jars. If you select Spark 3.4 in the Fabric environment, you should download the Sedona jars with Spark 3.4 and Scala 2.12 and the jar name should be like `sedona-spark-shaded-3.4_2.12-1.5.1.jar`. +3. Find the `geotools-wrapper` jar from [Maven Central](https://search.maven.org/search?q=g:org.datasyslab). Please pay attention to the Sedona versions of the jar. If you select Sedona 1.5.1, you should download the `geotools-wrapper` jar with version 1.5.1 and the jar name should be like `geotools-wrapper-1.5.1-28.2.jar`. - +The download links are like: - +``` +https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.4_2.12/1.5.1/sedona-spark-shaded-3.4_2.12-1.5.1.jar +https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.5.1-28.2/geotools-wrapper-1.5.1-28.2.jar +``` ## Step 8: Start the notebook with the Sedona environment and install the jars In the notebook page, select the `ApacheSedona` environment you created before. - + -In the notebook, you can install the jars by running the following code. Please replace the `spark.jars` with the `ABFS` paths of the 2 jars you uploaded in the previous step. +In the notebook, you can install the jars by running the following code. Please replace the `jars` with the download links of the 2 jars from the previous step. ```python %%configure -f { - "conf": { - "spark.jars": "abfss://XXX/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar,abfss://XXX/Files/geotools-wrapper-1.5.1-28.2.jar", - } + "jars": ["https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.5.1-28.2/geotools-wrapper-1.5.1-28.2.jar", "https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.4_2.12/1.5.1/sedona-spark-shaded-3.4_2.12-1.5.1.jar"] } ``` @@ -85,4 +84,33 @@ sedona.sql("SELECT ST_GeomFromEWKT('SRID=4269;POINT(40.7128 -74.0060)')").show() If you see the output of the point, then the installation is successful. - + + +## Optional: manually upload Sedona jars to the Fabric environment LakeHouse storage + +If your cluster has no internet access or you want to skip the slow on-the-fly download, you can manually upload the Sedona jars to the Fabric environment LakeHouse storage. + +In the notebook page, choose the `Explorer` and click the `LakeHouses` option. If you don't have a LakeHouse, you can create one. Then choose `Files` and upload the 2 jars you downloaded in the previous step. + +After the upload, you should be able to see the 2 jars in the LakeHouse storage. Then please copy the `ABFS` paths of the 2 jars. In this example, the paths are + +```angular2html +abfss://9e9d4196-870a-4901-8fa5-e24841492...@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar + +abfss://9e9d4196-870a-4901-8fa5-e24841492...@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/geotools-wrapper-1.5.1-28.2.jar +``` + + + + + +If you use this option, the config files in your notebook should be + +```python +%%configure -f +{ + "conf": { + "spark.jars": "abfss://XXX/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar,abfss://XXX/Files/geotools-wrapper-1.5.1-28.2.jar", + } +} +```
