This is an automated email from the ASF dual-hosted git repository.
jiayu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-sedona.git
The following commit(s) were added to refs/heads/master by this push:
new 205d79b Update docs
205d79b is described below
commit 205d79bf4e564eb51e5df81af42548f88e24c5ac
Author: Jia Yu <[email protected]>
AuthorDate: Wed Jan 6 19:54:50 2021 -0800
Update docs
---
README.md | 2 +-
docs/download/compile.md | 11 ++-----
docs/download/overview.md | 26 ++++++++++++---
docs/download/project.md | 56 ++-------------------------------
docs/tutorial/GeoSpark-Runnable-DEMO.md | 22 +++++--------
docs/tutorial/geospark-core-python.md | 2 +-
docs/tutorial/geospark-sql-python.md | 2 +-
docs/tutorial/jupyter-notebook.md | 32 +++++++++++++++++++
docs/tutorial/rdd.md | 2 +-
docs/tutorial/sql.md | 2 +-
mkdocs.yml | 4 ++-
python-adapter/.gitignore | 1 +
12 files changed, 76 insertions(+), 86 deletions(-)
diff --git a/README.md b/README.md
index c914b08..72ef9fc 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
<img src="./sedona_logo.png" width="400">
-[](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Scala+and+Java+build%22)
[](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Python+build%22)
+[](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Scala+and+Java+build%22)
[](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Python+build%22)
 is a cluster computing system for processing
large-scale spatial data. Sedona extends Apache Spark / SparkSQL with a set of
out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that
efficiently load, process, and analyze large-scale spatial data across machines.
diff --git a/docs/download/compile.md b/docs/download/compile.md
index 25cbdfe..17c8fdd 100644
--- a/docs/download/compile.md
+++ b/docs/download/compile.md
@@ -1,6 +1,6 @@
# Compile and Publish Sedona
-[](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Scala+and+Java+build%22)
[](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Python+build%22)
+[](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Scala+and+Java+build%22)
[](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Python+build%22)
 to the
corresponding folders in `site/api/javadoc`
-
-#### Deploy to ASF domain
-
-1. Copy the generated Javadoc and Scaladoc to the correct location in
`docs/api/javadoc`
-
-2. Then deploy Javadoc and Scaladoc with the project website
+1. Copy the generated Javadoc (Scaladoc should already be there) to the
corresponding folders in `site/api/javadoc`
+2. Deploy Javadoc and Scaladoc with the project website
## Publish SNAPSHOTs
diff --git a/docs/download/overview.md b/docs/download/overview.md
index 3241380..d93917d 100644
--- a/docs/download/overview.md
+++ b/docs/download/overview.md
@@ -47,6 +47,8 @@ Apache Sedona extends pyspark functions which depends on
libraries:
* shapely
* attrs
+You need to install necessary packages if your system does not have them
installed. See ["packages" in our
Pipfile](https://github.com/apache/incubator-sedona/blob/master/python/Pipfile).
+
### Install sedona
* Installing from PyPi repositories
@@ -55,12 +57,12 @@ Apache Sedona extends pyspark functions which depends on
libraries:
pip install sedona
```
-* Installing from source
+* Installing from Sedona Python source
Clone Sedona GitHub source code and run the following command
```bash
-cd python-adapter
+cd python
python3 setup.py install
```
@@ -68,7 +70,7 @@ python3 setup.py install
Sedona Python needs one additional jar file call
`sedona-python-adapter-3.0_2.12-1.0.0-incubator.jar` to work properly. Please
make sure you use the correct version for Spark and Scala.
-You can get it using the following methods:
+You can get it using one of the following methods:
* Compile from the source within main project directory and copy it (in
`target` folder) to SPARK_HOME/jars/ folder ([more
details](/download/compile/#compile-scala-and-java-source-code))
@@ -82,4 +84,20 @@ You can get it using the following methods:
config("spark.kryo.registrator", SedonaKryoRegistrator.getName) .\
config('spark.jars.packages',
'org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubator').\
getOrCreate()
-```
\ No newline at end of file
+```
+
+### Setup environment variables
+
+If you manually copy the python-adapter jar to `SPARK_HOME/jars/` folder, you
need to setup two environment variables
+
+* SPARK_HOME. For example, run the command in your terminal
+
+```bash
+export SPARK_HOME=~/Downloads/spark-3.0.1-bin-hadoop2.7
+```
+
+* PYTHONPATH. For example, run the command in your terminal
+
+```bash
+export PYTHONPATH=$SPARK_HOME/python
+```
\ No newline at end of file
diff --git a/docs/download/project.md b/docs/download/project.md
index 75e3b9e..32796be 100644
--- a/docs/download/project.md
+++ b/docs/download/project.md
@@ -6,63 +6,11 @@ A self-contained project allows you to create multiple Scala
/ Java files and wr
1. To add Sedona as dependencies, please read [Sedona Maven Central
coordinates](GeoSpark-All-Modules-Maven-Central-Coordinates.md)
2. Use Sedona Template project to start: [Sedona Template
Project](/tutorial/GeoSpark-Runnable-DEMO/)
-3. Compile your project using SBT or Maven. Make sure you obtain the fat jar
which packages all dependencies.
+3. Compile your project using SBT. Make sure you obtain the fat jar which
packages all dependencies.
4. Submit your compiled fat jar to Spark cluster. Make sure you are in the
root folder of Spark distribution. Then run the following command:
```
./bin/spark-submit --master spark://YOUR-IP:7077 /Path/To/YourJar.jar
```
!!!note
- The detailed explanation of spark-submit is available on [Spark
website](https://spark.apache.org/docs/latest/submitting-applications.html).
-
-## How to use Sedona in an IDE
-
-### Select an IDE
-To develop a complex project, we suggest you use IntelliJ IDEA. It supports
JVM languages, Scala and Java, and many dependency management systems, Maven
and SBT.
-
-Eclipse is also fine if you just want to use Java and Maven.
-
-### Open Sedona template project
-Select a proper project you want from [Sedona Template
Project](/tutorial/GeoSpark-Runnable-DEMO/). In this tutorial, we use Sedona
SQL Scala project as an example.
-
-Open the folder that contains `build.sbt` file in your IDE. The IDE may take a
while to index dependencies and source code.
-
-### Try Sedona SQL functions
-In your IDE, run ScalaExample.scala file.
-
-You don't need to change anything in this file. The IDE will run all SQL
queries in this example in local mode.
-
-### Package the project
-To run this project in cluster mode, you have to package this project to a JAR
and then run it using `spark-submit` command.
-
-Before packaging this project, you always need to check two places:
-
-* Remove the hardcoded Master IP `master("local[*]")`. This hardcoded IP is
only needed when you run this project in an IDE.
-```scala
-var sparkSession:SparkSession = SparkSession.builder()
- .config("spark.serializer",classOf[KryoSerializer].getName)
-
.config("spark.kryo.registrator",classOf[SedonaVizKryoRegistrator].getName)
- .master("local[*]")
- .appName("SedonaSQL-demo").getOrCreate()
-```
-
-* In build.sbt (or POM.xml), set Spark dependency scope to `provided` instead
of `compile`. `compile` is only needed when you run this project in an IDE.
-
-!!!warning
- Forgetting to change the package scope will lead to a very big fat JAR
and dependency conflicts when call `spark-submit`. For more details, please
visit [Maven Dependency
Scope](https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope).
-
-* Make sure your downloaded Spark binary distribution is the same version with
the Spark used in your `build.sbt` or `POM.xml`.
-
-### Submit the compiled jar
-1. Go to `./target/scala-2.11` folder and find a jar called
`SedonaSQLTemplate-0.1.0.jar`. Note that, this JAR normally is larger than 1MB.
(If you use POM.xml, the jar is under `./target` folder)
-2. Submit this JAR using `spark-submit`.
-
-* Local mode:
-```
-./bin/spark-submit /Path/To/YourJar.jar
-```
-
-* Cluster mode:
-```
-./bin/spark-submit --master spark://YOUR-IP:7077 /Path/To/YourJar.jar
-```
\ No newline at end of file
+ The detailed explanation of spark-submit is available on [Spark
website](https://spark.apache.org/docs/latest/submitting-applications.html).
\ No newline at end of file
diff --git a/docs/tutorial/GeoSpark-Runnable-DEMO.md
b/docs/tutorial/GeoSpark-Runnable-DEMO.md
index 76e0902..4c81619 100644
--- a/docs/tutorial/GeoSpark-Runnable-DEMO.md
+++ b/docs/tutorial/GeoSpark-Runnable-DEMO.md
@@ -1,36 +1,30 @@
-## Python Jupyter Notebook Examples
-
-[Sedona
core](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaCore.ipynb)
-
-[Sedona
SQL](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaSQL.ipynb)
-
-## Scala and Java Examples
+# Scala and Java Examples
[Scala and Java
Examples](https://github.com/apache/incubator-sedona/tree/master/examples)
contains template projects for RDD, SQL and Viz. The template projects have
been configured properly.
Note that, although the template projects are written in Scala, the same APIs
can be used in Java as well.
-### Folder structure
+## Folder structure
The folder structure of this repository is as follows.
* rdd-colocation-mining: a scala template shows how to use Sedona RDD API in
Spatial Data Mining
* sql: a scala template shows how to use Sedona DataFrame and SQL API
* viz: a scala template shows how to use Sedona Viz RDD and SQL API
-### Compile and package
+## Compile and package
-#### Prerequisites
+### Prerequisites
Please make sure you have the following software installed on your local
machine:
* For Scala: Scala 2.12, SBT
* For Java: JDK 1.8, Apache Maven 3
-#### Compile
+### Compile
Run a terminal command `sbt assembly` within the folder of each template
-#### Submit your fat jar to Spark
+### Submit your fat jar to Spark
After running the command mentioned above, you are able to see a fat jar in
`./target` folder. Please take it and use `./bin/spark-submit` to submit this
jar.
To run the jar in this way, you need to:
@@ -41,8 +35,8 @@ To run the jar in this way, you need to:
* Make sure the dependency versions in build.sbt are consistent with your
Spark version.
-### Run template projects locally
+## Run template projects locally
We highly suggest you use IDEs to run template projects on your local machine.
For Scala, we recommend IntelliJ IDEA with Scala plug-in. For Java, we
recommend IntelliJ IDEA and Eclipse. With the help of IDEs, **you don't have to
prepare anything** (even don't need to download and set up Spark!). As long as
you have Scala and Java, everything works properly!
-#### Scala
+### Scala
Import the Scala template project as SBT project. Then run the Main file in
this project.
\ No newline at end of file
diff --git a/docs/tutorial/geospark-core-python.md
b/docs/tutorial/geospark-core-python.md
index 53f1d3d..2f2869b 100644
--- a/docs/tutorial/geospark-core-python.md
+++ b/docs/tutorial/geospark-core-python.md
@@ -33,7 +33,7 @@ GeoData has one method to get user data.
<li> getUserData() -> str </li>
!!!note
- This tutorial is based on [Sedona Core Jupyter Notebook
example](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaCore.ipynb)
+ This tutorial is based on [Sedona Core Jupyter Notebook
example](../jupyter-notebook)
## Installation
diff --git a/docs/tutorial/geospark-sql-python.md
b/docs/tutorial/geospark-sql-python.md
index 8520e29..37d0f75 100644
--- a/docs/tutorial/geospark-sql-python.md
+++ b/docs/tutorial/geospark-sql-python.md
@@ -14,7 +14,7 @@ spark.sql("YOUR_SQL")
```
!!!note
- This tutorial is based on [Sedona SQL Jupyter Notebook
example](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaSQL.ipynb)
+ This tutorial is based on [Sedona SQL Jupyter Notebook
example](../jupyter-notebook)
## Installation
diff --git a/docs/tutorial/jupyter-notebook.md
b/docs/tutorial/jupyter-notebook.md
new file mode 100644
index 0000000..c69b739
--- /dev/null
+++ b/docs/tutorial/jupyter-notebook.md
@@ -0,0 +1,32 @@
+# Python Jupyter Notebook Examples
+
+Sedona Python provides two Jupyter Notebook examples: [Sedona
core](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaCore.ipynb)
and [Sedona
SQL](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaSQL.ipynb)
+
+
+Please use the following steps to run Jupyter notebook with Pipenv
+
+1. Clone Sedona GitHub repo or download the source code
+2. Install Sedona Python from PyPi or GitHub source: Read [Install Sedona
Python](/download/overview/#install-sedona) to learn.
+3. Prepare python-adapter jar: Read [Install Sedona
Python](/download/overview/#prepare-python-adapter-jar) to learn.
+4. Setup pipenv python version. For Spark 3.0, Sedona supports 3.7 - 3.9
+```bash
+cd python
+pipenv --python 3.8
+```
+5. Install dependencies
+```bash
+cd python
+pipenv install
+```
+6. Install jupyter notebook kernel for pipenv
+```bash
+pipenv install ipykernel
+pipenv shell
+```
+7. In the pipenv shell, do
+```bash
+python -m ipykernel install --user --name=my-virtualenv-name
+```
+8. Setup environment variables `SPARK_HOME` and `PYTHONPATH` if you didn't do
it before. Read [Install Sedona
Python](/download/overview/#setup-environment-variables) to learn.
+9. Launch jupyter notebook: `jupyter notebook`
+10. Select Sedona notebook. In your notebook, Kernel -> Change Kernel. Your
kernel should now be an option.
\ No newline at end of file
diff --git a/docs/tutorial/rdd.md b/docs/tutorial/rdd.md
index cbc2dc8..8ddb878 100644
--- a/docs/tutorial/rdd.md
+++ b/docs/tutorial/rdd.md
@@ -8,7 +8,7 @@ The page outlines the steps to create Spatial RDDs and run
spatial queries using
3. Add the dependencies in build.sbt or pom.xml.
!!!note
- To enjoy the full functions of Sedona, we suggest you include ==the
full dependencies==: [Apache Spark
core](https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11),
[Apache
SparkSQL](https://mvnrepository.com/artifact/org.apache.spark/spark-sql),
Sedona-core, Sedona-SQL, Sedona-Viz. Please see [RDD example
project](https://github.com/apache/incubator-sedona/tree/master/examples/rdd-colocation-mining)
+ To enjoy the full functions of Sedona, we suggest you include ==the
full dependencies==: [Apache Spark
core](https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11),
[Apache
SparkSQL](https://mvnrepository.com/artifact/org.apache.spark/spark-sql),
Sedona-core, Sedona-SQL, Sedona-Viz. Please see [RDD example
project](/tutorial/GeoSpark-Runnable-DEMO/)
## Initiate SparkContext
diff --git a/docs/tutorial/sql.md b/docs/tutorial/sql.md
index 0939ef2..035d12f 100644
--- a/docs/tutorial/sql.md
+++ b/docs/tutorial/sql.md
@@ -14,7 +14,7 @@ Detailed SedonaSQL APIs are available here: [SedonaSQL
API](../api/sql/GeoSparkS
3. Add the dependencies in build.sbt or pom.xml.
!!!note
- To enjoy the full functions of Sedona, we suggest you include ==the
full dependencies==: [Apache Spark
core](https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11),
[Apache
SparkSQL](https://mvnrepository.com/artifact/org.apache.spark/spark-sql),
Sedona-core, Sedona-SQL, Sedona-Viz. Please see [SQL example
project](https://github.com/apache/incubator-sedona/tree/master/examples/sql)
+ To enjoy the full functions of Sedona, we suggest you include ==the
full dependencies==: [Apache Spark
core](https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11),
[Apache
SparkSQL](https://mvnrepository.com/artifact/org.apache.spark/spark-sql),
Sedona-core, Sedona-SQL, Sedona-Viz. Please see [SQL example
project](/tutorial/GeoSpark-Runnable-DEMO/)
## Initiate SparkSession
diff --git a/mkdocs.yml b/mkdocs.yml
index dd1d478..3b9e2a8 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -22,7 +22,9 @@ nav:
- Map visualization SQL app:
- Scala/Java: tutorial/viz.md
- Use Apache Zeppelin: tutorial/zeppelin.md
- - Examples: tutorial/GeoSpark-Runnable-DEMO.md
+ - Examples:
+ - Scala/Java: tutorial/GeoSpark-Runnable-DEMO.md
+ - Python: tutorial/jupyter-notebook.md
- Performance tuning:
- Benchmark: tutorial/benchmark.md
- Tune RDD application:
tutorial/Advanced-Tutorial-Tune-your-GeoSpark-Application.md
diff --git a/python-adapter/.gitignore b/python-adapter/.gitignore
new file mode 100644
index 0000000..b83d222
--- /dev/null
+++ b/python-adapter/.gitignore
@@ -0,0 +1 @@
+/target/