[incubator-sedona] branch master updated: Update docs

jiayu Wed, 06 Jan 2021 19:55:17 -0800

This is an automated email from the ASF dual-hosted git repository.

jiayu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-sedona.git



The following commit(s) were added to refs/heads/master by this push:
     new 205d79b  Update docs
205d79b is described below

commit 205d79bf4e564eb51e5df81af42548f88e24c5ac
Author: Jia Yu <[email protected]>
AuthorDate: Wed Jan 6 19:54:50 2021 -0800

    Update docs
---
 README.md                               |  2 +-
 docs/download/compile.md                | 11 ++-----
 docs/download/overview.md               | 26 ++++++++++++---
 docs/download/project.md                | 56 ++-------------------------------
 docs/tutorial/GeoSpark-Runnable-DEMO.md | 22 +++++--------
 docs/tutorial/geospark-core-python.md   |  2 +-
 docs/tutorial/geospark-sql-python.md    |  2 +-
 docs/tutorial/jupyter-notebook.md       | 32 +++++++++++++++++++
 docs/tutorial/rdd.md                    |  2 +-
 docs/tutorial/sql.md                    |  2 +-
 mkdocs.yml                              |  4 ++-
 python-adapter/.gitignore               |  1 +
 12 files changed, 76 insertions(+), 86 deletions(-)

diff --git a/README.md b/README.md
index c914b08..72ef9fc 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 <img src="./sedona_logo.png" width="400">
 
-[![Scala and Java 
build](https://github.com/apache/incubator-sedona/workflows/Scala%20and%20Java%20build/badge.svg)](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Scala+and+Java+build%22)
 [![Python 
build](https://github.com/apache/incubator-sedona/workflows/Python%20build/badge.svg)](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Python+build%22)
+[![Scala and Java 
build](https://github.com/apache/incubator-sedona/workflows/Scala%20and%20Java%20build/badge.svg)](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Scala+and+Java+build%22)
 [![Python 
build](https://github.com/apache/incubator-sedona/workflows/Python%20build/badge.svg)](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Python+build%22)
 ![Example project 
build](https://github.com/apache/incubator-sedona/workflows/Example%20projec 
[...]
 
 Apache Sedona™(incubating) is a cluster computing system for processing 
large-scale spatial data. Sedona extends Apache Spark / SparkSQL with a set of 
out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that 
efficiently load, process, and analyze large-scale spatial data across machines.
 
diff --git a/docs/download/compile.md b/docs/download/compile.md
index 25cbdfe..17c8fdd 100644
--- a/docs/download/compile.md
+++ b/docs/download/compile.md
@@ -1,6 +1,6 @@
 # Compile and Publish Sedona
 
-[![Scala and Java 
build](https://github.com/apache/incubator-sedona/workflows/Scala%20and%20Java%20build/badge.svg)](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Scala+and+Java+build%22)
 [![Python 
build](https://github.com/apache/incubator-sedona/workflows/Python%20build/badge.svg)](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Python+build%22)
+[![Scala and Java 
build](https://github.com/apache/incubator-sedona/workflows/Scala%20and%20Java%20build/badge.svg)](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Scala+and+Java+build%22)
 [![Python 
build](https://github.com/apache/incubator-sedona/workflows/Python%20build/badge.svg)](https://github.com/apache/incubator-sedona/actions?query=workflow%3A%22Python+build%22)
 ![Example project 
build](https://github.com/apache/incubator-sedona/workflows/Example%20projec 
[...]
 
 
 ## Compile Scala and Java source code
@@ -125,13 +125,8 @@ You should first compile the entire docs using `mkdocs 
build` to get the `site`
 
 #### Copy
 
-Copy the generated Javadoc (Scaladoc should already be there) to the 
corresponding folders in `site/api/javadoc`
-
-#### Deploy to ASF domain
-
-1. Copy the generated Javadoc and Scaladoc to the correct location in 
`docs/api/javadoc`
-
-2. Then deploy Javadoc and Scaladoc with the project website
+1. Copy the generated Javadoc (Scaladoc should already be there) to the 
corresponding folders in `site/api/javadoc`
+2. Deploy Javadoc and Scaladoc with the project website
 
 ## Publish SNAPSHOTs
 
diff --git a/docs/download/overview.md b/docs/download/overview.md
index 3241380..d93917d 100644
--- a/docs/download/overview.md
+++ b/docs/download/overview.md
@@ -47,6 +47,8 @@ Apache Sedona extends pyspark functions which depends on 
libraries:
 * shapely
 * attrs
 
+You need to install necessary packages if your system does not have them 
installed. See ["packages" in our 
Pipfile](https://github.com/apache/incubator-sedona/blob/master/python/Pipfile).
+
 ### Install sedona
 
 * Installing from PyPi repositories
@@ -55,12 +57,12 @@ Apache Sedona extends pyspark functions which depends on 
libraries:
 pip install sedona
 ```
 
-* Installing from source
+* Installing from Sedona Python source
 
 Clone Sedona GitHub source code and run the following command
 
 ```bash
-cd python-adapter
+cd python
 python3 setup.py install
 ```
 
@@ -68,7 +70,7 @@ python3 setup.py install
 
 Sedona Python needs one additional jar file call 
`sedona-python-adapter-3.0_2.12-1.0.0-incubator.jar` to work properly. Please 
make sure you use the correct version for Spark and Scala.
 
-You can get it using the following methods:
+You can get it using one of the following methods:
 
 * Compile from the source within main project directory and copy it (in 
`target` folder) to SPARK_HOME/jars/ folder ([more 
details](/download/compile/#compile-scala-and-java-source-code))
 
@@ -82,4 +84,20 @@ You can get it using the following methods:
         config("spark.kryo.registrator", SedonaKryoRegistrator.getName) .\
         config('spark.jars.packages', 
'org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubator').\
         getOrCreate()
-```
\ No newline at end of file
+```
+
+### Setup environment variables
+
+If you manually copy the python-adapter jar to `SPARK_HOME/jars/` folder, you 
need to setup two environment variables
+
+* SPARK_HOME. For example, run the command in your terminal
+
+```bash
+export SPARK_HOME=~/Downloads/spark-3.0.1-bin-hadoop2.7
+```
+
+* PYTHONPATH. For example, run the command in your terminal
+
+```bash
+export PYTHONPATH=$SPARK_HOME/python
+``` 
\ No newline at end of file
diff --git a/docs/download/project.md b/docs/download/project.md
index 75e3b9e..32796be 100644
--- a/docs/download/project.md
+++ b/docs/download/project.md
@@ -6,63 +6,11 @@ A self-contained project allows you to create multiple Scala 
/ Java files and wr
 
 1. To add Sedona as dependencies, please read [Sedona Maven Central 
coordinates](GeoSpark-All-Modules-Maven-Central-Coordinates.md)
 2. Use Sedona Template project to start: [Sedona Template 
Project](/tutorial/GeoSpark-Runnable-DEMO/)
-3. Compile your project using SBT or Maven. Make sure you obtain the fat jar 
which packages all dependencies.
+3. Compile your project using SBT. Make sure you obtain the fat jar which 
packages all dependencies.
 4. Submit your compiled fat jar to Spark cluster. Make sure you are in the 
root folder of Spark distribution. Then run the following command:
 ```
 ./bin/spark-submit --master spark://YOUR-IP:7077 /Path/To/YourJar.jar
 ```
 
 !!!note
-       The detailed explanation of spark-submit is available on [Spark 
website](https://spark.apache.org/docs/latest/submitting-applications.html).
-
-## How to use Sedona in an IDE
-
-### Select an IDE
-To develop a complex project, we suggest you use IntelliJ IDEA. It supports 
JVM languages, Scala and Java, and many dependency management systems, Maven 
and SBT.
-
-Eclipse is also fine if you just want to use Java and Maven.
-
-### Open Sedona template project
-Select a proper project you want from [Sedona Template 
Project](/tutorial/GeoSpark-Runnable-DEMO/). In this tutorial, we use Sedona 
SQL Scala project as an example.
-
-Open the folder that contains `build.sbt` file in your IDE. The IDE may take a 
while to index dependencies and source code.
-
-### Try Sedona SQL functions
-In your IDE, run ScalaExample.scala file.
-
-You don't need to change anything in this file. The IDE will run all SQL 
queries in this example in local mode.
-
-### Package the project
-To run this project in cluster mode, you have to package this project to a JAR 
and then run it using `spark-submit` command.
-
-Before packaging this project, you always need to check two places:
-
-* Remove the hardcoded Master IP `master("local[*]")`. This hardcoded IP is 
only needed when you run this project in an IDE.
-```scala
-var sparkSession:SparkSession = SparkSession.builder()
-       .config("spark.serializer",classOf[KryoSerializer].getName)
-       
.config("spark.kryo.registrator",classOf[SedonaVizKryoRegistrator].getName)
-       .master("local[*]")
-       .appName("SedonaSQL-demo").getOrCreate()
-```
-
-* In build.sbt (or POM.xml), set Spark dependency scope to `provided` instead 
of `compile`. `compile` is only needed when you run this project in an IDE.
-
-!!!warning
-       Forgetting to change the package scope will lead to a very big fat JAR 
and dependency conflicts when call `spark-submit`. For more details, please 
visit [Maven Dependency 
Scope](https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope).
-
-* Make sure your downloaded Spark binary distribution is the same version with 
the Spark used in your `build.sbt` or `POM.xml`.
-
-### Submit the compiled jar
-1. Go to `./target/scala-2.11` folder and find a jar called 
`SedonaSQLTemplate-0.1.0.jar`. Note that, this JAR normally is larger than 1MB. 
(If you use POM.xml, the jar is under `./target` folder)
-2. Submit this JAR using `spark-submit`.
-
-* Local mode:
-```
-./bin/spark-submit /Path/To/YourJar.jar
-```
-
-* Cluster mode:
-```
-./bin/spark-submit --master spark://YOUR-IP:7077 /Path/To/YourJar.jar
-```
\ No newline at end of file
+       The detailed explanation of spark-submit is available on [Spark 
website](https://spark.apache.org/docs/latest/submitting-applications.html).
\ No newline at end of file
diff --git a/docs/tutorial/GeoSpark-Runnable-DEMO.md 
b/docs/tutorial/GeoSpark-Runnable-DEMO.md
index 76e0902..4c81619 100644
--- a/docs/tutorial/GeoSpark-Runnable-DEMO.md
+++ b/docs/tutorial/GeoSpark-Runnable-DEMO.md
@@ -1,36 +1,30 @@
-## Python Jupyter Notebook Examples
-
-[Sedona 
core](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaCore.ipynb)
-
-[Sedona 
SQL](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaSQL.ipynb)
-
-## Scala and Java Examples
+# Scala and Java Examples
 
 [Scala and Java 
Examples](https://github.com/apache/incubator-sedona/tree/master/examples) 
contains template projects for RDD, SQL and Viz. The template projects have 
been configured properly.
 
 Note that, although the template projects are written in Scala, the same APIs 
can be  used in Java as well.
 
-### Folder structure
+## Folder structure
 The folder structure of this repository is as follows.
 
 * rdd-colocation-mining: a scala template shows how to use Sedona RDD API in 
Spatial Data Mining
 * sql: a scala template shows how to use Sedona DataFrame and SQL API
 * viz: a scala template shows how to use Sedona Viz RDD and SQL API
 
-### Compile and package
+## Compile and package
 
-#### Prerequisites
+### Prerequisites
 Please make sure you have the following software installed on your local 
machine:
 
 * For Scala: Scala 2.12, SBT
 * For Java: JDK 1.8, Apache Maven 3
 
-#### Compile
+### Compile
 
 Run a terminal command `sbt assembly` within the folder of each template
 
 
-#### Submit your fat jar to Spark
+### Submit your fat jar to Spark
 After running the command mentioned above, you are able to see a fat jar in 
`./target` folder. Please take it and use `./bin/spark-submit` to submit this 
jar.
 
 To run the jar in this way, you need to:
@@ -41,8 +35,8 @@ To run the jar in this way, you need to:
 
 * Make sure the dependency versions in build.sbt are consistent with your 
Spark version.
 
-### Run template projects locally
+## Run template projects locally
 We highly suggest you use IDEs to run template projects on your local machine. 
For Scala, we recommend IntelliJ IDEA with Scala plug-in. For Java, we 
recommend IntelliJ IDEA and Eclipse. With the help of IDEs, **you don't have to 
prepare anything** (even don't need to download and set up Spark!). As long as 
you have Scala and Java, everything works properly!
 
-#### Scala
+### Scala
 Import the Scala template project as SBT project. Then run the Main file in 
this project.
\ No newline at end of file
diff --git a/docs/tutorial/geospark-core-python.md 
b/docs/tutorial/geospark-core-python.md
index 53f1d3d..2f2869b 100644
--- a/docs/tutorial/geospark-core-python.md
+++ b/docs/tutorial/geospark-core-python.md
@@ -33,7 +33,7 @@ GeoData has one method to get user data.
 <li> getUserData() -> str </li>
 
 !!!note
-       This tutorial is based on [Sedona Core Jupyter Notebook 
example](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaCore.ipynb)
+       This tutorial is based on [Sedona Core Jupyter Notebook 
example](../jupyter-notebook)
 
 ## Installation
 
diff --git a/docs/tutorial/geospark-sql-python.md 
b/docs/tutorial/geospark-sql-python.md
index 8520e29..37d0f75 100644
--- a/docs/tutorial/geospark-sql-python.md
+++ b/docs/tutorial/geospark-sql-python.md
@@ -14,7 +14,7 @@ spark.sql("YOUR_SQL")
 ```
 
 !!!note
-       This tutorial is based on [Sedona SQL Jupyter Notebook 
example](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaSQL.ipynb)
+       This tutorial is based on [Sedona SQL Jupyter Notebook 
example](../jupyter-notebook)
        
 ## Installation
 
diff --git a/docs/tutorial/jupyter-notebook.md 
b/docs/tutorial/jupyter-notebook.md
new file mode 100644
index 0000000..c69b739
--- /dev/null
+++ b/docs/tutorial/jupyter-notebook.md
@@ -0,0 +1,32 @@
+# Python Jupyter Notebook Examples
+
+Sedona Python provides two Jupyter Notebook examples: [Sedona 
core](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaCore.ipynb)
 and [Sedona 
SQL](https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaSQL.ipynb)
+
+
+Please use the following steps to run Jupyter notebook with Pipenv
+
+1. Clone Sedona GitHub repo or download the source code
+2. Install Sedona Python from PyPi or GitHub source: Read [Install Sedona 
Python](/download/overview/#install-sedona) to learn.
+3. Prepare python-adapter jar: Read [Install Sedona 
Python](/download/overview/#prepare-python-adapter-jar) to learn.
+4. Setup pipenv python version. For Spark 3.0, Sedona supports 3.7 - 3.9
+```bash
+cd python
+pipenv --python 3.8
+```
+5. Install dependencies
+```bash
+cd python
+pipenv install
+```
+6. Install jupyter notebook kernel for pipenv
+```bash
+pipenv install ipykernel
+pipenv shell
+```
+7. In the pipenv shell, do
+```bash
+python -m ipykernel install --user --name=my-virtualenv-name
+```
+8. Setup environment variables `SPARK_HOME` and `PYTHONPATH` if you didn't do 
it before. Read [Install Sedona 
Python](/download/overview/#setup-environment-variables) to learn.
+9. Launch jupyter notebook: `jupyter notebook`
+10. Select Sedona notebook. In your notebook, Kernel -> Change Kernel. Your 
kernel should now be an option.
\ No newline at end of file
diff --git a/docs/tutorial/rdd.md b/docs/tutorial/rdd.md
index cbc2dc8..8ddb878 100644
--- a/docs/tutorial/rdd.md
+++ b/docs/tutorial/rdd.md
@@ -8,7 +8,7 @@ The page outlines the steps to create Spatial RDDs and run 
spatial queries using
 3. Add the dependencies in build.sbt or pom.xml.
 
 !!!note
-       To enjoy the full functions of Sedona, we suggest you include ==the 
full dependencies==: [Apache Spark 
core](https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11), 
[Apache 
SparkSQL](https://mvnrepository.com/artifact/org.apache.spark/spark-sql), 
Sedona-core, Sedona-SQL, Sedona-Viz. Please see [RDD example 
project](https://github.com/apache/incubator-sedona/tree/master/examples/rdd-colocation-mining)
+       To enjoy the full functions of Sedona, we suggest you include ==the 
full dependencies==: [Apache Spark 
core](https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11), 
[Apache 
SparkSQL](https://mvnrepository.com/artifact/org.apache.spark/spark-sql), 
Sedona-core, Sedona-SQL, Sedona-Viz. Please see [RDD example 
project](/tutorial/GeoSpark-Runnable-DEMO/)
 
 ## Initiate SparkContext
 
diff --git a/docs/tutorial/sql.md b/docs/tutorial/sql.md
index 0939ef2..035d12f 100644
--- a/docs/tutorial/sql.md
+++ b/docs/tutorial/sql.md
@@ -14,7 +14,7 @@ Detailed SedonaSQL APIs are available here: [SedonaSQL 
API](../api/sql/GeoSparkS
 3. Add the dependencies in build.sbt or pom.xml.
 
 !!!note
-       To enjoy the full functions of Sedona, we suggest you include ==the 
full dependencies==: [Apache Spark 
core](https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11), 
[Apache 
SparkSQL](https://mvnrepository.com/artifact/org.apache.spark/spark-sql), 
Sedona-core, Sedona-SQL, Sedona-Viz. Please see [SQL example 
project](https://github.com/apache/incubator-sedona/tree/master/examples/sql)
+       To enjoy the full functions of Sedona, we suggest you include ==the 
full dependencies==: [Apache Spark 
core](https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11), 
[Apache 
SparkSQL](https://mvnrepository.com/artifact/org.apache.spark/spark-sql), 
Sedona-core, Sedona-SQL, Sedona-Viz. Please see [SQL example 
project](/tutorial/GeoSpark-Runnable-DEMO/)
 
 
 ## Initiate SparkSession
diff --git a/mkdocs.yml b/mkdocs.yml
index dd1d478..3b9e2a8 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -22,7 +22,9 @@ nav:
       - Map visualization SQL app:
         - Scala/Java: tutorial/viz.md
         - Use Apache Zeppelin: tutorial/zeppelin.md
-      - Examples: tutorial/GeoSpark-Runnable-DEMO.md
+      - Examples:
+        - Scala/Java: tutorial/GeoSpark-Runnable-DEMO.md
+        - Python: tutorial/jupyter-notebook.md
       - Performance tuning:
         - Benchmark: tutorial/benchmark.md            
         - Tune RDD application: 
tutorial/Advanced-Tutorial-Tune-your-GeoSpark-Application.md
diff --git a/python-adapter/.gitignore b/python-adapter/.gitignore
new file mode 100644
index 0000000..b83d222
--- /dev/null
+++ b/python-adapter/.gitignore
@@ -0,0 +1 @@
+/target/

[incubator-sedona] branch master updated: Update docs

Reply via email to