roadan commented on a change in pull request #58: documentation for pyspark sdk
URL: https://github.com/apache/incubator-amaterasu/pull/58#discussion_r297495190
 
 

 ##########
 File path: docs/docs/frameworks.md
 ##########
 @@ -41,13 +41,145 @@ Amaterasu supports different processing frameworks to be 
executed. Amaterasu fra
 
 # Amaterasu Frameworks
 
+## Python 
+Apache Amaterasu supports the following types of Python workloads:
+
+1. PySpark workload ([See below](#pyspark))
+
+2. Pandas workload 
+
+3. Generic Python workload
+
+Each workload type has a dedicated Apache Amaterasu SDK. 
+The Apache Amaterasu SDK is available in PyPI and can be installed as follows:
+```bash
+pip install apache-amaterasu
+```
+
+Alternatively, it is possible to download the SDK source and manually install 
it via ```easy_install``` or executing the setup script.
+
+```bash
+wget <link to source distribution>
+tar -xzf apache-amaterasu-0.2.1-incubating.tar.gz
+cd apache-amaterasu-0.2.1-incubating
+python setup.py install
+```
+
+### Action dependencies
+Apache Amaterasu has the capability of ensuring Python dependencies are 
present on all execution nodes when executing action sources.
+
+In order to define the required dependencies, a ```requirements.txt``` file 
has to be added to the job repository.
+Currently, only a global ```requirements.txt``` is supported.
+
+Below you can see where the requirements file has to be added:
+```
+repo
++-- deps/
+|   +-- requirements.txt <-- This is the place for defining dependencies
++-- env/
+|   +-- dev/
+|   |   +-- job.yaml
+|   |   +-- spark.yaml
+|   +-- test/
+|   |   +-- job.yaml
+|   |   +-- spark.yaml
+|   +-- prod/
+|       +-- job.yaml
+|       +-- spark.yaml
++-- src/
+|   +-- start/
+|       +-- dev/
+|       |   +-- job.yaml
+|       |   +-- spark.yaml
+|       +-- test/
+|       |   +-- job.yaml
+|       |   +-- spark.yaml
+|       +-- prod/
+|           +-- job.yaml
+|           +-- spark.yaml
++-- maki.yaml 
+
+```
+
+When a ```requirements.txt``` file exists, Apache Amaterasu distributes it to 
the execution containers and locally installs the dependencies in each 
container.
+
+> **Important** - Your execution nodes need to have egress connection 
available in order to use pip
+
+### Pandas
+### Generic Python
+
+
+## Java and JVM programs
+
 ## Apache Spark
 
 ### Spark Configuration
 
 ### Scala
 ### PySpark
+Apache Amaterasu has the capability of deploying PySpark applications and 
provide configuration and integration 
 
 Review comment:
   This section should be made generic about spark in general and move up

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to