Repository: incubator-zeppelin Updated Branches: refs/heads/master 79a92c789 -> c2cbafd1d
http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/interpreter/ignite.md ---------------------------------------------------------------------- diff --git a/docs/interpreter/ignite.md b/docs/interpreter/ignite.md new file mode 100644 index 0000000..02fc587 --- /dev/null +++ b/docs/interpreter/ignite.md @@ -0,0 +1,116 @@ +--- +layout: page +title: "Ignite Interpreter" +description: "Ignite user guide" +group: manual +--- +{% include JB/setup %} + +## Ignite Interpreter for Apache Zeppelin + +### Overview +[Apache Ignite](https://ignite.apache.org/) In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies. + + + +You can use Zeppelin to retrieve distributed data from cache using Ignite SQL interpreter. Moreover, Ignite interpreter allows you to execute any Scala code in cases when SQL doesn't fit to your requirements. For example, you can populate data into your caches or execute distributed computations. + +### Installing and Running Ignite example +In order to use Ignite interpreters, you may install Apache Ignite in some simple steps: + + 1. Download Ignite [source release](https://ignite.apache.org/download.html#sources) or [binary release](https://ignite.apache.org/download.html#binaries) whatever you want. But you must download Ignite as the same version of Zeppelin's. If it is not, you can't use scala code on Zeppelin. You can find ignite version in Zepplin at the pom.xml which is placed under `path/to/your-Zeppelin/ignite/pom.xml` ( Of course, in Zeppelin source release ). Please check `ignite.version` .<br>Currently, Zeppelin provides ignite only in Zeppelin source release. So, if you download Zeppelin binary release( `zeppelin-0.5.0-incubating-bin-spark-xxx-hadoop-xx` ), you can not use ignite interpreter on Zeppelin. We are planning to include ignite in a future binary release. + + 2. Examples are shipped as a separate Maven project, so to start running you simply need to import provided <dest_dir>/apache-ignite-fabric-1.2.0-incubating-bin/pom.xml file into your favourite IDE, such as Eclipse. + + * In case of Eclipse, Eclipse -> File -> Import -> Existing Maven Projects + * Set examples directory path to Eclipse and select the pom.xml. + * Then start `org.apache.ignite.examples.ExampleNodeStartup` (or whatever you want) to run at least one or more ignite node. When you run example code, you may notice that the number of node is increase one by one. + + > **Tip. If you want to run Ignite examples on the cli not IDE, you can export executable Jar file from IDE. Then run it by using below command.** + + ``` + $ nohup java -jar </path/to/your Jar file name> + ``` + +### Configuring Ignite Interpreter +At the "Interpreters" menu, you may edit Ignite interpreter or create new one. Zeppelin provides these properties for Ignite. + + <table class="table-configuration"> + <tr> + <th>Property Name</th> + <th>value</th> + <th>Description</th> + </tr> + <tr> + <td>ignite.addresses</td> + <td>127.0.0.1:47500..47509</td> + <td>Coma separated list of Ignite cluster hosts. See [Ignite Cluster Configuration](https://apacheignite.readme.io/v1.2/docs/cluster-config) section for more details.</td> + </tr> + <tr> + <td>ignite.clientMode</td> + <td>true</td> + <td>You can connect to the Ignite cluster as client or server node. See [Ignite Clients vs. Servers](https://apacheignite.readme.io/v1.2/docs/clients-vs-servers) section for details. Use true or false values in order to connect in client or server mode respectively.</td> + </tr> + <tr> + <td>ignite.config.url</td> + <td></td> + <td>Configuration URL. Overrides all other settings.</td> + </tr + <tr> + <td>ignite.jdbc.url</td> + <td>jdbc:ignite:cfg://default-ignite-jdbc.xml</td> + <td>Ignite JDBC connection URL.</td> + </tr> + <tr> + <td>ignite.peerClassLoadingEnabled</td> + <td>true</td> + <td>Enables peer-class-loading. See [Zero Deployment](https://apacheignite.readme.io/v1.2/docs/zero-deployment) section for details. Use true or false values in order to enable or disable P2P class loading respectively.</td> + </tr> + </table> + + + +### Interpreter Binding for Zeppelin Notebook +After configuring Ignite interpreter, create your own notebook. Then you can bind interpreters like below image. + + + +For more interpreter binding information see [here](http://zeppelin.incubator.apache.org/docs/manual/interpreters.html). + +### How to use Ignite SQL interpreter +In order to execute SQL query, use ` %ignite.ignitesql ` prefix. <br> +Supposing you are running `org.apache.ignite.examples.streaming.wordcount.StreamWords`, then you can use "words" cache( Of course you have to specify this cache name to the Ignite interpreter setting section `ignite.jdbc.url` of Zeppelin ). +For example, you can select top 10 words in the words cache using the following query + + ``` + %ignite.ignitesql + select _val, count(_val) as cnt from String group by _val order by cnt desc limit 10 + ``` + +  + +As long as your Ignite version and Zeppelin Ignite version is same, you can also use scala code. Please check the Zeppelin Ignite version before you download your own Ignite. + + ``` + %ignite + import org.apache.ignite._ + import org.apache.ignite.cache.affinity._ + import org.apache.ignite.cache.query._ + import org.apache.ignite.configuration._ + + import scala.collection.JavaConversions._ + + val cache: IgniteCache[AffinityUuid, String] = ignite.cache("words") + + val qry = new SqlFieldsQuery("select avg(cnt), min(cnt), max(cnt) from (select count(_val) as cnt from String group by _val)", true) + + val res = cache.query(qry).getAll() + + collectionAsScalaIterable(res).foreach(println _) + ``` + +  + +Apache Ignite also provides a guide docs for Zeppelin ["Ignite with Apache Zeppelin"](https://apacheignite.readme.io/docs/data-analysis-with-apache-zeppelin) + + http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/interpreter/lens.md ---------------------------------------------------------------------- diff --git a/docs/interpreter/lens.md b/docs/interpreter/lens.md new file mode 100644 index 0000000..903df7e --- /dev/null +++ b/docs/interpreter/lens.md @@ -0,0 +1,173 @@ +--- +layout: page +title: "Lens Interpreter" +description: "Lens user guide" +group: manual +--- +{% include JB/setup %} + +## Lens Interpreter for Apache Zeppelin + +### Overview +[Apache Lens](https://lens.apache.org/) provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It seamlessly integrates Hadoop with traditional data warehouses to appear like one. + + + +### Installing and Running Lens +In order to use Lens interpreters, you may install Apache Lens in some simple steps: + + 1. Download Lens for latest version from [the ASF](http://www.apache.org/dyn/closer.lua/lens/2.3-beta). Or the older release can be found [in the Archives](http://archive.apache.org/dist/lens/). + 2. Before running Lens, you have to set HIVE_HOME and HADOOP_HOME. If you want to get more information about this, please refer to [here](http://lens.apache.org/lenshome/install-and-run.html#Installation). Lens also provides Pseudo Distributed mode. [Lens pseudo-distributed setup](http://lens.apache.org/lenshome/pseudo-distributed-setup.html) is done by using [docker](https://www.docker.com/). Hive server and hadoop daemons are run as separate processes in lens pseudo-distributed setup. + 3. Now, you can start lens server (or stop). + + ``` + ./bin/lens-ctl start (or stop) + ``` + +### Configuring Lens Interpreter +At the "Interpreters" menu, you can to edit Lens interpreter or create new one. Zeppelin provides these properties for Lens. + + <table class="table-configuration"> + <tr> + <th>Property Name</th> + <th>value</th> + <th>Description</th> + </tr> + <tr> + <td>lens.client.dbname</td> + <td>default</td> + <td>The database schema name</td> + </tr> + <tr> + <td>lens.query.enable.persistent.resultset</td> + <td>false</td> + <td>Whether to enable persistent resultset for queries. When enabled, server will fetch results from driver, custom format them if any and store in a configured location. The file name of query output is queryhandle-id, with configured extensions</td> + </tr> + <tr> + <td>lens.server.base.url</td> + <td>http://hostname:port/lensapi</td> + <td>The base url for the lens server. you have to edit "hostname" and "port" that you may use(ex. http://0.0.0.0:9999/lensapi)</td> + </tr> + <tr> + <td>lens.session.cluster.user </td> + <td>default</td> + <td>Hadoop cluster username</td> + </tr> + <tr> + <td>zeppelin.lens.maxResult</td> + <td>1000</td> + <td>Max number of rows to display</td> + </tr> + <tr> + <td>zeppelin.lens.maxThreads</td> + <td>10</td> + <td>If concurrency is true then how many threads?</td> + </tr> + <tr> + <td>zeppelin.lens.run.concurrent</td> + <td>true</td> + <td>Run concurrent Lens Sessions</td> + </tr> + <tr> + <td>xxx</td> + <td>yyy</td> + <td>anything else from [Configuring lens server](https://lens.apache.org/admin/config-server.html)</td> + </tr> + </table> + + + +### Interpreter Bindging for Zeppelin Notebook +After configuring Lens interpreter, create your own notebook, then you can bind interpreters like below image. + + +For more interpreter binding information see [here](http://zeppelin.incubator.apache.org/docs/manual/interpreters.html). + +### How to use +You can analyze your data by using [OLAP Cube](http://lens.apache.org/user/olap-cube.html) [QL](http://lens.apache.org/user/cli.html) which is a high level SQL like language to query and describe data sets organized in data cubes. +You may experience OLAP Cube like this [Video tutorial](https://cwiki.apache.org/confluence/display/LENS/2015/07/13/20+Minute+video+demo+of+Apache+Lens+through+examples). +As you can see in this video, they are using Lens Client Shell(./bin/lens-cli.sh). All of these functions also can be used on Zeppelin by using Lens interpreter. + +<li> Create and Use(Switch) Databases. + + ``` + create database newDb + ``` + + ``` + use newDb + ``` + +<li> Create Storage. + + ``` + create storage your/path/to/lens/client/examples/resources/db-storage.xml + ``` + +<li> Create Dimensions, Show fields and join-chains of them. + + ``` + create dimension your/path/to/lens/client/examples/resources/customer.xml + ``` + + ``` + dimension show fields customer + ``` + + ``` + dimension show joinchains customer + ``` + +<li> Create Caches, Show fields and join-chains of them. + + ``` + create cube your/path/to/lens/client/examples/resources/sales-cube.xml + ``` + + ``` + cube show fields sales + ``` + + ``` + cube show joinchains sales + ``` + +<li> Create Dimtables and Fact. + + ``` + create dimtable your/path/to/lens/client/examples/resources/customer_table.xml + ``` + + ``` + create fact your/path/to/lens/client/examples/resources/sales-raw-fact.xml + ``` + +<li> Add partitions to Dimtable and Fact. + + ``` + dimtable add single-partition --dimtable_name customer_table --storage_name local --path your/path/to/lens/client/examples/resources/customer-local-part.xml + ``` + + ``` + fact add partitions --fact_name sales_raw_fact --storage_name local --path your/path/to/lens/client/examples/resources/sales-raw-local-parts.xml + ``` + +<li> Now, you can run queries on cubes. + + ``` + query execute cube select customer_city_name, product_details.description, product_details.category, product_details.color, store_sales from sales where time_range_in(delivery_time, '2015-04-11-00', '2015-04-13-00') + ``` + + +  + +These are just examples that provided in advance by Lens. If you want to explore whole tutorials of Lens, see the [tutorial video](https://cwiki.apache.org/confluence/display/LENS/2015/07/13/20+Minute+video+demo+of+Apache+Lens+through+examples). + +### Lens UI Service +Lens also provides web UI service. Once the server starts up, you can open the service on http://serverhost:19999/index.html and browse. You may also check the structure that you made and use query easily here. + +  + + + + http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/interpreter/postgresql.md ---------------------------------------------------------------------- diff --git a/docs/interpreter/postgresql.md b/docs/interpreter/postgresql.md new file mode 100644 index 0000000..9753cdc --- /dev/null +++ b/docs/interpreter/postgresql.md @@ -0,0 +1,180 @@ +--- +layout: page +title: "PostgreSQL and HAWQ Interpreter" +description: "" +group: manual +--- +{% include JB/setup %} + + +## PostgreSQL, HAWQ Interpreter for Apache Zeppelin + +<br/> +<table class="table-configuration"> + <tr> + <th>Name</th> + <th>Class</th> + <th>Description</th> + </tr> + <tr> + <td>%psql.sql</td> + <td>PostgreSqlInterpreter</td> + <td>Provides SQL environment for Postgresql, HAWQ and Greenplum</td> + </tr> +</table> + +<br/> +[<img align="right" src="http://img.youtube.com/vi/wqXXQhJ5Uk8/0.jpg" alt="zeppelin-view" hspace="10" width="250"></img>](https://www.youtube.com/watch?v=wqXXQhJ5Uk8) + +This interpreter seamlessly supports the following SQL data processing engines: + +* [PostgreSQL](http://www.postgresql.org/) - OSS, Object-relational database management system (ORDBMS) +* [Apache HAWQ](http://pivotal.io/big-data/pivotal-hawq) - Powerful [Open Source](https://wiki.apache.org/incubator/HAWQProposal) SQL-On-Hadoop engine. +* [Greenplum](http://pivotal.io/big-data/pivotal-greenplum-database) - MPP database built on open source PostgreSQL. + + +This [Video Tutorial](https://www.youtube.com/watch?v=wqXXQhJ5Uk8) illustrates some of the features provided by the `Postgresql Interpreter`. + +### Create Interpreter + +By default Zeppelin creates one `PSQL` instance. You can remove it or create new instances. + +Multiple PSQL instances can be created, each configured to the same or different backend databases. But over time a `Notebook` can have only one PSQL interpreter instance `bound`. That means you _can not_ connect to different databases in the same `Notebook`. This is a known Zeppelin limitation. + +To create new PSQL instance open the `Interprter` section and click the `+Create` button. Pick a `Name` of your choice and from the `Interpreter` drop-down select `psql`. Then follow the configuration instructions and `Save` the new instance. + +> Note: The `Name` of the instance is used only to distinct the instances while binding them to the `Notebook`. The `Name` is irrelevant inside the `Notebook`. In the `Notebook` you must use `%psql.sql` tag. + +### Bind to Notebook +In the `Notebook` click on the `settings` icon in the top right corner. The select/deselect the interpreters to be bound with the `Notebook`. + +### Configuration +You can modify the configuration of the PSQL from the `Interpreter` section. The PSQL interpreter expenses the following properties: + + + <table class="table-configuration"> + <tr> + <th>Property Name</th> + <th>Description</th> + <th>Default Value</th> + </tr> + <tr> + <td>postgresql.url</td> + <td>JDBC URL to connect to </td> + <td>jdbc:postgresql://localhost:5432</td> + </tr> + <tr> + <td>postgresql.user</td> + <td>JDBC user name</td> + <td>gpadmin</td> + </tr> + <tr> + <td>postgresql.password</td> + <td>JDBC password</td> + <td></td> + </tr> + <tr> + <td>postgresql.driver.name</td> + <td>JDBC driver name. In this version the driver name is fixed and should not be changed</td> + <td>org.postgresql.Driver</td> + </tr> + <tr> + <td>postgresql.max.result</td> + <td>Max number of SQL result to display to prevent the browser overload</td> + <td>1000</td> + </tr> + </table> + + +### How to use +``` +Tip: Use (CTRL + .) for SQL auto-completion. +``` +#### DDL and SQL commands + +Start the paragraphs with the full `%psql.sql` prefix tag! The short notation: `%psql` would still be able run the queries but the syntax highlighting and the auto-completions will be disabled. + +You can use the standard CREATE / DROP / INSERT commands to create or modify the data model: + +```sql +%psql.sql +drop table if exists mytable; +create table mytable (i int); +insert into mytable select generate_series(1, 100); +``` + +Then in a separate paragraph run the query. + +```sql +%psql.sql +select * from mytable; +``` + +> Note: You can have multiple queries in the same paragraph but only the result from the first is displayed. [[1](https://issues.apache.org/jira/browse/ZEPPELIN-178)], [[2](https://issues.apache.org/jira/browse/ZEPPELIN-212)]. + +For example, this will execute both queries but only the count result will be displayed. If you revert the order of the queries the mytable content will be shown instead. + +```sql +%psql.sql +select count(*) from mytable; +select * from mytable; +``` + +#### PSQL command line tools + +Use the Shell Interpreter (`%sh`) to access the command line [PSQL](http://www.postgresql.org/docs/9.4/static/app-psql.html) interactively: + +```bash +%sh +psql -h phd3.localdomain -U gpadmin -p 5432 <<EOF + \dn + \q +EOF +``` +This will produce output like this: + +``` + Name | Owner +--------------------+--------- + hawq_toolkit | gpadmin + information_schema | gpadmin + madlib | gpadmin + pg_catalog | gpadmin + pg_toast | gpadmin + public | gpadmin + retail_demo | gpadmin +``` + +#### Apply Zeppelin Dynamic Forms + +You can leverage [Zepplein Dynamic Form](https://zeppelin.incubator.apache.org/docs/manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parametrization features + +```sql +%psql.sql +SELECT ${group_by}, count(*) as count +FROM retail_demo.order_lineitems_pxf +GROUP BY ${group_by=product_id,product_id|product_name|customer_id|store_id} +ORDER BY count ${order=DESC,DESC|ASC} +LIMIT ${limit=10}; +``` +#### Example HAWQ PXF/HDFS Tables + +Create HAWQ external table that read data from tab-separated-value data in HDFS. + +```sql +%psql.sql +CREATE EXTERNAL TABLE retail_demo.payment_methods_pxf ( + payment_method_id smallint, + payment_method_code character varying(20) +) LOCATION ('pxf://${NAME_NODE_HOST}:50070/retail_demo/payment_methods.tsv.gz?profile=HdfsTextSimple') FORMAT 'TEXT' (DELIMITER = E'\t'); +``` +And retrieve content + +```sql +%psql.sql +seelect * from retail_demo.payment_methods_pxf +``` +### Auto-completion +The PSQL Interpreter provides a basic auto-completion functionality. On `(Ctrl+.)` it list the most relevant suggesntions in a pop-up window. In addition to the SQL keyword the interpter provides suggestions for the Schema, Table, Column names as well. + + http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/interpreter/spark.md ---------------------------------------------------------------------- diff --git a/docs/interpreter/spark.md b/docs/interpreter/spark.md new file mode 100644 index 0000000..58fce0b --- /dev/null +++ b/docs/interpreter/spark.md @@ -0,0 +1,221 @@ +--- +layout: page +title: "Spark Interpreter Group" +description: "" +group: manual +--- +{% include JB/setup %} + + +## Spark + +[Apache Spark](http://spark.apache.org) is supported in Zeppelin with +Spark Interpreter group, which consisted of 4 interpreters. + +<table class="table-configuration"> + <tr> + <th>Name</th> + <th>Class</th> + <th>Description</th> + </tr> + <tr> + <td>%spark</td> + <td>SparkInterpreter</td> + <td>Creates SparkContext and provides scala environment</td> + </tr> + <tr> + <td>%pyspark</td> + <td>PySparkInterpreter</td> + <td>Provides python environment</td> + </tr> + <tr> + <td>%sql</td> + <td>SparkSQLInterpreter</td> + <td>Provides SQL environment</td> + </tr> + <tr> + <td>%dep</td> + <td>DepInterpreter</td> + <td>Dependency loader</td> + </tr> +</table> + + +<br /> + + +### SparkContext, SQLContext, ZeppelinContext + +SparkContext, SQLContext, ZeppelinContext are automatically created and exposed as variable names 'sc', 'sqlContext' and 'z', respectively, both in scala and python environments. + +Note that scala / python environment shares the same SparkContext, SQLContext, ZeppelinContext instance. + + +<a name="dependencyloading"> </a> +<br /> +<br /> +### Dependency Management +There are two ways to load external library in spark interpreter. First is using Zeppelin's %dep interpreter and second is loading Spark properties. + +#### 1. Dynamic Dependency Loading via %dep interpreter + +When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using %dep interpreter. + + * Load libraries recursively from Maven repository + * Load libraries from local filesystem + * Add additional maven repository + * Automatically add libraries to SparkCluster (You can turn off) + +Dep interpreter leverages scala environment. So you can write any Scala code here. +Note that %dep interpreter should be used before %spark, %pyspark, %sql. + +Here's usages. + +```scala +%dep +z.reset() // clean up previously added artifact and repository + +// add maven repository +z.addRepo("RepoName").url("RepoURL") + +// add maven snapshot repository +z.addRepo("RepoName").url("RepoURL").snapshot() + +// add credentials for private maven repository +z.addRepo("RepoName").url("RepoURL").username("username").password("password") + +// add artifact from filesystem +z.load("/path/to.jar") + +// add artifact from maven repository, with no dependency +z.load("groupId:artifactId:version").excludeAll() + +// add artifact recursively +z.load("groupId:artifactId:version") + +// add artifact recursively except comma separated GroupID:ArtifactId list +z.load("groupId:artifactId:version").exclude("groupId:artifactId,groupId:artifactId, ...") + +// exclude with pattern +z.load("groupId:artifactId:version").exclude(*) +z.load("groupId:artifactId:version").exclude("groupId:artifactId:*") +z.load("groupId:artifactId:version").exclude("groupId:*") + +// local() skips adding artifact to spark clusters (skipping sc.addJar()) +z.load("groupId:artifactId:version").local() +``` + + +<br /> +#### 2. Loading Spark Properties +Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations. The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in conf/zeppelin-env.sh. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properites that user can set to distribute libraries are: + +<table class="table-configuration"> + <tr> + <th>spark-defaults.conf</th> + <th>SPARK_SUBMIT_OPTIONS</th> + <th>Applicable Interpreter</th> + <th>Description</th> + </tr> + <tr> + <td>spark.jars</td> + <td>--jars</td> + <td>%spark</td> + <td>Comma-separated list of local jars to include on the driver and executor classpaths.</td> + </tr> + <tr> + <td>spark.jars.packages</td> + <td>--packages</td> + <td>%spark</td> + <td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version.</td> + </tr> + <tr> + <td>spark.files</td> + <td>--files</td> + <td>%pyspark</td> + <td>Comma-separated list of files to be placed in the working directory of each executor.</td> + </tr> +</table> +Note that adding jar to pyspark is only availabe via %dep interpreter at the moment + +<br/> +Here are few examples: + +##### 0.5.5 and later +* SPARK\_SUBMIT\_OPTIONS in conf/zeppelin-env.sh + + export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0 --jars /path/mylib1.jar,/path/mylib2.jar --files /path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg" + +* SPARK_HOME/conf/spark-defaults.conf + + spark.jars /path/mylib1.jar,/path/mylib2.jar + spark.jars.packages com.databricks:spark-csv_2.10:1.2.0 + spark.files /path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip + +##### 0.5.0 +* ZEPPELIN\_JAVA\_OPTS in conf/zeppelin-env.sh + + export ZEPPELIN_JAVA_OPTS="-Dspark.jars=/path/mylib1.jar,/path/mylib2.jar -Dspark.files=/path/myfile1.dat,/path/myfile2.dat" +<br /> + + +<a name="zeppelincontext"> </a> +<br /> +<br /> +### ZeppelinContext + + +Zeppelin automatically injects ZeppelinContext as variable 'z' in your scala/python environment. ZeppelinContext provides some additional functions and utility. + +<br /> +#### Object exchange + +ZeppelinContext extends map and it's shared between scala, python environment. +So you can put some object from scala and read it from python, vise versa. + +Put object from scala + +```scala +%spark +val myObject = ... +z.put("objName", myObject) +``` + +Get object from python + +```python +%python +myObject = z.get("objName") +``` + +<br /> +#### Form creation + +ZeppelinContext provides functions for creating forms. +In scala and python environments, you can create forms programmatically. + +```scala +%spark +/* Create text input form */ +z.input("formName") + +/* Create text input form with default value */ +z.input("formName", "defaultValue") + +/* Create select form */ +z.select("formName", Seq(("option1", "option1DisplayName"), + ("option2", "option2DisplayName"))) + +/* Create select form with default value*/ +z.select("formName", "option1", Seq(("option1", "option1DisplayName"), + ("option2", "option2DisplayName"))) +``` + +In sql environment, you can create form in simple template. + +``` +%sql +select * from ${table=defaultTableName} where text like '%${search}%' +``` + +To learn more about dynamic form, checkout [Dynamic Form](../dynamicform.html). http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/manual/dynamicform.md ---------------------------------------------------------------------- diff --git a/docs/manual/dynamicform.md b/docs/manual/dynamicform.md new file mode 100644 index 0000000..06074fd --- /dev/null +++ b/docs/manual/dynamicform.md @@ -0,0 +1,78 @@ +--- +layout: page +title: "Dynamic Form" +description: "" +group: manual +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +{% include JB/setup %} + + +## Dynamic Form + +Zeppelin dynamically creates input forms. Depending on language backend, there're two different ways to create dynamic form. +Custom language backend can select which type of form creation it wants to use. + +<br /> +### Using form Templates + +This mode creates form using simple template language. It's simple and easy to use. For example Markdown, Shell, SparkSql language backend uses it. + +<br /> +#### Text input form + +To create text input form, use _${formName}_ templates. + +for example + +<img src="../../assets/themes/zeppelin/img/screenshots/form_input.png" /> + + +Also you can provide default value, using _${formName=defaultValue}_. + +<img src="../../assets/themes/zeppelin/img/screenshots/form_input_default.png" /> + + +<br /> +#### Select form + +To create select form, use _${formName=defaultValue,option1|option2...}_ + +for example + +<img src="../../assets/themes/zeppelin/img/screenshots/form_select.png" /> + +Also you can separate option's display name and value, using _${formName=defaultValue,option1(DisplayName)|option2(DisplayName)...}_ + +<img src="../../assets/themes/zeppelin/img/screenshots/form_select_displayname.png" /> + +<br /> +### Creates Programmatically + +Some language backend uses programmatic way to create form. For example [ZeppelinContext](./interpreter/spark.html#zeppelincontext) provides form creation API + +Here're some examples. + +Text input form + +<img src="../../assets/themes/zeppelin/img/screenshots/form_input_prog.png" /> + +Text input form with default value + +<img src="../../assets/themes/zeppelin/img/screenshots/form_input_default_prog.png" /> + +Select form + +<img src="../../assets/themes/zeppelin/img/screenshots/form_select_prog.png" /> http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/manual/interpreters.md ---------------------------------------------------------------------- diff --git a/docs/manual/interpreters.md b/docs/manual/interpreters.md new file mode 100644 index 0000000..ff5bff7 --- /dev/null +++ b/docs/manual/interpreters.md @@ -0,0 +1,64 @@ +--- +layout: page +title: "Interpreters" +description: "" +group: manual +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +{% include JB/setup %} + + +## Interpreters in zeppelin + +This section explain the role of Interpreters, interpreters group and interpreters settings in Zeppelin. +Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. +Currently Zeppelin supports many interpreters such as Scala(with Apache Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown and Shell. + +### What is zeppelin interpreter? + +Zeppelin Interpreter is the plug-in which enable zeppelin user to use a specific language/data-processing-backend. For example to use scala code in Zeppelin, you need ```spark``` interpreter. + +When you click on the ```+Create``` button in the interpreter page the interpreter drop-down list box will present all the available interpreters on your server. + +<img src="../../assets/themes/zeppelin/img/screenshots/interpreter_create.png"> + +### What is zeppelin interpreter setting? + +Zeppelin interpreter setting is the configuration of a given interpreter on zeppelin server. For example, the properties requried for hive JDBC interpreter to connect to the Hive server. + +<img src="../../assets/themes/zeppelin/img/screenshots/interpreter_setting.png"> +### What is zeppelin interpreter group? + +Every Interpreter belongs to an InterpreterGroup. InterpreterGroup is a unit of start/stop interpreter. +By default, every interpreter belong to a single group but the group might contain more interpreters. For example, spark interpreter group include spark support, pySpark, +SparkSQL and the dependency loader. + +Technically, Zeppelin interpreters from the same group are running in the same JVM. + +Interpreters belong to a single group a registered together and all of their properties are listed in the interpreter setting. +<img src="../../assets/themes/zeppelin/img/screenshots/interpreter_setting_spark.png"> + +### Programming langages for interpreter + +If the interpreter uses a specific programming language (like Scala, Python, SQL), it is generally a good idea to add syntax highlighting support for that to the notebook paragraph editor. + +To check out the list of languages supported, see the mode-*.js files under zeppelin-web/bower_components/ace-builds/src-noconflict or from github https://github.com/ajaxorg/ace-builds/tree/master/src-noconflict + +To add a new set of syntax highlighting, +1. add the mode-*.js file to zeppelin-web/bower.json (when built, zeppelin-web/src/index.html will be changed automatically) +2. add to the list of `editorMode` in zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js - it follows the pattern 'ace/mode/x' where x is the name +3. add to the code that checks for `%` prefix and calls `session.setMode(editorMode.x)` in `setParagraphMode` in zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js + + http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/manual/notebookashomepage.md ---------------------------------------------------------------------- diff --git a/docs/manual/notebookashomepage.md b/docs/manual/notebookashomepage.md new file mode 100644 index 0000000..86f1ea9 --- /dev/null +++ b/docs/manual/notebookashomepage.md @@ -0,0 +1,109 @@ +--- +layout: page +title: "Notebook as Homepage" +description: "" +group: manual +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +{% include JB/setup %} + +## Customize your zeppelin homepage + Zeppelin allows you to use one of the notebooks you create as your zeppelin Homepage. + With that you can brand your zeppelin installation, + adjust the instruction to your users needs and even translate to other languages. + + <br /> +### How to set a notebook as your zeppelin homepage + +The process for creating your homepage is very simple as shown below: + + 1. Create a notebook using zeppelin + 2. Set the notebook id in the config file + 3. Restart zeppelin + + <br /> +#### Create a notebook using zeppelin + Create a new notebook using zeppelin, + you can use ```%md``` interpreter for markdown content or any other interpreter you like. + + You can also use the display system to generate [text](../displaysystem/display.html), + [html](../displaysystem/display.html#html),[table](../displaysystem/table.html) or + [angular](../displaysystem/angular.html) + + Run (shift+Enter) the notebook and see the output. Optionally, change the notebook view to report to hide + the code sections. + + <br /> +#### Set the notebook id in the config file + To set the notebook id in the config file you should copy it from the last word in the notebook url + + for example + + <img src="../../assets/themes/zeppelin/img/screenshots/homepage_notebook_id.png" /> + + Set the notebook id to the ```ZEPPELIN_NOTEBOOK_HOMESCREEN``` environment variable + or ```zeppelin.notebook.homescreen``` property. + + You can also set the ```ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE``` environment variable + or ```zeppelin.notebook.homescreen.hide``` property to hide the new notebook from the notebook list. + + <br /> +#### Restart zeppelin + Restart your zeppelin server + + ``` + ./bin/zeppelin-deamon stop + ./bin/zeppelin-deamon start + ``` + ####That's it! Open your browser and navigate to zeppelin and see your customized homepage... + + +<br /> +### Show notebooks list in your custom homepage +If you want to display the list of notebooks on your custom zeppelin homepage all +you need to do is use our %angular support. + + <br /> + Add the following code to a paragraph in you home page and run it... walla! you have your notebooks list. + + ```javascript + println( + """%angular + <div class="col-md-4" ng-controller="HomeCtrl as home"> + <h4>Notebooks</h4> + <div> + <h5><a href="" data-toggle="modal" data-target="#noteNameModal" style="text-decoration: none;"> + <i style="font-size: 15px;" class="icon-notebook"></i> Create new note</a></h5> + <ul style="list-style-type: none;"> + <li ng-repeat="note in home.notes.list track by $index"><i style="font-size: 10px;" class="icon-doc"></i> + <a style="text-decoration: none;" href="#/notebook/{{note.id}}">{{note.name || 'Note ' + note.id}}</a> + </li> + </ul> + </div> + </div> + """) + ``` + + After running the notebook you will see output similar to this one: + <img src="../../assets/themes/zeppelin/img/screenshots/homepage_notebook_list.png" /> + + The main trick here relays in linking the ```<div>``` to the controller: + + ```javascript + <div class="col-md-4" ng-controller="HomeCtrl as home"> + ``` + + Once we have ```home``` as our controller variable in our ```<div></div>``` + we can use ```home.notes.list``` to get access to the notebook list. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/pleasecontribute.md ---------------------------------------------------------------------- diff --git a/docs/pleasecontribute.md b/docs/pleasecontribute.md new file mode 100644 index 0000000..063b48f --- /dev/null +++ b/docs/pleasecontribute.md @@ -0,0 +1,28 @@ +--- +layout: page +title: "Please contribute" +description: "" +group: development +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +{% include JB/setup %} + + +### Waiting for your help +The content does not exist yet. + +We're always welcoming contribution. + +If you're interested, please check [How to contribute (website)](./development/howtocontributewebsite.html). http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/rest-api/rest-interpreter.md ---------------------------------------------------------------------- diff --git a/docs/rest-api/rest-interpreter.md b/docs/rest-api/rest-interpreter.md new file mode 100644 index 0000000..d852340 --- /dev/null +++ b/docs/rest-api/rest-interpreter.md @@ -0,0 +1,363 @@ +--- +layout: page +title: "Interpreter REST API" +description: "" +group: rest-api +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +{% include JB/setup %} + +## Zeppelin REST API + Zeppelin provides several REST API's for interaction and remote activation of zeppelin functionality. + + All REST API are available starting with the following endpoint ```http://[zeppelin-server]:[zeppelin-port]/api``` + + Note that zeppein REST API receive or return JSON objects, it it recommended you install some JSON view such as + [JSONView](https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc) + + + If you work with zeppelin and find a need for an additional REST API please [file an issue or send us mail](../../community.html) + + <br /> +### Interpreter REST API list + + The role of registered interpreters, settings and interpreters group is described [here](../manual/interpreters.html) + + <table class="table-configuration"> + <col width="200"> + <tr> + <th>List registered interpreters</th> + <th></th> + </tr> + <tr> + <td>Description</td> + <td>This ```GET``` method return all the registered interpreters available on the server.</td> + </tr> + <tr> + <td>URL</td> + <td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter```</td> + </tr> + <tr> + <td>Success code</td> + <td>200</td> + </tr> + <tr> + <td> Fail code</td> + <td> 500 </td> + </tr> + <tr> + <td> sample JSON response + </td> + <td> + <pre> +{ + "status": "OK", + "message": "", + "body": { + "md.md": { + "name": "md", + "group": "md", + "className": "org.apache.zeppelin.markdown.Markdown", + "properties": {}, + "path": "/zeppelin/interpreter/md" + }, + "spark.spark": { + "name": "spark", + "group": "spark", + "className": "org.apache.zeppelin.spark.SparkInterpreter", + "properties": { + "spark.executor.memory": { + "defaultValue": "512m", + "description": "Executor memory per worker instance. ex) 512m, 32g" + }, + "spark.cores.max": { + "defaultValue": "", + "description": "Total number of cores to use. Empty value uses all available core." + }, + }, + "path": "/zeppelin/interpreter/spark" + }, + "spark.sql": { + "name": "sql", + "group": "spark", + "className": "org.apache.zeppelin.spark.SparkSqlInterpreter", + "properties": { + "zeppelin.spark.maxResult": { + "defaultValue": "1000", + "description": "Max number of SparkSQL result to display." + } + }, + "path": "/zeppelin/interpreter/spark" + } + } +} + </pre> + </td> + </tr> + </table> + +<br/> + + <table class="table-configuration"> + <col width="200"> + <tr> + <th>List interpreters settings</th> + <th></th> + </tr> + <tr> + <td>Description</td> + <td>This ```GET``` method return all the interpreters settings registered on the server.</td> + </tr> + <tr> + <td>URL</td> + <td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter/setting```</td> + </tr> + <tr> + <td>Success code</td> + <td>200</td> + </tr> + <tr> + <td> Fail code</td> + <td> 500 </td> + </tr> + <tr> + <td> sample JSON response + </td> + <td> + <pre> +{ + "status": "OK", + "message": "", + "body": [ + { + "id": "2AYUGP2D5", + "name": "md", + "group": "md", + "properties": { + "_empty_": "" + }, + "interpreterGroup": [ + { + "class": "org.apache.zeppelin.markdown.Markdown", + "name": "md" + } + ] + }, + { + "id": "2AY6GV7Q3", + "name": "spark", + "group": "spark", + "properties": { + "spark.cores.max": "", + "spark.executor.memory": "512m", + }, + "interpreterGroup": [ + { + "class": "org.apache.zeppelin.spark.SparkInterpreter", + "name": "spark" + }, + { + "class": "org.apache.zeppelin.spark.SparkSqlInterpreter", + "name": "sql" + } + ] + } + ] +} + </pre> + </td> + </tr> + </table> + +<br/> + + <table class="table-configuration"> + <col width="200"> + <tr> + <th>Create an interpreter setting</th> + <th></th> + </tr> + <tr> + <td>Description</td> + <td>This ```POST``` method adds a new interpreter setting using a registered interpreter to the server.</td> + </tr> + <tr> + <td>URL</td> + <td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter/setting```</td> + </tr> + <tr> + <td>Success code</td> + <td>201</td> + </tr> + <tr> + <td> Fail code</td> + <td> 500 </td> + </tr> + <tr> + <td> sample JSON input + </td> + <td> + <pre> +{ + "name": "Markdown setting name", + "group": "md", + "properties": { + "propname": "propvalue" + }, + "interpreterGroup": [ + { + "class": "org.apache.zeppelin.markdown.Markdown", + "name": "md" + } + ] +} + </pre> + </td> + </tr> + <tr> + <td> sample JSON response + </td> + <td> + <pre> +{ + "status": "CREATED", + "message": "", + "body": { + "id": "2AYW25ANY", + "name": "Markdown setting name", + "group": "md", + "properties": { + "propname": "propvalue" + }, + "interpreterGroup": [ + { + "class": "org.apache.zeppelin.markdown.Markdown", + "name": "md" + } + ] + } +} + </pre> + </td> + </tr> + </table> + + +<br/> + + <table class="table-configuration"> + <col width="200"> + <tr> + <th>Update an interpreter setting</th> + <th></th> + </tr> + <tr> + <td>Description</td> + <td>This ```PUT``` method updates an interpreter setting with new properties.</td> + </tr> + <tr> + <td>URL</td> + <td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter/setting/[interpreter ID]```</td> + </tr> + <tr> + <td>Success code</td> + <td>200</td> + </tr> + <tr> + <td> Fail code</td> + <td> 500 </td> + </tr> + <tr> + <td> sample JSON input + </td> + <td> + <pre> +{ + "name": "Markdown setting name", + "group": "md", + "properties": { + "propname": "Otherpropvalue" + }, + "interpreterGroup": [ + { + "class": "org.apache.zeppelin.markdown.Markdown", + "name": "md" + } + ] +} + </pre> + </td> + </tr> + <tr> + <td> sample JSON response + </td> + <td> + <pre> +{ + "status": "OK", + "message": "", + "body": { + "id": "2AYW25ANY", + "name": "Markdown setting name", + "group": "md", + "properties": { + "propname": "Otherpropvalue" + }, + "interpreterGroup": [ + { + "class": "org.apache.zeppelin.markdown.Markdown", + "name": "md" + } + ] + } +} + </pre> + </td> + </tr> + </table> + + +<br/> + + <table class="table-configuration"> + <col width="200"> + <tr> + <th>Delete an interpreter setting</th> + <th></th> + </tr> + <tr> + <td>Description</td> + <td>This ```DELETE``` method deletes an given interpreter setting.</td> + </tr> + <tr> + <td>URL</td> + <td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter/setting/[interpreter ID]```</td> + </tr> + <tr> + <td>Success code</td> + <td>200</td> + </tr> + <tr> + <td> Fail code</td> + <td> 500 </td> + </tr> + <tr> + <td> sample JSON response + </td> + <td> + <pre>{"status":"OK"}</pre> + </td> + </tr> + </table> http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/rest-api/rest-notebook.md ---------------------------------------------------------------------- diff --git a/docs/rest-api/rest-notebook.md b/docs/rest-api/rest-notebook.md new file mode 100644 index 0000000..ffee95a --- /dev/null +++ b/docs/rest-api/rest-notebook.md @@ -0,0 +1,171 @@ +--- +layout: page +title: "Notebook REST API" +description: "" +group: rest-api +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +{% include JB/setup %} + +## Zeppelin REST API + Zeppelin provides several REST API's for interaction and remote activation of zeppelin functionality. + + All REST API are available starting with the following endpoint ```http://[zeppelin-server]:[zeppelin-port]/api``` + + Note that zeppein REST API receive or return JSON objects, it it recommended you install some JSON view such as + [JSONView](https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc) + + + If you work with zeppelin and find a need for an additional REST API please [file an issue or send us mail](../../community.html) + + <br /> +### Notebook REST API list + + Notebooks REST API supports the following operations: List, Create, Delete & Clone as detailed in the following table + + <table class="table-configuration"> + <col width="200"> + <tr> + <th>List notebooks</th> + <th></th> + </tr> + <tr> + <td>Description</td> + <td>This ```GET``` method list the available notebooks on your server. + Notebook JSON contains the ```name``` and ```id``` of all notebooks. + </td> + </tr> + <tr> + <td>URL</td> + <td>```http://[zeppelin-server]:[zeppelin-port]/api/notebook```</td> + </tr> + <tr> + <td>Success code</td> + <td>200</td> + </tr> + <tr> + <td> Fail code</td> + <td> 500 </td> + </tr> + <tr> + <td> sample JSON response </td> + <td><pre>{"status":"OK","message":"","body":[{"name":"Homepage","id":"2AV4WUEMK"},{"name":"Zeppelin Tutorial","id":"2A94M5J1Z"}]}</pre></td> + </tr> + </table> + +<br/> + + <table class="table-configuration"> + <col width="200"> + <tr> + <th>Create notebook</th> + <th></th> + </tr> + <tr> + <td>Description</td> + <td>This ```POST``` method create a new notebook using the given name or default name if none given. + The body field of the returned JSON contain the new notebook id. + </td> + </tr> + <tr> + <td>URL</td> + <td>```http://[zeppelin-server]:[zeppelin-port]/api/notebook```</td> + </tr> + <tr> + <td>Success code</td> + <td>201</td> + </tr> + <tr> + <td> Fail code</td> + <td> 500 </td> + </tr> + <tr> + <td> sample JSON input </td> + <td><pre>{"name": "name of new notebook"}</pre></td> + </tr> + <tr> + <td> sample JSON response </td> + <td><pre>{"status": "CREATED","message": "","body": "2AZPHY918"}</pre></td> + </tr> + </table> + +<br/> + + <table class="table-configuration"> + <col width="200"> + <tr> + <th>Delete notebook</th> + <th></th> + </tr> + <tr> + <td>Description</td> + <td>This ```DELETE``` method delete a notebook by the given notebook id. + </td> + </tr> + <tr> + <td>URL</td> + <td>```http://[zeppelin-server]:[zeppelin-port]/api/notebook/[notebookId]```</td> + </tr> + <tr> + <td>Success code</td> + <td>200</td> + </tr> + <tr> + <td> Fail code</td> + <td> 500 </td> + </tr> + <tr> + <td> sample JSON response </td> + <td><pre>{"status":"OK","message":""}</pre></td> + </tr> + </table> + +<br/> + + <table class="table-configuration"> + <col width="200"> + <tr> + <th>Clone notebook</th> + <th></th> + </tr> + <tr> + <td>Description</td> + <td>This ```POST``` method clone a notebook by the given id and create a new notebook using the given name + or default name if none given. + The body field of the returned JSON contain the new notebook id. + </td> + </tr> + <tr> + <td>URL</td> + <td>```http://[zeppelin-server]:[zeppelin-port]/api/notebook/[notebookId]```</td> + </tr> + <tr> + <td>Success code</td> + <td>201</td> + </tr> + <tr> + <td> Fail code</td> + <td> 500 </td> + </tr> + <tr> + <td> sample JSON input </td> + <td><pre>{"name": "name of new notebook"}</pre></td> + </tr> + <tr> + <td> sample JSON response </td> + <td><pre>{"status": "CREATED","message": "","body": "2AZPHY918"}</pre></td> + </tr> + </table> + http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/storage/storage.md ---------------------------------------------------------------------- diff --git a/docs/storage/storage.md b/docs/storage/storage.md new file mode 100644 index 0000000..a04a703 --- /dev/null +++ b/docs/storage/storage.md @@ -0,0 +1,80 @@ +--- +layout: page +title: "Storage" +description: "Notebook Storage option for Zeppelin" +group: storage +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +### Notebook Storage + +In Zeppelin there are two option for storage Notebook, by default the notebook is storage in the notebook folder in your local File System and the second option is S3. + +</br> +#### Notebook Storage in S3 + +For notebook storage in S3 you need the AWS credentials, for this there are three options, the enviroment variable ```AWS_ACCESS_KEY_ID``` and ```AWS_ACCESS_SECRET_KEY```, credentials file in the folder .aws in you home and IAM role for your instance. For complete the need steps is necessary: + +</br> +you need the following folder structure on S3 + +``` +bucket_name/ + username/ + notebook/ + +``` + +set the enviroment variable in the file **zeppelin-env.sh**: + +``` +export ZEPPELIN_NOTEBOOK_S3_BUCKET = bucket_name +export ZEPPELIN_NOTEBOOK_S3_USER = username +``` + +in the file **zeppelin-site.xml** uncommet and complete the next property: + +``` +<!--If used S3 to storage, it is necessary the following folder structure bucket_name/username/notebook/--> +<property> + <name>zeppelin.notebook.s3.user</name> + <value>username</value> + <description>user name for s3 folder structure</description> +</property> +<property> + <name>zeppelin.notebook.s3.bucket</name> + <value>bucket_name</value> + <description>bucket name for notebook storage</description> +</property> +``` + +uncomment the next property for use S3NotebookRepo class: + +``` +<property> + <name>zeppelin.notebook.storage</name> + <value>org.apache.zeppelin.notebook.repo.S3NotebookRepo</value> + <description>notebook persistence layer implementation</description> +</property> +``` + +comment the next property: + +``` +<property> + <name>zeppelin.notebook.storage</name> + <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo</value> + <description>notebook persistence layer implementation</description> +</property> +``` http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/tutorial/tutorial.md ---------------------------------------------------------------------- diff --git a/docs/tutorial/tutorial.md b/docs/tutorial/tutorial.md new file mode 100644 index 0000000..68b2ee7 --- /dev/null +++ b/docs/tutorial/tutorial.md @@ -0,0 +1,197 @@ +--- +layout: page +title: "Tutorial" +description: "Tutorial is valid for Spark 1.3 and higher" +group: tutorial +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +### Zeppelin Tutorial + +We will assume you have Zeppelin installed already. If that's not the case, see [Install](../install/install.html). + +Zeppelin's current main backend processing engine is [Apache Spark](https://spark.apache.org). If you're new to the system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin. + +<br /> +### Tutorial with Local File + +#### Data Refine + +Before you start Zeppelin tutorial, you will need to download [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip). + +First, to transform data from csv format into RDD of `Bank` objects, run following script. This will also remove header using `filter` function. + +```scala + +val bankText = sc.textFile("yourPath/bank/bank-full.csv") + +case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer) + +// split each line, filter out header (starts with "age"), and map it into Bank case class +val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map( + s=>Bank(s(0).toInt, + s(1).replaceAll("\"", ""), + s(2).replaceAll("\"", ""), + s(3).replaceAll("\"", ""), + s(5).replaceAll("\"", "").toInt + ) +) + +// convert to DataFrame and create temporal table +bank.toDF().registerTempTable("bank") +``` + +<br /> +#### Data Retrieval + +Suppose we want to see age distribution from `bank`. To do this, run: + +```sql +%sql select age, count(1) from bank where age < 30 group by age order by age +``` + +You can make input box for setting age condition by replacing `30` with `${maxAge=30}`. + +```sql +%sql select age, count(1) from bank where age < ${maxAge=30} group by age order by age +``` + +Now we want to see age distribution with certain marital status and add combo box to select marital status. Run: + +```sql +%sql select age, count(1) from bank where marital="${marital=single,single|divorced|married}" group by age order by age +``` + +<br /> +### Tutorial with Streaming Data + +#### Data Refine + +Since this tutorial is based on Twitter's sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup). After you get API keys, you should fill out credential related values(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`) with your API keys on following script. + +This will create a RDD of `Tweet` objects and register these stream data as a table: + +```scala +import org.apache.spark.streaming._ +import org.apache.spark.streaming.twitter._ +import org.apache.spark.storage.StorageLevel +import scala.io.Source +import scala.collection.mutable.HashMap +import java.io.File +import org.apache.log4j.Logger +import org.apache.log4j.Level +import sys.process.stringSeqToProcess + +/** Configures the Oauth Credentials for accessing Twitter */ +def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) { + val configs = new HashMap[String, String] ++= Seq( + "apiKey" -> apiKey, "apiSecret" -> apiSecret, "accessToken" -> accessToken, "accessTokenSecret" -> accessTokenSecret) + println("Configuring Twitter OAuth") + configs.foreach{ case(key, value) => + if (value.trim.isEmpty) { + throw new Exception("Error setting authentication - value for " + key + " not set") + } + val fullKey = "twitter4j.oauth." + key.replace("api", "consumer") + System.setProperty(fullKey, value.trim) + println("\tProperty " + fullKey + " set as [" + value.trim + "]") + } + println() +} + +// Configure Twitter credentials +val apiKey = "xxxxxxxxxxxxxxxxxxxxxxxxx" +val apiSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" +val accessToken = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" +val accessTokenSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" +configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret) + +import org.apache.spark.streaming.twitter._ +val ssc = new StreamingContext(sc, Seconds(2)) +val tweets = TwitterUtils.createStream(ssc, None) +val twt = tweets.window(Seconds(60)) + +case class Tweet(createdAt:Long, text:String) +twt.map(status=> + Tweet(status.getCreatedAt().getTime()/1000, status.getText()) +).foreachRDD(rdd=> + // Below line works only in spark 1.3.0. + // For spark 1.1.x and spark 1.2.x, + // use rdd.registerTempTable("tweets") instead. + rdd.toDF().registerAsTable("tweets") +) + +twt.print + +ssc.start() +``` + +<br /> +#### Data Retrieval + +For each following script, every time you click run button you will see different result since it is based on real-time data. + +Let's begin by extracting maximum 10 tweets which contain the word "girl". + +```sql +%sql select * from tweets where text like '%girl%' limit 10 +``` + +This time suppose we want to see how many tweets have been created per sec during last 60 sec. To do this, run: + +```sql +%sql select createdAt, count(1) from tweets group by createdAt order by createdAt +``` + + +You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes(positive, negative, neutral) towards the parameter. + +```scala +def sentiment(s:String) : String = { + val positive = Array("like", "love", "good", "great", "happy", "cool", "the", "one", "that") + val negative = Array("hate", "bad", "stupid", "is") + + var st = 0; + + val words = s.split(" ") + positive.foreach(p => + words.foreach(w => + if(p==w) st = st+1 + ) + ) + + negative.foreach(p=> + words.foreach(w=> + if(p==w) st = st-1 + ) + ) + if(st>0) + "positivie" + else if(st<0) + "negative" + else + "neutral" +} + +// Below line works only in spark 1.3.0. +// For spark 1.1.x and spark 1.2.x, +// use sqlc.registerFunction("sentiment", sentiment _) instead. +sqlc.udf.register("sentiment", sentiment _) + +``` + +To check how people think about girls using `sentiment` function we've made above, run this: + +```sql +%sql select sentiment(text), count(1) from tweets where text like '%girl%' group by sentiment(text) +```
