http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/docs/rest-api/rest-interpreter.md ---------------------------------------------------------------------- diff --git a/docs/docs/rest-api/rest-interpreter.md b/docs/docs/rest-api/rest-interpreter.md deleted file mode 100644 index d852340..0000000 --- a/docs/docs/rest-api/rest-interpreter.md +++ /dev/null @@ -1,363 +0,0 @@ ---- -layout: page -title: "Interpreter REST API" -description: "" -group: rest-api ---- -<!-- -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> -{% include JB/setup %} - -## Zeppelin REST API - Zeppelin provides several REST API's for interaction and remote activation of zeppelin functionality. - - All REST API are available starting with the following endpoint ```http://[zeppelin-server]:[zeppelin-port]/api``` - - Note that zeppein REST API receive or return JSON objects, it it recommended you install some JSON view such as - [JSONView](https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc) - - - If you work with zeppelin and find a need for an additional REST API please [file an issue or send us mail](../../community.html) - - <br /> -### Interpreter REST API list - - The role of registered interpreters, settings and interpreters group is described [here](../manual/interpreters.html) - - <table class="table-configuration"> - <col width="200"> - <tr> - <th>List registered interpreters</th> - <th></th> - </tr> - <tr> - <td>Description</td> - <td>This ```GET``` method return all the registered interpreters available on the server.</td> - </tr> - <tr> - <td>URL</td> - <td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter```</td> - </tr> - <tr> - <td>Success code</td> - <td>200</td> - </tr> - <tr> - <td> Fail code</td> - <td> 500 </td> - </tr> - <tr> - <td> sample JSON response - </td> - <td> - <pre> -{ - "status": "OK", - "message": "", - "body": { - "md.md": { - "name": "md", - "group": "md", - "className": "org.apache.zeppelin.markdown.Markdown", - "properties": {}, - "path": "/zeppelin/interpreter/md" - }, - "spark.spark": { - "name": "spark", - "group": "spark", - "className": "org.apache.zeppelin.spark.SparkInterpreter", - "properties": { - "spark.executor.memory": { - "defaultValue": "512m", - "description": "Executor memory per worker instance. ex) 512m, 32g" - }, - "spark.cores.max": { - "defaultValue": "", - "description": "Total number of cores to use. Empty value uses all available core." - }, - }, - "path": "/zeppelin/interpreter/spark" - }, - "spark.sql": { - "name": "sql", - "group": "spark", - "className": "org.apache.zeppelin.spark.SparkSqlInterpreter", - "properties": { - "zeppelin.spark.maxResult": { - "defaultValue": "1000", - "description": "Max number of SparkSQL result to display." - } - }, - "path": "/zeppelin/interpreter/spark" - } - } -} - </pre> - </td> - </tr> - </table> - -<br/> - - <table class="table-configuration"> - <col width="200"> - <tr> - <th>List interpreters settings</th> - <th></th> - </tr> - <tr> - <td>Description</td> - <td>This ```GET``` method return all the interpreters settings registered on the server.</td> - </tr> - <tr> - <td>URL</td> - <td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter/setting```</td> - </tr> - <tr> - <td>Success code</td> - <td>200</td> - </tr> - <tr> - <td> Fail code</td> - <td> 500 </td> - </tr> - <tr> - <td> sample JSON response - </td> - <td> - <pre> -{ - "status": "OK", - "message": "", - "body": [ - { - "id": "2AYUGP2D5", - "name": "md", - "group": "md", - "properties": { - "_empty_": "" - }, - "interpreterGroup": [ - { - "class": "org.apache.zeppelin.markdown.Markdown", - "name": "md" - } - ] - }, - { - "id": "2AY6GV7Q3", - "name": "spark", - "group": "spark", - "properties": { - "spark.cores.max": "", - "spark.executor.memory": "512m", - }, - "interpreterGroup": [ - { - "class": "org.apache.zeppelin.spark.SparkInterpreter", - "name": "spark" - }, - { - "class": "org.apache.zeppelin.spark.SparkSqlInterpreter", - "name": "sql" - } - ] - } - ] -} - </pre> - </td> - </tr> - </table> - -<br/> - - <table class="table-configuration"> - <col width="200"> - <tr> - <th>Create an interpreter setting</th> - <th></th> - </tr> - <tr> - <td>Description</td> - <td>This ```POST``` method adds a new interpreter setting using a registered interpreter to the server.</td> - </tr> - <tr> - <td>URL</td> - <td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter/setting```</td> - </tr> - <tr> - <td>Success code</td> - <td>201</td> - </tr> - <tr> - <td> Fail code</td> - <td> 500 </td> - </tr> - <tr> - <td> sample JSON input - </td> - <td> - <pre> -{ - "name": "Markdown setting name", - "group": "md", - "properties": { - "propname": "propvalue" - }, - "interpreterGroup": [ - { - "class": "org.apache.zeppelin.markdown.Markdown", - "name": "md" - } - ] -} - </pre> - </td> - </tr> - <tr> - <td> sample JSON response - </td> - <td> - <pre> -{ - "status": "CREATED", - "message": "", - "body": { - "id": "2AYW25ANY", - "name": "Markdown setting name", - "group": "md", - "properties": { - "propname": "propvalue" - }, - "interpreterGroup": [ - { - "class": "org.apache.zeppelin.markdown.Markdown", - "name": "md" - } - ] - } -} - </pre> - </td> - </tr> - </table> - - -<br/> - - <table class="table-configuration"> - <col width="200"> - <tr> - <th>Update an interpreter setting</th> - <th></th> - </tr> - <tr> - <td>Description</td> - <td>This ```PUT``` method updates an interpreter setting with new properties.</td> - </tr> - <tr> - <td>URL</td> - <td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter/setting/[interpreter ID]```</td> - </tr> - <tr> - <td>Success code</td> - <td>200</td> - </tr> - <tr> - <td> Fail code</td> - <td> 500 </td> - </tr> - <tr> - <td> sample JSON input - </td> - <td> - <pre> -{ - "name": "Markdown setting name", - "group": "md", - "properties": { - "propname": "Otherpropvalue" - }, - "interpreterGroup": [ - { - "class": "org.apache.zeppelin.markdown.Markdown", - "name": "md" - } - ] -} - </pre> - </td> - </tr> - <tr> - <td> sample JSON response - </td> - <td> - <pre> -{ - "status": "OK", - "message": "", - "body": { - "id": "2AYW25ANY", - "name": "Markdown setting name", - "group": "md", - "properties": { - "propname": "Otherpropvalue" - }, - "interpreterGroup": [ - { - "class": "org.apache.zeppelin.markdown.Markdown", - "name": "md" - } - ] - } -} - </pre> - </td> - </tr> - </table> - - -<br/> - - <table class="table-configuration"> - <col width="200"> - <tr> - <th>Delete an interpreter setting</th> - <th></th> - </tr> - <tr> - <td>Description</td> - <td>This ```DELETE``` method deletes an given interpreter setting.</td> - </tr> - <tr> - <td>URL</td> - <td>```http://[zeppelin-server]:[zeppelin-port]/api/interpreter/setting/[interpreter ID]```</td> - </tr> - <tr> - <td>Success code</td> - <td>200</td> - </tr> - <tr> - <td> Fail code</td> - <td> 500 </td> - </tr> - <tr> - <td> sample JSON response - </td> - <td> - <pre>{"status":"OK"}</pre> - </td> - </tr> - </table>
http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/docs/rest-api/rest-notebook.md ---------------------------------------------------------------------- diff --git a/docs/docs/rest-api/rest-notebook.md b/docs/docs/rest-api/rest-notebook.md deleted file mode 100644 index ffee95a..0000000 --- a/docs/docs/rest-api/rest-notebook.md +++ /dev/null @@ -1,171 +0,0 @@ ---- -layout: page -title: "Notebook REST API" -description: "" -group: rest-api ---- -<!-- -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> -{% include JB/setup %} - -## Zeppelin REST API - Zeppelin provides several REST API's for interaction and remote activation of zeppelin functionality. - - All REST API are available starting with the following endpoint ```http://[zeppelin-server]:[zeppelin-port]/api``` - - Note that zeppein REST API receive or return JSON objects, it it recommended you install some JSON view such as - [JSONView](https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc) - - - If you work with zeppelin and find a need for an additional REST API please [file an issue or send us mail](../../community.html) - - <br /> -### Notebook REST API list - - Notebooks REST API supports the following operations: List, Create, Delete & Clone as detailed in the following table - - <table class="table-configuration"> - <col width="200"> - <tr> - <th>List notebooks</th> - <th></th> - </tr> - <tr> - <td>Description</td> - <td>This ```GET``` method list the available notebooks on your server. - Notebook JSON contains the ```name``` and ```id``` of all notebooks. - </td> - </tr> - <tr> - <td>URL</td> - <td>```http://[zeppelin-server]:[zeppelin-port]/api/notebook```</td> - </tr> - <tr> - <td>Success code</td> - <td>200</td> - </tr> - <tr> - <td> Fail code</td> - <td> 500 </td> - </tr> - <tr> - <td> sample JSON response </td> - <td><pre>{"status":"OK","message":"","body":[{"name":"Homepage","id":"2AV4WUEMK"},{"name":"Zeppelin Tutorial","id":"2A94M5J1Z"}]}</pre></td> - </tr> - </table> - -<br/> - - <table class="table-configuration"> - <col width="200"> - <tr> - <th>Create notebook</th> - <th></th> - </tr> - <tr> - <td>Description</td> - <td>This ```POST``` method create a new notebook using the given name or default name if none given. - The body field of the returned JSON contain the new notebook id. - </td> - </tr> - <tr> - <td>URL</td> - <td>```http://[zeppelin-server]:[zeppelin-port]/api/notebook```</td> - </tr> - <tr> - <td>Success code</td> - <td>201</td> - </tr> - <tr> - <td> Fail code</td> - <td> 500 </td> - </tr> - <tr> - <td> sample JSON input </td> - <td><pre>{"name": "name of new notebook"}</pre></td> - </tr> - <tr> - <td> sample JSON response </td> - <td><pre>{"status": "CREATED","message": "","body": "2AZPHY918"}</pre></td> - </tr> - </table> - -<br/> - - <table class="table-configuration"> - <col width="200"> - <tr> - <th>Delete notebook</th> - <th></th> - </tr> - <tr> - <td>Description</td> - <td>This ```DELETE``` method delete a notebook by the given notebook id. - </td> - </tr> - <tr> - <td>URL</td> - <td>```http://[zeppelin-server]:[zeppelin-port]/api/notebook/[notebookId]```</td> - </tr> - <tr> - <td>Success code</td> - <td>200</td> - </tr> - <tr> - <td> Fail code</td> - <td> 500 </td> - </tr> - <tr> - <td> sample JSON response </td> - <td><pre>{"status":"OK","message":""}</pre></td> - </tr> - </table> - -<br/> - - <table class="table-configuration"> - <col width="200"> - <tr> - <th>Clone notebook</th> - <th></th> - </tr> - <tr> - <td>Description</td> - <td>This ```POST``` method clone a notebook by the given id and create a new notebook using the given name - or default name if none given. - The body field of the returned JSON contain the new notebook id. - </td> - </tr> - <tr> - <td>URL</td> - <td>```http://[zeppelin-server]:[zeppelin-port]/api/notebook/[notebookId]```</td> - </tr> - <tr> - <td>Success code</td> - <td>201</td> - </tr> - <tr> - <td> Fail code</td> - <td> 500 </td> - </tr> - <tr> - <td> sample JSON input </td> - <td><pre>{"name": "name of new notebook"}</pre></td> - </tr> - <tr> - <td> sample JSON response </td> - <td><pre>{"status": "CREATED","message": "","body": "2AZPHY918"}</pre></td> - </tr> - </table> - http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/docs/storage/storage.md ---------------------------------------------------------------------- diff --git a/docs/docs/storage/storage.md b/docs/docs/storage/storage.md deleted file mode 100644 index a04a703..0000000 --- a/docs/docs/storage/storage.md +++ /dev/null @@ -1,80 +0,0 @@ ---- -layout: page -title: "Storage" -description: "Notebook Storage option for Zeppelin" -group: storage ---- -<!-- -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> -### Notebook Storage - -In Zeppelin there are two option for storage Notebook, by default the notebook is storage in the notebook folder in your local File System and the second option is S3. - -</br> -#### Notebook Storage in S3 - -For notebook storage in S3 you need the AWS credentials, for this there are three options, the enviroment variable ```AWS_ACCESS_KEY_ID``` and ```AWS_ACCESS_SECRET_KEY```, credentials file in the folder .aws in you home and IAM role for your instance. For complete the need steps is necessary: - -</br> -you need the following folder structure on S3 - -``` -bucket_name/ - username/ - notebook/ - -``` - -set the enviroment variable in the file **zeppelin-env.sh**: - -``` -export ZEPPELIN_NOTEBOOK_S3_BUCKET = bucket_name -export ZEPPELIN_NOTEBOOK_S3_USER = username -``` - -in the file **zeppelin-site.xml** uncommet and complete the next property: - -``` -<!--If used S3 to storage, it is necessary the following folder structure bucket_name/username/notebook/--> -<property> - <name>zeppelin.notebook.s3.user</name> - <value>username</value> - <description>user name for s3 folder structure</description> -</property> -<property> - <name>zeppelin.notebook.s3.bucket</name> - <value>bucket_name</value> - <description>bucket name for notebook storage</description> -</property> -``` - -uncomment the next property for use S3NotebookRepo class: - -``` -<property> - <name>zeppelin.notebook.storage</name> - <value>org.apache.zeppelin.notebook.repo.S3NotebookRepo</value> - <description>notebook persistence layer implementation</description> -</property> -``` - -comment the next property: - -``` -<property> - <name>zeppelin.notebook.storage</name> - <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo</value> - <description>notebook persistence layer implementation</description> -</property> -``` http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/docs/tutorial/tutorial.md ---------------------------------------------------------------------- diff --git a/docs/docs/tutorial/tutorial.md b/docs/docs/tutorial/tutorial.md deleted file mode 100644 index 68b2ee7..0000000 --- a/docs/docs/tutorial/tutorial.md +++ /dev/null @@ -1,197 +0,0 @@ ---- -layout: page -title: "Tutorial" -description: "Tutorial is valid for Spark 1.3 and higher" -group: tutorial ---- -<!-- -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> -### Zeppelin Tutorial - -We will assume you have Zeppelin installed already. If that's not the case, see [Install](../install/install.html). - -Zeppelin's current main backend processing engine is [Apache Spark](https://spark.apache.org). If you're new to the system, you might want to start by getting an idea of how it processes data to get the most out of Zeppelin. - -<br /> -### Tutorial with Local File - -#### Data Refine - -Before you start Zeppelin tutorial, you will need to download [bank.zip](http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip). - -First, to transform data from csv format into RDD of `Bank` objects, run following script. This will also remove header using `filter` function. - -```scala - -val bankText = sc.textFile("yourPath/bank/bank-full.csv") - -case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer) - -// split each line, filter out header (starts with "age"), and map it into Bank case class -val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map( - s=>Bank(s(0).toInt, - s(1).replaceAll("\"", ""), - s(2).replaceAll("\"", ""), - s(3).replaceAll("\"", ""), - s(5).replaceAll("\"", "").toInt - ) -) - -// convert to DataFrame and create temporal table -bank.toDF().registerTempTable("bank") -``` - -<br /> -#### Data Retrieval - -Suppose we want to see age distribution from `bank`. To do this, run: - -```sql -%sql select age, count(1) from bank where age < 30 group by age order by age -``` - -You can make input box for setting age condition by replacing `30` with `${maxAge=30}`. - -```sql -%sql select age, count(1) from bank where age < ${maxAge=30} group by age order by age -``` - -Now we want to see age distribution with certain marital status and add combo box to select marital status. Run: - -```sql -%sql select age, count(1) from bank where marital="${marital=single,single|divorced|married}" group by age order by age -``` - -<br /> -### Tutorial with Streaming Data - -#### Data Refine - -Since this tutorial is based on Twitter's sample tweet stream, you must configure authentication with a Twitter account. To do this, take a look at [Twitter Credential Setup](https://databricks-training.s3.amazonaws.com/realtime-processing-with-spark-streaming.html#twitter-credential-setup). After you get API keys, you should fill out credential related values(`apiKey`, `apiSecret`, `accessToken`, `accessTokenSecret`) with your API keys on following script. - -This will create a RDD of `Tweet` objects and register these stream data as a table: - -```scala -import org.apache.spark.streaming._ -import org.apache.spark.streaming.twitter._ -import org.apache.spark.storage.StorageLevel -import scala.io.Source -import scala.collection.mutable.HashMap -import java.io.File -import org.apache.log4j.Logger -import org.apache.log4j.Level -import sys.process.stringSeqToProcess - -/** Configures the Oauth Credentials for accessing Twitter */ -def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) { - val configs = new HashMap[String, String] ++= Seq( - "apiKey" -> apiKey, "apiSecret" -> apiSecret, "accessToken" -> accessToken, "accessTokenSecret" -> accessTokenSecret) - println("Configuring Twitter OAuth") - configs.foreach{ case(key, value) => - if (value.trim.isEmpty) { - throw new Exception("Error setting authentication - value for " + key + " not set") - } - val fullKey = "twitter4j.oauth." + key.replace("api", "consumer") - System.setProperty(fullKey, value.trim) - println("\tProperty " + fullKey + " set as [" + value.trim + "]") - } - println() -} - -// Configure Twitter credentials -val apiKey = "xxxxxxxxxxxxxxxxxxxxxxxxx" -val apiSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" -val accessToken = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" -val accessTokenSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" -configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret) - -import org.apache.spark.streaming.twitter._ -val ssc = new StreamingContext(sc, Seconds(2)) -val tweets = TwitterUtils.createStream(ssc, None) -val twt = tweets.window(Seconds(60)) - -case class Tweet(createdAt:Long, text:String) -twt.map(status=> - Tweet(status.getCreatedAt().getTime()/1000, status.getText()) -).foreachRDD(rdd=> - // Below line works only in spark 1.3.0. - // For spark 1.1.x and spark 1.2.x, - // use rdd.registerTempTable("tweets") instead. - rdd.toDF().registerAsTable("tweets") -) - -twt.print - -ssc.start() -``` - -<br /> -#### Data Retrieval - -For each following script, every time you click run button you will see different result since it is based on real-time data. - -Let's begin by extracting maximum 10 tweets which contain the word "girl". - -```sql -%sql select * from tweets where text like '%girl%' limit 10 -``` - -This time suppose we want to see how many tweets have been created per sec during last 60 sec. To do this, run: - -```sql -%sql select createdAt, count(1) from tweets group by createdAt order by createdAt -``` - - -You can make user-defined function and use it in Spark SQL. Let's try it by making function named `sentiment`. This function will return one of the three attitudes(positive, negative, neutral) towards the parameter. - -```scala -def sentiment(s:String) : String = { - val positive = Array("like", "love", "good", "great", "happy", "cool", "the", "one", "that") - val negative = Array("hate", "bad", "stupid", "is") - - var st = 0; - - val words = s.split(" ") - positive.foreach(p => - words.foreach(w => - if(p==w) st = st+1 - ) - ) - - negative.foreach(p=> - words.foreach(w=> - if(p==w) st = st-1 - ) - ) - if(st>0) - "positivie" - else if(st<0) - "negative" - else - "neutral" -} - -// Below line works only in spark 1.3.0. -// For spark 1.1.x and spark 1.2.x, -// use sqlc.registerFunction("sentiment", sentiment _) instead. -sqlc.udf.register("sentiment", sentiment _) - -``` - -To check how people think about girls using `sentiment` function we've made above, run this: - -```sql -%sql select sentiment(text), count(1) from tweets where text like '%girl%' group by sentiment(text) -``` http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/download.md ---------------------------------------------------------------------- diff --git a/docs/download.md b/docs/download.md deleted file mode 100644 index 99c4ac1..0000000 --- a/docs/download.md +++ /dev/null @@ -1,87 +0,0 @@ ---- -layout: page -title: "Download" -description: "" -group: nav-right ---- -<!-- -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> -{% include JB/setup %} - -### Download Zeppelin - -The latest release of Apache Zeppelin (incubating) is *0.5.0-incubating*. - - - 0.5.0-incubating released on July 23, 2015 ([release notes](./docs/releases/zeppelin-release-0.5.0-incubating.html)) ([git tag](https://git-wip-us.apache.org/repos/asf?p=incubator-zeppelin.git;a=tag;h=refs/tags/v0.5.0)) - - - * Source: - <a style="cursor:pointer" onclick="ga('send', 'event', 'download', 'zeppelin-src', '0.5.0-incubating'); window.location.href='http://www.apache.org/dyn/closer.cgi/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating.tgz'">zeppelin-0.5.0-incubating.tgz</a> - ([pgp](https://www.apache.org/dist/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating.tgz.asc), - [md5](https://www.apache.org/dist/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating.tgz.md5), - [sha](https://www.apache.org/dist/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating.tgz.sha)) - - * Binary built with spark-1.4.0 and hadoop-2.3: - <a style="cursor:pointer" onclick="ga('send', 'event', 'download', 'zeppelin-bin', '0.5.0-incubating'); window.location.href='http://www.apache.org/dyn/closer.cgi/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating-bin-spark-1.4.0_hadoop-2.3.tgz'">zeppelin-0.5.0-incubating-bin-spark-1.4.0_hadoop-2.3.tgz</a> - ([pgp](https://www.apache.org/dist/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating-bin-spark-1.4.0_hadoop-2.3.tgz.asc), - [md5](https://www.apache.org/dist/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating-bin-spark-1.4.0_hadoop-2.3.tgz.md5), - [sha](https://www.apache.org/dist/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating-bin-spark-1.4.0_hadoop-2.3.tgz.sha)) - - * Binary built with spark-1.3.1 and hadoop-2.3: - <a style="cursor:pointer" onclick="ga('send', 'event', 'download', 'zeppelin-bin', '0.5.0-incubating'); window.location.href='http://www.apache.org/dyn/closer.cgi/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating-bin-spark-1.3.1_hadoop-2.3.tgz'">zeppelin-0.5.0-incubating-bin-spark-1.3.1_hadoop-2.3.tgz</a> - ([pgp](https://www.apache.org/dist/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating-bin-spark-1.3.1_hadoop-2.3.tgz.asc), - [md5](https://www.apache.org/dist/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating-bin-spark-1.3.1_hadoop-2.3.tgz.md5), - [sha](https://www.apache.org/dist/incubator/zeppelin/0.5.0-incubating/zeppelin-0.5.0-incubating-bin-spark-1.3.1_hadoop-2.3.tgz.sha)) - - - - - -### Verify the integrity of the files - -It is essential that you [verify](https://www.apache.org/info/verification.html) the integrity of the downloaded files using the PGP or MD5 signatures. This signature should be matched against the [KEYS](https://www.apache.org/dist/incubator/zeppelin/KEYS) file. - - - -### Build from source - -For developers, to get latest *0.6.0-incubating-SNAPSHOT* check [install](./docs/install/install.html) section. - - -<!-- -------------- -### Old release - -##### Zeppelin-0.3.3 (2014.03.29) - -Download <a onclick="ga('send', 'event', 'download', 'zeppelin', '0.3.3');" href="https://s3-ap-northeast-1.amazonaws.com/zeppel.in/zeppelin-0.3.3.tar.gz">zeppelin-0.3.3.tar.gz</a> ([release note](https://zeppelin-project.atlassian.net/secure/ReleaseNote.jspa?projectId=10001&version=10301)) - - -##### Zeppelin-0.3.2 (2014.03.14) - -Download <a onclick="ga('send', 'event', 'download', 'zeppelin', '0.3.2');" href="https://s3-ap-northeast-1.amazonaws.com/zeppel.in/zeppelin-0.3.2.tar.gz">zeppelin-0.3.2.tar.gz</a> ([release note](https://zeppelin-project.atlassian.net/secure/ReleaseNote.jspa?projectId=10001&version=10300)) - -##### Zeppelin-0.3.1 (2014.03.06) - -Download <a onclick="ga('send', 'event', 'download', 'zeppelin', '0.3.1');" href="https://s3-ap-northeast-1.amazonaws.com/zeppel.in/zeppelin-0.3.1.tar.gz">zeppelin-0.3.1.tar.gz</a> ([release note](https://zeppelin-project.atlassian.net/secure/ReleaseNote.jspa?projectId=10001&version=10201)) - -##### Zeppelin-0.3.0 (2014.02.07) - -Download <a onclick="ga('send', 'event', 'download', 'zeppelin', '0.3.0');" href="https://s3-ap-northeast-1.amazonaws.com/zeppel.in/zeppelin-0.3.0.tar.gz">zeppelin-0.3.0.tar.gz</a>, ([release note](https://zeppelin-project.atlassian.net/secure/ReleaseNote.jspa?projectId=10001&version=10200)) - -##### Zeppelin-0.2.0 (2014.01.22) - -Download Download <a onclick="ga('send', 'event', 'download', 'zeppelin', '0.2.0');" href="https://s3-ap-northeast-1.amazonaws.com/zeppel.in/zeppelin-0.2.0.tar.gz">zeppelin-0.2.0.tar.gz</a>, ([release note](https://zeppelin-project.atlassian.net/secure/ReleaseNote.jspa?projectId=10001&version=10001)) - ---> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/index.md ---------------------------------------------------------------------- diff --git a/docs/index.md b/docs/index.md index 57ad2fb..4343c64 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,7 +1,8 @@ --- layout: page -title: Zeppelin +title: Overview tagline: Less Development, More analysis! +group: nav-right --- <!-- Licensed under the Apache License, Version 2.0 (the "License"); @@ -17,7 +18,7 @@ See the License for the specific language governing permissions and limitations under the License. --> {% include JB/setup %} - +<br /> <div class="row"> <div class="col-md-5"> <h2>Multi-purpose Notebook</h2> @@ -45,7 +46,7 @@ Currently Zeppelin supports many interpreters such as Scala(with Apache Spark), <img class="img-responsive" src="assets/themes/zeppelin/img/screenshots/multiple_language_backend.png" /> -Adding new language-backend is really simple. Learn [how to write a zeppelin interpreter](./docs/development/writingzeppelininterpreter.html). +Adding new language-backend is really simple. Learn [how to write a zeppelin interpreter](./development/writingzeppelininterpreter.html). <br /> @@ -58,7 +59,7 @@ Zeppelin provides built-in Apache Spark integration. You don't need to build a s Zeppelin's Spark integration provides - Automatic SparkContext and SQLContext injection -- Runtime jar dependency loading from local filesystem or maven repository. Learn more about [dependency loader](./docs/interpreter/spark.html#dependencyloading). +- Runtime jar dependency loading from local filesystem or maven repository. Learn more about [dependency loader](./interpreter/spark.html#dependencyloading). - Canceling job and displaying its progress <br /> @@ -84,7 +85,7 @@ With simple drag and drop Zeppelin aggeregates the values and display them in pi <img class="img-responsive" src="./assets/themes/zeppelin/img/screenshots/pivot.png" /> </div> </div> -Learn more about Zeppelin's Display system. ( [text](./docs/displaysystem/display.html), [html](./docs/displaysystem/display.html#html), [table](./docs/displaysystem/table.html), [angular](./docs/displaysystem/angular.html) ) +Learn more about Zeppelin's Display system. ( [text](./displaysystem/display.html), [html](./displaysystem/display.html#html), [table](./displaysystem/table.html), [angular](./displaysystem/angular.html) ) <br /> @@ -94,7 +95,7 @@ Zeppelin can dynamically create some input forms into your notebook. <img class="img-responsive" src="./assets/themes/zeppelin/img/screenshots/form_input.png" /> -Learn more about [Dynamic Forms](./docs/manual/dynamicform.html). +Learn more about [Dynamic Forms](./manual/dynamicform.html). <br /> @@ -117,7 +118,7 @@ This way, you can easily embed it as an iframe inside of your website.</p> <br /> ### 100% Opensource -Apache Zeppelin (incubating) is Apache2 Licensed software. Please check out the [source repository](https://github.com/apache/incubator-zeppelin) and [How to contribute](./docs/development/howtocontribute.html) +Apache Zeppelin (incubating) is Apache2 Licensed software. Please check out the [source repository](https://github.com/apache/incubator-zeppelin) and [How to contribute](./development/howtocontribute.html) Zeppelin has a very active development community. Join the [Mailing list](./community.html) and report issues on our [Issue tracker](https://issues.apache.org/jira/browse/ZEPPELIN). http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/install/install.md ---------------------------------------------------------------------- diff --git a/docs/install/install.md b/docs/install/install.md new file mode 100644 index 0000000..a4b3336 --- /dev/null +++ b/docs/install/install.md @@ -0,0 +1,132 @@ +--- +layout: page +title: "Install Zeppelin" +description: "" +group: install +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +{% include JB/setup %} + + + +## Build + +#### Prerequisites + + * Java 1.7 + * None root account + * Apache Maven + +Build tested on OSX, CentOS 6. + +Checkout source code from [https://github.com/apache/incubator-zeppelin](https://github.com/apache/incubator-zeppelin) + +#### Local mode + +``` +mvn install -DskipTests +``` + +#### Cluster mode + +``` +mvn install -DskipTests -Dspark.version=1.1.0 -Dhadoop.version=2.2.0 +``` + +Change spark.version and hadoop.version to your cluster's one. + +#### Custom built Spark + +Note that is you uses custom build spark, you need build Zeppelin with custome built spark artifact. To do that, deploy spark artifact to local maven repository using + +``` +sbt/sbt publish-local +``` + +and then build Zeppelin with your custom built Spark + +``` +mvn install -DskipTests -Dspark.version=1.1.0-Custom -Dhadoop.version=2.2.0 +``` + + + + +## Configure + +Configuration can be done by both environment variable(conf/zeppelin-env.sh) and java properties(conf/zeppelin-site.xml). If both defined, environment vaiable is used. + + +<table class="table-configuration"> + <tr> + <th>zepplin-env.sh</th> + <th>zepplin-site.xml</th> + <th>Default value</th> + <th>Description</th> + </tr> + <tr> + <td>ZEPPELIN_PORT</td> + <td>zeppelin.server.port</td> + <td>8080</td> + <td>Zeppelin server port. Note that port+1 is used for web socket</td> + </tr> + <tr> + <td>ZEPPELIN_NOTEBOOK_DIR</td> + <td>zeppelin.notebook.dir</td> + <td>notebook</td> + <td>Where notebook file is saved</td> + </tr> + <tr> + <td>ZEPPELIN_INTERPRETERS</td> + <td>zeppelin.interpreters</td> + <description></description> + <td>org.apache.zeppelin.spark.SparkInterpreter,<br />org.apache.zeppelin.spark.PySparkInterpreter,<br />org.apache.zeppelin.spark.SparkSqlInterpreter,<br />org.apache.zeppelin.spark.DepInterpreter,<br />org.apache.zeppelin.markdown.Markdown,<br />org.apache.zeppelin.shell.ShellInterpreter,<br />org.apache.zeppelin.hive.HiveInterpreter</td> + <td>Comma separated interpreter configurations [Class]. First interpreter become a default</td> + </tr> + <tr> + <td>ZEPPELIN_INTERPRETER_DIR</td> + <td>zeppelin.interpreter.dir</td> + <td>interpreter</td> + <td>Zeppelin interpreter directory</td> + </tr> + <tr> + <td>MASTER</td> + <td></td> + <td>N/A</td> + <td>Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode</td> + </tr> + <tr> + <td>ZEPPELIN_JAVA_OPTS</td> + <td></td> + <td>N/A</td> + <td>JVM Options</td> +</table> + +## Start/Stop +#### Start Zeppelin + +``` +bin/zeppelin-daemon.sh start +``` +After successful start, visit http://localhost:8080 with your web browser. +Note that port **8081** also need to be accessible for websocket connection. + +#### Stop Zeppelin + +``` +bin/zeppelin-daemon.sh stop +``` + + http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/install/yarn_install.md ---------------------------------------------------------------------- diff --git a/docs/install/yarn_install.md b/docs/install/yarn_install.md new file mode 100644 index 0000000..2b38068 --- /dev/null +++ b/docs/install/yarn_install.md @@ -0,0 +1,264 @@ +--- +layout: page +title: "Install Zeppelin to connect with existing YARN cluster" +description: "" +group: install +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +{% include JB/setup %} + +## Introduction +This page describes how to pre-configure a bare metal node, build & configure Zeppelin on it, configure Zeppelin and connect it to existing YARN cluster running Hortonworks flavour of Hadoop. It also describes steps to configure Spark & Hive interpreter of Zeppelin. + +## Prepare Node + +### Zeppelin user (Optional) +This step is optional, however its nice to run Zeppelin under its own user. In case you do not like to use Zeppelin (hope not) the user could be deleted along with all the pacakges that were installed for Zeppelin, Zeppelin binary itself and associated directories. + +Create a zeppelin user and switch to zeppelin user or if zeppelin user is already created then login as zeppelin. + +```bash +useradd zeppelin +su - zeppelin +whoami +``` +Assuming a zeppelin user is created then running whoami command must return + +```bash +zeppelin +``` + +Its assumed in the rest of the document that zeppelin user is indeed created and below installation instructions are performed as zeppelin user. + +### List of Prerequisites + + * CentOS 6.x + * Git + * Java 1.7 + * Apache Maven + * Hadoop client. + * Spark. + * Internet connection is required. + +Its assumed that the node has CentOS 6.x installed on it. Although any version of Linux distribution should work fine. The working directory of all prerequisite pacakges is /home/zeppelin/prerequisites, although any location could be used. + +#### Git +Intall latest stable version of Git. This document describes installation of version 2.4.8 + +```bash +yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel +yum install gcc perl-ExtUtils-MakeMaker +yum remove git +cd /home/zeppelin/prerequisites +wget https://github.com/git/git/archive/v2.4.8.tar.gz +tar xzf git-2.0.4.tar.gz +cd git-2.0.4 +make prefix=/home/zeppelin/prerequisites/git all +make prefix=/home/zeppelin/prerequisites/git install +echo "export PATH=$PATH:/home/zeppelin/prerequisites/bin" >> /home/zeppelin/.bashrc +source /home/zeppelin/.bashrc +git --version +``` + +Assuming all the packages are successfully installed, running the version option with git command should display + +```bash +git version 2.4.8 +``` + +#### Java +Zeppelin works well with 1.7.x version of Java runtime. Download JDK version 7 and a stable update and follow below instructions to install it. + +```bash +cd /home/zeppelin/prerequisites/ +#Download JDK 1.7, Assume JDK 7 update 79 is downloaded. +tar -xf jdk-7u79-linux-x64.tar.gz +echo "export JAVA_HOME=/home/zeppelin/prerequisites/jdk1.7.0_79" >> /home/zeppelin/.bashrc +source /home/zeppelin/.bashrc +echo $JAVA_HOME +``` +Assuming all the packages are successfully installed, echoing JAVA_HOME environment variable should display + +```bash +/home/zeppelin/prerequisites/jdk1.7.0_79 +``` + +#### Apache Maven +Download and install a stable version of Maven. + +```bash +cd /home/zeppelin/prerequisites/ +wget ftp://mirror.reverse.net/pub/apache/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz +tar -xf apache-maven-3.3.3-bin.tar.gz +cd apache-maven-3.3.3 +export MAVEN_HOME=/home/zeppelin/prerequisites/apache-maven-3.3.3 +echo "export PATH=$PATH:/home/zeppelin/prerequisites/apache-maven-3.3.3/bin" >> /home/zeppelin/.bashrc +source /home/zeppelin/.bashrc +mvn -version +``` + +Assuming all the packages are successfully installed, running the version option with mvn command should display + +```bash +Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T04:57:37-07:00) +Maven home: /home/zeppelin/prerequisites/apache-maven-3.3.3 +Java version: 1.7.0_79, vendor: Oracle Corporation +Java home: /home/zeppelin/prerequisites/jdk1.7.0_79/jre +Default locale: en_US, platform encoding: UTF-8 +OS name: "linux", version: "2.6.32-358.el6.x86_64", arch: "amd64", family: "unix" +``` + +#### Hadoop client +Zeppelin can work with multiple versions & distributions of Hadoop. A complete list [is available here.](https://github.com/apache/incubator-zeppelin#build) This document assumes Hadoop 2.7.x client libraries including configuration files are installed on Zeppelin node. It also assumes /etc/hadoop/conf contains various Hadoop configuration files. The location of Hadoop configuration files may vary, hence use appropriate location. + +```bash +hadoop version +Hadoop 2.7.1.2.3.1.0-2574 +Subversion [email protected]:hortonworks/hadoop.git -r f66cf95e2e9367a74b0ec88b2df33458b6cff2d0 +Compiled by jenkins on 2015-07-25T22:36Z +Compiled with protoc 2.5.0 +From source with checksum 54f9bbb4492f92975e84e390599b881d +This command was run using /usr/hdp/2.3.1.0-2574/hadoop/lib/hadoop-common-2.7.1.2.3.1.0-2574.jar +``` + +#### Spark +Zeppelin can work with multiple versions Spark. A complete list [is available here.](https://github.com/apache/incubator-zeppelin#build) This document assumes Spark 1.3.1 is installed on Zeppelin node at /home/zeppelin/prerequisites/spark. + +## Build + +Checkout source code from [https://github.com/apache/incubator-zeppelin](https://github.com/apache/incubator-zeppelin) + +```bash +cd /home/zeppelin/ +git clone https://github.com/apache/incubator-zeppelin.git +``` +Zeppelin package is available at /home/zeppelin/incubator-zeppelin after the checkout completes. + +### Cluster mode + +As its assumed Hadoop 2.7.x is installed on the YARN cluster & Spark 1.3.1 is installed on Zeppelin node. Hence appropriate options are chosen to build Zeppelin. This is very important as Zeppelin will bundle corresponding Hadoop & Spark libraries and they must match the ones present on YARN cluster & Zeppelin Spark installation. + +Zeppelin is a maven project and hence must be built with Apache Maven. + +```bash +cd /home/zeppelin/incubator-zeppelin +mvn clean package -Pspark-1.3 -Dspark.version=1.3.1 -Dhadoop.version=2.7.0 -Phadoop-2.6 -Pyarn -DskipTests +``` +Building Zeppelin for first time downloads various dependencies and hence takes few minutes to complete. + +## Zeppelin Configuration +Zeppelin configurations needs to be modified to connect to YARN cluster. Create a copy of zeppelin environment XML + +```bash +cp /home/zeppelin/incubator-zeppelin/conf/zeppelin-env.sh.template /home/zeppelin/incubator-zeppelin/conf/zeppelin-env.sh +``` + +Set the following properties + +```bash +export JAVA_HOME=/home/zeppelin/prerequisites/jdk1.7.0_79 +export HADOOP_CONF_DIR=/etc/hadoop/conf +export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.1.0-2574" +``` + +As /etc/hadoop/conf contains various configurations of YARN cluster, Zeppelin can now submit Spark/Hive jobs on YARN cluster form its web interface. The value of hdp.version is set to 2.3.1.0-2574. This can be obtained by running the following command + +```bash +hdp-select status hadoop-client | sed 's/hadoop-client - \(.*\)/\1/' +# It returned 2.3.1.0-2574 +``` + +## Start/Stop +### Start Zeppelin + +``` +cd /home/zeppelin/incubator-zeppelin +bin/zeppelin-daemon.sh start +``` +After successful start, visit http://[zeppelin-server-host-name]:8080 with your web browser. + +### Stop Zeppelin + +``` +bin/zeppelin-daemon.sh stop +``` + +## Interpreter +Zeppelin provides to various distributed processing frameworks to process data that ranges from Spark, Hive, Tajo, Ignite and Lens to name a few. This document describes to configure Hive & Spark interpreters. + +### Hive +Zeppelin supports Hive interpreter and hence copy hive-site.xml that should be present at /etc/hive/conf to the configuration folder of Zeppelin. Once Zeppelin is built it will have conf folder under /home/zeppelin/incubator-zeppelin. + +```bash +cp /etc/hive/conf/hive-site.xml /home/zeppelin/incubator-zeppelin/conf +``` + +Once Zeppelin server has started successfully, visit http://[zeppelin-server-host-name]:8080 with your web browser. Click on Interpreter tab next to Notebook dropdown. Look for Hive configurations and set them appropriately. By default hive.hiveserver2.url will be pointing to localhost and hive.hiveserver2.password/hive.hiveserver2.user are set to hive/hive. Set them as per Hive installation on YARN cluster. +Click on Save button. Once these configurations are updated, Zeppelin will prompt you to restart the interpreter. Accept the prompt and the interpreter will reload the configurations. + +### Spark +Zeppelin was built with Spark 1.3.1 and it was assumed that 1.3.1 version of Spark is installed at /home/zeppelin/prerequisites/spark. Look for Spark configrations and click edit button to add the following properties + +<table class="table-configuration"> + <tr> + <th>Property Name</th> + <th>Property Value</th> + <th>Remarks</th> + </tr> + <tr> + <td>master</td> + <td>yarn-client</td> + <td>In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.</td> + </tr> + <tr> + <td>spark.home</td> + <td>/home/zeppelin/prerequisites/spark</td> + <td></td> + </tr> + <tr> + <td>spark.driver.extraJavaOptions</td> + <td>-Dhdp.version=2.3.1.0-2574</td> + <td></td> + </tr> + <tr> + <td>spark.yarn.am.extraJavaOptions</td> + <td>-Dhdp.version=2.3.1.0-2574</td> + <td></td> + </tr> + <tr> + <td>spark.yarn.jar</td> + <td>/home/zeppelin/incubator-zeppelin/interpreter/spark/zeppelin-spark-0.6.0-incubating-SNAPSHOT.jar</td> + <td></td> + </tr> +</table> + +Click on Save button. Once these configurations are updated, Zeppelin will prompt you to restart the interpreter. Accept the prompt and the interpreter will reload the configurations. + +Spark & Hive notebooks can be written with Zeppelin now. The resulting Spark & Hive jobs will run on configured YARN cluster. + +## Debug +Zeppelin does not emit any kind of error messages on web interface when notebook/paragrah is run. If a paragraph fails it only displays ERROR. The reason for failure needs to be looked into log files which is present in logs directory under zeppelin installation base directory. Zeppelin creates a log file for each kind of interpreter. + +```bash +[zeppelin@zeppelin-3529 logs]$ pwd +/home/zeppelin/incubator-zeppelin/logs +[zeppelin@zeppelin-3529 logs]$ ls -l +total 844 +-rw-rw-r-- 1 zeppelin zeppelin 14648 Aug 3 14:45 zeppelin-interpreter-hive-zeppelin-zeppelin-3529.log +-rw-rw-r-- 1 zeppelin zeppelin 625050 Aug 3 16:05 zeppelin-interpreter-spark-zeppelin-zeppelin-3529.log +-rw-rw-r-- 1 zeppelin zeppelin 200394 Aug 3 21:15 zeppelin-zeppelin-zeppelin-3529.log +-rw-rw-r-- 1 zeppelin zeppelin 16162 Aug 3 14:03 zeppelin-zeppelin-zeppelin-3529.out +[zeppelin@zeppelin-3529 logs]$ +``` http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/interpreter/cassandra.md ---------------------------------------------------------------------- diff --git a/docs/interpreter/cassandra.md b/docs/interpreter/cassandra.md new file mode 100644 index 0000000..b53295c --- /dev/null +++ b/docs/interpreter/cassandra.md @@ -0,0 +1,807 @@ +--- +layout: page +title: "Cassandra Interpreter" +description: "Cassandra Interpreter" +group: manual +--- +{% include JB/setup %} + +<hr/> +## 1. Cassandra CQL Interpreter for Apache Zeppelin + +<br/> +<table class="table-configuration"> + <tr> + <th>Name</th> + <th>Class</th> + <th>Description</th> + </tr> + <tr> + <td>%cassandra</td> + <td>CassandraInterpreter</td> + <td>Provides interpreter for Apache Cassandra CQL query language</td> + </tr> +</table> + +<hr/> + +## 2. Enabling Cassandra Interpreter + + In a notebook, to enable the **Cassandra** interpreter, click on the **Gear** icon and select **Cassandra** + + <center> +  + +  + </center> + +<hr/> + +## 3. Using the Cassandra Interpreter + + In a paragraph, use **_%cassandra_** to select the **Cassandra** interpreter and then input all commands. + + To access the interactive help, type **HELP;** + + <center> +  + </center> + +<hr/> + +## 4. Interpreter Commands + + The **Cassandra** interpreter accepts the following commands + +<center> + <table class="table-configuration"> + <tr> + <th>Command Type</th> + <th>Command Name</th> + <th>Description</th> + </tr> + <tr> + <td nowrap>Help command</td> + <td>HELP</td> + <td>Display the interactive help menu</td> + </tr> + <tr> + <td nowrap>Schema commands</td> + <td>DESCRIBE KEYSPACE, DESCRIBE CLUSTER, DESCRIBE TABLES ...</td> + <td>Custom commands to describe the Cassandra schema</td> + </tr> + <tr> + <td nowrap>Option commands</td> + <td>@consistency, @retryPolicy, @fetchSize ...</td> + <td>Inject runtime options to all statements in the paragraph</td> + </tr> + <tr> + <td nowrap>Prepared statement commands</td> + <td>@prepare, @bind, @remove_prepared</td> + <td>Let you register a prepared command and re-use it later by injecting bound values</td> + </tr> + <tr> + <td nowrap>Native CQL statements</td> + <td>All CQL-compatible statements (SELECT, INSERT, CREATE ...)</td> + <td>All CQL statements are executed directly against the Cassandra server</td> + </tr> + </table> +</center> + +<hr/> +## 5. CQL statements + +This interpreter is compatible with any CQL statement supported by Cassandra. Ex: + +```sql + + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + SELECT * FROM users WHERE login='jdoe'; +``` + +Each statement should be separated by a semi-colon ( **;** ) except the special commands below: + +1. @prepare +2. @bind +3. @remove_prepare +4. @consistency +5. @serialConsistency +6. @timestamp +7. @retryPolicy +8. @fetchSize + +Multi-line statements as well as multiple statements on the same line are also supported as long as they are +separated by a semi-colon. Ex: + +```sql + + USE spark_demo; + + SELECT * FROM albums_by_country LIMIT 1; SELECT * FROM countries LIMIT 1; + + SELECT * + FROM artists + WHERE login='jlennon'; +``` + +Batch statements are supported and can span multiple lines, as well as DDL(CREATE/ALTER/DROP) statements: + +```sql + + BEGIN BATCH + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + INSERT INTO users_preferences(login,account_type) VALUES('jdoe','BASIC'); + APPLY BATCH; + + CREATE TABLE IF NOT EXISTS test( + key int PRIMARY KEY, + value text + ); +``` + +CQL statements are <strong>case-insensitive</strong> (except for column names and values). +This means that the following statements are equivalent and valid: + +```sql + + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + Insert into users(login,name) vAlues('hsue','Helen SUE'); +``` + +The complete list of all CQL statements and versions can be found below: +<center> + <table class="table-configuration"> + <tr> + <th>Cassandra Version</th> + <th>Documentation Link</th> + </tr> + <tr> + <td><strong>2.2</strong></td> + <td> + <a target="_blank" + href="http://docs.datastax.com/en/cql/3.3/cql/cqlIntro.html"> + http://docs.datastax.com/en/cql/3.3/cql/cqlIntro.html + </a> + </td> + </tr> + <tr> + <td><strong>2.1 & 2.0</strong></td> + <td> + <a target="_blank" + href="http://docs.datastax.com/en/cql/3.1/cql/cql_intro_c.html"> + http://docs.datastax.com/en/cql/3.1/cql/cql_intro_c.html + </a> + </td> + </tr> + <tr> + <td><strong>1.2</strong></td> + <td> + <a target="_blank" + href="http://docs.datastax.com/en/cql/3.0/cql/aboutCQL.html"> + http://docs.datastax.com/en/cql/3.0/cql/aboutCQL.html + </a> + </td> + </tr> + </table> +</center> + +<hr/> + +## 6. Comments in statements + +It is possible to add comments between statements. Single line comments start with the hash sign (#). Multi-line comments are enclosed between /** and **/. Ex: + +```sql + + #First comment + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + + /** + Multi line + comments + **/ + Insert into users(login,name) vAlues('hsue','Helen SUE'); +``` + +<hr/> + +## 7. Syntax Validation + +The interpreters is shipped with a built-in syntax validator. This validator only checks for basic syntax errors. +All CQL-related syntax validation is delegated directly to **Cassandra** + +Most of the time, syntax errors are due to **missing semi-colons** between statements or **typo errors**. + +<hr/> + +## 8. Schema commands + +To make schema discovery easier and more interactive, the following commands are supported: +<center> + <table class="table-configuration"> + <tr> + <th>Command</th> + <th>Description</th> + </tr> + <tr> + <td><strong>DESCRIBE CLUSTER;</strong></td> + <td>Show the current cluster name and its partitioner</td> + </tr> + <tr> + <td><strong>DESCRIBE KEYSPACES;</strong></td> + <td>List all existing keyspaces in the cluster and their configuration (replication factor, durable write ...)</td> + </tr> + <tr> + <td><strong>DESCRIBE TABLES;</strong></td> + <td>List all existing keyspaces in the cluster and for each, all the tables name</td> + </tr> + <tr> + <td><strong>DESCRIBE TYPES;</strong></td> + <td>List all existing user defined types in the <strong>current (logged) keyspace</strong></td> + </tr> + <tr> + <td nowrap><strong>DESCRIBE FUNCTIONS <keyspace_name>;</strong></td> + <td>List all existing user defined functions in the given keyspace</td> + </tr> + <tr> + <td nowrap><strong>DESCRIBE AGGREGATES <keyspace_name>;</strong></td> + <td>List all existing user defined aggregates in the given keyspace</td> + </tr> + <tr> + <td nowrap><strong>DESCRIBE KEYSPACE <keyspace_name>;</strong></td> + <td>Describe the given keyspace configuration and all its table details (name, columns, ...)</td> + </tr> + <tr> + <td nowrap><strong>DESCRIBE TABLE (<keyspace_name>).<table_name>;</strong></td> + <td> + Describe the given table. If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. + If no table is found, an error message is raised + </td> + </tr> + <tr> + <td nowrap><strong>DESCRIBE TYPE (<keyspace_name>).<type_name>;</strong></td> + <td> + Describe the given type(UDT). If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. + If no type is found, an error message is raised + </td> + </tr> + <tr> + <td nowrap><strong>DESCRIBE FUNCTION (<keyspace_name>).<function_name>;</strong></td> + <td>Describe the given user defined function. The keyspace is optional</td> + </tr> + <tr> + <td nowrap><strong>DESCRIBE AGGREGATE (<keyspace_name>).<aggregate_name>;</strong></td> + <td>Describe the given user defined aggregate. The keyspace is optional</td> + </tr> + </table> +</center> + +The schema objects (cluster, keyspace, table, type, function and aggregate) are displayed in a tabular format. +There is a drop-down menu on the top left corner to expand objects details. On the top right menu is shown the Icon legend. + +<br/> +<center> +  +</center> + +<hr/> + +## 9. Runtime Parameters + +Sometimes you want to be able to pass runtime query parameters to your statements. +Those parameters are not part of the CQL specs and are specific to the interpreter. +Below is the list of all parameters: + +<br/> +<center> + <table class="table-configuration"> + <tr> + <th>Parameter</th> + <th>Syntax</th> + <th>Description</th> + </tr> + <tr> + <td nowrap>Consistency Level</td> + <td><strong>@consistency=<em>value</em></strong></td> + <td>Apply the given consistency level to all queries in the paragraph</td> + </tr> + <tr> + <td nowrap>Serial Consistency Level</td> + <td><strong>@serialConsistency=<em>value</em></strong></td> + <td>Apply the given serial consistency level to all queries in the paragraph</td> + </tr> + <tr> + <td nowrap>Timestamp</td> + <td><strong>@timestamp=<em>long value</em></strong></td> + <td> + Apply the given timestamp to all queries in the paragraph. + Please note that timestamp value passed directly in CQL statement will override this value + </td> + </tr> + <tr> + <td nowrap>Retry Policy</td> + <td><strong>@retryPolicy=<em>value</em></strong></td> + <td>Apply the given retry policy to all queries in the paragraph</td> + </tr> + <tr> + <td nowrap>Fetch Size</td> + <td><strong>@fetchSize=<em>integer value</em></strong></td> + <td>Apply the given fetch size to all queries in the paragraph</td> + </tr> + </table> +</center> + + Some parameters only accept restricted values: + +<br/> +<center> + <table class="table-configuration"> + <tr> + <th>Parameter</th> + <th>Possible Values</th> + </tr> + <tr> + <td nowrap>Consistency Level</td> + <td><strong>ALL, ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM</strong></td> + </tr> + <tr> + <td nowrap>Serial Consistency Level</td> + <td><strong>SERIAL, LOCAL_SERIAL</strong></td> + </tr> + <tr> + <td nowrap>Timestamp</td> + <td>Any long value</td> + </tr> + <tr> + <td nowrap>Retry Policy</td> + <td><strong>DEFAULT, DOWNGRADING_CONSISTENCY, FALLTHROUGH, LOGGING_DEFAULT, LOGGING_DOWNGRADING, LOGGING_FALLTHROUGH</strong></td> + </tr> + <tr> + <td nowrap>Fetch Size</td> + <td>Any integer value</td> + </tr> + </table> +</center> + +>Please note that you should **not** add semi-colon ( **;** ) at the end of each parameter statement + +Some examples: + +```sql + + CREATE TABLE IF NOT EXISTS spark_demo.ts( + key int PRIMARY KEY, + value text + ); + TRUNCATE spark_demo.ts; + + # Timestamp in the past + @timestamp=10 + + # Force timestamp directly in the first insert + INSERT INTO spark_demo.ts(key,value) VALUES(1,'first insert') USING TIMESTAMP 100; + + # Select some data to make the clock turn + SELECT * FROM spark_demo.albums LIMIT 100; + + # Now insert using the timestamp parameter set at the beginning(10) + INSERT INTO spark_demo.ts(key,value) VALUES(1,'second insert'); + + # Check for the result. You should see 'first insert' + SELECT value FROM spark_demo.ts WHERE key=1; +``` + +Some remarks about query parameters: + +> 1. **many** query parameters can be set in the same paragraph +> 2. if the **same** query parameter is set many time with different values, the interpreter only take into account the first value +> 3. each query parameter applies to **all CQL statements** in the same paragraph, unless you override the option using plain CQL text (like forcing timestamp with the USING clause) +> 4. the order of each query parameter with regard to CQL statement does not matter + +<hr/> + +## 10. Support for Prepared Statements + +For performance reason, it is better to prepare statements before-hand and reuse them later by providing bound values. +This interpreter provides 3 commands to handle prepared and bound statements: + +1. **@prepare** +2. **@bind** +3. **@remove_prepared** + +Example: + +``` + + @prepare[statement_name]=... + + @bind[statement_name]=âtextâ, 1223, â2015-07-30 12:00:01â, null, true, [âlist_item1â, âlist_item2â] + + @bind[statement_name_with_no_bound_value] + + @remove_prepare[statement_name] +``` + +<br/> +#### a. @prepare +<br/> +You can use the syntax _"@prepare[statement_name]=SELECT ..."_ to create a prepared statement. +The _statement_name_ is **mandatory** because the interpreter prepares the given statement with the Java driver and +saves the generated prepared statement in an **internal hash map**, using the provided _statement_name_ as search key. + +> Please note that this internal prepared statement map is shared with **all notebooks** and **all paragraphs** because +there is only one instance of the interpreter for Cassandra + +> If the interpreter encounters **many** @prepare for the **same _statement_name_ (key)**, only the **first** statement will be taken into account. + +Example: + +``` + + @prepare[select]=SELECT * FROM spark_demo.albums LIMIT ? + + @prepare[select]=SELECT * FROM spark_demo.artists LIMIT ? +``` + +For the above example, the prepared statement is _SELECT * FROM spark_demo.albums LIMIT ?_. +_SELECT * FROM spark_demo.artists LIMIT ?_ is ignored because an entry already exists in the prepared statements map with the key select. + +In the context of **Zeppelin**, a notebook can be scheduled to be executed at regular interval, +thus it is necessary to **avoid re-preparing many time the same statement (considered an anti-pattern)**. +<br/> +<br/> +#### b. @bind +<br/> +Once the statement is prepared (possibly in a separated notebook/paragraph). You can bind values to it: + +``` + @bind[select_first]=10 +``` + +Bound values are not mandatory for the **@bind** statement. However if you provide bound values, they need to comply to some syntax: + +* String values should be enclosed between simple quotes ( â ) +* Date values should be enclosed between simple quotes ( â ) and respect the formats: + 1. yyyy-MM-dd HH:MM:ss + 2. yyyy-MM-dd HH:MM:ss.SSS +* **null** is parsed as-is +* **boolean** (true|false) are parsed as-is +* collection values must follow the **[standard CQL syntax]**: + * list: [âlist_item1â, âlist_item2â, ...] + * set: {âset_item1â, âset_item2â, â¦} + * map: {âkey1â: âval1â, âkey2â: âval2â, â¦} +* **tuple** values should be enclosed between parenthesis (see **[Tuple CQL syntax]**): (âtextâ, 123, true) +* **udt** values should be enclosed between brackets (see **[UDT CQL syntax]**): {stree_name: âBeverly Hillsâ, number: 104, zip_code: 90020, state: âCaliforniaâ, â¦} + +> It is possible to use the @bind statement inside a batch: +> +> ```sql +> +> BEGIN BATCH +> @bind[insert_user]='jdoe','John DOE' +> UPDATE users SET age = 27 WHERE login='hsue'; +> APPLY BATCH; +> ``` + +<br/> +#### c. @remove_prepare +<br/> +To avoid for a prepared statement to stay forever in the prepared statement map, you can use the +**@remove_prepare[statement_name]** syntax to remove it. +Removing a non-existing prepared statement yields no error. + +<hr/> + +## 11. Using Dynamic Forms + +Instead of hard-coding your CQL queries, it is possible to use the mustache syntax ( **\{\{ \}\}** ) to inject simple value or multiple choices forms. + +The syntax for simple parameter is: **\{\{input_Label=default value\}\}**. The default value is mandatory because the first time the paragraph is executed, +we launch the CQL query before rendering the form so at least one value should be provided. + +The syntax for multiple choices parameter is: **\{\{input_Label=value1 | value2 | ⦠| valueN \}\}**. By default the first choice is used for CQL query +the first time the paragraph is executed. + +Example: + +{% raw %} + #Secondary index on performer style + SELECT name, country, performer + FROM spark_demo.performers + WHERE name='{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}' + AND styles CONTAINS '{{style=Rock}}'; +{% endraw %} + + +In the above example, the first CQL query will be executed for _performer='Sheryl Crow' AND style='Rock'_. +For subsequent queries, you can change the value directly using the form. + +> Please note that we enclosed the **\{\{ \}\}** block between simple quotes ( **'** ) because Cassandra expects a String here. +> We could have also use the **\{\{style='Rock'\}\}** syntax but this time, the value displayed on the form is **_'Rock'_** and not **_Rock_**. + +It is also possible to use dynamic forms for **prepared statements**: + +{% raw %} + + @bind[select]=='{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}', '{{style=Rock}}' + +{% endraw %} + +<hr/> + +## 12. Execution parallelism and shared states + +It is possible to execute many paragraphs in parallel. However, at the back-end side, weâre still using synchronous queries. +_Asynchronous execution_ is only possible when it is possible to return a `Future` value in the `InterpreterResult`. +It may be an interesting proposal for the **Zeppelin** project. + +Another caveat is that the same `com.datastax.driver.core.Session` object is used for **all** notebooks and paragraphs. +Consequently, if you use the **USE _keyspace name_;** statement to log into a keyspace, it will change the keyspace for +**all current users** of the **Cassandra** interpreter because we only create 1 `com.datastax.driver.core.Session` object +per instance of **Cassandra** interpreter. + +The same remark does apply to the **prepared statement hash map**, it is shared by **all users** using the same instance of **Cassandra** interpreter. + +Until **Zeppelin** offers a real multi-users separation, there is a work-around to segregate user environment and states: +_create different **Cassandra** interpreter instances_ + +For this, first go to the **Interpreter** menu and click on the **Create** button +<br/> +<br/> +<center> +  +</center> + +In the interpreter creation form, put **cass-instance2** as **Name** and select the **cassandra** +in the interpreter drop-down list +<br/> +<br/> +<center> +  +</center> + + Click on **Save** to create the new interpreter instance. Now you should be able to see it in the interpreter list. + +<br/> +<br/> +<center> +  +</center> + +Go back to your notebook and click on the **Gear** icon to configure interpreter bindings. +You should be able to see and select the **cass-instance2** interpreter instance in the available +interpreter list instead of the standard **cassandra** instance. + +<br/> +<br/> +<center> +  +</center> + +<hr/> + +## 13. Interpreter Configuration + +To configure the **Cassandra** interpreter, go to the **Interpreter** menu and scroll down to change the parameters. +The **Cassandra** interpreter is using the official **[Cassandra Java Driver]** and most of the parameters are used +to configure the Java driver + +Below are the configuration parameters and their default value. + + + <table class="table-configuration"> + <tr> + <th>Property Name</th> + <th>Description</th> + <th>Default Value</th> + </tr> + <tr> + <td>cassandra.cluster</td> + <td>Name of the Cassandra cluster to connect to</td> + <td>Test Cluster</td> + </tr> + <tr> + <td>cassandra.compression.protocol</td> + <td>On wire compression. Possible values are: NONE, SNAPPY, LZ4</td> + <td>NONE</td> + </tr> + <tr> + <td>cassandra.credentials.username</td> + <td>If security is enable, provide the login</td> + <td>none</td> + </tr> + <tr> + <td>cassandra.credentials.password</td> + <td>If security is enable, provide the password</td> + <td>none</td> + </tr> + <tr> + <td>cassandra.hosts</td> + <td> + Comma separated Cassandra hosts (DNS name or IP address). + <br/> + Ex: '192.168.0.12,node2,node3' + </td> + <td>localhost</td> + </tr> + <tr> + <td>cassandra.interpreter.parallelism</td> + <td>Number of concurrent paragraphs(queries block) that can be executed</td> + <td>10</td> + </tr> + <tr> + <td>cassandra.keyspace</td> + <td> + Default keyspace to connect to. + <strong> + It is strongly recommended to let the default value + and prefix the table name with the actual keyspace + in all of your queries + </strong> + </td> + <td>system</td> + </tr> + <tr> + <td>cassandra.load.balancing.policy</td> + <td> + Load balancing policy. Default = <em>new TokenAwarePolicy(new DCAwareRoundRobinPolicy())</em> + To Specify your own policy, provide the <strong>fully qualify class name (FQCN)</strong> of your policy. + At runtime the interpreter will instantiate the policy using + <strong>Class.forName(FQCN)</strong> + </td> + <td>DEFAULT</td> + </tr> + <tr> + <td>cassandra.max.schema.agreement.wait.second</td> + <td>Cassandra max schema agreement wait in second</td> + <td>10</td> + </tr> + <tr> + <td>cassandra.pooling.core.connection.per.host.local</td> + <td>Protocol V2 and below default = 2. Protocol V3 and above default = 1</td> + <td>2</td> + </tr> + <tr> + <td>cassandra.pooling.core.connection.per.host.remote</td> + <td>Protocol V2 and below default = 1. Protocol V3 and above default = 1</td> + <td>1</td> + </tr> + <tr> + <td>cassandra.pooling.heartbeat.interval.seconds</td> + <td>Cassandra pool heartbeat interval in secs</td> + <td>30</td> + </tr> + <tr> + <td>cassandra.pooling.idle.timeout.seconds</td> + <td>Cassandra idle time out in seconds</td> + <td>120</td> + </tr> + <tr> + <td>cassandra.pooling.max.connection.per.host.local</td> + <td>Protocol V2 and below default = 8. Protocol V3 and above default = 1</td> + <td>8</td> + </tr> + <tr> + <td>cassandra.pooling.max.connection.per.host.remote</td> + <td>Protocol V2 and below default = 2. Protocol V3 and above default = 1</td> + <td>2</td> + </tr> + <tr> + <td>cassandra.pooling.max.request.per.connection.local</td> + <td>Protocol V2 and below default = 128. Protocol V3 and above default = 1024</td> + <td>128</td> + </tr> + <tr> + <td>cassandra.pooling.max.request.per.connection.remote</td> + <td>Protocol V2 and below default = 128. Protocol V3 and above default = 256</td> + <td>128</td> + </tr> + <tr> + <td>cassandra.pooling.new.connection.threshold.local</td> + <td>Protocol V2 and below default = 100. Protocol V3 and above default = 800</td> + <td>100</td> + </tr> + <tr> + <td>cassandra.pooling.new.connection.threshold.remote</td> + <td>Protocol V2 and below default = 100. Protocol V3 and above default = 200</td> + <td>100</td> + </tr> + <tr> + <td>cassandra.pooling.pool.timeout.millisecs</td> + <td>Cassandra pool time out in millisecs</td> + <td>5000</td> + </tr> + <tr> + <td>cassandra.protocol.version</td> + <td>Cassandra binary protocol version</td> + <td>3</td> + </tr> + <tr> + <td>cassandra.query.default.consistency</td> + <td> + Cassandra query default consistency level + <br/> + Available values: ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL + </td> + <td>ONE</td> + </tr> + <tr> + <td>cassandra.query.default.fetchSize</td> + <td>Cassandra query default fetch size</td> + <td>5000</td> + </tr> + <tr> + <td>cassandra.query.default.serial.consistency</td> + <td> + Cassandra query default serial consistency level + <br/> + Available values: SERIAL, LOCAL_SERIAL + </td> + <td>SERIAL</td> + </tr> + <tr> + <td>cassandra.reconnection.policy</td> + <td> + Cassandra Reconnection Policy. + Default = new ExponentialReconnectionPolicy(1000, 10 * 60 * 1000) + To Specify your own policy, provide the <strong>fully qualify class name (FQCN)</strong> of your policy. + At runtime the interpreter will instantiate the policy using + <strong>Class.forName(FQCN)</strong> + </td> + <td>DEFAULT</td> + </tr> + <tr> + <td>cassandra.retry.policy</td> + <td> + Cassandra Retry Policy. + Default = DefaultRetryPolicy.INSTANCE + To Specify your own policy, provide the <strong>fully qualify class name (FQCN)</strong> of your policy. + At runtime the interpreter will instantiate the policy using + <strong>Class.forName(FQCN)</strong> + </td> + <td>DEFAULT</td> + </tr> + <tr> + <td>cassandra.socket.connection.timeout.millisecs</td> + <td>Cassandra socket default connection timeout in millisecs</td> + <td>500</td> + </tr> + <tr> + <td>cassandra.socket.read.timeout.millisecs</td> + <td>Cassandra socket read timeout in millisecs</td> + <td>12000</td> + </tr> + <tr> + <td>cassandra.socket.tcp.no_delay</td> + <td>Cassandra socket TCP no delay</td> + <td>true</td> + </tr> + <tr> + <td>cassandra.speculative.execution.policy</td> + <td> + Cassandra Speculative Execution Policy. + Default = NoSpeculativeExecutionPolicy.INSTANCE + To Specify your own policy, provide the <strong>fully qualify class name (FQCN)</strong> of your policy. + At runtime the interpreter will instantiate the policy using + <strong>Class.forName(FQCN)</strong> + </td> + <td>DEFAULT</td> + </tr> + </table> + +<hr/> + +## 14. Bugs & Contacts + + If you encounter a bug for this interpreter, please create a **[JIRA]** ticket and ping me on Twitter + at **[@doanduyhai]** + + +[Cassandra Java Driver]: https://github.com/datastax/java-driver +[standard CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html +[Tuple CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_reference/tupleType.html +[UDT CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_using/cqlUseUDT.html +[JIRA]: https://issues.apache.org/jira/browse/ZEPPELIN-382?jql=project%20%3D%20ZEPPELIN +[@doanduyhai]: https://twitter.com/doanduyhai http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/interpreter/flink.md ---------------------------------------------------------------------- diff --git a/docs/interpreter/flink.md b/docs/interpreter/flink.md new file mode 100644 index 0000000..ce1f780 --- /dev/null +++ b/docs/interpreter/flink.md @@ -0,0 +1,68 @@ +--- +layout: page +title: "Flink Interpreter" +description: "" +group: manual +--- +{% include JB/setup %} + + +## Flink interpreter for Apache Zeppelin +[Apache Flink](https://flink.apache.org) is an open source platform for distributed stream and batch data processing. + + +### How to start local Flink cluster, to test the interpreter +Zeppelin comes with pre-configured flink-local interpreter, which starts Flink in a local mode on your machine, so you do not need to install anything. + +### How to configure interpreter to point to Flink cluster +At the "Interpreters" menu, you have to create a new Flink interpreter and provide next properties: + +<table class="table-configuration"> + <tr> + <th>property</th> + <th>value</th> + <th>Description</th> + </tr> + <tr> + <td>host</td> + <td>local</td> + <td>host name of running JobManager. 'local' runs flink in local mode (default)</td> + </tr> + <tr> + <td>port</td> + <td>6123</td> + <td>port of running JobManager</td> + </tr> + <tr> + <td>xxx</td> + <td>yyy</td> + <td>anything else from [Flink Configuration](https://ci.apache.org/projects/flink/flink-docs-release-0.9/setup/config.html)</td> + </tr> +</table> +<br /> + + +### How to test it's working + +In example, by using the [Zeppelin notebook](https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL05GTGFicy96ZXBwZWxpbi1ub3RlYm9va3MvbWFzdGVyL25vdGVib29rcy8yQVFFREs1UEMvbm90ZS5qc29u) is from [Till Rohrmann's presentation](http://www.slideshare.net/tillrohrmann/data-analysis-49806564) "Interactive data analysis with Apache Flink" for Apache Flink Meetup. + + +``` +%sh +rm 10.txt.utf-8 +wget http://www.gutenberg.org/ebooks/10.txt.utf-8 +``` +``` +%flink +case class WordCount(word: String, frequency: Int) +val bible:DataSet[String] = env.readTextFile("10.txt.utf-8") +val partialCounts: DataSet[WordCount] = bible.flatMap{ + line => + """\b\w+\b""".r.findAllIn(line).map(word => WordCount(word, 1)) +// line.split(" ").map(word => WordCount(word, 1)) +} +val wordCounts = partialCounts.groupBy("word").reduce{ + (left, right) => WordCount(left.word, left.frequency + right.frequency) +} +val result10 = wordCounts.first(10).collect() +``` http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/c2cbafd1/docs/interpreter/geode.md ---------------------------------------------------------------------- diff --git a/docs/interpreter/geode.md b/docs/interpreter/geode.md new file mode 100644 index 0000000..96d1c04 --- /dev/null +++ b/docs/interpreter/geode.md @@ -0,0 +1,203 @@ +--- +layout: page +title: "Geode OQL Interpreter" +description: "" +group: manual +--- +{% include JB/setup %} + + +## Geode/Gemfire OQL Interpreter for Apache Zeppelin + +<br/> +<table class="table-configuration"> + <tr> + <th>Name</th> + <th>Class</th> + <th>Description</th> + </tr> + <tr> + <td>%geode.oql</td> + <td>GeodeOqlInterpreter</td> + <td>Provides OQL environment for Apache Geode</td> + </tr> +</table> + +<br/> +This interpreter supports the [Geode](http://geode.incubator.apache.org/) [Object Query Language (OQL)](http://geode-docs.cfapps.io/docs/developing/querying_basics/oql_compared_to_sql.html). With the OQL-based querying language: + +[<img align="right" src="http://img.youtube.com/vi/zvzzA9GXu3Q/3.jpg" alt="zeppelin-view" hspace="10" width="200"></img>](https://www.youtube.com/watch?v=zvzzA9GXu3Q) + +* You can query on any arbitrary object +* You can navigate object collections +* You can invoke methods and access the behavior of objects +* Data mapping is supported +* You are not required to declare types. Since you do not need type definitions, you can work across multiple languages +* You are not constrained by a schema + +This [Video Tutorial](https://www.youtube.com/watch?v=zvzzA9GXu3Q) illustrates some of the features provided by the `Geode Interpreter`. + +### Create Interpreter + +By default Zeppelin creates one `Geode/OQL` instance. You can remove it or create more instances. + +Multiple Geode instances can be created, each configured to the same or different backend Geode cluster. But over time a `Notebook` can have only one Geode interpreter instance `bound`. That means you _can not_ connect to different Geode clusters in the same `Notebook`. This is a known Zeppelin limitation. + +To create new Geode instance open the `Interprter` section and click the `+Create` button. Pick a `Name` of your choice and from the `Interpreter` drop-down select `geode`. Then follow the configuration instructions and `Save` the new instance. + +> Note: The `Name` of the instance is used only to distinct the instances while binding them to the `Notebook`. The `Name` is irrelevant inside the `Notebook`. In the `Notebook` you must use `%geode.oql` tag. + +### Bind to Notebook +In the `Notebook` click on the `settings` icon in the top right corner. The select/deselect the interpreters to be bound with the `Notebook`. + +### Configuration +You can modify the configuration of the Geode from the `Interpreter` section. The Geode interpreter express the following properties: + + + <table class="table-configuration"> + <tr> + <th>Property Name</th> + <th>Description</th> + <th>Default Value</th> + </tr> + <tr> + <td>geode.locator.host</td> + <td>The Geode Locator Host</td> + <td>localhost</td> + </tr> + <tr> + <td>geode.locator.port</td> + <td>The Geode Locator Port</td> + <td>10334</td> + </tr> + <tr> + <td>geode.max.result</td> + <td>Max number of OQL result to display to prevent the browser overload</td> + <td>1000</td> + </tr> + </table> + +### How to use + +> *Tip 1: Use (CTRL + .) for OQL auto-completion.* + +> *Tip 2: Alawys start the paragraphs with the full `%geode.oql` prefix tag! The short notation: `%geode` would still be able run the OQL queries but the syntax highlighting and the auto-completions will be disabled.* + +#### Create / Destroy Regions + +The OQL sepecification does not support [Geode Regions](https://cwiki.apache.org/confluence/display/GEODE/Index#Index-MainConceptsandComponents) mutation operations. To `creaate`/`destroy` regions one should use the [GFSH](http://geode-docs.cfapps.io/docs/tools_modules/gfsh/chapter_overview.html) shell tool instead. To wokr this it assumes that the GFSH is colocated with Zeppelin server. + +```bash +%sh +source /etc/geode/conf/geode-env.sh +gfsh << EOF + + connect --locator=ambari.localdomain[10334] + + destroy region --name=/regionEmployee + destroy region --name=/regionCompany + create region --name=regionEmployee --type=REPLICATE + create region --name=regionCompany --type=REPLICATE + + exit; +EOF +``` + +Above snippet re-creates two regions: `regionEmployee` and `regionCompany`. Note that you have to explicetely specify the locator host and port. The values should match those you have used in the Geode Interpreter configuration. Comprehensive list of [GFSH Commands by Functional Area](http://geode-docs.cfapps.io/docs/tools_modules/gfsh/gfsh_quick_reference.html). + +#### Basic OQL + + +```sql +%geode.oql +SELECT count(*) FROM /regionEmploee +``` + +OQL `IN` and `SET` filters: + +```sql +%geode.oql +SELECT * FROM /regionEmployee +WHERE companyId IN SET(2) OR lastName IN SET('Tzolov13', 'Tzolov73') +``` + +OQL `JOIN` operations + +```sql +%geode.oql +SELECT e.employeeId, e.firstName, e.lastName, c.id as companyId, c.companyName, c.address +FROM /regionEmployee e, /regionCompany c +WHERE e.companyId = c.id +``` + +By default the QOL responses contain only the region entry values. To access the keys, query the `EntrySet` instead: + +```sql +%geode.oql +SELECT e.key, e.value.companyId, e.value.email +FROM /regionEmployee.entrySet e +``` +Following query will return the EntrySet value as a Blob: + +```sql +%geode.oql +SELECT e.key, e.value FROM /regionEmployee.entrySet e +``` + + +> Note: You can have multiple queries in the same paragraph but only the result from the first is displayed. [[1](https://issues.apache.org/jira/browse/ZEPPELIN-178)], [[2](https://issues.apache.org/jira/browse/ZEPPELIN-212)]. + + +#### GFSH Commands From The Shell + +Use the Shell Interpreter (`%sh`) to run OQL commands form the command line: + +```bash +%sh +source /etc/geode/conf/geode-env.sh +gfsh -e "connect" -e "list members" +``` + +#### Apply Zeppelin Dynamic Forms + +You can leverage [Zepplein Dynamic Form](https://zeppelin.incubator.apache.org/docs/manual/dynamicform.html) inside your OQL queries. You can use both the `text input` and `select form` parametrization features + +```sql +%geode.oql +SELECT * FROM /regionEmployee e WHERE e.employeeId > ${Id} +``` + +#### Geode REST API +To list the defined regions you can use the [Geode REST API](http://geode-docs.cfapps.io/docs/geode_rest/chapter_overview.html): + +``` +http://<geode server hostname>phd1.localdomain:8484/gemfire-api/v1/ +``` + +```json +{ + "regions" : [{ + "name" : "regionEmployee", + "type" : "REPLICATE", + "key-constraint" : null, + "value-constraint" : null + }, { + "name" : "regionCompany", + "type" : "REPLICATE", + "key-constraint" : null, + "value-constraint" : null + }] +} +``` + +> To enable Geode REST API with JSON support add the following properties to geode.server.properties.file and restart: + +``` +http-service-port=8484 +start-dev-rest-api=true +``` + +### Auto-completion +The Geode Interpreter provides a basic auto-completion functionality. On `(Ctrl+.)` it list the most relevant suggesntions in a pop-up window. + +
