Author: moon
Date: Thu Jun 9 16:31:17 2016
New Revision: 1747561
URL: http://svn.apache.org/viewvc?rev=1747561&view=rev
Log:
[ZEPPELIN-840] Scalding interpreter that works in hdfs mode
Removed:
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/displaysystem/display.html
Modified:
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/atom.xml
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/development/writingzeppelininterpreter.html
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/index.html
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/install.html
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/yarn_install.html
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/hive.html
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/scalding.html
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/manual/interpreters.html
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/rss.xml
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/security/interpreter_authorization.html
Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/atom.xml
URL:
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/atom.xml?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/atom.xml (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/atom.xml Thu Jun 9 16:31:17
2016
@@ -4,7 +4,7 @@
<title>Apache Zeppelin</title>
<link href="http://zeppelin.apache.org/" rel="self"/>
<link href="http://zeppelin.apache.org"/>
- <updated>2016-06-08T11:53:05-07:00</updated>
+ <updated>2016-06-09T09:31:10-07:00</updated>
<id>http://zeppelin.apache.org</id>
<author>
<name>The Apache Software Foundation</name>
Modified:
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/development/writingzeppelininterpreter.html
URL:
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/development/writingzeppelininterpreter.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
---
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/development/writingzeppelininterpreter.html
(original)
+++
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/development/writingzeppelininterpreter.html
Thu Jun 9 16:31:17 2016
@@ -330,7 +330,7 @@ codes for myintp2
<li><a
href="https://github.com/apache/incubator-zeppelin/tree/master/spark">spark</a></li>
<li><a
href="https://github.com/apache/incubator-zeppelin/tree/master/markdown">markdown</a></li>
<li><a
href="https://github.com/apache/incubator-zeppelin/tree/master/shell">shell</a></li>
-<li><a
href="https://github.com/apache/incubator-zeppelin/tree/master/hive">hive</a></li>
+<li><a
href="https://github.com/apache/incubator-zeppelin/tree/master/jdbc">jdbc</a></li>
</ul>
<h3>Contributing a new Interpreter to Zeppelin releases</h3>
Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/index.html
URL:
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/index.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/index.html (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/index.html Thu Jun 9 16:31:17
2016
@@ -202,7 +202,7 @@ limitations under the License.
<h3>Multiple language backend</h3>
<p>Zeppelin interpreter concept allows any language/data-processing-backend to
be plugged into Zeppelin.
-Currently Zeppelin supports many interpreters such as Scala(with Apache
Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown and Shell.</p>
+Currently Zeppelin supports many interpreters such as Scala(with Apache
Spark), Python(with Apache Spark), SparkSQL, JDBC, Markdown and Shell.</p>
<p><img class="img-responsive"
src="/assets/themes/zeppelin/img/screenshots/multiple_language_backend.png"
/></p>
Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/install.html
URL:
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/install.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/install.html (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/install.html Thu Jun 9
16:31:17 2016
@@ -393,7 +393,7 @@ limitations under the License.
<td>ZEPPELIN_INTERPRETERS</td>
<td>zeppelin.interpreters</td>
<description></description>
- <td>org.apache.zeppelin.spark.SparkInterpreter,<br
/>org.apache.zeppelin.spark.PySparkInterpreter,<br
/>org.apache.zeppelin.spark.SparkSqlInterpreter,<br
/>org.apache.zeppelin.spark.DepInterpreter,<br
/>org.apache.zeppelin.markdown.Markdown,<br
/>org.apache.zeppelin.shell.ShellInterpreter,<br
/>org.apache.zeppelin.hive.HiveInterpreter<br />
+ <td>org.apache.zeppelin.spark.SparkInterpreter,<br
/>org.apache.zeppelin.spark.PySparkInterpreter,<br
/>org.apache.zeppelin.spark.SparkSqlInterpreter,<br
/>org.apache.zeppelin.spark.DepInterpreter,<br
/>org.apache.zeppelin.markdown.Markdown,<br
/>org.apache.zeppelin.shell.ShellInterpreter,<br />
...
</td>
<td>Comma separated interpreter configurations [Class] <br /> The first
interpreter will be a default value. <br /> It means only the first interpreter
in this list can be available without <code>%interpreter_name</code> annotation
in Zeppelin notebook paragraph. </td>
Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/yarn_install.html
URL:
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/yarn_install.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/yarn_install.html
(original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/yarn_install.html Thu
Jun 9 16:31:17 2016
@@ -179,7 +179,7 @@ limitations under the License.
<h2>Introduction</h2>
-<p>This page describes how to pre-configure a bare metal node, configure
Zeppelin and connect it to existing YARN cluster running Hortonworks flavour of
Hadoop. It also describes steps to configure Spark & Hive interpreter of
Zeppelin.</p>
+<p>This page describes how to pre-configure a bare metal node, configure
Zeppelin and connect it to existing YARN cluster running Hortonworks flavour of
Hadoop. It also describes steps to configure Spark interpreter of Zeppelin.</p>
<h2>Prepare Node</h2>
@@ -266,14 +266,13 @@ bin/zeppelin-daemon.sh start
</code></pre></div>
<h2>Interpreter</h2>
-<p>Zeppelin provides various distributed processing frameworks to process data
that ranges from Spark, Hive, Tajo, Ignite and Lens to name a few. This
document describes to configure Hive & Spark interpreters.</p>
+<p>Zeppelin provides various distributed processing frameworks to process data
that ranges from Spark, JDBC, Tajo, Ignite and Lens to name a few. This
document describes to configure JDBC & Spark interpreters.</p>
<h3>Hive</h3>
-<p>Zeppelin supports Hive interpreter and hence copy hive-site.xml that should
be present at /etc/hive/conf to the configuration folder of Zeppelin. Once
Zeppelin is built it will have conf folder under
/home/zeppelin/incubator-zeppelin.</p>
-<div class="highlight"><pre><code class="bash language-bash"
data-lang="bash">cp /etc/hive/conf/hive-site.xml
/home/zeppelin/incubator-zeppelin/conf
-</code></pre></div>
-<p>Once Zeppelin server has started successfully, visit
http://[zeppelin-server-host-name]:8080 with your web browser. Click on
Interpreter tab next to Notebook dropdown. Look for Hive configurations and set
them appropriately. By default hive.hiveserver2.url will be pointing to
localhost and hive.hiveserver2.password/hive.hiveserver2.user are set to
hive/hive. Set them as per Hive installation on YARN cluster.
+<p>Zeppelin supports Hive through JDBC interpreter. You might need the
information to use Hive and can find in your hive-site.xml</p>
+
+<p>Once Zeppelin server has started successfully, visit
http://[zeppelin-server-host-name]:8080 with your web browser. Click on
Interpreter tab next to Notebook dropdown. Look for Hive configurations and set
them appropriately. Set them as per Hive installation on YARN cluster.
Click on Save button. Once these configurations are updated, Zeppelin will
prompt you to restart the interpreter. Accept the prompt and the interpreter
will reload the configurations.</p>
<h3>Spark</h3>
Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/hive.html
URL:
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/hive.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/hive.html (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/hive.html Thu Jun
9 16:31:17 2016
@@ -167,6 +167,54 @@
<p>The <a href="https://hive.apache.org/">Apache Hive</a> ⢠data warehouse
software facilitates querying and managing large datasets residing in
distributed storage. Hive provides a mechanism to project structure onto this
data and query the data using a SQL-like language called HiveQL. At the same
time this language also allows traditional map/reduce programmers to plug in
their custom mappers and reducers when it is inconvenient or inefficient to
express this logic in HiveQL.</p>
+<h2>Important Notice</h2>
+
+<p>Hive Interpreter will be deprecated and merged into JDBC Interpreter. You
can use Hive Interpreter by using JDBC Interpreter with same functionality. See
the example below of settings and dependencies.</p>
+
+<h3>Properties</h3>
+
+<table class="table-configuration">
+ <tr>
+ <th>Property</th>
+ <th>Value</th>
+ </tr>
+ <tr>
+ <td>hive.driver</td>
+ <td>org.apache.hive.jdbc.HiveDriver</td>
+ </tr>
+ <tr>
+ <td>hive.url</td>
+ <td>jdbc:hive2://localhost:10000</td>
+ </tr>
+ <tr>
+ <td>hive.user</td>
+ <td>hiveUser</td>
+ </tr>
+ <tr>
+ <td>hive.password</td>
+ <td>hivePassword</td>
+ </tr>
+</table>
+
+<h3>Dependencies</h3>
+
+<table class="table-configuration">
+ <tr>
+ <th>Artifact</th>
+ <th>Exclude</th>
+ </tr>
+ <tr>
+ <td>org.apache.hive:hive-jdbc:0.14.0</td>
+ <td></td>
+ </tr>
+ <tr>
+ <td>org.apache.hadoop:hadoop-common:2.6.0</td>
+ <td></td>
+ </tr>
+</table>
+
+<hr>
+
<h3>Configuration</h3>
<table class="table-configuration">
Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
URL:
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html Thu Jun
9 16:31:17 2016
@@ -357,6 +357,52 @@
</tr>
</table>
+<h3>Examples</h3>
+
+<h4>Hive</h4>
+
+<h5>Properties</h5>
+
+<p><table class="table-configuration">
+ <tr>
+ <th>Name</th>
+ <th>Value</th>
+ </tr>
+ <tr>
+ <td>hive.driver</td>
+ <td>org.apache.hive.jdbc.HiveDriver</td>
+ </tr>
+ <tr>
+ <td>hive.url</td>
+ <td>jdbc:hive2://localhost:10000</td>
+ </tr>
+ <tr>
+ <td>hive.user</td>
+ <td>hive<em>user</td>
+ </tr>
+ <tr>
+ <td>hive.password</td>
+ <td>hive</em>password</td>
+ </tr>
+ </table></p>
+
+<h5>Dependencies</h5>
+
+<p><table class="table-configuration">
+ <tr>
+ <th>Artifact</th>
+ <th>Excludes</th>
+ </tr>
+ <tr>
+ <td>org.apache.hive:hive-jdbc:0.14.0</td>
+ <td></td>
+ </tr>
+ <tr>
+ <td>org.apache.hadoop:hadoop-common:2.6.0</td>
+ <td></td>
+ </tr>
+ </table></p>
+
<h3>How to use</h3>
<h4>Reference in paragraph</h4>
Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/scalding.html
URL:
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/scalding.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/scalding.html
(original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/scalding.html Thu
Jun 9 16:31:17 2016
@@ -186,11 +186,50 @@
<h3>Configuring the Interpreter</h3>
-<p>Zeppelin comes with a pre-configured Scalding interpreter in local mode, so
you do not need to install anything.</p>
+<p>Scalding interpreter runs in two modes:</p>
+
+<ul>
+<li>local</li>
+<li>hdfs</li>
+</ul>
+
+<p>In the local mode, you can access files on the local server and scalding
transformation are done locally.</p>
+
+<p>In hdfs mode you can access files in HDFS and scalding transformation are
run as hadoop map-reduce jobs.</p>
+
+<p>Zeppelin comes with a pre-configured Scalding interpreter in local mode.</p>
+
+<p>To run the scalding interpreter in the hdfs mode you have to do the
following:</p>
+
+<p><strong>Set the classpath with ZEPPELIN_CLASSPATH_OVERRIDES</strong></p>
+
+<p>In conf/zeppelin<em>env.sh, you have to set
+ZEPPELIN</em>CLASSPATH_OVERRIDES to the contents of 'hadoop classpath'
+and directories with custom jar files you need for your scalding commands.</p>
+
+<p><strong>Set arguments to the scalding repl</strong></p>
+
+<p>The default arguments are: "--local --repl"</p>
+
+<p>For hdfs mode you need to add: "--hdfs --repl"</p>
+
+<p>If you want to add custom jars, you need to add:
+"-libjars directory/<em>:directory/</em>"</p>
+
+<p>For reducer estimation, you need to add something like:
+"-Dscalding.reducer.estimator.classes=com.twitter.scalding.reducer_estimation.InputSizeReducerEstimator"</p>
+
+<p><strong>Set max.open.instances</strong></p>
+
+<p>If you want to control the maximum number of open interpreters, you have to
select "scoped" interpreter for note
+option and set max.open.instances argument.</p>
<h3>Testing the Interpreter</h3>
-<p>In example, by using the <a
href="https://gist.github.com/johnynek/a47699caa62f4f38a3e2">Alice in
Wonderland</a> tutorial, we will count words (of course!), and plot a graph of
the top 10 words in the book.</p>
+<h4>Local mode</h4>
+
+<p>In example, by using the <a
href="https://gist.github.com/johnynek/a47699caa62f4f38a3e2">Alice in
Wonderland</a> tutorial,
+we will count words (of course!), and plot a graph of the top 10 words in the
book.</p>
<div class="highlight"><pre><code class="text language-text"
data-lang="text">%scalding
import scala.io.Source
@@ -223,11 +262,36 @@ print("%table " + table)
<p>If you click on the icon for the pie chart, you should be able to see a
chart like this:
<img src="../assets/themes/zeppelin/img/docs-img/scalding-pie.png"
alt="Scalding - Pie - Chart"></p>
-<h3>Current Status & Future Work</h3>
+<h4>HDFS mode</h4>
+
+<p><strong>Test mode</strong></p>
+<div class="highlight"><pre><code class="text language-text"
data-lang="text">%scalding
+mode
+</code></pre></div>
+<p>This command should print:</p>
+<div class="highlight"><pre><code class="text language-text"
data-lang="text">res4: com.twitter.scalding.Mode = Hdfs(true,Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml)
+</code></pre></div>
+<p><strong>Test HDFS read</strong></p>
+<div class="highlight"><pre><code class="text language-text"
data-lang="text">val testfile =
TypedPipe.from(TextLine("/user/x/testfile"))
+testfile.dump
+</code></pre></div>
+<p>This command should print the contents of the hdfs file
/user/x/testfile.</p>
+
+<p><strong>Test map-reduce job</strong></p>
+<div class="highlight"><pre><code class="text language-text"
data-lang="text">val testfile =
TypedPipe.from(TextLine("/user/x/testfile"))
+val a = testfile.groupAll.size.values
+a.toList
+</code></pre></div>
+<p>This command should create a map reduce job.</p>
-<p>The current implementation of the Scalding interpreter does not support
canceling jobs, or fine-grained progress updates.</p>
+<h3>Future Work</h3>
-<p>The pre-configured Scalding interpreter only supports Scalding in local
mode. Hadoop mode for Scalding is currently unsupported, and will be future
work (contributions welcome!).</p>
+<ul>
+<li>Better user feedback (hadoop url, progress updates)</li>
+<li>Ability to cancel jobs</li>
+<li>Ability to dynamically load jars without restarting the interpreter</li>
+<li>Multiuser scalability (run scalding interpreters on different servers)</li>
+</ul>
</div>
</div>
Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/manual/interpreters.html
URL:
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/manual/interpreters.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/manual/interpreters.html
(original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/manual/interpreters.html Thu
Jun 9 16:31:17 2016
@@ -181,7 +181,7 @@ limitations under the License.
<p>In this section, we will explain about the role of interpreters,
interpreters group and interpreter settings in Zeppelin.
The concept of Zeppelin interpreter allows any
language/data-processing-backend to be plugged into Zeppelin.
-Currently, Zeppelin supports many interpreters such as Scala ( with Apache
Spark ), Python ( with Apache Spark ), SparkSQL, Hive, Markdown, Shell and so
on.</p>
+Currently, Zeppelin supports many interpreters such as Scala ( with Apache
Spark ), Python ( with Apache Spark ), SparkSQL, JDBC, Markdown, Shell and so
on.</p>
<h2>What is Zeppelin interpreter?</h2>
Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/rss.xml
URL:
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/rss.xml?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/rss.xml (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/rss.xml Thu Jun 9 16:31:17 2016
@@ -5,8 +5,8 @@
<description>Apache Zeppelin - The Apache Software
Foundation</description>
<link>http://zeppelin.apache.org</link>
<link>http://zeppelin.apache.org</link>
- <lastBuildDate>2016-06-08T11:53:05-07:00</lastBuildDate>
- <pubDate>2016-06-08T11:53:05-07:00</pubDate>
+ <lastBuildDate>2016-06-09T09:31:10-07:00</lastBuildDate>
+ <pubDate>2016-06-09T09:31:10-07:00</pubDate>
<ttl>1800</ttl>
Modified:
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/security/interpreter_authorization.html
URL:
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/security/interpreter_authorization.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
---
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/security/interpreter_authorization.html
(original)
+++
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/security/interpreter_authorization.html
Thu Jun 9 16:31:17 2016
@@ -187,7 +187,7 @@ limitations under the License.
<p>Data source authorization involves authenticating to the data source like a
Mysql database and letting it determine user permissions.</p>
-<p>For the Hive interpreter, we need to maintain per-user connection pools.
+<p>For the JDBC interpreter, we need to maintain per-user connection pools.
The interpret method takes the user string as parameter and executes the jdbc
call using a connection in the user's connection pool.</p>
<p>In case of Presto, we don't need password if the Presto DB server runs
backend code using HDFS authorization for the user.