Author: moon
Date: Thu Jun  9 16:31:17 2016
New Revision: 1747561

URL: http://svn.apache.org/viewvc?rev=1747561&view=rev
Log:
[ZEPPELIN-840] Scalding interpreter that works in hdfs mode

Removed:
    incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/displaysystem/display.html
Modified:
    incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/atom.xml
    
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/development/writingzeppelininterpreter.html
    incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/index.html
    incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/install.html
    incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/yarn_install.html
    incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/hive.html
    incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
    incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/scalding.html
    incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/manual/interpreters.html
    incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/rss.xml
    
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/security/interpreter_authorization.html

Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/atom.xml
URL: 
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/atom.xml?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/atom.xml (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/atom.xml Thu Jun  9 16:31:17 
2016
@@ -4,7 +4,7 @@
  <title>Apache Zeppelin</title>
  <link href="http://zeppelin.apache.org/"; rel="self"/>
  <link href="http://zeppelin.apache.org"/>
- <updated>2016-06-08T11:53:05-07:00</updated>
+ <updated>2016-06-09T09:31:10-07:00</updated>
  <id>http://zeppelin.apache.org</id>
  <author>
    <name>The Apache Software Foundation</name>

Modified: 
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/development/writingzeppelininterpreter.html
URL: 
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/development/writingzeppelininterpreter.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- 
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/development/writingzeppelininterpreter.html
 (original)
+++ 
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/development/writingzeppelininterpreter.html
 Thu Jun  9 16:31:17 2016
@@ -330,7 +330,7 @@ codes for myintp2
 <li><a 
href="https://github.com/apache/incubator-zeppelin/tree/master/spark";>spark</a></li>
 <li><a 
href="https://github.com/apache/incubator-zeppelin/tree/master/markdown";>markdown</a></li>
 <li><a 
href="https://github.com/apache/incubator-zeppelin/tree/master/shell";>shell</a></li>
-<li><a 
href="https://github.com/apache/incubator-zeppelin/tree/master/hive";>hive</a></li>
+<li><a 
href="https://github.com/apache/incubator-zeppelin/tree/master/jdbc";>jdbc</a></li>
 </ul>
 
 <h3>Contributing a new Interpreter to Zeppelin releases</h3>

Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/index.html
URL: 
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/index.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/index.html (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/index.html Thu Jun  9 16:31:17 
2016
@@ -202,7 +202,7 @@ limitations under the License.
 <h3>Multiple language backend</h3>
 
 <p>Zeppelin interpreter concept allows any language/data-processing-backend to 
be plugged into Zeppelin.
-Currently Zeppelin supports many interpreters such as Scala(with Apache 
Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown and Shell.</p>
+Currently Zeppelin supports many interpreters such as Scala(with Apache 
Spark), Python(with Apache Spark), SparkSQL, JDBC, Markdown and Shell.</p>
 
 <p><img class="img-responsive" 
src="/assets/themes/zeppelin/img/screenshots/multiple_language_backend.png" 
/></p>
 

Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/install.html
URL: 
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/install.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/install.html (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/install.html Thu Jun  9 
16:31:17 2016
@@ -393,7 +393,7 @@ limitations under the License.
     <td>ZEPPELIN_INTERPRETERS</td>
     <td>zeppelin.interpreters</td>
   <description></description>
-    <td>org.apache.zeppelin.spark.SparkInterpreter,<br 
/>org.apache.zeppelin.spark.PySparkInterpreter,<br 
/>org.apache.zeppelin.spark.SparkSqlInterpreter,<br 
/>org.apache.zeppelin.spark.DepInterpreter,<br 
/>org.apache.zeppelin.markdown.Markdown,<br 
/>org.apache.zeppelin.shell.ShellInterpreter,<br 
/>org.apache.zeppelin.hive.HiveInterpreter<br />
+    <td>org.apache.zeppelin.spark.SparkInterpreter,<br 
/>org.apache.zeppelin.spark.PySparkInterpreter,<br 
/>org.apache.zeppelin.spark.SparkSqlInterpreter,<br 
/>org.apache.zeppelin.spark.DepInterpreter,<br 
/>org.apache.zeppelin.markdown.Markdown,<br 
/>org.apache.zeppelin.shell.ShellInterpreter,<br />
     ...
     </td>
     <td>Comma separated interpreter configurations [Class] <br /> The first 
interpreter will be a default value. <br /> It means only the first interpreter 
in this list can be available without <code>%interpreter_name</code> annotation 
in Zeppelin notebook paragraph. </td>

Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/yarn_install.html
URL: 
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/yarn_install.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/yarn_install.html 
(original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/install/yarn_install.html Thu 
Jun  9 16:31:17 2016
@@ -179,7 +179,7 @@ limitations under the License.
 
 <h2>Introduction</h2>
 
-<p>This page describes how to pre-configure a bare metal node, configure 
Zeppelin and connect it to existing YARN cluster running Hortonworks flavour of 
Hadoop. It also describes steps to configure Spark &amp; Hive interpreter of 
Zeppelin.</p>
+<p>This page describes how to pre-configure a bare metal node, configure 
Zeppelin and connect it to existing YARN cluster running Hortonworks flavour of 
Hadoop. It also describes steps to configure Spark interpreter of Zeppelin.</p>
 
 <h2>Prepare Node</h2>
 
@@ -266,14 +266,13 @@ bin/zeppelin-daemon.sh start
 </code></pre></div>
 <h2>Interpreter</h2>
 
-<p>Zeppelin provides various distributed processing frameworks to process data 
that ranges from Spark, Hive, Tajo, Ignite and Lens to name a few. This 
document describes to configure Hive &amp; Spark interpreters.</p>
+<p>Zeppelin provides various distributed processing frameworks to process data 
that ranges from Spark, JDBC, Tajo, Ignite and Lens to name a few. This 
document describes to configure JDBC &amp; Spark interpreters.</p>
 
 <h3>Hive</h3>
 
-<p>Zeppelin supports Hive interpreter and hence copy hive-site.xml that should 
be present at /etc/hive/conf to the configuration folder of Zeppelin. Once 
Zeppelin is built it will have conf folder under 
/home/zeppelin/incubator-zeppelin.</p>
-<div class="highlight"><pre><code class="bash language-bash" 
data-lang="bash">cp /etc/hive/conf/hive-site.xml  
/home/zeppelin/incubator-zeppelin/conf
-</code></pre></div>
-<p>Once Zeppelin server has started successfully, visit 
http://[zeppelin-server-host-name]:8080 with your web browser. Click on 
Interpreter tab next to Notebook dropdown. Look for Hive configurations and set 
them appropriately. By default hive.hiveserver2.url will be pointing to 
localhost and hive.hiveserver2.password/hive.hiveserver2.user are set to 
hive/hive. Set them as per Hive installation on YARN cluster.
+<p>Zeppelin supports Hive through JDBC interpreter. You might need the 
information to use Hive and can find in your hive-site.xml</p>
+
+<p>Once Zeppelin server has started successfully, visit 
http://[zeppelin-server-host-name]:8080 with your web browser. Click on 
Interpreter tab next to Notebook dropdown. Look for Hive configurations and set 
them appropriately. Set them as per Hive installation on YARN cluster.
 Click on Save button. Once these configurations are updated, Zeppelin will 
prompt you to restart the interpreter. Accept the prompt and the interpreter 
will reload the configurations.</p>
 
 <h3>Spark</h3>

Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/hive.html
URL: 
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/hive.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/hive.html (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/hive.html Thu Jun  
9 16:31:17 2016
@@ -167,6 +167,54 @@
 
 <p>The <a href="https://hive.apache.org/";>Apache Hive</a> ™ data warehouse 
software facilitates querying and managing large datasets residing in 
distributed storage. Hive provides a mechanism to project structure onto this 
data and query the data using a SQL-like language called HiveQL. At the same 
time this language also allows traditional map/reduce programmers to plug in 
their custom mappers and reducers when it is inconvenient or inefficient to 
express this logic in HiveQL.</p>
 
+<h2>Important Notice</h2>
+
+<p>Hive Interpreter will be deprecated and merged into JDBC Interpreter. You 
can use Hive Interpreter by using JDBC Interpreter with same functionality. See 
the example below of settings and dependencies.</p>
+
+<h3>Properties</h3>
+
+<table class="table-configuration">
+  <tr>
+    <th>Property</th>
+    <th>Value</th>
+  </tr>
+  <tr>
+    <td>hive.driver</td>
+    <td>org.apache.hive.jdbc.HiveDriver</td>
+  </tr>
+  <tr>
+    <td>hive.url</td>
+    <td>jdbc:hive2://localhost:10000</td>
+  </tr>
+  <tr>
+    <td>hive.user</td>
+    <td>hiveUser</td>
+  </tr>
+  <tr>
+    <td>hive.password</td>
+    <td>hivePassword</td>
+  </tr>
+</table>
+
+<h3>Dependencies</h3>
+
+<table class="table-configuration">
+  <tr>
+    <th>Artifact</th>
+    <th>Exclude</th>
+  </tr>
+  <tr>
+    <td>org.apache.hive:hive-jdbc:0.14.0</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>org.apache.hadoop:hadoop-common:2.6.0</td>
+    <td></td>
+  </tr>
+</table>
+
+<hr>
+
 <h3>Configuration</h3>
 
 <table class="table-configuration">

Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
URL: 
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html Thu Jun  
9 16:31:17 2016
@@ -357,6 +357,52 @@
   </tr>
 </table>
 
+<h3>Examples</h3>
+
+<h4>Hive</h4>
+
+<h5>Properties</h5>
+
+<p><table class="table-configuration">
+   <tr>
+     <th>Name</th>
+     <th>Value</th>
+   </tr>
+   <tr>
+     <td>hive.driver</td>
+     <td>org.apache.hive.jdbc.HiveDriver</td>
+   </tr>
+   <tr>
+     <td>hive.url</td>
+     <td>jdbc:hive2://localhost:10000</td>
+   </tr>
+   <tr>
+     <td>hive.user</td>
+     <td>hive<em>user</td>
+   </tr>
+   <tr>
+     <td>hive.password</td>
+     <td>hive</em>password</td>
+   </tr>
+ </table></p>
+
+<h5>Dependencies</h5>
+
+<p><table class="table-configuration">
+   <tr>
+     <th>Artifact</th>
+     <th>Excludes</th>
+   </tr>
+   <tr>
+     <td>org.apache.hive:hive-jdbc:0.14.0</td>
+     <td></td>
+   </tr>
+   <tr>
+     <td>org.apache.hadoop:hadoop-common:2.6.0</td>
+     <td></td>
+   </tr>
+ </table></p>
+
 <h3>How to use</h3>
 
 <h4>Reference in paragraph</h4>

Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/scalding.html
URL: 
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/scalding.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/scalding.html 
(original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/interpreter/scalding.html Thu 
Jun  9 16:31:17 2016
@@ -186,11 +186,50 @@
 
 <h3>Configuring the Interpreter</h3>
 
-<p>Zeppelin comes with a pre-configured Scalding interpreter in local mode, so 
you do not need to install anything.</p>
+<p>Scalding interpreter runs in two modes:</p>
+
+<ul>
+<li>local</li>
+<li>hdfs</li>
+</ul>
+
+<p>In the local mode, you can access files on the local server and scalding 
transformation are done locally.</p>
+
+<p>In hdfs mode you can access files in HDFS and scalding transformation are 
run as hadoop map-reduce jobs.</p>
+
+<p>Zeppelin comes with a pre-configured Scalding interpreter in local mode.</p>
+
+<p>To run the scalding interpreter in the hdfs mode you have to do the 
following:</p>
+
+<p><strong>Set the classpath with ZEPPELIN_CLASSPATH_OVERRIDES</strong></p>
+
+<p>In conf/zeppelin<em>env.sh, you have to set
+ZEPPELIN</em>CLASSPATH_OVERRIDES to the contents of &#39;hadoop classpath&#39;
+and directories with custom jar files you need for your scalding commands.</p>
+
+<p><strong>Set arguments to the scalding repl</strong></p>
+
+<p>The default arguments are: &quot;--local --repl&quot;</p>
+
+<p>For hdfs mode you need to add: &quot;--hdfs --repl&quot;</p>
+
+<p>If you want to add custom jars, you need to add:
+&quot;-libjars directory/<em>:directory/</em>&quot;</p>
+
+<p>For reducer estimation, you need to add something like:
+&quot;-Dscalding.reducer.estimator.classes=com.twitter.scalding.reducer_estimation.InputSizeReducerEstimator&quot;</p>
+
+<p><strong>Set max.open.instances</strong></p>
+
+<p>If you want to control the maximum number of open interpreters, you have to 
select &quot;scoped&quot; interpreter for note
+option and set max.open.instances argument.</p>
 
 <h3>Testing the Interpreter</h3>
 
-<p>In example, by using the <a 
href="https://gist.github.com/johnynek/a47699caa62f4f38a3e2";>Alice in 
Wonderland</a> tutorial, we will count words (of course!), and plot a graph of 
the top 10 words in the book.</p>
+<h4>Local mode</h4>
+
+<p>In example, by using the <a 
href="https://gist.github.com/johnynek/a47699caa62f4f38a3e2";>Alice in 
Wonderland</a> tutorial, 
+we will count words (of course!), and plot a graph of the top 10 words in the 
book.</p>
 <div class="highlight"><pre><code class="text language-text" 
data-lang="text">%scalding
 
 import scala.io.Source
@@ -223,11 +262,36 @@ print(&quot;%table &quot; + table)
 <p>If you click on the icon for the pie chart, you should be able to see a 
chart like this:
 <img src="../assets/themes/zeppelin/img/docs-img/scalding-pie.png" 
alt="Scalding - Pie - Chart"></p>
 
-<h3>Current Status &amp; Future Work</h3>
+<h4>HDFS mode</h4>
+
+<p><strong>Test mode</strong></p>
+<div class="highlight"><pre><code class="text language-text" 
data-lang="text">%scalding
+mode
+</code></pre></div>
+<p>This command should print:</p>
+<div class="highlight"><pre><code class="text language-text" 
data-lang="text">res4: com.twitter.scalding.Mode = Hdfs(true,Configuration: 
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, 
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml)
+</code></pre></div>
+<p><strong>Test HDFS read</strong></p>
+<div class="highlight"><pre><code class="text language-text" 
data-lang="text">val testfile = 
TypedPipe.from(TextLine(&quot;/user/x/testfile&quot;))
+testfile.dump
+</code></pre></div>
+<p>This command should print the contents of the hdfs file 
/user/x/testfile.</p>
+
+<p><strong>Test map-reduce job</strong></p>
+<div class="highlight"><pre><code class="text language-text" 
data-lang="text">val testfile = 
TypedPipe.from(TextLine(&quot;/user/x/testfile&quot;))
+val a = testfile.groupAll.size.values
+a.toList
+</code></pre></div>
+<p>This command should create a map reduce job.</p>
 
-<p>The current implementation of the Scalding interpreter does not support 
canceling jobs, or fine-grained progress updates.</p>
+<h3>Future Work</h3>
 
-<p>The pre-configured Scalding interpreter only supports Scalding in local 
mode. Hadoop mode for Scalding is currently unsupported, and will be future 
work (contributions welcome!).</p>
+<ul>
+<li>Better user feedback (hadoop url, progress updates)</li>
+<li>Ability to cancel jobs</li>
+<li>Ability to dynamically load jars without restarting the interpreter</li>
+<li>Multiuser scalability (run scalding interpreters on different servers)</li>
+</ul>
 
   </div>
 </div>

Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/manual/interpreters.html
URL: 
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/manual/interpreters.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/manual/interpreters.html 
(original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/manual/interpreters.html Thu 
Jun  9 16:31:17 2016
@@ -181,7 +181,7 @@ limitations under the License.
 
 <p>In this section, we will explain about the role of interpreters, 
interpreters group and interpreter settings in Zeppelin.
 The concept of Zeppelin interpreter allows any 
language/data-processing-backend to be plugged into Zeppelin.
-Currently, Zeppelin supports many interpreters such as Scala ( with Apache 
Spark ), Python ( with Apache Spark ), SparkSQL, Hive, Markdown, Shell and so 
on.</p>
+Currently, Zeppelin supports many interpreters such as Scala ( with Apache 
Spark ), Python ( with Apache Spark ), SparkSQL, JDBC, Markdown, Shell and so 
on.</p>
 
 <h2>What is Zeppelin interpreter?</h2>
 

Modified: incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/rss.xml
URL: 
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/rss.xml?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/rss.xml (original)
+++ incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/rss.xml Thu Jun  9 16:31:17 2016
@@ -5,8 +5,8 @@
         <description>Apache Zeppelin - The Apache Software 
Foundation</description>
         <link>http://zeppelin.apache.org</link>
         <link>http://zeppelin.apache.org</link>
-        <lastBuildDate>2016-06-08T11:53:05-07:00</lastBuildDate>
-        <pubDate>2016-06-08T11:53:05-07:00</pubDate>
+        <lastBuildDate>2016-06-09T09:31:10-07:00</lastBuildDate>
+        <pubDate>2016-06-09T09:31:10-07:00</pubDate>
         <ttl>1800</ttl>
 
 

Modified: 
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/security/interpreter_authorization.html
URL: 
http://svn.apache.org/viewvc/incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/security/interpreter_authorization.html?rev=1747561&r1=1747560&r2=1747561&view=diff
==============================================================================
--- 
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/security/interpreter_authorization.html
 (original)
+++ 
incubator/zeppelin/site/docs/0.6.0-SNAPSHOT/security/interpreter_authorization.html
 Thu Jun  9 16:31:17 2016
@@ -187,7 +187,7 @@ limitations under the License.
 
 <p>Data source authorization involves authenticating to the data source like a 
Mysql database and letting it determine user permissions.</p>
 
-<p>For the Hive interpreter, we need to maintain per-user connection pools.
+<p>For the JDBC interpreter, we need to maintain per-user connection pools.
 The interpret method takes the user string as parameter and executes the jdbc 
call using a connection in the user&#39;s connection pool.</p>
 
 <p>In case of Presto, we don&#39;t need password if the Presto DB server runs 
backend code using HDFS authorization for the user.


Reply via email to