http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/hadoop-provided.html ---------------------------------------------------------------------- diff --git a/site/docs/2.1.0/hadoop-provided.html b/site/docs/2.1.0/hadoop-provided.html index ff7afb7..9d77cf0 100644 --- a/site/docs/2.1.0/hadoop-provided.html +++ b/site/docs/2.1.0/hadoop-provided.html @@ -133,16 +133,16 @@ <h1 id="apache-hadoop">Apache Hadoop</h1> <p>For Apache distributions, you can use Hadoop’s ‘classpath’ command. For instance:</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c">### in conf/spark-env.sh ###</span> +<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1">### in conf/spark-env.sh ###</span> -<span class="c"># If 'hadoop' binary is on your PATH</span> -<span class="nb">export </span><span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>hadoop classpath<span class="k">)</span> +<span class="c1"># If 'hadoop' binary is on your PATH</span> +<span class="nb">export</span> <span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>hadoop classpath<span class="k">)</span> -<span class="c"># With explicit path to 'hadoop' binary</span> -<span class="nb">export </span><span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>/path/to/hadoop/bin/hadoop classpath<span class="k">)</span> +<span class="c1"># With explicit path to 'hadoop' binary</span> +<span class="nb">export</span> <span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>/path/to/hadoop/bin/hadoop classpath<span class="k">)</span> -<span class="c"># Passing a Hadoop configuration directory</span> -<span class="nb">export </span><span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>hadoop --config /path/to/configs classpath<span class="k">)</span></code></pre></div> +<span class="c1"># Passing a Hadoop configuration directory</span> +<span class="nb">export</span> <span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>hadoop --config /path/to/configs classpath<span class="k">)</span></code></pre></figure>
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/img/structured-streaming-watermark.png ---------------------------------------------------------------------- diff --git a/site/docs/2.1.0/img/structured-streaming-watermark.png b/site/docs/2.1.0/img/structured-streaming-watermark.png new file mode 100644 index 0000000..f21fbda Binary files /dev/null and b/site/docs/2.1.0/img/structured-streaming-watermark.png differ http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/img/structured-streaming.pptx ---------------------------------------------------------------------- diff --git a/site/docs/2.1.0/img/structured-streaming.pptx b/site/docs/2.1.0/img/structured-streaming.pptx index 6aad2ed..f5bdfc0 100644 Binary files a/site/docs/2.1.0/img/structured-streaming.pptx and b/site/docs/2.1.0/img/structured-streaming.pptx differ http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/job-scheduling.html ---------------------------------------------------------------------- diff --git a/site/docs/2.1.0/job-scheduling.html b/site/docs/2.1.0/job-scheduling.html index 53161c2..9651607 100644 --- a/site/docs/2.1.0/job-scheduling.html +++ b/site/docs/2.1.0/job-scheduling.html @@ -127,24 +127,24 @@ <ul id="markdown-toc"> - <li><a href="#overview" id="markdown-toc-overview">Overview</a></li> - <li><a href="#scheduling-across-applications" id="markdown-toc-scheduling-across-applications">Scheduling Across Applications</a> <ul> - <li><a href="#dynamic-resource-allocation" id="markdown-toc-dynamic-resource-allocation">Dynamic Resource Allocation</a> <ul> - <li><a href="#configuration-and-setup" id="markdown-toc-configuration-and-setup">Configuration and Setup</a></li> - <li><a href="#resource-allocation-policy" id="markdown-toc-resource-allocation-policy">Resource Allocation Policy</a> <ul> - <li><a href="#request-policy" id="markdown-toc-request-policy">Request Policy</a></li> - <li><a href="#remove-policy" id="markdown-toc-remove-policy">Remove Policy</a></li> + <li><a href="#overview">Overview</a></li> + <li><a href="#scheduling-across-applications">Scheduling Across Applications</a> <ul> + <li><a href="#dynamic-resource-allocation">Dynamic Resource Allocation</a> <ul> + <li><a href="#configuration-and-setup">Configuration and Setup</a></li> + <li><a href="#resource-allocation-policy">Resource Allocation Policy</a> <ul> + <li><a href="#request-policy">Request Policy</a></li> + <li><a href="#remove-policy">Remove Policy</a></li> </ul> </li> - <li><a href="#graceful-decommission-of-executors" id="markdown-toc-graceful-decommission-of-executors">Graceful Decommission of Executors</a></li> + <li><a href="#graceful-decommission-of-executors">Graceful Decommission of Executors</a></li> </ul> </li> </ul> </li> - <li><a href="#scheduling-within-an-application" id="markdown-toc-scheduling-within-an-application">Scheduling Within an Application</a> <ul> - <li><a href="#fair-scheduler-pools" id="markdown-toc-fair-scheduler-pools">Fair Scheduler Pools</a></li> - <li><a href="#default-behavior-of-pools" id="markdown-toc-default-behavior-of-pools">Default Behavior of Pools</a></li> - <li><a href="#configuring-pool-properties" id="markdown-toc-configuring-pool-properties">Configuring Pool Properties</a></li> + <li><a href="#scheduling-within-an-application">Scheduling Within an Application</a> <ul> + <li><a href="#fair-scheduler-pools">Fair Scheduler Pools</a></li> + <li><a href="#default-behavior-of-pools">Default Behavior of Pools</a></li> + <li><a href="#configuring-pool-properties">Configuring Pool Properties</a></li> </ul> </li> </ul> @@ -321,9 +321,9 @@ mode is best for multi-user settings.</p> <p>To enable the fair scheduler, simply set the <code>spark.scheduler.mode</code> property to <code>FAIR</code> when configuring a SparkContext:</p> -<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">setMaster</span><span class="o">(...).</span><span class="n">setAppName</span><span class="o">(...)</span> +<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">setMaster</span><span class="o">(...).</span><span class="n">setAppName</span><span class="o">(...)</span> <span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">"spark.scheduler.mode"</span><span class="o">,</span> <span class="s">"FAIR"</span><span class="o">)</span> -<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></div> +<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></figure> <h2 id="fair-scheduler-pools">Fair Scheduler Pools</h2> @@ -337,15 +337,15 @@ many concurrent jobs they have instead of giving <em>jobs</em> equal shares. Thi adding the <code>spark.scheduler.pool</code> “local property” to the SparkContext in the thread that’s submitting them. This is done as follows:</p> -<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Assuming sc is your SparkContext variable</span> -<span class="n">sc</span><span class="o">.</span><span class="n">setLocalProperty</span><span class="o">(</span><span class="s">"spark.scheduler.pool"</span><span class="o">,</span> <span class="s">"pool1"</span><span class="o">)</span></code></pre></div> +<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Assuming sc is your SparkContext variable</span> +<span class="n">sc</span><span class="o">.</span><span class="n">setLocalProperty</span><span class="o">(</span><span class="s">"spark.scheduler.pool"</span><span class="o">,</span> <span class="s">"pool1"</span><span class="o">)</span></code></pre></figure> <p>After setting this local property, <em>all</em> jobs submitted within this thread (by calls in this thread to <code>RDD.save</code>, <code>count</code>, <code>collect</code>, etc) will use this pool name. The setting is per-thread to make it easy to have a thread run multiple jobs on behalf of the same user. If you’d like to clear the pool that a thread is associated with, simply call:</p> -<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">sc</span><span class="o">.</span><span class="n">setLocalProperty</span><span class="o">(</span><span class="s">"spark.scheduler.pool"</span><span class="o">,</span> <span class="kc">null</span><span class="o">)</span></code></pre></div> +<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">sc</span><span class="o">.</span><span class="n">setLocalProperty</span><span class="o">(</span><span class="s">"spark.scheduler.pool"</span><span class="o">,</span> <span class="kc">null</span><span class="o">)</span></code></pre></figure> <h2 id="default-behavior-of-pools">Default Behavior of Pools</h2> @@ -379,12 +379,12 @@ of the cluster. By default, each pool’s <code>minShare</code> is 0.</li> and setting a <code>spark.scheduler.allocation.file</code> property in your <a href="configuration.html#spark-properties">SparkConf</a>.</p> -<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">"spark.scheduler.allocation.file"</span><span class="o">,</span> <span class="s">"/path/to/file"</span><span class="o">)</span></code></pre></div> +<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">"spark.scheduler.allocation.file"</span><span class="o">,</span> <span class="s">"/path/to/file"</span><span class="o">)</span></code></pre></figure> <p>The format of the XML file is simply a <code><pool></code> element for each pool, with different elements within it for the various settings. For example:</p> -<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="cp"><?xml version="1.0"?></span> +<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><span></span><span class="cp"><?xml version="1.0"?></span> <span class="nt"><allocations></span> <span class="nt"><pool</span> <span class="na">name=</span><span class="s">"production"</span><span class="nt">></span> <span class="nt"><schedulingMode></span>FAIR<span class="nt"></schedulingMode></span> @@ -396,7 +396,7 @@ within it for the various settings. For example:</p> <span class="nt"><weight></span>2<span class="nt"></weight></span> <span class="nt"><minShare></span>3<span class="nt"></minShare></span> <span class="nt"></pool></span> -<span class="nt"></allocations></span></code></pre></div> +<span class="nt"></allocations></span></code></pre></figure> <p>A full example is also available in <code>conf/fairscheduler.xml.template</code>. Note that any pools not configured in the XML file will simply get default values for all settings (scheduling mode FIFO, http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/ml-advanced.html ---------------------------------------------------------------------- diff --git a/site/docs/2.1.0/ml-advanced.html b/site/docs/2.1.0/ml-advanced.html index 02c95e1..84dcf43 100644 --- a/site/docs/2.1.0/ml-advanced.html +++ b/site/docs/2.1.0/ml-advanced.html @@ -307,10 +307,10 @@ <ul id="markdown-toc"> - <li><a href="#optimization-of-linear-methods-developer" id="markdown-toc-optimization-of-linear-methods-developer">Optimization of linear methods (developer)</a> <ul> - <li><a href="#limited-memory-bfgs-l-bfgs" id="markdown-toc-limited-memory-bfgs-l-bfgs">Limited-memory BFGS (L-BFGS)</a></li> - <li><a href="#normal-equation-solver-for-weighted-least-squares" id="markdown-toc-normal-equation-solver-for-weighted-least-squares">Normal equation solver for weighted least squares</a></li> - <li><a href="#iteratively-reweighted-least-squares-irls" id="markdown-toc-iteratively-reweighted-least-squares-irls">Iteratively reweighted least squares (IRLS)</a></li> + <li><a href="#optimization-of-linear-methods-developer">Optimization of linear methods (developer)</a> <ul> + <li><a href="#limited-memory-bfgs-l-bfgs">Limited-memory BFGS (L-BFGS)</a></li> + <li><a href="#normal-equation-solver-for-weighted-least-squares">Normal equation solver for weighted least squares</a></li> + <li><a href="#iteratively-reweighted-least-squares-irls">Iteratively reweighted least squares (IRLS)</a></li> </ul> </li> </ul> @@ -385,7 +385,7 @@ Quasi-Newton methods in this case. This fallback is currently always enabled for <p><code>WeightedLeastSquares</code> supports L1, L2, and elastic-net regularization and provides options to enable or disable regularization and standardization. In the case where no L1 regularization is applied (i.e. $\alpha = 0$), there exists an analytical solution and either Cholesky or Quasi-Newton solver may be used. When $\alpha > 0$ no analytical -solution exists and we instead use the Quasi-Newton solver to find the coefficients iteratively.</p> +solution exists and we instead use the Quasi-Newton solver to find the coefficients iteratively. </p> <p>In order to make the normal equation approach efficient, <code>WeightedLeastSquares</code> requires that the number of features be no more than 4096. For larger problems, use L-BFGS instead.</p> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org