This is an automated email from the ASF dual-hosted git repository. baunsgaard pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/systemds.git
commit 52b59bd594a18f6c22adb5648fcf9ab6cb32c770 Author: Sebastian Baunsgaard <[email protected]> AuthorDate: Fri Aug 30 14:08:21 2024 +0200 [DOCS] Update Python API --- docs/api/java/member-search-index.zip | Bin 255110 -> 255110 bytes docs/api/java/package-search-index.zip | Bin 900 -> 900 bytes docs/api/java/type-search-index.zip | Bin 15653 -> 15653 bytes .../python/getting_started/simple_examples.html | 80 +++++++++++---------- docs/api/python/guide/algorithms_basics.html | 29 ++++---- docs/api/python/guide/federated.html | 6 +- docs/api/python/guide/python_end_to_end_tut.html | 38 +++++----- .../getting_started/simple_examples.rst.txt | 16 +++-- .../python/sources/guide/algorithms_basics.rst.txt | 14 +++- docs/api/python/sources/guide/federated.rst.txt | 12 +++- .../sources/guide/python_end_to_end_tut.rst.txt | 23 ++++++ 11 files changed, 136 insertions(+), 82 deletions(-) diff --git a/docs/api/java/member-search-index.zip b/docs/api/java/member-search-index.zip index 22c7d88707..d1d2ddd75b 100644 Binary files a/docs/api/java/member-search-index.zip and b/docs/api/java/member-search-index.zip differ diff --git a/docs/api/java/package-search-index.zip b/docs/api/java/package-search-index.zip index 5528ad9138..41f533ee77 100644 Binary files a/docs/api/java/package-search-index.zip and b/docs/api/java/package-search-index.zip differ diff --git a/docs/api/java/type-search-index.zip b/docs/api/java/type-search-index.zip index 010e344958..21e4180df1 100644 Binary files a/docs/api/java/type-search-index.zip and b/docs/api/java/type-search-index.zip differ diff --git a/docs/api/python/getting_started/simple_examples.html b/docs/api/python/getting_started/simple_examples.html index d79df9c849..8dfea75977 100644 --- a/docs/api/python/getting_started/simple_examples.html +++ b/docs/api/python/getting_started/simple_examples.html @@ -108,22 +108,25 @@ <section id="matrix-operations"> <h2>Matrix Operations<a class="headerlink" href="#matrix-operations" title="Link to this heading"></a></h2> <p>Making use of SystemDS, let us multiply an Matrix with an scalar:</p> -<pre class="code python literal-block"><code><span class="keyword namespace">import</span> <span class="name namespace">logging</span><span class="whitespace"> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> +<span class="kn">import</span> <span class="nn">logging</span> -</span><span class="keyword namespace">from</span> <span class="name namespace">systemds.context</span> <span class="keyword namespace">import</span> <span class="name">SystemDSContext</span><span class="whitespace"> +<span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> -</span><span class="comment single"># Create a context and if necessary (no SystemDS py4j instance running)</span><span class="whitespace"> -</span><span class="comment single"># it starts a subprocess which does the execution in SystemDS</span><span class="whitespace"> -</span><span class="keyword">with</span> <span class="name">SystemDSContext</span><span class="punctuation">()</span> <span class="keyword">as</span> <span class="name">sds</span><span class="punctuation">:</span><span class="whitespace"> -</span> <span class="comment single"># Full generates a matrix completely filled with one number.</span><span class="whitespace"> -</span> <span class="comment single"># Generate a 5x10 matrix filled with 4.2</span><span class="whitespace"> -</span> <span class="name">m</span> <span class="operator">=</span> <span class="name">sds</span><span class="operator">.</span><span class="name">full</span><span class="punctuation">((</span><span class="literal number integer">5</span><span class="punctuation">,</span> <span class="literal number integer">10</span><span class="punctuation">),</span> <span class="literal number float">4.20</span><span class="punctuation">)</span><span class="whitespace"> -</span> <span class="comment single"># multiply with scalar. Nothing is executed yet!</span><span class="whitespace"> -</span> <span class="name">m_res</span> <span class="operator">=</span> <span class="name">m</span> <span class="operator">*</span> <span class="literal number float">3.1</span><span class="whitespace"> -</span> <span class="comment single"># Do the calculation in SystemDS by calling compute().</span><span class="whitespace"> -</span> <span class="comment single"># The returned value is an numpy array that can be directly printed.</span><span class="whitespace"> -</span> <span class="name">logging</span><span class="operator">.</span><span class="name">info</span><span class="punctuation">(</span><span class="name">m_res</span><span class="operator">.</span><span class="name">compute</span><span class="punctuation">())</span><span class="whitespace"> -</span> <span class="comment single"># context will automatically be closed and process stopped</span></code></pre> +<span class="c1"># Create a context and if necessary (no SystemDS py4j instance running)</span> +<span class="c1"># it starts a subprocess which does the execution in SystemDS</span> +<span class="k">with</span> <span class="n">SystemDSContext</span><span class="p">()</span> <span class="k">as</span> <span class="n">sds</span><span class="p">:</span> + <span class="c1"># Full generates a matrix completely filled with one number.</span> + <span class="c1"># Generate a 5x10 matrix filled with 4.2</span> + <span class="n">m</span> <span class="o">=</span> <span class="n">sds</span><span class="o">.</span><span class="n">full</span><span class="p">((</span><span class="mi">5</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span> <span class="mf">4.20</span><span class="p">)</span> + <span class="c1"># multiply with scalar. Nothing is executed yet!</span> + <span class="n">m_res</span> <span class="o">=</span> <span class="n">m</span> <span class="o">*</span> <span class="mf">3.1</span> + <span class="c1"># Do the calculation in SystemDS by calling compute().</span> + <span class="c1"># The returned value is an numpy array that can be directly printed.</span> + <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="n">m_res</span><span class="o">.</span><span class="n">compute</span><span class="p">())</span> + <span class="c1"># context will automatically be closed and process stopped</span> +</pre></div> +</div> <p>As output we get</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[[</span><span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span><span class="p">]</span> <span class="p">[</span><span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span> <span class="mf">13.02</span><span class="p">]</span> @@ -135,7 +138,7 @@ <p>The Python SystemDS package is compatible with numpy arrays. Let us do a quick element-wise matrix multiplication of numpy arrays with SystemDS. Remember to first start up a new terminal:</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">logging</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">logging</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> <span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> @@ -161,31 +164,34 @@ Remember to first start up a new terminal:</p> <h2>More complex operations<a class="headerlink" href="#more-complex-operations" title="Link to this heading"></a></h2> <p>SystemDS provides algorithm level functions as built-in functions to simplify development. One example of this is l2SVM, a high level functions for Data-Scientists. Let’s take a look at l2svm:</p> -<pre class="code python literal-block"><code><span class="keyword namespace">import</span> <span class="name namespace">logging</span><span class="whitespace"> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> +<span class="kn">import</span> <span class="nn">logging</span> -</span><span class="keyword namespace">import</span> <span class="name namespace">numpy</span> <span class="keyword">as</span> <span class="name namespace">np</span><span class="whitespace"> -</span><span class="keyword namespace">from</span> <span class="name namespace">systemds.context</span> <span class="keyword namespace">import</span> <span class="name">SystemDSContext</span><span class="whitespace"> -</span><span class="keyword namespace">from</span> <span class="name namespace">systemds.operator.algorithm</span> <span class="keyword namespace">import</span> <span class="name">l2svm</span><span class="whitespace"> +<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> +<span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> +<span class="kn">from</span> <span class="nn">systemds.operator.algorithm</span> <span class="kn">import</span> <span class="n">l2svm</span> -</span><span class="comment single"># Set a seed</span><span class="whitespace"> -</span><span class="name">np</span><span class="operator">.</span><span class="name">random</span><span class="operator">.</span><span class="name">seed</span><span class="punctuation">(</span><span class="literal number integer">0</span><span class="punctuation">)</span><span class="whitespace"> -</span><span class="comment single"># Generate random features and labels in numpy</span><span class="whitespace"> -</span><span class="comment single"># This can easily be exchanged with a data set.</span><span class="whitespace"> -</span><span class="name">features</span> <span class="operator">=</span> <span class="name">np</span><span class="operator">.</span><span class="name">array</span><span class="punctuation">(</span><span class="name">np</span><span class="operator">.</span><span class="name">random</span><span class="operator">.</span><span class="name">randint</span><span class="punctuation">(</span><span class="whitespace"> -</span> <span class="literal number integer">100</span><span class="punctuation">,</span> <span class="name">size</span><span class="operator">=</span><span class="literal number integer">10</span> <span class="operator">*</span> <span class="literal number integer">10</span><span class="punctuation">)</span> <span class="operator">+</span> <span class="literal number float">1.01</span><span class="punctuation">,</span> <span class="name">dtype</span><span class="operator">=</span><sp [...] -</span><span class="name">features</span><span class="operator">.</span><span class="name">shape</span> <span class="operator">=</span> <span class="punctuation">(</span><span class="literal number integer">10</span><span class="punctuation">,</span> <span class="literal number integer">10</span><span class="punctuation">)</span><span class="whitespace"> -</span><span class="name">labels</span> <span class="operator">=</span> <span class="name">np</span><span class="operator">.</span><span class="name">zeros</span><span class="punctuation">((</span><span class="literal number integer">10</span><span class="punctuation">,</span> <span class="literal number integer">1</span><span class="punctuation">))</span><span class="whitespace"> +<span class="c1"># Set a seed</span> +<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> +<span class="c1"># Generate random features and labels in numpy</span> +<span class="c1"># This can easily be exchanged with a data set.</span> +<span class="n">features</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span> + <span class="mi">100</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">10</span> <span class="o">*</span> <span class="mi">10</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.01</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">double</span><span class="p">)</span> +<span class="n">features</span><span class="o">.</span><span class="n">shape</span> <span class="o">=</span> <span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span> +<span class="n">labels</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">10</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> -</span><span class="comment single"># l2svm labels can only be 0 or 1</span><span class="whitespace"> -</span><span class="keyword">for</span> <span class="name">i</span> <span class="operator word">in</span> <span class="name builtin">range</span><span class="punctuation">(</span><span class="literal number integer">10</span><span class="punctuation">):</span><span class="whitespace"> -</span> <span class="keyword">if</span> <span class="name">np</span><span class="operator">.</span><span class="name">random</span><span class="operator">.</span><span class="name">random</span><span class="punctuation">()</span> <span class="operator">></span> <span class="literal number float">0.5</span><span class="punctuation">:</span><span class="whitespace"> -</span> <span class="name">labels</span><span class="punctuation">[</span><span class="name">i</span><span class="punctuation">][</span><span class="literal number integer">0</span><span class="punctuation">]</span> <span class="operator">=</span> <span class="literal number integer">1</span><span class="whitespace"> +<span class="c1"># l2svm labels can only be 0 or 1</span> +<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span> + <span class="k">if</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">></span> <span class="mf">0.5</span><span class="p">:</span> + <span class="n">labels</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span> -</span><span class="comment single"># compute our model</span><span class="whitespace"> -</span><span class="keyword">with</span> <span class="name">SystemDSContext</span><span class="punctuation">()</span> <span class="keyword">as</span> <span class="name">sds</span><span class="punctuation">:</span><span class="whitespace"> -</span> <span class="name">model</span> <span class="operator">=</span> <span class="name">l2svm</span><span class="punctuation">(</span><span class="name">sds</span><span class="operator">.</span><span class="name">from_numpy</span><span class="punctuation">(</span><span class="name">features</span><span class="punctuation">),</span><span class="whitespace"> -</span> <span class="name">sds</span><span class="operator">.</span><span class="name">from_numpy</span><span class="punctuation">(</span><span class="name">labels</span><span class="punctuation">),</span> <span class="name">verbose</span><span class="operator">=</span><span class="keyword constant">False</span><span class="punctuation">)</span><span class="operator">.</span><span class="name">compute</span><span class="punctuation">()</span><span class="whitespace"> -</span> <span class="name">logging</span><span class="operator">.</span><span class="name">info</span><span class="punctuation">(</span><span class="name">model</span><span class="punctuation">)</span></code></pre> +<span class="c1"># compute our model</span> +<span class="k">with</span> <span class="n">SystemDSContext</span><span class="p">()</span> <span class="k">as</span> <span class="n">sds</span><span class="p">:</span> + <span class="n">model</span> <span class="o">=</span> <span class="n">l2svm</span><span class="p">(</span><span class="n">sds</span><span class="o">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">features</span><span class="p">),</span> + <span class="n">sds</span><span class="o">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">labels</span><span class="p">),</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span><span class="o">.</span><span class="n">compute</span><span class="p">()</span> + <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="n">model</span><span class="p">)</span> +</pre></div> +</div> <p>The output should be similar to</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[[</span> <span class="mf">0.02033445</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="mf">0.00324092</span><span class="p">]</span> @@ -202,7 +208,7 @@ One example of this is l2SVM, a high level functions for Data-Scientists. Let’ <p>To get the full performance of SystemDS one can modify the script to only use internal functionality, instead of using numpy arrays that have to be transfered into systemDS. The above script transformed goes like this:</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">logging</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">logging</span> <span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> <span class="kn">from</span> <span class="nn">systemds.operator.algorithm</span> <span class="kn">import</span> <span class="n">l2svm</span> diff --git a/docs/api/python/guide/algorithms_basics.html b/docs/api/python/guide/algorithms_basics.html index eeb2ddc19d..6311b744ef 100644 --- a/docs/api/python/guide/algorithms_basics.html +++ b/docs/api/python/guide/algorithms_basics.html @@ -120,7 +120,7 @@ since it is commonly known and explored.</p> <h2>Step 1: Get Dataset<a class="headerlink" href="#step-1-get-dataset" title="Link to this heading"></a></h2> <p>SystemDS provides builtin for downloading and setup of the MNIST dataset. To setup this simply use</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> <span class="kn">from</span> <span class="nn">systemds.examples.tutorials.mnist</span> <span class="kn">import</span> <span class="n">DataManager</span> <span class="kn">from</span> <span class="nn">systemds.operator.algorithm</span> <span class="kn">import</span> <span class="n">multiLogReg</span><span class="p">,</span> <span class="n">multiLogRegPredict</span> @@ -160,7 +160,7 @@ unfortunately the algorithm require the labels to be distinct integers from 1 an <section id="step-3-training"> <h2>Step 3: Training<a class="headerlink" href="#step-3-training" title="Link to this heading"></a></h2> <p>To start with, we setup a SystemDS context and setup the data:</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Yt</span> <span class="o">=</span> <span class="n">d</span><span class="o">.</span><span class="n">get_test_labels</span><span class="p">()</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Yt</span> <span class="o">=</span> <span class="n">d</span><span class="o">.</span><span class="n">get_test_labels</span><span class="p">()</span> <span class="k">with</span> <span class="n">SystemDSContext</span><span class="p">()</span> <span class="k">as</span> <span class="n">sds</span><span class="p">:</span> <span class="c1"># Train Data</span> @@ -224,17 +224,20 @@ and have nothing more to learn from the data as it is now.</p> <p>To improve further we have to increase the training data, here for example we increase it from our sample of 1k to the full training dataset of 60k, in this example the maxi is set to reduce the number of iterations the algorithm takes, to again reduce training time</p> -<pre class="code python literal-block"><code><span class="name">Yt</span> <span class="operator">=</span> <span class="name">d</span><span class="operator">.</span><span class="name">get_test_labels</span><span class="punctuation">()</span><span class="whitespace"> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Yt</span> <span class="o">=</span> <span class="n">d</span><span class="o">.</span><span class="n">get_test_labels</span><span class="p">()</span> -</span><span class="keyword">with</span> <span class="name">SystemDSContext</span><span class="punctuation">()</span> <span class="keyword">as</span> <span class="name">sds</span><span class="punctuation">:</span><span class="whitespace"> -</span> <span class="comment single"># Train Data</span><span class="whitespace"> -</span> <span class="name">X_ds</span> <span class="operator">=</span> <span class="name">sds</span><span class="operator">.</span><span class="name">from_numpy</span><span class="punctuation">(</span><span class="name">X</span><span class="punctuation">)</span><span class="whitespace"> -</span> <span class="name">Y_ds</span> <span class="operator">=</span> <span class="name">sds</span><span class="operator">.</span><span class="name">from_numpy</span><span class="punctuation">(</span><span class="name">Y</span><span class="punctuation">)</span> <span class="operator">+</span> <span class="literal number float">1.0</span><span class="whitespace"> -</span> <span class="name">bias</span> <span class="operator">=</span> <span class="name">multiLogReg</span><span class="punctuation">(</span><span class="name">X_ds</span><span class="punctuation">,</span> <span class="name">Y_ds</span><span class="punctuation">,</span> <span class="name">maxi</span><span class="operator">=</span><span class="literal number integer">30</span><span class="punctuation">,</span> <span class="name">verbose</span><span class="operator">=</span><span class [...] -</span> <span class="comment single"># Test data</span><span class="whitespace"> -</span> <span class="name">Xt_ds</span> <span class="operator">=</span> <span class="name">sds</span><span class="operator">.</span><span class="name">from_numpy</span><span class="punctuation">(</span><span class="name">Xt</span><span class="punctuation">)</span><span class="whitespace"> -</span> <span class="name">Yt_ds</span> <span class="operator">=</span> <span class="name">sds</span><span class="operator">.</span><span class="name">from_numpy</span><span class="punctuation">(</span><span class="name">Yt</span><span class="punctuation">)</span> <span class="operator">+</span> <span class="literal number float">1.0</span><span class="whitespace"> -</span> <span class="punctuation">[</span><span class="name">m</span><span class="punctuation">,</span> <span class="name">y_pred</span><span class="punctuation">,</span> <span class="name">acc</span><span class="punctuation">]</span> <span class="operator">=</span> <span class="name">multiLogRegPredict</span><span class="punctuation">(</span><span class="name">Xt_ds</span><span class="punctuation">,</span> <span class="name">bias</span><span class="punctuation">,</span> <span class=" [...] +<span class="k">with</span> <span class="n">SystemDSContext</span><span class="p">()</span> <span class="k">as</span> <span class="n">sds</span><span class="p">:</span> + <span class="c1"># Train Data</span> + <span class="n">X_ds</span> <span class="o">=</span> <span class="n">sds</span><span class="o">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">X</span><span class="p">)</span> + <span class="n">Y_ds</span> <span class="o">=</span> <span class="n">sds</span><span class="o">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">Y</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.0</span> + <span class="n">bias</span> <span class="o">=</span> <span class="n">multiLogReg</span><span class="p">(</span><span class="n">X_ds</span><span class="p">,</span> <span class="n">Y_ds</span><span class="p">,</span> <span class="n">maxi</span><span class="o">=</span><span class="mi">30</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> + <span class="c1"># Test data</span> + <span class="n">Xt_ds</span> <span class="o">=</span> <span class="n">sds</span><span class="o">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">Xt</span><span class="p">)</span> + <span class="n">Yt_ds</span> <span class="o">=</span> <span class="n">sds</span><span class="o">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">Yt</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.0</span> + <span class="p">[</span><span class="n">m</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">,</span> <span class="n">acc</span><span class="p">]</span> <span class="o">=</span> <span class="n">multiLogRegPredict</span><span class="p">(</span><span class="n">Xt_ds</span><span class="p">,</span> <span class="n">bias</span><span class="p">,</span> <span class="n">Y</span><span class="o">=</span><span class="n">Yt_ds</span><span class="p">,</span> <span class="n [...] + +</pre></div> +</div> <p>With this change the accuracy achieved changes from the previous value to 92%. But this is a basic implementation that can be replaced by a variety of algorithms and techniques.</p> </section> @@ -243,7 +246,7 @@ But this is a basic implementation that can be replaced by a variety of algorith <p>The full script, some steps are combined to reduce the overall script. One noteworthy change is the + 1 is done on the matrix ready for SystemDS, this makes SystemDS responsible for adding the 1 to each value.</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">logging</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">logging</span> <span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> <span class="kn">from</span> <span class="nn">systemds.examples.tutorials.mnist</span> <span class="kn">import</span> <span class="n">DataManager</span> diff --git a/docs/api/python/guide/federated.html b/docs/api/python/guide/federated.html index 92b6b48794..db29fc01bb 100644 --- a/docs/api/python/guide/federated.html +++ b/docs/api/python/guide/federated.html @@ -130,7 +130,7 @@ In this example we simply use Numpy to create a <code class="docutils literal no <p>Currently we also require a metadata file for the federated worker. This should be located next to the <code class="docutils literal notranslate"><span class="pre">test.csv</span></code> file called <code class="docutils literal notranslate"><span class="pre">test.csv.mtd</span></code>. To make both the data and metadata simply execute the following</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># Python</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># Python</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> @@ -145,7 +145,7 @@ To make both the data and metadata simply execute the following</p> </div> <p>After creating our data the federated worker becomes able to execute federated instructions. The aggregated sum using federated instructions in python SystemDS is done as follows</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># Python</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># Python</span> <span class="kn">import</span> <span class="nn">logging</span> <span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> @@ -180,7 +180,7 @@ Start with 3 different terminals, and run one federated environment in each.</p> </pre></div> </div> <p>Once all three workers are up and running we can leverage all three in the following example</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># Python</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># Python</span> <span class="kn">import</span> <span class="nn">logging</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> diff --git a/docs/api/python/guide/python_end_to_end_tut.html b/docs/api/python/guide/python_end_to_end_tut.html index aeb1d14135..8f656e2a43 100644 --- a/docs/api/python/guide/python_end_to_end_tut.html +++ b/docs/api/python/guide/python_end_to_end_tut.html @@ -142,7 +142,7 @@ to assess how well our model can predict if the income is above or below $50K/yr <p>First, we get our training and testing data from the built-in DataManager. Since the multiLogReg function requires the labels (Y) to be > 0, we add 1 to all labels. This ensures that the smallest label is >= 1. Additionally we will only take a fraction of the training and test set into account to speed up the execution.</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> <span class="kn">from</span> <span class="nn">systemds.examples.tutorials.adult</span> <span class="kn">import</span> <span class="n">DataManager</span> <span class="kn">from</span> <span class="nn">systemds.operator.algorithm</span> <span class="kn">import</span> <span class="n">multiLogReg</span> <span class="kn">from</span> <span class="nn">systemds.operator.algorithm</span> <span class="kn">import</span> <span class="n">multiLogRegPredict</span> @@ -164,10 +164,10 @@ a fraction of the training and test set into account to speed up the execution.< <span class="c1"># Transform frames to matrices.</span> <span class="n">X</span><span class="p">,</span> <span class="n">M1</span> <span class="o">=</span> <span class="n">X_frame</span><span class="o">.</span><span class="n">transform_encode</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_data</span><span class="p">)</span> - <span class="n">Xt</span> <span class="o">=</span> <span class="n">Xt_frame</span><span class="o">.</span><span class="n">transform_apply</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_data</span><span class="p">,</span> <span class="n">meta</span><span class="o">=</span><span class="n">M1</span><span class="p">)</span> + <span class="n">Xt</span> <span class="o">=</span> <span class="n">Xt_frame</span><span class="o">.</span><span class="n">transform_apply</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_data</span><span class="p">,</span> <span class="n">meta</span><span class="o">=</span><span class="n">M1</span><span class="p">)</span> <span class="n">Y</span><span class="p">,</span> <span class="n">M2</span> <span class="o">=</span> <span class="n">Y_frame</span><span class="o">.</span><span class="n">transform_encode</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_labels</span><span class="p">)</span> - <span class="n">Yt</span> <span class="o">=</span> <span class="n">Yt_frame</span><span class="o">.</span><span class="n">transform_apply</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_labels</span><span class="p">,</span> <span class="n">meta</span><span class="o">=</span><span class="n">M2</span><span class="p">)</span> - + <span class="n">Yt</span> <span class="o">=</span> <span class="n">Yt_frame</span><span class="o">.</span><span class="n">transform_apply</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_labels</span><span class="p">,</span> <span class="n">meta</span><span class="o">=</span><span class="n">M2</span><span class="p">)</span> + <span class="c1"># Subsample to make training faster</span> <span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">train_count</span><span class="p">]</span> <span class="n">Y</span> <span class="o">=</span> <span class="n">Y</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">train_count</span><span class="p">]</span> @@ -182,13 +182,13 @@ for the best performance and no data transfer from pandas to SystemDS it is reco <h3>Step 2: Training<a class="headerlink" href="#step-2-training" title="Link to this heading"></a></h3> <p>Now that we prepared the data, we can use the multiLogReg function. First, we will train the model on our training data. Afterward, we can make predictions on the test data and assess the performance of the model.</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">betas</span> <span class="o">=</span> <span class="n">multiLogReg</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">betas</span> <span class="o">=</span> <span class="n">multiLogReg</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> </pre></div> </div> <p>Note that nothing has been calculated yet. In SystemDS the calculation is executed once compute() is called. E.g. betas_res = betas.compute().</p> <p>We can now use the trained model to make predictions on the test data.</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="p">[</span><span class="n">_</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">,</span> <span class="n">acc</span><span class="p">]</span> <span class="o">=</span> <span class="n">multiLogRegPredict</span><span class="p">(</span><span class="n">Xt</span><span class="p">,</span> <span class="n">betas</span><span class="p">,</span> <span class="n">Y</sp [...] +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="p">[</span><span class="n">_</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">,</span> <span class="n">acc</span><span class="p">]</span> <span class="o">=</span> <span class="n">multiLogRegPredict</span><span class="p">(</span><span class="n">Xt</span><span class="p">,</span> <span class="n">betas</span><span class="p">,</span> <span class="n">Y</span><span cla [...] </pre></div> </div> <dl class="simple"> @@ -206,14 +206,14 @@ E.g. betas_res = betas.compute().</p> which classes the model has difficulties separating. The confusionMatrix function takes the predicted labels and the true labels. It then returns the confusion matrix for the predictions and the confusion matrix averages of each true class.</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">confusion_matrix_abs</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">confusionMatrix</span><span class="p">(</span><span class="n">y_pred</span><span class="p">,</span> <span class="n">Yt</span><span class="p">)</span><span class="o">.</span><span class="n">compute</span><span class="p">()</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">confusion_matrix_abs</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">confusionMatrix</span><span class="p">(</span><span class="n">y_pred</span><span class="p">,</span> <span class="n">Yt</span><span class="p">)</span><span class="o">.</span><span class="n">compute</span><span class="p">()</span> </pre></div> </div> </section> <section id="full-script"> <h3>Full Script<a class="headerlink" href="#full-script" title="Link to this heading"></a></h3> <p>In the full script, some steps are combined to reduce the overall script.</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> <span class="kn">from</span> <span class="nn">systemds.examples.tutorials.adult</span> <span class="kn">import</span> <span class="n">DataManager</span> <span class="kn">from</span> <span class="nn">systemds.operator.algorithm</span> <span class="kn">import</span> <span class="n">multiLogReg</span> <span class="kn">from</span> <span class="nn">systemds.operator.algorithm</span> <span class="kn">import</span> <span class="n">multiLogRegPredict</span> @@ -235,17 +235,17 @@ for the predictions and the confusion matrix averages of each true class.</p> <span class="c1"># Transform frames to matrices.</span> <span class="n">X</span><span class="p">,</span> <span class="n">M1</span> <span class="o">=</span> <span class="n">X_frame</span><span class="o">.</span><span class="n">transform_encode</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_data</span><span class="p">)</span> - <span class="n">Xt</span> <span class="o">=</span> <span class="n">Xt_frame</span><span class="o">.</span><span class="n">transform_apply</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_data</span><span class="p">,</span> <span class="n">meta</span><span class="o">=</span><span class="n">M1</span><span class="p">)</span> + <span class="n">Xt</span> <span class="o">=</span> <span class="n">Xt_frame</span><span class="o">.</span><span class="n">transform_apply</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_data</span><span class="p">,</span> <span class="n">meta</span><span class="o">=</span><span class="n">M1</span><span class="p">)</span> <span class="n">Y</span><span class="p">,</span> <span class="n">M2</span> <span class="o">=</span> <span class="n">Y_frame</span><span class="o">.</span><span class="n">transform_encode</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_labels</span><span class="p">)</span> - <span class="n">Yt</span> <span class="o">=</span> <span class="n">Yt_frame</span><span class="o">.</span><span class="n">transform_apply</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_labels</span><span class="p">,</span> <span class="n">meta</span><span class="o">=</span><span class="n">M2</span><span class="p">)</span> - + <span class="n">Yt</span> <span class="o">=</span> <span class="n">Yt_frame</span><span class="o">.</span><span class="n">transform_apply</span><span class="p">(</span><span class="n">spec</span><span class="o">=</span><span class="n">jspec_labels</span><span class="p">,</span> <span class="n">meta</span><span class="o">=</span><span class="n">M2</span><span class="p">)</span> + <span class="c1"># Subsample to make training faster</span> <span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">train_count</span><span class="p">]</span> <span class="n">Y</span> <span class="o">=</span> <span class="n">Y</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">train_count</span><span class="p">]</span> <span class="n">Xt</span> <span class="o">=</span> <span class="n">Xt</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">test_count</span><span class="p">]</span> <span class="n">Yt</span> <span class="o">=</span> <span class="n">Yt</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">test_count</span><span class="p">]</span> - <span class="c1"># Train model</span> + <span class="c1"># Train model </span> <span class="n">betas</span> <span class="o">=</span> <span class="n">multiLogReg</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="c1"># Apply model</span> @@ -269,7 +269,7 @@ For this we will introduce another dml file, which can be used to train a basic <h3>Step 1: Obtain data<a class="headerlink" href="#step-1-obtain-data" title="Link to this heading"></a></h3> <p>For the whole data setup please refer to level 1, Step 1, as these steps are almost identical, but instead of preparing the test data, we only prepare the training data.</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> <span class="kn">from</span> <span class="nn">systemds.examples.tutorials.adult</span> <span class="kn">import</span> <span class="n">DataManager</span> <span class="kn">from</span> <span class="nn">systemds.operator.algorithm</span> <span class="kn">import</span> <span class="n">multiLogReg</span> <span class="kn">from</span> <span class="nn">systemds.operator.algorithm</span> <span class="kn">import</span> <span class="n">multiLogRegPredict</span> @@ -306,7 +306,7 @@ First, we need to source the dml file for neural networks. This file includes all the necessary functions for training, evaluating, and storing the model. The returned object of the source call is further used for calling the functions. The file can be found here:</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="c1"># Load custom neural network</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="c1"># Load custom neural network</span> <span class="n">neural_net_src_path</span> <span class="o">=</span> <span class="s2">"tests/examples/tutorials/neural_net_source.dml"</span> <span class="n">FFN_package</span> <span class="o">=</span> <span class="n">sds</span><span class="o">.</span><span class="n">source</span><span class="p">(</span><span class="n">neural_net_src_path</span><span class="p">,</span> <span class="s2">"fnn"</span><span class="p">)</span> </pre></div> @@ -319,7 +319,7 @@ The first two arguments are the training features and the target values we want Then we need to set the hyperparameters of the model. We choose to train for 1 epoch with a batch size of 16 and a learning rate of 0.01, which are common parameters for neural networks. The seed argument ensures that running the code again yields the same results.</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">epochs</span> <span class="o">=</span> <span class="mi">1</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">epochs</span> <span class="o">=</span> <span class="mi">1</span> <span class="n">batch_size</span> <span class="o">=</span> <span class="mi">16</span> <span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.01</span> <span class="n">seed</span> <span class="o">=</span> <span class="mi">42</span> @@ -335,7 +335,7 @@ We only need to specify the name of our model and the file path. This call stores the weights and biases of our model. Similarly the transformation metadata to transform input data to the model, is saved.</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="c1"># Write metadata and trained network to disk.</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="c1"># Write metadata and trained network to disk.</span> <span class="n">sds</span><span class="o">.</span><span class="n">combine</span><span class="p">(</span> <span class="n">network</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'tests/examples/docs_test/end_to_end/network'</span><span class="p">),</span> <span class="n">M1</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'tests/examples/docs_test/end_to_end/encode_X'</span><span class="p">),</span> @@ -348,7 +348,7 @@ is saved.</p> <h3>Step 5: Predict on Unseen data<a class="headerlink" href="#step-5-predict-on-unseen-data" title="Link to this heading"></a></h3> <p>Once the model is saved along with metadata, it is simple to apply it all to unseen data:</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="c1"># Read metadata and trained network and do prediction.</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="c1"># Read metadata and trained network and do prediction.</span> <span class="n">M1_r</span> <span class="o">=</span> <span class="n">sds</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="s1">'tests/examples/docs_test/end_to_end/encode_X'</span><span class="p">)</span> <span class="n">M2_r</span> <span class="o">=</span> <span class="n">sds</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="s1">'tests/examples/docs_test/end_to_end/encode_Y'</span><span class="p">)</span> <span class="n">network_r</span> <span class="o">=</span> <span class="n">sds</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="s1">'tests/examples/docs_test/end_to_end/network'</span><span class="p">)</span> @@ -365,7 +365,7 @@ unseen data:</p> <section id="full-script-nn"> <h3>Full Script NN<a class="headerlink" href="#full-script-nn" title="Link to this heading"></a></h3> <p>The complete script now can be seen here:</p> -<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> +<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">systemds.context</span> <span class="kn">import</span> <span class="n">SystemDSContext</span> <span class="kn">from</span> <span class="nn">systemds.examples.tutorials.adult</span> <span class="kn">import</span> <span class="n">DataManager</span> <span class="kn">from</span> <span class="nn">systemds.operator.algorithm</span> <span class="kn">import</span> <span class="n">multiLogReg</span> <span class="kn">from</span> <span class="nn">systemds.operator.algorithm</span> <span class="kn">import</span> <span class="n">multiLogRegPredict</span> @@ -403,7 +403,7 @@ unseen data:</p> <span class="n">seed</span> <span class="o">=</span> <span class="mi">42</span> <span class="n">network</span> <span class="o">=</span> <span class="n">FFN_package</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">epochs</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">learning_rate</span><span class="p">,</span> <span class="n">seed</span><span class="p">)</span> - + <span class="c1"># Write metadata and trained network to disk.</span> <span class="n">sds</span><span class="o">.</span><span class="n">combine</span><span class="p">(</span> <span class="n">network</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'tests/examples/docs_test/end_to_end/network'</span><span class="p">),</span> diff --git a/docs/api/python/sources/getting_started/simple_examples.rst.txt b/docs/api/python/sources/getting_started/simple_examples.rst.txt index dd20c89fd0..75d4c4ccee 100644 --- a/docs/api/python/sources/getting_started/simple_examples.rst.txt +++ b/docs/api/python/sources/getting_started/simple_examples.rst.txt @@ -30,8 +30,10 @@ Matrix Operations Making use of SystemDS, let us multiply an Matrix with an scalar: .. include:: ../code/getting_started/simpleExamples/multiply.py - :start-line: 20 :code: python + :start-line: 20 + :encoding: utf-8 + :literal: As output we get @@ -48,8 +50,10 @@ Let us do a quick element-wise matrix multiplication of numpy arrays with System Remember to first start up a new terminal: .. include:: ../code/getting_started/simpleExamples/multiplyMatrix.py - :start-line: 20 :code: python + :start-line: 20 + :encoding: utf-8 + :literal: More complex operations ----------------------- @@ -58,8 +62,10 @@ SystemDS provides algorithm level functions as built-in functions to simplify de One example of this is l2SVM, a high level functions for Data-Scientists. Let's take a look at l2svm: .. include:: ../code/getting_started/simpleExamples/l2svm.py - :start-line: 20 :code: python + :start-line: 20 + :encoding: utf-8 + :literal: The output should be similar to @@ -81,8 +87,10 @@ instead of using numpy arrays that have to be transfered into systemDS. The above script transformed goes like this: .. include:: ../code/getting_started/simpleExamples/l2svm_internal.py - :start-line: 20 :code: python + :start-line: 20 + :encoding: utf-8 + :literal: When reading in datasets for processing it is highly recommended that you read from inside systemds using sds.read("file"), since this avoid the transferring of numpy arrays. diff --git a/docs/api/python/sources/guide/algorithms_basics.rst.txt b/docs/api/python/sources/guide/algorithms_basics.rst.txt index 7206605222..825c0d066f 100644 --- a/docs/api/python/sources/guide/algorithms_basics.rst.txt +++ b/docs/api/python/sources/guide/algorithms_basics.rst.txt @@ -42,6 +42,8 @@ To setup this simply use :code: python :start-line: 22 :end-line: 30 + :encoding: utf-8 + :literal: Here the DataManager contains the code for downloading and setting up numpy arrays containing the data. @@ -86,9 +88,11 @@ Step 3: Training To start with, we setup a SystemDS context and setup the data: .. include:: ../code/guide/algorithms/FullScript.py + :code: python :start-line: 31 :end-line: 35 - :code: python + :encoding: utf-8 + :literal: to reduce the training time and verify everything works, it is usually good to reduce the amount of data, to train on a smaller sample to start with @@ -169,9 +173,11 @@ from our sample of 1k to the full training dataset of 60k, in this example the m to again reduce training time .. include:: ../code/guide/algorithms/FullScript.py + :code: python :start-line: 31 :end-line: 43 - :code: python + :encoding: utf-8 + :literal: With this change the accuracy achieved changes from the previous value to 92%. But this is a basic implementation that can be replaced by a variety of algorithms and techniques. @@ -185,6 +191,8 @@ One noteworthy change is the + 1 is done on the matrix ready for SystemDS, this makes SystemDS responsible for adding the 1 to each value. .. include:: ../code/guide/algorithms/FullScript.py - :start-line: 20 :code: python + :start-line: 20 + :encoding: utf-8 + :literal: diff --git a/docs/api/python/sources/guide/federated.rst.txt b/docs/api/python/sources/guide/federated.rst.txt index 4afafa070d..cdd7a698d0 100644 --- a/docs/api/python/sources/guide/federated.rst.txt +++ b/docs/api/python/sources/guide/federated.rst.txt @@ -54,15 +54,19 @@ This should be located next to the ``test.csv`` file called ``test.csv.mtd``. To make both the data and metadata simply execute the following .. include:: ../code/guide/federated/federatedTutorial_part1.py - :start-line: 20 :code: python + :start-line: 20 + :encoding: utf-8 + :literal: After creating our data the federated worker becomes able to execute federated instructions. The aggregated sum using federated instructions in python SystemDS is done as follows .. include:: ../code/guide/federated/federatedTutorial_part2.py - :start-line: 20 :code: python + :start-line: 20 + :encoding: utf-8 + :literal: Multiple Federated Environments ------------------------------- @@ -82,8 +86,10 @@ Start with 3 different terminals, and run one federated environment in each. Once all three workers are up and running we can leverage all three in the following example .. include:: ../code/guide/federated/federatedTutorial_part3.py - :start-line: 20 :code: python + :start-line: 20 + :encoding: utf-8 + :literal: The print should look like diff --git a/docs/api/python/sources/guide/python_end_to_end_tut.rst.txt b/docs/api/python/sources/guide/python_end_to_end_tut.rst.txt index 961b47d61b..6237450382 100644 --- a/docs/api/python/sources/guide/python_end_to_end_tut.rst.txt +++ b/docs/api/python/sources/guide/python_end_to_end_tut.rst.txt @@ -56,6 +56,8 @@ a fraction of the training and test set into account to speed up the execution. :code: python :start-line: 20 :end-line: 51 + :encoding: utf-8 + :literal: Here the DataManager contains the code for downloading and setting up either Pandas DataFrames or internal SystemDS Frames, for the best performance and no data transfer from pandas to SystemDS it is recommended to read directly from disk into SystemDS. @@ -70,6 +72,8 @@ training data. Afterward, we can make predictions on the test data and assess th :code: python :start-line: 53 :end-line: 54 + :encoding: utf-8 + :literal: Note that nothing has been calculated yet. In SystemDS the calculation is executed once compute() is called. E.g. betas_res = betas.compute(). @@ -80,6 +84,8 @@ We can now use the trained model to make predictions on the test data. :code: python :start-line: 56 :end-line: 57 + :encoding: utf-8 + :literal: The multiLogRegPredict function has three return values: - m, a matrix with the mean probability of correctly classifying each label. We do not use it further in this example. @@ -98,6 +104,8 @@ for the predictions and the confusion matrix averages of each true class. :code: python :start-line: 59 :end-line: 60 + :encoding: utf-8 + :literal: Full Script ~~~~~~~~~~~ @@ -108,6 +116,8 @@ In the full script, some steps are combined to reduce the overall script. :code: python :start-line: 20 :end-line: 65 + :encoding: utf-8 + :literal: Level 2 ------- @@ -125,6 +135,8 @@ but instead of preparing the test data, we only prepare the training data. :code: python :start-line: 20 :end-line: 47 + :encoding: utf-8 + :literal: Step 2: Load the algorithm ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -139,6 +151,8 @@ The file can be found here: :code: python :start-line: 48 :end-line: 51 + :encoding: utf-8 + :literal: Step 3: Training the neural network @@ -154,6 +168,8 @@ The seed argument ensures that running the code again yields the same results. :code: python :start-line: 52 :end-line: 58 + :encoding: utf-8 + :literal: Step 4: Saving the model @@ -169,6 +185,8 @@ is saved. :code: python :start-line: 59 :end-line: 65 + :encoding: utf-8 + :literal: Step 5: Predict on Unseen data ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -180,6 +198,8 @@ unseen data: :code: python :start-line: 66 :end-line: 77 + :encoding: utf-8 + :literal: Full Script NN @@ -192,3 +212,6 @@ The complete script now can be seen here: :code: python :start-line: 20 :end-line: 80 + :encoding: utf-8 + :literal: +
