[incubator-datasketches-website] branch asf-site updated: Automatic Site Publish by Buildbot

git-site-role Fri, 21 Aug 2020 23:43:49 -0700

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository 
https://gitbox.apache.org/repos/asf/incubator-datasketches-website.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 4fc0d6f  Automatic Site Publish by Buildbot
4fc0d6f is described below

commit 4fc0d6ff58217c848be04fd45c4d9d5934fa0891
Author: buildbot <[email protected]>
AuthorDate: Sat Aug 22 06:42:56 2020 +0000

    Automatic Site Publish by Buildbot
---
 output/docs/Community/KDD_Tutorial_Summary.html    |  13 +-
 .../img/Community/KDD_sketching_tutorial_pt1.pdf   | Bin 0 -> 15455179 bytes
 .../img/Community/KDD_sketching_tutorial_pt2.pdf   | Bin 0 -> 754785 bytes
 .../docs/img/Community/KLL_Sketch_Tutorial.ipynb   | 518 +++++++++++++++++++++
 .../docs/img/Community/Theta_Sketch_Tutorial.ipynb | 329 +++++++++++++
 output/docs/img/Community/Untitled.ipynb           | 493 ++++++++++++++++++++
 6 files changed, 1352 insertions(+), 1 deletion(-)

diff --git a/output/docs/Community/KDD_Tutorial_Summary.html 
b/output/docs/Community/KDD_Tutorial_Summary.html
index 836eec6..69106b0 100644
--- a/output/docs/Community/KDD_Tutorial_Summary.html
+++ b/output/docs/Community/KDD_Tutorial_Summary.html
@@ -57,7 +57,7 @@
     under the License.
 -->
 
-<h1 id="data-sketching-for-real-time-analyticstheory-and-practice">Data 
Sketching for Real Time Analytics:<br />Theory and Practice</h1>
+<h1 id="data-sketching-for-real-time-analytics-theory-and-practice">Data 
Sketching for Real Time Analytics: Theory and Practice</h1>
 
 <h2 id="abstract">Abstract</h2>
 
@@ -71,6 +71,17 @@
 
 <p>The audience is expected to have a familiarity of probability and 
statistics that is typical for an undergraduate mathematical statistics or 
introductory graduate machine learning course.</p>
 
+<h2 id="materials">Materials</h2>
+
+<p>In addition to the prerecorded presentations, the slides and Jupyter 
notebooks are available. Note that the KLL notebook uses an update method that 
is only available in release candidate v2.1.0 but as of the tutorial date is 
not quite available in an official release (the latest is 2.0.0).</p>
+
+<ul>
+  <li>Slides: <a 
href="/docs/img/Community/KDD_sketching_tutorial_pt1.pdf">Theory (part 
1)</a></li>
+  <li>Slides: <a 
href="/docs/img/Community/KDD_sketching_tutorial_pt2.pdf">Practice (part 
2)</a></li>
+  <li>Notebook: <a href="/docs/img/Community/KLL_Sketch_tutorial.ipynb">KLL 
Sketch</a></li>
+  <li>Theta SketchL: <a 
href="/docs/img/Community/Theta_Sketch_tutorial.ipynb">Theta Sketch</a></li>
+</ul>
+
 <h2 id="outline">Outline</h2>
 
 <p>The tutorial will consist of two parts. The first focuses on methods and 
theory for data sketching and sampling. The second focuses on application and 
includes code examples using the Apache DataSketches project.</p>
diff --git a/output/docs/img/Community/KDD_sketching_tutorial_pt1.pdf 
b/output/docs/img/Community/KDD_sketching_tutorial_pt1.pdf
new file mode 100644
index 0000000..a078c47
Binary files /dev/null and 
b/output/docs/img/Community/KDD_sketching_tutorial_pt1.pdf differ
diff --git a/output/docs/img/Community/KDD_sketching_tutorial_pt2.pdf 
b/output/docs/img/Community/KDD_sketching_tutorial_pt2.pdf
new file mode 100644
index 0000000..3990c27
Binary files /dev/null and 
b/output/docs/img/Community/KDD_sketching_tutorial_pt2.pdf differ
diff --git a/output/docs/img/Community/KLL_Sketch_Tutorial.ipynb 
b/output/docs/img/Community/KLL_Sketch_Tutorial.ipynb
new file mode 100644
index 0000000..418ce84
--- /dev/null
+++ b/output/docs/img/Community/KLL_Sketch_Tutorial.ipynb
@@ -0,0 +1,518 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# KLL Sketch Tutorial"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Table of Contents\n",
+    "\n",
+    "  * [Overview](#Overview)\n",
+    "  * [Set-up](#Set-up)\n",
+    "  * [Creating a KLL Sketch](#Creating-a-KLL-Sketch)\n",
+    "  * [Querying the sketch](#Querying-the-sketch)\n",
+    "  * [Merging Sketches](#Merging-Sketches)\n",
+    "  * [Serializing Sketches for 
Transportation](#Serializing-Sketches-for-Transportation)\n",
+    "  * [Using in a Data Cube](#Using-in-a-Data-Cube)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Overview\n",
+    "\n",
+    "This tutorual will focus on the KLL sketch. We will demonstrate how to 
create and feed data into sketches, and also show an option for moving sketches 
between systems. We will rely on synthetic data to help us better reason about 
expected results when visualizing."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Set-up\n",
+    "\n",
+    "This tutorial assumes you have already downloaded and installed the 
python wrapper for the DataSketches library. See the [DataSketches 
Downloads](http://datasketches.apache.org/docs/Community/Downloads.html) page 
for details"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datasketches import kll_floats_sketch\n",
+    "\n",
+    "import base64\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sns\n",
+    "%matplotlib inline\n",
+    "sns.set(color_codes=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "source": [
+    "### Creating a KLL Sketch\n",
+    "\n",
+    "Sketch creation is simple: As with all the sketches in the library, you 
simply need to decide on your error tolerance, which determines the maximum 
size of the sketch. The DataSketches library refers to that value as $k$.\n",
+    "\n",
+    "We can get an estimate of the expected error bound (99th percentile) 
without instantiating anything."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(kll_floats_sketch.get_normalized_rank_error(160, False))\n",
+    "print(kll_floats_sketch.get_normalized_rank_error(200, False))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As we can see, the (one-sided) error with $k=160$ is about 1.67% versus 
1.33% at $k=200$. For the rest of the examples, we will use $200$. We can now 
instantiate a sketch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "k = 200\n",
+    "sk = kll_floats_sketch(k)\n",
+    "print(sk)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The sketch has seen no data so far (N=0) and is consequently storing 
nothing (Retained items=0). Storage bytes refers to how much space would be 
required to save the sketch as an array of bytes, which in this case is fairly 
minimal.\n",
+    "Next, we can add some data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sk.update(np.random.exponential(size=150))\n",
+    "print(sk)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We added 150 samples, which is few enough that the sketch is still in 
exact mode, meaning it is storing everything rather than sampling. To be able 
to compare the sketch to an exact computation, we will generate new data -- and 
a lot more of it. We will also create a sketch with a much larger $k$ to 
demonstrate the effect of increasing the sketch size."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sk = kll_floats_sketch(k)\n",
+    "sk_large = kll_floats_sketch(10*k)\n",
+    "data = np.random.exponential(size=2**24)\n",
+    "sk.update(data)\n",
+    "sk_large.update(data)\n",
+    "print(sk)\n",
+    "print(sk_large)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here the sketch is well into sampling territory, having processed nearly 
17 million items. We can see that the sketch is retaining only 645 items. The 
2676 bytes of storage compares to 64MB for raw data using 4-byte floats. Next 
we will start querying the sketch to better understand the performance. Even 
the much larger sketch uses less 24k bytes with fewer than 6000 points."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Querying the sketch\n",
+    "\n",
+    "The median for an exponential distribution is $\\frac{ln 2}{\\lambda}$, 
and the default numpy exponential distribution has $\\lambda = 1.0$, so the 
median should be close to $0.693$. Similarly, if we ask for the rank $ln 2$, we 
should get a value close to $0.5$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(f'Theoretical median             : {np.log(2):.6f}')\n",
+    "print(f'Estimated median, k=200        : {sk.get_quantile(0.5):.6f}')\n",
+    "print(f'Estimated median, k=2000       : 
{sk_large.get_quantile(0.5):.6f}')\n",
+    "print('')\n",
+    "print(f'Exact Quantile of ln(2)        : 0.5')\n",
+    "print(f'Est. Quantile of ln(2), k=200  : 
{sk.get_rank(np.log(2)):.6f}')\n",
+    "print(f'Est. Quantile of ln(2), k=2000 : 
{sk_large.get_rank(np.log(2)):.6f}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "One of the common use cases of a quantiles sketch like KLL is visualizing 
data with a histogram. We can create one from the sketch easily enough, but for 
this tutorial we also want to know how well we are doing. Fortunately, we can 
still compute a histogram on this data directly for comparison.\n",
+    "\n",
+    "Note that the sketch returns a PMF while the histogram computes data only 
for the bins between the provided points and must be converted to a PMF. The 
sketch also returns a bin containing all the mass less than the minimum 
provided point. In this case that will always be 0, so we discard it for 
plotting."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "xmin = 0 # could use sk.get_min_value() but we know the bound here\n",
+    "xmax = sk.get_quantile(0.99995) # this will exclude a little data from 
the exact distribution\n",
+    "num_splits = 40\n",
+    "step = (xmax - xmin) / num_splits\n",
+    "splits = [xmin + (i*step) for i in range(0, num_splits)]\n",
+    "x = splits.copy()\n",
+    "x.append(xmax)\n",
+    "\n",
+    "pmf = sk.get_pmf(splits)[1:]\n",
+    "pmf_large = sk_large.get_pmf(splits)[1:]\n",
+    "exact_pmf = np.histogram(data, bins=x)[0] / sk.get_n()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.figure(figsize=(12,12))\n",
+    "plt.subplot(2,1,1)\n",
+    "plt.title('PMF, k = 200')\n",
+    "plt.ylabel('Probability')\n",
+    "plt.bar(x=splits, height=pmf, align='edge', width=-.07, color='blue')\n",
+    "plt.bar(x=splits, height=exact_pmf, align='edge', width=.07, 
color='red')\n",
+    "plt.legend(['KLL, k=200','Exact'])\n",
+    "\n",
+    "plt.subplot(2,1,2)\n",
+    "plt.title('PMF, k = 2000')\n",
+    "plt.ylabel('Probability')\n",
+    "plt.bar(x=splits, height=pmf_large, align='edge', width=-.07, 
color='blue')\n",
+    "plt.bar(x=splits, height=exact_pmf, align='edge', width=.07, 
color='red')\n",
+    "plt.legend(['KLL, k=2000','Exact'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The sketch with $k=200$ clearly provides a good approximation. In the 
case of this exponential distribution, we sometimes observe that there is 
additional mass near the right edge of the tail compared to the true PMF, 
although still within the provided error bound with high probability. While 
this is not problematic given the guarantees of the sketch, certain use cases 
requiring high precision at extreme quantiles may find it less satisfactory. 
With the larger $k=2000$ sketch, the a [...]
+    "\n",
+    "We will eventaully provide what we call a Relative Error Quantiles sketch 
that will have tighter error bounds as you approach the tail of the 
distribution, which will be useful if you care primarily about accuray in the 
tail, but that will require a larger sketch."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Merging sketches\n",
+    "\n",
+    "A single sketch is certainly useful, but the real power of sketches comes 
from the ability to merge them. Here, we will create two simple sketches to 
demonstrate. For good measure, we'll use different values of $k$ for the 
sketches, as well as feed them different numbers of points."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sk1 = kll_floats_sketch(k)\n",
+    "sk2 = kll_floats_sketch(int(1.5 * k))\n",
+    "\n",
+    "data1 = np.random.normal(loc=-2.0, size=2**24)\n",
+    "data2 = np.random.normal(loc=2.0, size=2**25)\n",
+    "sk1.update(data1)\n",
+    "sk2.update(data2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "With the KLL sketch, there is no separate object for unions. We can 
either create another empty sketch and use that as a merge target or we can 
merge sketch 2 into sketch 1. Taking the latter approach and plotting the 
resulting histogram gives us the expected distribution. Note that one sketch 
has twice as many points as the other."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sk1.merge(sk2)\n",
+    "print(sk1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We saved the input data so that we can again compute an exact 
distribution."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "xmin = sk1.get_min_value()\n",
+    "xmax = sk1.get_max_value()\n",
+    "num_splits = 20\n",
+    "step = (xmax - xmin) / num_splits\n",
+    "splits = [xmin + (i*step) for i in range(0, num_splits)]\n",
+    "x = splits.copy()\n",
+    "x.append(xmax)\n",
+    "\n",
+    "pmf = sk1.get_pmf(splits)[1:]\n",
+    "cdf = sk1.get_cdf(splits)[1:]\n",
+    "exact_pmf = (np.histogram(data1, bins=x)[0] + np.histogram(data2, 
bins=x)[0]) / sk1.get_n()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.figure(figsize=(12,6))\n",
+    "plt.subplot(1,2,1)\n",
+    "plt.bar(x=splits, height=pmf, align='edge', width=-.3, color='blue')\n",
+    "plt.bar(x=splits, height=exact_pmf, align='edge', width=.3, 
color='red')\n",
+    "plt.legend(['KLL','Exact'])\n",
+    "plt.ylabel('Probability')\n",
+    "plt.title('Merged PMF')\n",
+    "\n",
+    "plt.subplot(1,2,2)\n",
+    "plt.bar(x=splits, height=cdf, align='edge', width=-.3, color='blue')\n",
+    "plt.bar(x=splits, height=np.cumsum(exact_pmf), align='edge', width=.3, 
color='red')\n",
+    "plt.legend(['KLL','Exact'])\n",
+    "plt.ylabel('Probability')\n",
+    "plt.title('Merged CDF')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice that we do not need to do anything special to merge the sketches 
despite the different values of $k$, and the 2:1 relative ratio of weights of 
the two distributions was preserved despite the input sketch size difference."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Serializing Sketches for Transportation\n",
+    "\n",
+    "Being able to move sketches between platforms is important. One of the 
useful aspects of the DataSketches library in particular is binary 
compatibility across languages. While this section will remain within python, 
sketches serialized from C++- or Java-based systems would work identically.\n",
+    "\n",
+    "In this section, we will start by creating a tab-separated file with a 
handfull\n",
+    "of sketches and then load it in as a dataframe. We will encode each 
binary sketch image as base64.\n",
+    "\n",
+    "To simplify sketch creation, the first step will be to define a simple 
function to generate a line for the file with the given parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def generate_sketch(family: str, n: int, mean: float, var: float) -> 
str:\n",
+    "    sk = kll_floats_sketch(200)\n",
+    "    if (family == 'normal'):\n",
+    "        sk.update(np.random.normal(loc=mean, scale=var, size=n))\n",
+    "    elif (family == 'uniform'):\n",
+    "        b = mean + np.sqrt(3 * var)\n",
+    "        a = 2 * mean - b\n",
+    "        sk.update(np.random.uniform(low=a, high=b, size=n))\n",
+    "    else:\n",
+    "        return None\n",
+    "    sk_b64 = base64.b64encode(sk.serialize()).decode('utf-8')\n",
+    "    return f'{family}\\t{n}\\t{mean}\\t{var}\\t{sk_b64}\\n'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "filename = 'kll_tutorial.tsv'\n",
+    "with open(filename, 'w') as f:\n",
+    "    f.write('family\\tn\\tmean\\tvariance\\tkll\\n')\n",
+    "    f.write(generate_sketch('normal', 2**23, -4.0, 0.5))\n",
+    "    f.write(generate_sketch('normal', 2**24,  0.0, 1.0))\n",
+    "    f.write(generate_sketch('normal', 2**25,  2.0, 0.5))\n",
+    "    f.write(generate_sketch('normal', 2**23,  4.0, 0.2))\n",
+    "    f.write(generate_sketch('normal', 2**22, -2.0, 2.0))\n",
+    "    f.write(generate_sketch('uniform', 2**21,  0.5, 1.0/12))\n",
+    "    f.write(generate_sketch('uniform', 2**22,  5.0, 1.0/12))\n",
+    "    f.write(generate_sketch('uniform', 2**20, -0.5, 1.0/3))\n",
+    "    f.write(generate_sketch('uniform', 2**23,  0.0, 4.0/3))\n",
+    "    f.write(generate_sketch('uniform', 2**22, -4.0, 1.0/3))               
          "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If your system has a *nix shell, you can inspect the resulting file:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!head -2 {filename}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Using in a Data Cube"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we have our file with 10 sketches, we can use pandas to load 
them in. To ensure that we load the sketches as useful objects, we need to 
define a converter function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "deserialize_kll = lambda x : 
kll_floats_sketch.deserialize(base64.b64decode(x))\n",
+    "\n",
+    "df = pd.read_csv(filename,\n",
+    "                 sep='\\t',\n",
+    "                 header=0,\n",
+    "                 dtype={'family':'category', 'n':int, 'mean':float, 
'var':float},\n",
+    "                 converters={'kll':deserialize_kll}\n",
+    "                )\n",
+    "print(df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The sketch column is represented by the string equivalent, which is not 
very useful for viewing here but does show that the column contains the actual 
objects."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And finally, we can now perform queries on the results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query_result = kll_floats_sketch(10*k)\n",
+    "for sk in df.loc[df['family'] == 'normal'].itertuples(index=False):\n",
+    "    query_result.merge(sk.kll)\n",
+    "print(query_result)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here we see that the resulting sketch has processed 71 million items 
(272MB of data) and is summarizing it using 563 items, and can be serialized 
into only 2352 bytes, which includes some sketch metadata.\n",
+    "\n",
+    "Finally, we want to visualize this data. Remember that we have a mixture 
of 5 Gaussian distributions:\n",
+    "\n",
+    "| $\\mu$ | $\\sigma^2$ | n |\n",
+    "|-----:|----:|---------:|\n",
+    "| -4.0 | 0.5 | $2^{23}$ |\n",
+    "|  0.0 | 1.0 | $2^{24}$ |\n",
+    "|  2.0 | 0.5 | $2^{25}$ |\n",
+    "|  4.0 | 0.2 | $2^{23}$ |\n",
+    "| -2.0 | 2.0 | $2^{22}$ |"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "xmin = query_result.get_quantile(0.0005)\n",
+    "xmax = query_result.get_quantile(0.9995)\n",
+    "num_splits = 50\n",
+    "step = (xmax - xmin) / num_splits\n",
+    "splits = [xmin + (i*step) for i in range(0, num_splits)]\n",
+    "\n",
+    "pmf = query_result.get_pmf(splits)\n",
+    "x = splits.copy()\n",
+    "x.append(xmax)\n",
+    "plt.figure(figsize=(12,6))\n",
+    "plt.title('PMF')\n",
+    "plt.ylabel('Probability')\n",
+    "plt.bar(x=x, height=pmf, align='edge', width=-0.15)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/output/docs/img/Community/Theta_Sketch_Tutorial.ipynb 
b/output/docs/img/Community/Theta_Sketch_Tutorial.ipynb
new file mode 100644
index 0000000..28c6157
--- /dev/null
+++ b/output/docs/img/Community/Theta_Sketch_Tutorial.ipynb
@@ -0,0 +1,329 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Theta Sketch Tutorial\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Table of Contents\n",
+    "\n",
+    "  * [Overview](#Overview)\n",
+    "  * [Set-up](#Set-up)\n",
+    "  * [Basic Sketch Usage](#Basic-Sketch-Usage)\n",
+    "  * [Sketch Unions](#Sketch-Unions)\n",
+    "  * [Sketch Intersections](#Sketch-Intersections)\n",
+    "  * [Set Difference (A-not-B)](#Set-Difference-(A-not-B))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Overview\n",
+    "\n",
+    "This tutorial covers basic operation of the Theta sketch for distinct 
counting. We will demonstrate how to create and feed data into sketches as well 
as the various set operations. We will also include the HLL sketch for 
comparison.\n",
+    "\n",
+    "Characterization tests of the hash function we use, Murmur3, have shown 
that it has excellent independence properties. As a reuslt, we can achieve 
reasonable performance for demonstration purposes by feeding in sequential 
integers. This lets us experiment with the set operations in a controlled but 
still realistic manner, and to know the exact result without resorting to an 
expensive computation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Set-up\n",
+    "\n",
+    "This tutorial assuems you have already downloaded and installed the 
python wrapper for the DataSketches library. See the [DataSketches 
Downloads](http://datasketches.apache.org/docs/Community/Downloads.html) page 
for details"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datasketches import theta_sketch, update_theta_sketch, 
compact_theta_sketch\n",
+    "from datasketches import theta_union, theta_intersection, 
theta_a_not_b\n",
+    "\n",
+    "from datasketches import hll_sketch, hll_union"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Basic Sketch Usage"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To start, we'll create a sketch with ~1 million points in order to 
demonstrate basic sketch operations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n = 2**20\n",
+    "k = 12\n",
+    "sk1 = update_theta_sketch(k)\n",
+    "hll1 = hll_sketch(k)\n",
+    "for i in range(0, n):\n",
+    "    sk1.update(i)\n",
+    "    hll1.update(i)\n",
+    "print(sk1)\n",
+    "print(hll1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The summary contains most data of interest, but we can also query for 
specific information. And in this case, since we know the exact number of 
distinct items presented to the sketch, we can look at the estimate, upper, and 
lower bounds as a percentage of the exact value."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(f'Exact result:\\t\\t\\t{n}')\n",
+    "print('')\n",
+    "print(f'Theta upper bound (1 std. 
dev):\\t{sk1.get_upper_bound(1):.1f}\\t({100*sk1.get_upper_bound(1) / n - 
100:.2f}%)')\n",
+    "print(f'Theta 
estimate:\\t\\t\\t{sk1.get_estimate():.1f}\\t({100*sk1.get_estimate() / n - 
100:.2f}%)')\n",
+    "print(f'Theta lower bound (1 std. 
dev):\\t{sk1.get_lower_bound(1):.1f}\\t({100*sk1.get_lower_bound(1) / n - 
100:.2f}%)')\n",
+    "print('')\n",
+    "print(f'HLL upper bound (1 std. 
dev):\\t{hll1.get_upper_bound(1):.1f}\\t({100*hll1.get_upper_bound(1) / n - 
100:.2f}%)')\n",
+    "print(f'HLL 
estimate:\\t\\t\\t{hll1.get_estimate():.1f}\\t({100*hll1.get_estimate() / n - 
100:.2f}%)')\n",
+    "print(f'HLL lower bound (1 std. 
dev):\\t{hll1.get_lower_bound(1):.1f}\\t({100*hll1.get_lower_bound(1) / n - 
100:.2f}%)')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can serialize and reconstruct the sketch. If we compact the sketch 
prior to serialization, we can still query the rebuilt sketch but cannot update 
it further. When reconstructed, we can see that the estimate is exactly the 
same."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sk1_bytes = sk1.compact().serialize()\n",
+    "len(sk1_bytes)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "new_sk1 = theta_sketch.deserialize(sk1_bytes)\n",
+    "print(f'Estimate (original):\\t{sk1.get_estimate()}')\n",
+    "print(f'Estimate (new):\\t\\t{new_sk1.get_estimate()}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Sketch Unions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Theta Sketch unions make use of a separate union object. The union will 
accept input sketches with different values of $k$.\n",
+    "\n",
+    "For this example, we will create a sketch with distinct values that 
partially overlap those in `sk1`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "offset = int(15 * n / 16)\n",
+    "sk2 = update_theta_sketch(k+1)\n",
+    "hll2 = hll_sketch(k+1)\n",
+    "for i in range(0, n):\n",
+    "    sk2.update(i + offset)\n",
+    "    hll2.update(i + offset)\n",
+    "print(sk2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can now feed the sketches into the union. As constructed, the exact 
number of unique values presented to the two sketches is $\\frac{31}{16}n$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "union = theta_union(k)\n",
+    "union.update(sk1)\n",
+    "union.update(sk2)\n",
+    "\n",
+    "union_hll = hll_union(k)\n",
+    "union_hll.update(hll1)\n",
+    "union_hll.update(hll2)\n",
+    "\n",
+    "exact = int(31 * n / 16);\n",
+    "result = union.get_result()\n",
+    "theta_bound_pct = 100 * (result.get_upper_bound(1) - 
result.get_estimate()) / exact\n",
+    "\n",
+    "hll_result = union_hll.get_result()\n",
+    "hll_bound_pct = 100 * (hll_result.get_upper_bound(1) - 
hll_result.get_estimate()) / exact\n",
+    "\n",
+    "\n",
+    "print(f'Exact result:\\t{exact}')\n",
+    "print(f'Theta Estimate:\\t{result.get_estimate():.1f} 
({100*(result.get_estimate()/exact - 1):.2f}% +- {theta_bound_pct:.2f}%)')\n",
+    "print(f'HLL Estimate:\\t{hll_result.get_estimate():.1f} 
({100*(hll_result.get_estimate()/exact - 1):.2f}% +- {hll_bound_pct:.2f}%)')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Sketch Intersections"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Beyond unions, theta sketches also support intersctions through the use 
of an intersection object. For comparison, we also present the HLL estimate 
here using, using the Inclusion-Exclusion formula: $|A \\cup B| = |A| + |B| - 
|A \\cap B|$.\n",
+    "\n",
+    "That formula might not seem too bad when intersecting 2 sketches, but as 
the number of sketches increases the formula becomes increasingly comples, and 
the error compounds rapidly. By comparison, the Theta set operations can be 
applied to an arbitrary number of sketches. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "intersection = theta_intersection()\n",
+    "intersection.update(sk1)\n",
+    "intersection.update(sk2)\n",
+    "\n",
+    "hll_inter_est = hll1.get_estimate() + hll2.get_estimate() - 
hll_result.get_estimate()\n",
+    "\n",
+    "print(\"Has result: \", intersection.has_result())\n",
+    "result = intersection.get_result()\n",
+    "print(result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this case, we expect the sets to have an overlap of $\\frac{1}{16}n$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exact = int(n / 16)\n",
+    "theta_bound_pct = 100 * (result.get_upper_bound(1) - 
result.get_estimate()) / exact\n",
+    "\n",
+    "print(f'Exact result:\\t\\t{exact}')\n",
+    "print(f'Theta Estimate:\\t\\t{result.get_estimate():.1f} 
({100*(result.get_estimate()/exact - 1):.2f}% +- {theta_bound_pct:.2f}%)')\n",
+    "print(f'HLL Estimate:\\t\\t{hll_inter_est:.1f} ({100*(hll_inter_est/exact 
- 1):.2f}%)')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Set Difference (A-not-B)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, we have the set difference operation. Unlike `theta_union` and 
`theta_intersection`, `theta_a_not_b` is currently stateless: The object takes 
as input 2 sketches at a time, namely $a$ and $b$, and directly returns the 
result as a sketch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "anb = theta_a_not_b()\n",
+    "result = anb.compute(sk1, sk2)\n",
+    "print(result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "By using the same two sketches as before, the expected result here is 
$\\frac{15}{16}n$.\n",
+    "\n",
+    "Our HLL estimate comes from manipulating the Inclusion-Exclusion formula 
above to obtain $|A| - |A \\cap B| = |A \\cup B| - |B|$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exact = int(15 * n /16)\n",
+    "theta_bound_pct = 100 * (result.get_upper_bound(1) - 
result.get_estimate()) / exact\n",
+    "hll_diff_est = hll_result.get_estimate() - hll2.get_estimate()\n",
+    "\n",
+    "print(f'Exact result:\\t{exact}')\n",
+    "print(f'Theta estimate:\\t{result.get_estimate():.1f} 
({100*(result.get_estimate()/exact -1):.2f}% +- {theta_bound_pct:.2f}%)')\n",
+    "print(f'HLL estimate:\\t{hll_diff_est:.1f} ({100*(hll_diff_est/exact - 
1):.2f}%)')"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/output/docs/img/Community/Untitled.ipynb 
b/output/docs/img/Community/Untitled.ipynb
new file mode 100644
index 0000000..2eab621
--- /dev/null
+++ b/output/docs/img/Community/Untitled.ipynb
@@ -0,0 +1,493 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datasketches import *\n",
+    "import pandas as pd\n",
+    "import base64"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "deserialize_kll = lambda x : 
kll_floats_sketch.deserialize(base64.b64decode(x))\n",
+    "\n",
+    "df = pd.read_csv(\"1hr.kll.k140.txt\",\n",
+    "                 sep=\"\\t\",\n",
+    "                 header=None,\n",
+    "                 names=['pty','device','kll'],\n",
+    "                 dtype={'pty':'category', 'device':'category'},\n",
+    "                 converters={'kll':deserialize_kll}\n",
+    "                )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>pty</th>\n",
+       "      <th>device</th>\n",
+       "      <th>kll</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>mail</td>\n",
+       "      <td>mobile</td>\n",
+       "      <td>### KLL sketch summary:\\n   K              : 1...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>mail</td>\n",
+       "      <td>desktop</td>\n",
+       "      <td>### KLL sketch summary:\\n   K              : 1...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>news</td>\n",
+       "      <td>mobile</td>\n",
+       "      <td>### KLL sketch summary:\\n   K              : 1...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>news</td>\n",
+       "      <td>desktop</td>\n",
+       "      <td>### KLL sketch summary:\\n   K              : 1...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>sports</td>\n",
+       "      <td>mobile</td>\n",
+       "      <td>### KLL sketch summary:\\n   K              : 1...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>sports</td>\n",
+       "      <td>desktop</td>\n",
+       "      <td>### KLL sketch summary:\\n   K              : 1...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>finance</td>\n",
+       "      <td>mobile</td>\n",
+       "      <td>### KLL sketch summary:\\n   K              : 1...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>finance</td>\n",
+       "      <td>desktop</td>\n",
+       "      <td>### KLL sketch summary:\\n   K              : 1...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>front-page</td>\n",
+       "      <td>mobile</td>\n",
+       "      <td>### KLL sketch summary:\\n   K              : 1...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>front-page</td>\n",
+       "      <td>desktop</td>\n",
+       "      <td>### KLL sketch summary:\\n   K              : 1...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "          pty   device                                                
kll\n",
+       "0        mail   mobile  ### KLL sketch summary:\\n   K              : 
1...\n",
+       "1        mail  desktop  ### KLL sketch summary:\\n   K              : 
1...\n",
+       "2        news   mobile  ### KLL sketch summary:\\n   K              : 
1...\n",
+       "3        news  desktop  ### KLL sketch summary:\\n   K              : 
1...\n",
+       "4      sports   mobile  ### KLL sketch summary:\\n   K              : 
1...\n",
+       "5      sports  desktop  ### KLL sketch summary:\\n   K              : 
1...\n",
+       "6     finance   mobile  ### KLL sketch summary:\\n   K              : 
1...\n",
+       "7     finance  desktop  ### KLL sketch summary:\\n   K              : 
1...\n",
+       "8  front-page   mobile  ### KLL sketch summary:\\n   K              : 
1...\n",
+       "9  front-page  desktop  ### KLL sketch summary:\\n   K              : 
1..."
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "### KLL sketch summary:\n",
+      "   K              : 140\n",
+      "   min K          : 140\n",
+      "   M              : 8\n",
+      "   N              : 8651479\n",
+      "   Epsilon        : 1.88%\n",
+      "   Epsilon PMF    : 2.31%\n",
+      "   Empty          : false\n",
+      "   Estimation mode: true\n",
+      "   Levels         : 16\n",
+      "   Sorted         : false\n",
+      "   Capacity items : 466\n",
+      "   Retained items : 345\n",
+      "   Storage bytes  : 1472\n",
+      "   Min value      : 0\n",
+      "   Max value      : 5.38e+03\n",
+      "### End sketch summary\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "query_result = kll_floats_sketch(140)\n",
+    "for sk in df.loc[df['pty'] != 'news'].itertuples(index=False):\n",
+    "    query_result.merge(sk.kll)\n",
+    "print(query_result)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#xmin = query_result.get_min_value()\n",
+    "#xmax = query_result.get_max_value()\n",
+    "xmin = 0.001\n",
+    "xmax = query_result.get_quantile(0.95)\n",
+    "num_splits = 50\n",
+    "step = (xmax - xmin) / num_splits\n",
+    "splits = [0.001 + (i*step) for i in range(0, num_splits)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pmf = query_result.get_pmf(splits)\n",
+    "x = splits.copy()\n",
+    "x.append(xmax)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import seaborn as sns\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "%matplotlib inline\n",
+    "sns.set(color_codes=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<BarContainer object of 51 artists>"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": 
"iVBORw0KGgoAAAANSUhEUgAAAXwAAAD7CAYAAABpJS8eAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAbx0lEQVR4nO3dfUyU9wEH8O/BgXqDFmHP3QwxNZubrlPmWhIYM1gr58nLgVqaUVxvzopvXW1JteLLAtK6GqribNFUbLtEccMq4ugc0m6ZSwZZwXWCae3qWjsreofALIeAJzz7w3jxehzPHdzLY3/fT2Li8/59Hsj3nvsdPGhkWZZBRERfe2GhDkBERMHBwiciEgQLn4hIECx8IiJBsPCJiATBwiciEgQLn4hIENpQBxhJd3cvhob8/2sCcXFR6Oy0+32//qT2jGrPB6g/I/ONndozBjtf
 [...]
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "fig, ax = plt.subplots()\n",
+    "plt.bar(x=x, height=pmf, align='edge', width=-50)\n",
+    "#ax.set_xscale('log')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "num_splits = 50\n",
+    "logstep = np.log10(xmax - 1.0) / num_splits\n",
+    "logsplits = [1.0 + (i*logstep) for i in range(0, num_splits)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pmf = query_result.get_pmf(np.power(logsplits,10))\n",
+    "logx = logsplits.copy()\n",
+    "logx.append(xmax)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<BarContainer object of 51 artists>"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": 
"iVBORw0KGgoAAAANSUhEUgAAAXwAAAD7CAYAAABpJS8eAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAd0klEQVR4nO3df1Cb9QEG8CdYoKV0w3Jvsslpvc5Nhi1j0zuR3eGwTdNSXoOF3mo7o+tKW89Kx3kIjvaqWCtXcZwV7Xls6q0LG5TryOJ5KVtPu7vBzcKc6LV2Mn+spS4JpG4FQ0mad3/0mmsM4c1vot/n8xfvj7x58g33kPu+5H01iqIoICKir7y0uQ5ARETJwcInIhIEC5+ISBAsfCIiQbDwiYgEwcInIhIEC5+ISBDz5jrAbM6fn4TPl5ivCeTmZmN8fCIhx44H5otdqmdkvtilesZk
 [...]
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "fig, ax = plt.subplots()\n",
+    "plt.bar(x=x, height=pmf, align='edge', width=-50)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "kll = kll_floats_sketch(140)\n",
+    "for i in range(0,100000):\n",
+    "    kll.update(np.random.exponential())\n",
+    "xmin = kll.get_min_value()\n",
+    "xmax = kll.get_max_value()\n",
+    "step = (xmax - xmin) / 50\n",
+    "splits = [xmin + (i*step) for i in range(0,50)]\n",
+    "x = splits.copy()\n",
+    "x.append(xmax)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<BarContainer object of 51 artists>"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": 
"iVBORw0KGgoAAAANSUhEUgAAAYIAAAD7CAYAAABnoJM0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3df1DT9/0H8GeAQGGko7Akcm7X9qYnh4reLhuUtbBOaeRHGop4U+iix8QfndXmWiqtWoTD+WMotqtQS3u7a9UpqzZZehix7by1wt2Q1Zqbtlfv1tapTcKPKqGgIfl8//Dbz0oBEyAQ4uf5uPOOz+f9ziev1+jy5PP+JJ/IBEEQQEREkhUW7AKIiCi4GARERBLHICAikjgGARGRxDEIiIgkjkFARCRxDAIiIomLCHYBY9Hd3QuvN7Aff0hIiEVnpyugxww29hQa2FNoCOWe
 [...]
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "pmf = kll.get_pmf(splits)\n",
+    "plt.bar(x=x,height=pmf,align='edge',width=-.25)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = [i for i in range(-10,11)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y = np.multiply(x,x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[<matplotlib.lines.Line2D at 0x7f933a602760>]"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": 
"iVBORw0KGgoAAAANSUhEUgAAAXkAAAD7CAYAAACPDORaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deVxU59028GuGGYZ9n2GTXRA3FkUiLqBxYRM0aBITU5sYo2lTk6aNjTFN0yZNtKnPx6RvHu2TpbFNYlNXXIK4gyhGBRFcUFH2fd+HYZb7/cNIg6Iyw8ycWX7ff1pmznCu3OjF8Zxz34fHGGMghBBikvhcByCEEKI7VPKEEGLCqOQJIcSEUckTQogJo5InhBATRiVPCCEmjEqeEEJMmIDrAPdqa+uBSqX+rfuurnZoaenWQaKRoVzqoVzqM9RslEs9mubi83lwdrZ94PsG
 [...]
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "plt.plot(x,y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 67,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "kll = kll_floats_sketch(160)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "kll.update(np.random.poisson(lam=2.0,size=2**20))\n",
+    "kll.update(np.random.poisson(lam=20.0,size=2**22))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 75,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<BarContainer object of 61 artists>"
+      ]
+     },
+     "execution_count": 75,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": 
"iVBORw0KGgoAAAANSUhEUgAAAXwAAAD7CAYAAABpJS8eAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAaiUlEQVR4nO3df0wb5x0G8MfkB4XFjBWdvRSt7bZMYWnD0IRWiiZHmUKcAFcQpBpqJi+LRtdqC5vVsdIA3ZI1JY3ovHTpoop2ndrBCqEtlqvMoEbrVg2UDZQmVKFRmZYuYZltIE2A2oDD7Y8otzoY7hxsDH6fz1/cvefz917sx8fruxeDoigKiIgo4SXFuwAiIlocDHwiIkEw8ImIBMHAJyISBAOfiEgQDHwiIkEw8ImIBLEy3gXM5/LlCczMLPw2gYyMNRgZGY9CRcsX++A69gP74IZE
 [...]
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "xmin = kll.get_min_value()\n",
+    "xmax = kll.get_max_value()\n",
+    "num_steps = 60\n",
+    "step = (xmax - xmin) / num_steps\n",
+    "splits = [xmin + (i*step) for i in range(0,num_steps)]\n",
+    "x = splits.copy()\n",
+    "x.append(xmax)\n",
+    "pmf = kll.get_pmf(splits)\n",
+    "plt.bar(x=x,height=pmf,align='edge',width=-.5)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[incubator-datasketches-website] branch asf-site updated: Automatic Site Publish by Buildbot

Reply via email to