[datasketches-website] branch master updated: Update and rename tutorial on Quantiles and Ranks.

leerho Thu, 28 Jul 2022 10:43:10 -0700

This is an automated email from the ASF dual-hosted git repository.

leerho pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 66116bb6 Update and rename tutorial on Quantiles and Ranks.
66116bb6 is described below

commit 66116bb6aabe61ab62225f68fb129edbce797523
Author: Lee Rhodes <[email protected]>
AuthorDate: Thu Jul 28 10:42:23 2022 -0700

    Update and rename tutorial on Quantiles and Ranks.
---
 _includes/toc.html                                 |  2 +-
 ...ns.md => SketchingQuantilesAndRanksTutorial.md} | 68 ++++++++++++++--------
 src/main/resources/docgen/toc.json                 |  2 +-
 3 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/_includes/toc.html b/_includes/toc.html
index 88e587df..4641b845 100644
--- a/_includes/toc.html
+++ b/_includes/toc.html
@@ -227,7 +227,7 @@
       <a data-toggle="collapse" class="menu collapsed" 
href="#collapse_quantiles_and_histograms">Quantiles And Histograms</a>
     </p>
     <div class="collapse" id="collapse_quantiles_and_histograms">
-      <li><a href="{{site.docs_dir}}/Quantiles/Definitions.html">•Quantiles 
Definitions</a></li>
+      <li><a 
href="{{site.docs_dir}}/Quantiles/SketchingQuantilesAndRanksTutorial.html">•Quantiles
 and Ranks Tutorial</a></li>
       <li><a 
href="{{site.docs_dir}}/Quantiles/QuantilesOverview.html">•Quantiles 
Overview</a></li>
       <li><a href="{{site.docs_dir}}/KLL/KLLSketch.html">•KLL Floats 
sketch</a></li>
       <li><a href="{{site.docs_dir}}/REQ/ReqSketch.html">•REQ Floats 
sketch</a></li>
diff --git a/docs/Quantiles/Definitions.md 
b/docs/Quantiles/SketchingQuantilesAndRanksTutorial.md
similarity index 70%
rename from docs/Quantiles/Definitions.md
rename to docs/Quantiles/SketchingQuantilesAndRanksTutorial.md
index a435de78..fec20a01 100644
--- a/docs/Quantiles/Definitions.md
+++ b/docs/Quantiles/SketchingQuantilesAndRanksTutorial.md
@@ -19,16 +19,20 @@ layout: doc_page
     specific language governing permissions and limitations
     under the License.
 -->
-# Quantiles and Ranks Definitions
-Streaming quantiles algorithms, or quantiles sketches, enable us to analyze 
the distributions of massive data very quickly using only a small amout of 
space.  They allow us to compute a quantile values given a desired rank, or 
compute a rank given
+# Sketching Quantiles and Ranks, the Basics
+Streaming quantiles algorithms, or quantiles sketches, enable us to analyze 
the distributions 
+of massive data very quickly using only a small amout of space.  
+They allow us to compute a quantile values given a desired rank, or compute a 
rank given
 a quantile value. Quantile sketches enable us to plot the CDF, PMF or 
histograms of a distribution. 
 
-The goal of this short tutorial it to introduce to the reader some of the 
basic concepts of quantiles, ranks and their functions.
-
-Before we can define *quantile*, we must first define what we mean by *rank*.
+The goal of this short tutorial it to introduce to the reader some of the 
basic concepts 
+of quantiles, ranks and their functions.
 
 ## What is a rank?
-Given an ordered set of values the term rank can be defined two different 
ways. 
+
+>  A ***rank*** identifies the numeric position of a specific value in an 
enumerated ordered set if values.
+
+The actual enumeration can be done in several ways, but for our use here we 
will define the two common ways that *rank* can be specified and that we will 
use. 
 
 * The **natural rank** is a **natural number** from the set of one-based, 
natural numbers, &#8469;<sub>1</sub>, and is derived by enumerating an ordered 
set of values, starting with the value 1, up to *n*, the number of values in 
the set.
 
@@ -42,7 +46,7 @@ In our sketch library and documentation, when we refer to 
*rank*, we imply *norm
 
 ## What is a quantile?
 
-> A ***quantile*** is a *value* that achieves a particular ***rank***. 
+> A ***quantile*** is a *value* that is associated with a particular 
***rank***. 
 
 *Quantile* is the general term that includes other terms that are also 
quantiles.
 To wit:
@@ -62,9 +66,9 @@ Let's examine the following table:
 
 Let's define the functions
 
-* *quantile(rank)* or *q(r)* := given a *rank, r*, return the quantile value 
*q* associated with the given *r*.
+> *quantile(rank)* or *q(r)* := return the quantile value *q* associated with 
a given *rank, r*.
 
-* *rank(quantile)* or *r(q)* := given a quantile value *q*, return the rank 
*r* associated with the given *q*.  
+> *rank(quantile)* or *r(q)* := return the rank *r* associated with the given 
*quantile, q*.  
 
 Using an example from the table:
 
@@ -110,32 +114,37 @@ The sketch might discard the even values producing 
something like this:
 |-----------------|----|----|----|----|----|
 | Natural Rank    | 2  | 4  | 6  | 8  | 10 |
 
-So how do we resove *q(3)* or *r(20)*?
+When the sketch deletes values it adjusts the associated ranks by effectively 
increasing the "weight" of adjacent items so that they are positionally 
approximately correct and the top rank corresponds to *n*.
+
+How do we resove *q(3)* or *r(20)*?
 
 ## The need for inequality search
 The quantile sketch algorithms discussed in the literature primarily differ by 
how they choose which values in the stream should be discarded. After the 
elimination process, all of the quantiles sketch implementations are left with 
the challenge of how to reconstruct the actual distribution, approximately and 
with good accuracy. 
 
-Given the presence of duplicates and absence of values from the stream we must 
redefine the above quantile and rank functions as inequalities. **We also want 
the quantile and rank functions to retain the properties of 1:1 functions.**
+Given the presence of duplicates and absence of values from the stream we must 
redefine the above quantile and rank functions as inequalities **while 
retaining the properties of 1:1 functions.**
 
-You will find examples of both of the following definitions in the research 
literature.  All of our library quantile sketches allow the user to choose 
between the two searching criteria.  
+One can find examples of the following definitions in the research literature. 
 All of our library quantile sketches allow the user to choose the searching 
criteria.  
 
 These next examples use a small data set that mimics what could be the result 
of both duplication and sketch data deletion.
 
 ## Two search conventions used when finding ranks, r(q)
-* The *non inclusive* criterion for *r(q)*: (a.k.a. the *LT* criterion)
-   * Search the quantile array until we find the adjacent pair *{q1, q2}* 
where *q1 < q <= q2*. Return the rank associated with *q1*, the first of the 
pair.
 
-For example *r(30) = 5*
+### The *non inclusive* criterion for *r(q)* (a.k.a. the *LT* criterion):
+
+Given *q*, search the quantile array until we find the adjacent pair *{q1, 
q2}* where *q1 < q <= q2*. Return the rank associated with *q1*, the first of 
the pair.
+
+For example *q = 30; r(30) = 5*
 
 | Quantile[]:     | 10    | 20    | q1=20 | q2=30 | 30    | 30    | 40    | 50 
   |
 
|-----------------|-------|-------|-------|-------|-------|-------|-------|-------|
 | Natural Rank[]: | 1     | 3     | r=5   |  7    | 9     | 11    | 13    | 14 
   |
 
 
-* The *inclusive* criterion for *r(q)*: (a.k.a. the *LE* criterion)
-   * Search the quantile array until we find the adjacent pair *{q1, q2}* 
where *q1 <= q < q2*. Return the rank associated with *q1*, the first of the 
pair. 
+### The *inclusive* criterion for *r(q)* (a.k.a. the *LE* criterion):
+
+Given *q*, search the quantile array until we find the adjacent pair *{q1, 
q2}* where *q1 <= q < q2*. Return the rank associated with *q1*, the first of 
the pair. 
 
-For example *r(30) = 11*
+For example *q = 30; r(30) = 11*
 
 | Quantile[]:     | 10    | 20    | 20    | 30    | 30    | q1=30 | q2=40 | 50 
   |
 
|-----------------|-------|-------|-------|-------|-------|-------|-------|-------|
@@ -143,17 +152,20 @@ For example *r(30) = 11*
 
 
 ## Two search conventions when finding quantiles, q(r)
-* The *non inclusive* criterion for *q(r)* : (a.k.a. the *GT* criterion)
-   * Search the rank array until we find the adjacent pair *{r1, r2}* where 
*r1 <= r < r2*. Return the quantile associated with *r2*, the second of the 
pair.
+
+### The *non inclusive* criterion for *q(r)* (a.k.a. the *GT* criterion):
+
+Given *r*, search the rank array until we find the adjacent pair *{r1, r2}* 
where *r1 <= r < r2*. Return the quantile associated with *r2*, the second of 
the pair.
  
-For example *q(5) = 30*
+For example *r = 5; q(5) = 30*
 
 | Natural Rank[]: | 1     | 3     | r1=5  |  r2=7 | 9     | 11    | 13    | 14 
   |
 
|-----------------|-------|-------|-------|-------|-------|-------|-------|-------|
 | Quantile[]:     | 10    | 20    | 20    | q=30  | 30    | 30    | 40    | 50 
   |
 
-* The *inclusive* criterion for *q(r)*:  (a.k.a. the *GE* criterion)
-   * Search the rank array until we find the adjacent pair *{r1, r2}* where 
*r1 < r <= r2*. Return the quantile associated with *r2*, the second of the 
pair.
+### The *inclusive* criterion for *q(r)*  (a.k.a. the *GE* criterion):
+
+Given *r*, search the rank array until we find the adjacent pair *{r1, r2}* 
where *r1 < r <= r2*. Return the quantile associated with *r2*, the second of 
the pair.
 
 For example *q(11) = 30*
 
@@ -161,8 +173,14 @@ For example *q(11) = 30*
 
|-----------------|-------|-------|-------|-------|-------|-------|-------|-------|
 | Quantile[]:     | 10    | 20    | 20    | 30    | 30    | q=30  | 40    | 50 
   |
 
+## These conventions maintain the 1:1 functional relationship
+
+### The non inclusive search for q(r) is the inverse of the non inclusive 
search for r(q). Therefore, *q = q(r(q))* and *r = r(q(r))*.
+
+### The inclusive search for q(r) is the inverse of the inclusive search for 
r(q). Therefore, *q = q(r(q))* and *r = r(q(r))*.
+
 
-The power of these inequality search algorithms is that the will produce 
repeatable and accurate results and insensitive to duplicates and sketch 
deletions. 
-In addition, the way these algorithms are designed, the property of 1:1 
functions are maintained.  
+## Summary
+The power of these inequality search algorithms is that they produce 
repeatable and accurate results, are insensitive to duplicates and sketch 
deletions, and maintain the property of 1:1 functions.
 
 
diff --git a/src/main/resources/docgen/toc.json 
b/src/main/resources/docgen/toc.json
index 1fe0235b..4b2a5a38 100644
--- a/src/main/resources/docgen/toc.json
+++ b/src/main/resources/docgen/toc.json
@@ -191,7 +191,7 @@
 
         { "class":"Dropdown", "desc" : "Quantiles And Histograms", "array":
           [
-            {"class":"Doc",  "desc" : "Quantiles Definitions",                 
   "dir" : "Quantiles", "file": "Definitions"},
+            {"class":"Doc",  "desc" : "Quantiles and Ranks Tutorial",          
   "dir" : "Quantiles", "file": "SketchingQuantilesAndRanksTutorial"},
             {"class":"Doc",  "desc" : "Quantiles Overview",                    
   "dir" : "Quantiles", "file": "QuantilesOverview" },
             {"class":"Doc",  "desc" : "KLL Floats sketch",                     
   "dir" : "KLL", "file": "KLLSketch" },
             {"class":"Doc",  "desc" : "REQ Floats sketch",                     
   "dir" : "REQ", "file": "ReqSketch" },


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[datasketches-website] branch master updated: Update and rename tutorial on Quantiles and Ranks.

Reply via email to