This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository 
https://gitbox.apache.org/repos/asf/incubator-datasketches-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new f4cbf3b  Automatic Site Publish by Buildbot
f4cbf3b is described below

commit f4cbf3ba50bb587e996f64c05f3bd8566e55a7f3
Author: buildbot <[email protected]>
AuthorDate: Fri Sep 25 05:32:29 2020 +0000

    Automatic Site Publish by Buildbot
---
 output/docs/Community/Research.html | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/output/docs/Community/Research.html 
b/output/docs/Community/Research.html
index 04d1513..950a449 100644
--- a/output/docs/Community/Research.html
+++ b/output/docs/Community/Research.html
@@ -523,20 +523,19 @@
 
 <p>Agarwal, et al, discuss different types of mergeable summaries in their 
Mergeable Summaries paper [AC+13].</p>
 
-<h3 id="the-data-sketches-open-source-library">The Data Sketches Open Source 
Library</h3>
+<h3 id="the-apache-datasketches-open-source-library">The Apache DataSketches 
Open Source Library</h3>
 
 <p>This library has been designed from the beginning to be high-performance 
and production-quality suitable for integration into large data processing 
systems that must deal with massive data.
-The library is written in Java, and contains state of the art algorithms for a 
variety of basic query classes, including identifying frequent items, unique 
count queries, computing quantiles and histograms, and sampling. It will soon 
contain algorithms for matrix analytic tasks such as PCA as well. 
-All algorithms in the library produce mergeable summaries, 
-and come with formal guarantees on the accuracy of the answers returned.</p>
+The library is written in Java and C++ (with adaptors to Python), and contains 
state of the art algorithms for a variety of basic query classes, including 
identifying frequent items, unique count queries, computing quantiles and 
histograms, and sampling. It will soon contain algorithms for matrix analytic 
tasks such as PCA as well. 
+All algorithms in the library produce mergeable summaries, and come with 
formal guarantees on the accuracy of the answers returned.</p>
 
-<p>Currently, the core contributors to the library are Lee Rhodes, Kevin Lang, 
Jon Malkin, and Alex Saydakov (all at Yahoo/Verizon Media), Justin Thaler 
(Assistant Professor at Georgetown University, Department of Computer Science), 
and Edo Liberty (Principal Scientist at Amazon Web Services and manager of the 
Algorithms group at Amazon AI).</p>
+<p>The original core contributors to the library are Lee Rhodes, Jon Malkin, 
and Alex Saydakov (all at Yahoo/Verizon Media), Justin Thaler (Assistant 
Professor at Georgetown University, Department of Computer Science), and Edo 
Liberty (Principal Scientist at Amazon Web Services and manager of the 
Algorithms group at Amazon AI), but we continue to grow our community.</p>
 
-<p>The library has been adapted throughout industry and government. For 
example, at Yahoo, where it was conceived and created, the library is widely 
used internally to reduce processing time from days to seconds for many tasks. 
At SpliceMachine, it is used for database query planning and optimization. It 
is also deeply embedded into a low-latency open source data store called Druid, 
as well as an open source graph database called Gaffer that is maintained by 
the British intelligence agen [...]
+<p>The library has been adapted throughout industry and government. For 
example, at Yahoo, where it was conceived and created, the library is widely 
used internally to reduce processing time from days to seconds for many tasks. 
At SpliceMachine, it is used for database query planning and optimization. It 
is also deeply embedded into a low-latency open source data store called Druid, 
as well as an open source graph database called Gaffer that is maintained by 
the British intelligence agen [...]
 
-<p>Beyond its utility in deployed systems, the process of developing of 
developing the Data Sketches library has led to interesting research. This has 
involved important contributions to both the theory of streaming algorithms as 
well as addressing issues that are crucial in real-world stream engines but are 
often ignored in the academic literature. These issues include mergeability, 
and dealing with <em>weighted</em> stream updates (i.e., data streams where 
each piece of data comes with [...]
+<p>Beyond its utility in deployed systems, the process of developing the Data 
Sketches library has led to interesting research. This has involved important 
contributions to both the theory of streaming algorithms as well as addressing 
issues that are crucial in real-world stream engines but are often ignored in 
the academic literature. These issues include mergeability, and dealing with 
<em>weighted</em> stream updates (i.e., data streams where each piece of data 
comes with an associated [...]
 
-<p>In particular, work on the Data Sketches library has led to novel 
algorithms achieving state of the art practical performance for identifying 
frequent items in data streams [ABL+17], and mergeable summaries for unique 
count queries [DLRT16] including intersections and differences as an extension 
of [BJKST02]. On the theoretical side, work on the library has led to the 
resolution of the space complexity of streaming approximation algorithms for 
quantile queries, which was a longstandin [...]
+<p>In particular, work on the Data Sketches library has led to novel 
algorithms achieving state of the art practical performance for identifying 
frequent items in data streams [ABL+17], and mergeable summaries for unique 
count queries [DLRT16] including intersections and differences as an extension 
of [BJKST02]. On the theoretical side, work on the library has led to the 
resolution of the space complexity of streaming approximation algorithms for 
quantile queries, which was a longstandin [...]
 
 <p>The cutting edge of scientific inquiry is to build more powerful algorithms 
and, at the same time, to devise new theorems and proofs that certify that 
these algorithms work well enough to draw robust conclusions.  This library is 
dedicated to that quest. By working in the open source community we are 
confident that there are major opportunities to incorporate algorithms for new 
and richer types of queries into the library, as well as to improve the 
efficiency of the algorithms that ar [...]
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to