This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository
https://gitbox.apache.org/repos/asf/incubator-datasketches-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new f4cbf3b Automatic Site Publish by Buildbot
f4cbf3b is described below
commit f4cbf3ba50bb587e996f64c05f3bd8566e55a7f3
Author: buildbot <[email protected]>
AuthorDate: Fri Sep 25 05:32:29 2020 +0000
Automatic Site Publish by Buildbot
---
output/docs/Community/Research.html | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/output/docs/Community/Research.html
b/output/docs/Community/Research.html
index 04d1513..950a449 100644
--- a/output/docs/Community/Research.html
+++ b/output/docs/Community/Research.html
@@ -523,20 +523,19 @@
<p>Agarwal, et al, discuss different types of mergeable summaries in their
Mergeable Summaries paper [AC+13].</p>
-<h3 id="the-data-sketches-open-source-library">The Data Sketches Open Source
Library</h3>
+<h3 id="the-apache-datasketches-open-source-library">The Apache DataSketches
Open Source Library</h3>
<p>This library has been designed from the beginning to be high-performance
and production-quality suitable for integration into large data processing
systems that must deal with massive data.
-The library is written in Java, and contains state of the art algorithms for a
variety of basic query classes, including identifying frequent items, unique
count queries, computing quantiles and histograms, and sampling. It will soon
contain algorithms for matrix analytic tasks such as PCA as well.
-All algorithms in the library produce mergeable summaries,
-and come with formal guarantees on the accuracy of the answers returned.</p>
+The library is written in Java and C++ (with adaptors to Python), and contains
state of the art algorithms for a variety of basic query classes, including
identifying frequent items, unique count queries, computing quantiles and
histograms, and sampling. It will soon contain algorithms for matrix analytic
tasks such as PCA as well.
+All algorithms in the library produce mergeable summaries, and come with
formal guarantees on the accuracy of the answers returned.</p>
-<p>Currently, the core contributors to the library are Lee Rhodes, Kevin Lang,
Jon Malkin, and Alex Saydakov (all at Yahoo/Verizon Media), Justin Thaler
(Assistant Professor at Georgetown University, Department of Computer Science),
and Edo Liberty (Principal Scientist at Amazon Web Services and manager of the
Algorithms group at Amazon AI).</p>
+<p>The original core contributors to the library are Lee Rhodes, Jon Malkin,
and Alex Saydakov (all at Yahoo/Verizon Media), Justin Thaler (Assistant
Professor at Georgetown University, Department of Computer Science), and Edo
Liberty (Principal Scientist at Amazon Web Services and manager of the
Algorithms group at Amazon AI), but we continue to grow our community.</p>
-<p>The library has been adapted throughout industry and government. For
example, at Yahoo, where it was conceived and created, the library is widely
used internally to reduce processing time from days to seconds for many tasks.
At SpliceMachine, it is used for database query planning and optimization. It
is also deeply embedded into a low-latency open source data store called Druid,
as well as an open source graph database called Gaffer that is maintained by
the British intelligence agen [...]
+<p>The library has been adapted throughout industry and government. For
example, at Yahoo, where it was conceived and created, the library is widely
used internally to reduce processing time from days to seconds for many tasks.
At SpliceMachine, it is used for database query planning and optimization. It
is also deeply embedded into a low-latency open source data store called Druid,
as well as an open source graph database called Gaffer that is maintained by
the British intelligence agen [...]
-<p>Beyond its utility in deployed systems, the process of developing of
developing the Data Sketches library has led to interesting research. This has
involved important contributions to both the theory of streaming algorithms as
well as addressing issues that are crucial in real-world stream engines but are
often ignored in the academic literature. These issues include mergeability,
and dealing with <em>weighted</em> stream updates (i.e., data streams where
each piece of data comes with [...]
+<p>Beyond its utility in deployed systems, the process of developing the Data
Sketches library has led to interesting research. This has involved important
contributions to both the theory of streaming algorithms as well as addressing
issues that are crucial in real-world stream engines but are often ignored in
the academic literature. These issues include mergeability, and dealing with
<em>weighted</em> stream updates (i.e., data streams where each piece of data
comes with an associated [...]
-<p>In particular, work on the Data Sketches library has led to novel
algorithms achieving state of the art practical performance for identifying
frequent items in data streams [ABL+17], and mergeable summaries for unique
count queries [DLRT16] including intersections and differences as an extension
of [BJKST02]. On the theoretical side, work on the library has led to the
resolution of the space complexity of streaming approximation algorithms for
quantile queries, which was a longstandin [...]
+<p>In particular, work on the Data Sketches library has led to novel
algorithms achieving state of the art practical performance for identifying
frequent items in data streams [ABL+17], and mergeable summaries for unique
count queries [DLRT16] including intersections and differences as an extension
of [BJKST02]. On the theoretical side, work on the library has led to the
resolution of the space complexity of streaming approximation algorithms for
quantile queries, which was a longstandin [...]
<p>The cutting edge of scientific inquiry is to build more powerful algorithms
and, at the same time, to devise new theorems and proofs that certify that
these algorithms work well enough to draw robust conclusions. This library is
dedicated to that quest. By working in the open source community we are
confident that there are major opportunities to incorporate algorithms for new
and richer types of queries into the library, as well as to improve the
efficiency of the algorithms that ar [...]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]