Hi Lee,

I must admit that I also heard of data sketches for the first time (there
are really many Apache projects).

Datasketches sounds really exciting. As a (former) data engineer, I can
100% say that this is something that (end-)users want and need and it would
make so much sense to have it in Flink from the get-go.
Flink, however, is a quite old project already, which grew at a strong pace
leading to some 150 modules in the core. We are currently in the process to
restructure that and reduce the number of things in the core, such that
build times and stability improve.

To counter that we created Flink packages [1], which includes everything
new that we deem to not be essential. I'd propose to incorporate a Flink
datasketch package there. If it seems like it's becoming essential, we can
still move it to core at a later point.

As I have seen on the page, there are already plenty of adoptions. That
leaves a few questions to me.

   1. I'm curious on how you would estimate the effort to port datasketches
   to Flink? It already has a Java API, but how difficult would it be to
   subdivide the tasks into parallel chunks of work? Since it's already ported
   on Pig, I think we could use this port as a baseline.
   2. Do you have any idea who is usually driving the adoptions?


[1] https://flink-packages.org/

On Sun, Apr 26, 2020 at 8:07 AM leerho <lee...@gmail.com> wrote:

> Hello All,
>
> I am a committer on DataSketches.apache.org
> <http://datasketches.apache.org/> and just learning about Flink,  Since
> Flink is designed for stateful stream processing I would think it would
> make sense to have the DataSketches library integrated into its core so all
> users of Flink could take advantage of these advanced streaming
> algorithms.  If there is interest in the Flink community for this
> capability, please contact us at d...@datasketches.apache.org or on our
> datasketches-dev Slack channel.
> Cheers,
> Lee.
>


-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Reply via email to