New Flink connector

Nacho Garcia Fernandez Fri, 27 Oct 2017 05:24:41 -0700

Hello all.

I'm a little bit stuck with one issue that I hope you can help me with.


I'm developing a flink-connector-kudu extension that allows to read from
Kudu and write to Flink and Kudu. This connector addresses the issue
[BAHIR-99] and is a full re-implemnetation of
https://github.com/apache/bahir-flink/pull/17.

I'm struggle with testing: How is it supposed to be handled when the data
storage (kudu) do not provide an embedded driver?

In the case of Kudu, it does not provide any embedded java-based driver yet
and I need a built Kudu to perform testing against it, otherwise I cannot
test (e2e) this connector with a "real" data storage.

Because of that I see three main possibilities for this scenario:

* Create a Mock for Kudu classes (KuduSession, KuduTable, KuduClient, etc).

* Use MiniKuduCluster utility of Kudu to instantiate a local cluster: it is
not possible due to the fact that this needs a real build of Kudu in the
local machine.

* Update travis.yml to install a Kudu server: it would fix the problem for
CI, but tests would fail locally. Moreover, bulding Kudu takes so long
(more than 20 minutes), which is not feasible for CI.

* Ignore testing: not an option :)


In the case of Kudu, I saw that other connectors for other distributed
analytics platforms (i.e spark) are directly implemented in the Kudu repo (
https://github.com/apache/kudu/tree/master/java/kudu-spark) instead of
using bahir-spark. I think this is good because when you execute the tests
you have a real build of Kudu to perform testing against it.

What is the best place (kudu vs bahir) for this connector if we take into
consideration the abovementioned issues?

If the answer is bahir-flink, how should I proceed with my tests? :)

Thanks in advance.

New Flink connector

Reply via email to