Re: New Flink connector

Ted Yu Sat, 18 Nov 2017 07:32:31 -0800

+1 on Robert's proposal.

On Sat, Nov 18, 2017 at 6:21 AM, Robert Metzger <[email protected]> wrote:


> I'm really sorry that I didn't respond yet.
>
> Regarding the question "should the code be in the bahir-flink repo, or the
> kudu repo"
> I have the feeling that the kudu repo might actually the better spot,
> because flume, mapreduce, hive, spark, and others are already there. It
> seems that the Kudu project is accepting such contributions and is also
> willing to maintain them.
>
> If the Kudu project rejects the contribution for now, I would suggest to
> provide a script that builds the flink-kudu connector, starts kudu (maybe
> from a docker image?) and then runs a few test jobs.
>
>
> On Fri, Nov 3, 2017 at 11:45 AM, Nacho Garcia Fernandez <
> [email protected]> wrote:
>
> > Can somebody out there please reply my last question?  Thanks in advance
> :D
> >
> > On 27 October 2017 at 14:23, Nacho Garcia Fernandez <
> > [email protected]
> > > wrote:
> >
> > > Hello all.
> > >
> > > I'm a little bit stuck with one issue that I hope you can help me with.
> > >
> > > I'm developing a flink-connector-kudu extension that allows to read
> from
> > > Kudu and write to Flink and Kudu. This connector addresses the issue
> > > [BAHIR-99] and is a full re-implemnetation of
> https://github.com/apache/
> > > bahir-flink/pull/17.
> > >
> > > I'm struggle with testing: How is it supposed to be handled when the
> data
> > > storage (kudu) do not provide an embedded driver?
> > >
> > > In the case of Kudu, it does not provide any embedded java-based driver
> > > yet and I need a built Kudu to perform testing against it, otherwise I
> > > cannot test (e2e) this connector with a "real" data storage.
> > >
> > > Because of that I see three main possibilities for this scenario:
> > >
> > > * Create a Mock for Kudu classes (KuduSession, KuduTable, KuduClient,
> > etc).
> > >
> > > * Use MiniKuduCluster utility of Kudu to instantiate a local cluster:
> it
> > > is not possible due to the fact that this needs a real build of Kudu in
> > the
> > > local machine.
> > >
> > > * Update travis.yml to install a Kudu server: it would fix the problem
> > for
> > > CI, but tests would fail locally. Moreover, bulding Kudu takes so long
> > > (more than 20 minutes), which is not feasible for CI.
> > >
> > > * Ignore testing: not an option :)
> > >
> > >
> > > In the case of Kudu, I saw that other connectors for other distributed
> > > analytics platforms (i.e spark) are directly implemented in the Kudu
> > repo (
> > > https://github.com/apache/kudu/tree/master/java/kudu-spark) instead of
> > > using bahir-spark. I think this is good because when you execute the
> > tests
> > > you have a real build of Kudu to perform testing against it.
> > >
> > > What is the best place (kudu vs bahir) for this connector if we take
> into
> > > consideration the abovementioned issues?
> > >
> > > If the answer is bahir-flink, how should I proceed with my tests? :)
> > >
> > > Thanks in advance.
> > >
> > >
> >
>

Re: New Flink connector

Reply via email to