I explored the "download binaries from Maven" approach for a while on
Friday. Here is what I found:

1) There is a Maven plugin that should be able to help us find matching
system binaries @ https://github.com/trustin/os-maven-plugin

The protobuf-maven-plugin uses this approach to download and run the
appropriate protoc binary for your architecture according to
https://www.xolstice.org/protobuf-maven-plugin/examples/protoc-artifact.html

2) Stripped binaries from release builds look small enough to be viable to
download to run integration tests via Maven in precommit builds, at least
in non-bandwidth-constrained environments:

$ strip kudu-master
$ strip kudu-tserver
$ ls -alh
total 85M
drwxrwxr-x 2 mpercy mpercy  45 Jul  2 12:05 .
drwxrwxr-x 3 mpercy mpercy  98 Jun 29 14:56 ..
-rwxrwxr-x 1 mpercy mpercy 45M Jul  2 12:05 kudu-master
-rwxrwxr-x 1 mpercy mpercy 41M Jul  2 12:05 kudu-tserver

3) Kudu binaries contain many system dependencies related to security as
well as the c++ stdlib:

$ ldd kudu-tserver
        linux-vdso.so.1 =>  (0x00007ffe0c290000)
        libz.so.1 => /lib64/libz.so.1 (0x00007fde730d5000)
        libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fde72eab000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde72c8e000)
        libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007fde729a7000)
        libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fde725bd000)
        libssl.so.10 => /lib64/libssl.so.10 (0x00007fde7234e000)
        libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007fde72131000)
        libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2
(0x00007fde71ee3000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fde71cde000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fde71ad6000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fde717cd000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fde714ca000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fde712b4000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fde70ef3000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fde732fe000)
        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007fde70cc0000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007fde70abc000)
        libkrb5support.so.0 => /lib64/libkrb5support.so.0
(0x00007fde708ad000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fde706a8000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fde7048e000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fde70257000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fde7002f000)
        libfreebl3.so => /lib64/libfreebl3.so (0x00007fde6fe2c000)
        libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fde6fbca000)

So it's not viable to simply have a linux-x86_64 binary and a darwin-x86_64
binary like protoc does, or even just ubuntu & redhat. We'll likely need a
separate binary for every major OS version, i.e. RHEL 6, RHEL 7, trusty,
xenial, bionic. I think people running non-LTS builds of Ubuntu, or SUSE or
something, would be out of luck.

One potential option would be to offer a completely static build that is
for testing only and with no intent to ever fix security vulnerabilities. I
would have two concerns about that, though: 1) someone could take those
binaries and run them for non-testing purposes, and 2) I'm not sure how
easy it would be to generate a fully static build, since I don't think the
distributions provide static libs for security components in order to
discourage people from doing this.

Mike


On Sat, Jun 30, 2018 at 4:31 AM Tim Robertson <timrobertson...@gmail.com>
wrote:

> > What do you mean by that?
> Sorry, poor phrasing - currently the Beam project has the build path with
> unit tests (no Docker there) and the project IT environment which can use
> Docker.
> A binary only approach could potentially be managed without adding a
> dependency on Docker - but has other issues summarised below.
>
> > For Kudu-internal testing I think we could stick to running "kudu
> minicluster
> Yes.
>
> > ... external use cases, we could switch that to "docker run
> kudu:minicluster:1.7.0"
> I think this makes good sense.
>
>
> In summary:
>
> 1) Fake a Kudu master in Java - difficult unless simplified, not
> representative if simplified, code maintenance issue
> 2) Mocking the Kudu client - verbose unless only covering simple scenarios
> 3) Use mini cluster with binaries - portability challenge of binaries, need
> to script caching the binaries / use of some repository, unfamiliar build
> tasks with binary handling (unless built to work with something like
> maven), possible could see linking problems
> 4) Docker - predictable, adds a dependency, existing Kudu images not
> "managed" at the moment
>
> For Beam I think I will put most effort into IT which can use Docker or an
> existing cluster and then mock a Java KuduClient for some basic sanity
> tests for the build path.
>
> On Docker:
> - to get current versions [e.g. 1] working I found I had to edit
> /etc/hosts. I think the mini cluster version with the FakeDNS might avoid
> that?
> - Kudu docs currently encourage the Cloudera Quickstart VM over Docker
> [2,3]
>
> Do you think the Kudu project could provide an image allowing "docker run
> kudu:minicluster:1.x.x" as part of the release cycle?
>
> Thanks again,
> Tim
>
> [1] https://github.com/MartinWeindel/kudu-docker
> [2] https://kudu.apache.org/docs/quickstart.html#quickstart_vm
> [3] https://github.com/cloudera/kudu-examples/wiki/Docker-based-tutorial
>
> On Sat, Jun 30, 2018 at 2:22 AM, Todd Lipcon <t...@cloudera.com.invalid>
> wrote:
>
> > On Fri, Jun 29, 2018 at 1:23 PM, Tim Robertson <
> timrobertson...@gmail.com>
> > wrote:
> >
> > > Thanks Mike, Todd - I greatly appreciate the inputs.
> > >
> > > > How many platforms would need to be supported for it to be viable for
> > > Beam?
> > > The minimal for it to be considered would probably(!) be ubuntu,
> centos,
> > > osx. Incidentally it was actually the protobuf approach that make me
> > > consider this.
> > >
> > > > What about depending on a docker container than runs the kudu
> > > minicluster in
> > > "host" networking mode?
> > > I've also pondered this a little but like Attila raises it puts a lot
> of
> > > burden for other project developers. Mmmm...
> > >
> >
> > What do you mean by that? For Kudu-internal testing I think we could
> stick
> > to running "kudu minicluster" as is. For external use cases, we could
> > switch that to "docker run kudu:minicluster:1.7.0" or whatever, and it
> > would auto-download from dockerhub as necessary, right?
> >
> >
> > >
> > > Ismaël (Beam PMC) has suggested I stick to mocking given the complexity
> > of
> > > the things I'm exploring.
> > >
> > > As another idea:
> > > I briefly pondered writing a "FakeKudu Java server" - data held in
> > memory,
> > > no partitioning, protobuf messaging, handling table metadata, checking
> > > schemas on write, predicate and projected columns for scan, faking
> > kerberos
> > > (if possible). It didn't seem particularly difficult to do but I fear a
> > > maintenance burden for a small audience.
> > >
> > >
> > Yea, I think that would be quite a maintenance burden, especially as new
> > features are added over time. I suppose in many cases you could omit
> things
> > or stub things out, but then the behavior will begin to differ and it
> won't
> > really be that clear that your tests actually are representative.
> >
> >
> > > Could utilities in Kudu that help folk test Java clients be of interest
> > to
> > > others? - e.g. preconfigured mock objects for various scenarios. If so,
> > I'd
> > > be happy to discuss options and offer PRs in Kudu.
> > >
> > > Thanks,
> > > Tim
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jun 29, 2018 at 9:34 PM, Todd Lipcon <t...@cloudera.com.invalid
> >
> > > wrote:
> > >
> > > > On Fri, Jun 29, 2018 at 12:31 PM, Mike Percy <mpe...@apache.org>
> > wrote:
> > > >
> > > > > This is something I've been thinking about and toying with and I'd
> > like
> > > > to
> > > > > see if we can't get binaries available via Maven for at least one
> > > > platform
> > > > > (say, RHEL 7). Similar to how protobuf does it.
> > > > >
> > > >
> > > > What about depending on a docker container than runs the kudu
> > minicluster
> > > > in "host" networking mode? eg https://github.com/
> > > MartinWeindel/kudu-docker
> > > > is one possibility
> > > >
> > > >
> > > > > How many platforms would need to be supported for it to be viable
> for
> > > > Beam?
> > > > >
> > > > > Thanks,
> > > > > Mike
> > > > >
> > > > > On Fri, Jun 29, 2018 at 10:01 AM Tim <timrobertson...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thanks Attila
> > > > > >
> > > > > > That’s great feedback and helpful for me to reference as
> guidance.
> > > > > >
> > > > > > By “Kudu installation” I was referring to the possibility that an
> > > > install
> > > > > > might set config etc, beyond just having the binary. I got it
> > running
> > > > on
> > > > > > CentOS similar to how you outline now.
> > > > > >
> > > > > > I too believe mocking makes most sense, especially as we have the
> > IT
> > > > > > running as well, but was asked to explore this further. It’s
> useful
> > > to
> > > > > know
> > > > > > you’d agree.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Tim
> > > > > >
> > > > > > > On 29 Jun 2018, at 17:33, Attila Bukor <abu...@cloudera.com>
> > > wrote:
> > > > > > >
> > > > > > > Hi Tim,
> > > > > > >
> > > > > > > I’m not sure what you mean by relying on actual installations.
> If
> > > you
> > > > > > have the kudu, kudu-master and kudu-tserver binaries at the same
> > > > location
> > > > > > and they can be executed, MiniKuduCluster can be used (“binDir”
> > > > property
> > > > > > should be set to the directory containing the Kudu binaries). You
> > > > should
> > > > > > also look into BaseKuduTest as that will set up the
> MiniKuduCluster
> > > for
> > > > > you
> > > > > > and you don’t have to do it manually.
> > > > > > >
> > > > > > > Extracting the Kudu binaries from an rpm should probably work,
> > but
> > > > that
> > > > > > binds you to CDH as currently Cloudera is the only one that ships
> > > Kudu
> > > > > > binaries and MacOS builds are not available anywhere afaik. Also,
> > > 1.4.0
> > > > > is
> > > > > > around a year old, you might want to use this repository instead
> > > (from
> > > > > CDH
> > > > > > 5.13 Kudu is part of the CDH):
> > > > > > http://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/5/
> > > > > RPMS/x86_64/kudu-1.7.0+cdh5.15.0+0-1.cdh5.15.0.p0.52.el7.x86_64.rpm
> > > > > > >
> > > > > > > As a general suggestion, I would recommend mocking Kudu for
> unit
> > > > tests
> > > > > > (that’s what a unit test is for after all) and create separate
> > > > > integration
> > > > > > tests that actually use Kudu that can be skipped where Kudu is
> not
> > > > > > available. Of course the CI should be set up to be able to
> provide
> > > all
> > > > > > necessary integrations for the tests, but a developer wouldn’t
> have
> > > to
> > > > > set
> > > > > > up Kudu, or use Docker to run the tests if their change doesn’t
> > > affect
> > > > > the
> > > > > > Kudu integration.
> > > > > > >
> > > > > > > Attila
> > > > > > >
> > > > > > >> On 2018. Jun 29., at 16:42, Tim Robertson <
> > > > timrobertson...@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> Hi folks,
> > > > > > >>
> > > > > > >> I've written Java KuduIO for Apache Beam with integration
> tests
> > > > making
> > > > > > use
> > > > > > >> of Kudu in Docker.  It is yet to be committed on Apache Beam.
> > > > > > >>
> > > > > > >> Rather than mocking Kudu client for unit tests I'd like to
> > explore
> > > > use
> > > > > > of
> > > > > > >> the MiniKuduCluster which "Depends on precompiled kudu,
> > > kudu-master,
> > > > > and
> > > > > > >> kudu-tserver binaries".
> > > > > > >>
> > > > > > >> I'd need unit tests to run on the main linux distros and OS X.
> > > > > > >>
> > > > > > >> For the linux distros, would an approach where I extract the
> > > > binaries
> > > > > > from
> > > > > > >> the packages [1] work please? Or does the MiniKuduCluster rely
> > on
> > > > > actual
> > > > > > >> installations? I am pretty weak on C builds and linked
> libraries
> > > etc
> > > > > > (Java
> > > > > > >> guy, sorry).
> > > > > > >>
> > > > > > >> For CentOS I'm exploring this for example:
> > > > > > >>  rpm2cpio ./kudu-1.4.0+cdh5.12.2+0-1.
> > > cdh5.12.2.p0.8.el7.x86_64.rpm
> > > > |
> > > > > > cpio
> > > > > > >> -idmv
> > > > > > >>
> > > > > > >> I haven't explored OS X options yet.
> > > > > > >>
> > > > > > >> Any advice here would greatly be appreciated to save me going
> > > down a
> > > > > > dead
> > > > > > >> end.
> > > > > > >>
> > > > > > >> Many thanks,
> > > > > > >> Tim
> > > > > > >>
> > > > > > >>
> > > > > > >> [1] http://kudu.apache.org/docs/installation.html#install_
> > > packages
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Todd Lipcon
> > > > Software Engineer, Cloudera
> > > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>

Reply via email to