Thank you very much Mike for taking the time to explore that.

Tim

> On 2 Jul 2018, at 21:24, Mike Percy <mpe...@apache.org> wrote:
> 
> I explored the "download binaries from Maven" approach for a while on
> Friday. Here is what I found:
> 
> 1) There is a Maven plugin that should be able to help us find matching
> system binaries @ https://github.com/trustin/os-maven-plugin
> 
> The protobuf-maven-plugin uses this approach to download and run the
> appropriate protoc binary for your architecture according to
> https://www.xolstice.org/protobuf-maven-plugin/examples/protoc-artifact.html
> 
> 2) Stripped binaries from release builds look small enough to be viable to
> download to run integration tests via Maven in precommit builds, at least
> in non-bandwidth-constrained environments:
> 
> $ strip kudu-master
> $ strip kudu-tserver
> $ ls -alh
> total 85M
> drwxrwxr-x 2 mpercy mpercy  45 Jul  2 12:05 .
> drwxrwxr-x 3 mpercy mpercy  98 Jun 29 14:56 ..
> -rwxrwxr-x 1 mpercy mpercy 45M Jul  2 12:05 kudu-master
> -rwxrwxr-x 1 mpercy mpercy 41M Jul  2 12:05 kudu-tserver
> 
> 3) Kudu binaries contain many system dependencies related to security as
> well as the c++ stdlib:
> 
> $ ldd kudu-tserver
>        linux-vdso.so.1 =>  (0x00007ffe0c290000)
>        libz.so.1 => /lib64/libz.so.1 (0x00007fde730d5000)
>        libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fde72eab000)
>        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde72c8e000)
>        libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007fde729a7000)
>        libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fde725bd000)
>        libssl.so.10 => /lib64/libssl.so.10 (0x00007fde7234e000)
>        libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007fde72131000)
>        libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2
> (0x00007fde71ee3000)
>        libdl.so.2 => /lib64/libdl.so.2 (0x00007fde71cde000)
>        librt.so.1 => /lib64/librt.so.1 (0x00007fde71ad6000)
>        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fde717cd000)
>        libm.so.6 => /lib64/libm.so.6 (0x00007fde714ca000)
>        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fde712b4000)
>        libc.so.6 => /lib64/libc.so.6 (0x00007fde70ef3000)
>        /lib64/ld-linux-x86-64.so.2 (0x00007fde732fe000)
>        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007fde70cc0000)
>        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007fde70abc000)
>        libkrb5support.so.0 => /lib64/libkrb5support.so.0
> (0x00007fde708ad000)
>        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fde706a8000)
>        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fde7048e000)
>        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fde70257000)
>        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fde7002f000)
>        libfreebl3.so => /lib64/libfreebl3.so (0x00007fde6fe2c000)
>        libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fde6fbca000)
> 
> So it's not viable to simply have a linux-x86_64 binary and a darwin-x86_64
> binary like protoc does, or even just ubuntu & redhat. We'll likely need a
> separate binary for every major OS version, i.e. RHEL 6, RHEL 7, trusty,
> xenial, bionic. I think people running non-LTS builds of Ubuntu, or SUSE or
> something, would be out of luck.
> 
> One potential option would be to offer a completely static build that is
> for testing only and with no intent to ever fix security vulnerabilities. I
> would have two concerns about that, though: 1) someone could take those
> binaries and run them for non-testing purposes, and 2) I'm not sure how
> easy it would be to generate a fully static build, since I don't think the
> distributions provide static libs for security components in order to
> discourage people from doing this.
> 
> Mike
> 
> 
> On Sat, Jun 30, 2018 at 4:31 AM Tim Robertson <timrobertson...@gmail.com>
> wrote:
> 
>>> What do you mean by that?
>> Sorry, poor phrasing - currently the Beam project has the build path with
>> unit tests (no Docker there) and the project IT environment which can use
>> Docker.
>> A binary only approach could potentially be managed without adding a
>> dependency on Docker - but has other issues summarised below.
>> 
>>> For Kudu-internal testing I think we could stick to running "kudu
>> minicluster
>> Yes.
>> 
>>> ... external use cases, we could switch that to "docker run
>> kudu:minicluster:1.7.0"
>> I think this makes good sense.
>> 
>> 
>> In summary:
>> 
>> 1) Fake a Kudu master in Java - difficult unless simplified, not
>> representative if simplified, code maintenance issue
>> 2) Mocking the Kudu client - verbose unless only covering simple scenarios
>> 3) Use mini cluster with binaries - portability challenge of binaries, need
>> to script caching the binaries / use of some repository, unfamiliar build
>> tasks with binary handling (unless built to work with something like
>> maven), possible could see linking problems
>> 4) Docker - predictable, adds a dependency, existing Kudu images not
>> "managed" at the moment
>> 
>> For Beam I think I will put most effort into IT which can use Docker or an
>> existing cluster and then mock a Java KuduClient for some basic sanity
>> tests for the build path.
>> 
>> On Docker:
>> - to get current versions [e.g. 1] working I found I had to edit
>> /etc/hosts. I think the mini cluster version with the FakeDNS might avoid
>> that?
>> - Kudu docs currently encourage the Cloudera Quickstart VM over Docker
>> [2,3]
>> 
>> Do you think the Kudu project could provide an image allowing "docker run
>> kudu:minicluster:1.x.x" as part of the release cycle?
>> 
>> Thanks again,
>> Tim
>> 
>> [1] https://github.com/MartinWeindel/kudu-docker
>> [2] https://kudu.apache.org/docs/quickstart.html#quickstart_vm
>> [3] https://github.com/cloudera/kudu-examples/wiki/Docker-based-tutorial
>> 
>> On Sat, Jun 30, 2018 at 2:22 AM, Todd Lipcon <t...@cloudera.com.invalid>
>> wrote:
>> 
>>> On Fri, Jun 29, 2018 at 1:23 PM, Tim Robertson <
>> timrobertson...@gmail.com>
>>> wrote:
>>> 
>>>> Thanks Mike, Todd - I greatly appreciate the inputs.
>>>> 
>>>>> How many platforms would need to be supported for it to be viable for
>>>> Beam?
>>>> The minimal for it to be considered would probably(!) be ubuntu,
>> centos,
>>>> osx. Incidentally it was actually the protobuf approach that make me
>>>> consider this.
>>>> 
>>>>> What about depending on a docker container than runs the kudu
>>>> minicluster in
>>>> "host" networking mode?
>>>> I've also pondered this a little but like Attila raises it puts a lot
>> of
>>>> burden for other project developers. Mmmm...
>>>> 
>>> 
>>> What do you mean by that? For Kudu-internal testing I think we could
>> stick
>>> to running "kudu minicluster" as is. For external use cases, we could
>>> switch that to "docker run kudu:minicluster:1.7.0" or whatever, and it
>>> would auto-download from dockerhub as necessary, right?
>>> 
>>> 
>>>> 
>>>> Ismaël (Beam PMC) has suggested I stick to mocking given the complexity
>>> of
>>>> the things I'm exploring.
>>>> 
>>>> As another idea:
>>>> I briefly pondered writing a "FakeKudu Java server" - data held in
>>> memory,
>>>> no partitioning, protobuf messaging, handling table metadata, checking
>>>> schemas on write, predicate and projected columns for scan, faking
>>> kerberos
>>>> (if possible). It didn't seem particularly difficult to do but I fear a
>>>> maintenance burden for a small audience.
>>>> 
>>>> 
>>> Yea, I think that would be quite a maintenance burden, especially as new
>>> features are added over time. I suppose in many cases you could omit
>> things
>>> or stub things out, but then the behavior will begin to differ and it
>> won't
>>> really be that clear that your tests actually are representative.
>>> 
>>> 
>>>> Could utilities in Kudu that help folk test Java clients be of interest
>>> to
>>>> others? - e.g. preconfigured mock objects for various scenarios. If so,
>>> I'd
>>>> be happy to discuss options and offer PRs in Kudu.
>>>> 
>>>> Thanks,
>>>> Tim
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Jun 29, 2018 at 9:34 PM, Todd Lipcon <t...@cloudera.com.invalid
>>> 
>>>> wrote:
>>>> 
>>>>> On Fri, Jun 29, 2018 at 12:31 PM, Mike Percy <mpe...@apache.org>
>>> wrote:
>>>>> 
>>>>>> This is something I've been thinking about and toying with and I'd
>>> like
>>>>> to
>>>>>> see if we can't get binaries available via Maven for at least one
>>>>> platform
>>>>>> (say, RHEL 7). Similar to how protobuf does it.
>>>>>> 
>>>>> 
>>>>> What about depending on a docker container than runs the kudu
>>> minicluster
>>>>> in "host" networking mode? eg https://github.com/
>>>> MartinWeindel/kudu-docker
>>>>> is one possibility
>>>>> 
>>>>> 
>>>>>> How many platforms would need to be supported for it to be viable
>> for
>>>>> Beam?
>>>>>> 
>>>>>> Thanks,
>>>>>> Mike
>>>>>> 
>>>>>> On Fri, Jun 29, 2018 at 10:01 AM Tim <timrobertson...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>>> Thanks Attila
>>>>>>> 
>>>>>>> That’s great feedback and helpful for me to reference as
>> guidance.
>>>>>>> 
>>>>>>> By “Kudu installation” I was referring to the possibility that an
>>>>> install
>>>>>>> might set config etc, beyond just having the binary. I got it
>>> running
>>>>> on
>>>>>>> CentOS similar to how you outline now.
>>>>>>> 
>>>>>>> I too believe mocking makes most sense, especially as we have the
>>> IT
>>>>>>> running as well, but was asked to explore this further. It’s
>> useful
>>>> to
>>>>>> know
>>>>>>> you’d agree.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> Tim
>>>>>>> 
>>>>>>>> On 29 Jun 2018, at 17:33, Attila Bukor <abu...@cloudera.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi Tim,
>>>>>>>> 
>>>>>>>> I’m not sure what you mean by relying on actual installations.
>> If
>>>> you
>>>>>>> have the kudu, kudu-master and kudu-tserver binaries at the same
>>>>> location
>>>>>>> and they can be executed, MiniKuduCluster can be used (“binDir”
>>>>> property
>>>>>>> should be set to the directory containing the Kudu binaries). You
>>>>> should
>>>>>>> also look into BaseKuduTest as that will set up the
>> MiniKuduCluster
>>>> for
>>>>>> you
>>>>>>> and you don’t have to do it manually.
>>>>>>>> 
>>>>>>>> Extracting the Kudu binaries from an rpm should probably work,
>>> but
>>>>> that
>>>>>>> binds you to CDH as currently Cloudera is the only one that ships
>>>> Kudu
>>>>>>> binaries and MacOS builds are not available anywhere afaik. Also,
>>>> 1.4.0
>>>>>> is
>>>>>>> around a year old, you might want to use this repository instead
>>>> (from
>>>>>> CDH
>>>>>>> 5.13 Kudu is part of the CDH):
>>>>>>> http://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/5/
>>>>>> RPMS/x86_64/kudu-1.7.0+cdh5.15.0+0-1.cdh5.15.0.p0.52.el7.x86_64.rpm
>>>>>>>> 
>>>>>>>> As a general suggestion, I would recommend mocking Kudu for
>> unit
>>>>> tests
>>>>>>> (that’s what a unit test is for after all) and create separate
>>>>>> integration
>>>>>>> tests that actually use Kudu that can be skipped where Kudu is
>> not
>>>>>>> available. Of course the CI should be set up to be able to
>> provide
>>>> all
>>>>>>> necessary integrations for the tests, but a developer wouldn’t
>> have
>>>> to
>>>>>> set
>>>>>>> up Kudu, or use Docker to run the tests if their change doesn’t
>>>> affect
>>>>>> the
>>>>>>> Kudu integration.
>>>>>>>> 
>>>>>>>> Attila
>>>>>>>> 
>>>>>>>>> On 2018. Jun 29., at 16:42, Tim Robertson <
>>>>> timrobertson...@gmail.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi folks,
>>>>>>>>> 
>>>>>>>>> I've written Java KuduIO for Apache Beam with integration
>> tests
>>>>> making
>>>>>>> use
>>>>>>>>> of Kudu in Docker.  It is yet to be committed on Apache Beam.
>>>>>>>>> 
>>>>>>>>> Rather than mocking Kudu client for unit tests I'd like to
>>> explore
>>>>> use
>>>>>>> of
>>>>>>>>> the MiniKuduCluster which "Depends on precompiled kudu,
>>>> kudu-master,
>>>>>> and
>>>>>>>>> kudu-tserver binaries".
>>>>>>>>> 
>>>>>>>>> I'd need unit tests to run on the main linux distros and OS X.
>>>>>>>>> 
>>>>>>>>> For the linux distros, would an approach where I extract the
>>>>> binaries
>>>>>>> from
>>>>>>>>> the packages [1] work please? Or does the MiniKuduCluster rely
>>> on
>>>>>> actual
>>>>>>>>> installations? I am pretty weak on C builds and linked
>> libraries
>>>> etc
>>>>>>> (Java
>>>>>>>>> guy, sorry).
>>>>>>>>> 
>>>>>>>>> For CentOS I'm exploring this for example:
>>>>>>>>> rpm2cpio ./kudu-1.4.0+cdh5.12.2+0-1.
>>>> cdh5.12.2.p0.8.el7.x86_64.rpm
>>>>> |
>>>>>>> cpio
>>>>>>>>> -idmv
>>>>>>>>> 
>>>>>>>>> I haven't explored OS X options yet.
>>>>>>>>> 
>>>>>>>>> Any advice here would greatly be appreciated to save me going
>>>> down a
>>>>>>> dead
>>>>>>>>> end.
>>>>>>>>> 
>>>>>>>>> Many thanks,
>>>>>>>>> Tim
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> [1] http://kudu.apache.org/docs/installation.html#install_
>>>> packages
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>> 
>> 

Reply via email to