Thank you very much Mike for taking the time to explore that. Tim
> On 2 Jul 2018, at 21:24, Mike Percy <mpe...@apache.org> wrote: > > I explored the "download binaries from Maven" approach for a while on > Friday. Here is what I found: > > 1) There is a Maven plugin that should be able to help us find matching > system binaries @ https://github.com/trustin/os-maven-plugin > > The protobuf-maven-plugin uses this approach to download and run the > appropriate protoc binary for your architecture according to > https://www.xolstice.org/protobuf-maven-plugin/examples/protoc-artifact.html > > 2) Stripped binaries from release builds look small enough to be viable to > download to run integration tests via Maven in precommit builds, at least > in non-bandwidth-constrained environments: > > $ strip kudu-master > $ strip kudu-tserver > $ ls -alh > total 85M > drwxrwxr-x 2 mpercy mpercy 45 Jul 2 12:05 . > drwxrwxr-x 3 mpercy mpercy 98 Jun 29 14:56 .. > -rwxrwxr-x 1 mpercy mpercy 45M Jul 2 12:05 kudu-master > -rwxrwxr-x 1 mpercy mpercy 41M Jul 2 12:05 kudu-tserver > > 3) Kudu binaries contain many system dependencies related to security as > well as the c++ stdlib: > > $ ldd kudu-tserver > linux-vdso.so.1 => (0x00007ffe0c290000) > libz.so.1 => /lib64/libz.so.1 (0x00007fde730d5000) > libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fde72eab000) > libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde72c8e000) > libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007fde729a7000) > libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fde725bd000) > libssl.so.10 => /lib64/libssl.so.10 (0x00007fde7234e000) > libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007fde72131000) > libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 > (0x00007fde71ee3000) > libdl.so.2 => /lib64/libdl.so.2 (0x00007fde71cde000) > librt.so.1 => /lib64/librt.so.1 (0x00007fde71ad6000) > libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fde717cd000) > libm.so.6 => /lib64/libm.so.6 (0x00007fde714ca000) > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fde712b4000) > libc.so.6 => /lib64/libc.so.6 (0x00007fde70ef3000) > /lib64/ld-linux-x86-64.so.2 (0x00007fde732fe000) > libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007fde70cc0000) > libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007fde70abc000) > libkrb5support.so.0 => /lib64/libkrb5support.so.0 > (0x00007fde708ad000) > libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fde706a8000) > libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fde7048e000) > libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fde70257000) > libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fde7002f000) > libfreebl3.so => /lib64/libfreebl3.so (0x00007fde6fe2c000) > libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fde6fbca000) > > So it's not viable to simply have a linux-x86_64 binary and a darwin-x86_64 > binary like protoc does, or even just ubuntu & redhat. We'll likely need a > separate binary for every major OS version, i.e. RHEL 6, RHEL 7, trusty, > xenial, bionic. I think people running non-LTS builds of Ubuntu, or SUSE or > something, would be out of luck. > > One potential option would be to offer a completely static build that is > for testing only and with no intent to ever fix security vulnerabilities. I > would have two concerns about that, though: 1) someone could take those > binaries and run them for non-testing purposes, and 2) I'm not sure how > easy it would be to generate a fully static build, since I don't think the > distributions provide static libs for security components in order to > discourage people from doing this. > > Mike > > > On Sat, Jun 30, 2018 at 4:31 AM Tim Robertson <timrobertson...@gmail.com> > wrote: > >>> What do you mean by that? >> Sorry, poor phrasing - currently the Beam project has the build path with >> unit tests (no Docker there) and the project IT environment which can use >> Docker. >> A binary only approach could potentially be managed without adding a >> dependency on Docker - but has other issues summarised below. >> >>> For Kudu-internal testing I think we could stick to running "kudu >> minicluster >> Yes. >> >>> ... external use cases, we could switch that to "docker run >> kudu:minicluster:1.7.0" >> I think this makes good sense. >> >> >> In summary: >> >> 1) Fake a Kudu master in Java - difficult unless simplified, not >> representative if simplified, code maintenance issue >> 2) Mocking the Kudu client - verbose unless only covering simple scenarios >> 3) Use mini cluster with binaries - portability challenge of binaries, need >> to script caching the binaries / use of some repository, unfamiliar build >> tasks with binary handling (unless built to work with something like >> maven), possible could see linking problems >> 4) Docker - predictable, adds a dependency, existing Kudu images not >> "managed" at the moment >> >> For Beam I think I will put most effort into IT which can use Docker or an >> existing cluster and then mock a Java KuduClient for some basic sanity >> tests for the build path. >> >> On Docker: >> - to get current versions [e.g. 1] working I found I had to edit >> /etc/hosts. I think the mini cluster version with the FakeDNS might avoid >> that? >> - Kudu docs currently encourage the Cloudera Quickstart VM over Docker >> [2,3] >> >> Do you think the Kudu project could provide an image allowing "docker run >> kudu:minicluster:1.x.x" as part of the release cycle? >> >> Thanks again, >> Tim >> >> [1] https://github.com/MartinWeindel/kudu-docker >> [2] https://kudu.apache.org/docs/quickstart.html#quickstart_vm >> [3] https://github.com/cloudera/kudu-examples/wiki/Docker-based-tutorial >> >> On Sat, Jun 30, 2018 at 2:22 AM, Todd Lipcon <t...@cloudera.com.invalid> >> wrote: >> >>> On Fri, Jun 29, 2018 at 1:23 PM, Tim Robertson < >> timrobertson...@gmail.com> >>> wrote: >>> >>>> Thanks Mike, Todd - I greatly appreciate the inputs. >>>> >>>>> How many platforms would need to be supported for it to be viable for >>>> Beam? >>>> The minimal for it to be considered would probably(!) be ubuntu, >> centos, >>>> osx. Incidentally it was actually the protobuf approach that make me >>>> consider this. >>>> >>>>> What about depending on a docker container than runs the kudu >>>> minicluster in >>>> "host" networking mode? >>>> I've also pondered this a little but like Attila raises it puts a lot >> of >>>> burden for other project developers. Mmmm... >>>> >>> >>> What do you mean by that? For Kudu-internal testing I think we could >> stick >>> to running "kudu minicluster" as is. For external use cases, we could >>> switch that to "docker run kudu:minicluster:1.7.0" or whatever, and it >>> would auto-download from dockerhub as necessary, right? >>> >>> >>>> >>>> Ismaël (Beam PMC) has suggested I stick to mocking given the complexity >>> of >>>> the things I'm exploring. >>>> >>>> As another idea: >>>> I briefly pondered writing a "FakeKudu Java server" - data held in >>> memory, >>>> no partitioning, protobuf messaging, handling table metadata, checking >>>> schemas on write, predicate and projected columns for scan, faking >>> kerberos >>>> (if possible). It didn't seem particularly difficult to do but I fear a >>>> maintenance burden for a small audience. >>>> >>>> >>> Yea, I think that would be quite a maintenance burden, especially as new >>> features are added over time. I suppose in many cases you could omit >> things >>> or stub things out, but then the behavior will begin to differ and it >> won't >>> really be that clear that your tests actually are representative. >>> >>> >>>> Could utilities in Kudu that help folk test Java clients be of interest >>> to >>>> others? - e.g. preconfigured mock objects for various scenarios. If so, >>> I'd >>>> be happy to discuss options and offer PRs in Kudu. >>>> >>>> Thanks, >>>> Tim >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Jun 29, 2018 at 9:34 PM, Todd Lipcon <t...@cloudera.com.invalid >>> >>>> wrote: >>>> >>>>> On Fri, Jun 29, 2018 at 12:31 PM, Mike Percy <mpe...@apache.org> >>> wrote: >>>>> >>>>>> This is something I've been thinking about and toying with and I'd >>> like >>>>> to >>>>>> see if we can't get binaries available via Maven for at least one >>>>> platform >>>>>> (say, RHEL 7). Similar to how protobuf does it. >>>>>> >>>>> >>>>> What about depending on a docker container than runs the kudu >>> minicluster >>>>> in "host" networking mode? eg https://github.com/ >>>> MartinWeindel/kudu-docker >>>>> is one possibility >>>>> >>>>> >>>>>> How many platforms would need to be supported for it to be viable >> for >>>>> Beam? >>>>>> >>>>>> Thanks, >>>>>> Mike >>>>>> >>>>>> On Fri, Jun 29, 2018 at 10:01 AM Tim <timrobertson...@gmail.com> >>>> wrote: >>>>>> >>>>>>> Thanks Attila >>>>>>> >>>>>>> That’s great feedback and helpful for me to reference as >> guidance. >>>>>>> >>>>>>> By “Kudu installation” I was referring to the possibility that an >>>>> install >>>>>>> might set config etc, beyond just having the binary. I got it >>> running >>>>> on >>>>>>> CentOS similar to how you outline now. >>>>>>> >>>>>>> I too believe mocking makes most sense, especially as we have the >>> IT >>>>>>> running as well, but was asked to explore this further. It’s >> useful >>>> to >>>>>> know >>>>>>> you’d agree. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Tim >>>>>>> >>>>>>>> On 29 Jun 2018, at 17:33, Attila Bukor <abu...@cloudera.com> >>>> wrote: >>>>>>>> >>>>>>>> Hi Tim, >>>>>>>> >>>>>>>> I’m not sure what you mean by relying on actual installations. >> If >>>> you >>>>>>> have the kudu, kudu-master and kudu-tserver binaries at the same >>>>> location >>>>>>> and they can be executed, MiniKuduCluster can be used (“binDir” >>>>> property >>>>>>> should be set to the directory containing the Kudu binaries). You >>>>> should >>>>>>> also look into BaseKuduTest as that will set up the >> MiniKuduCluster >>>> for >>>>>> you >>>>>>> and you don’t have to do it manually. >>>>>>>> >>>>>>>> Extracting the Kudu binaries from an rpm should probably work, >>> but >>>>> that >>>>>>> binds you to CDH as currently Cloudera is the only one that ships >>>> Kudu >>>>>>> binaries and MacOS builds are not available anywhere afaik. Also, >>>> 1.4.0 >>>>>> is >>>>>>> around a year old, you might want to use this repository instead >>>> (from >>>>>> CDH >>>>>>> 5.13 Kudu is part of the CDH): >>>>>>> http://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/5/ >>>>>> RPMS/x86_64/kudu-1.7.0+cdh5.15.0+0-1.cdh5.15.0.p0.52.el7.x86_64.rpm >>>>>>>> >>>>>>>> As a general suggestion, I would recommend mocking Kudu for >> unit >>>>> tests >>>>>>> (that’s what a unit test is for after all) and create separate >>>>>> integration >>>>>>> tests that actually use Kudu that can be skipped where Kudu is >> not >>>>>>> available. Of course the CI should be set up to be able to >> provide >>>> all >>>>>>> necessary integrations for the tests, but a developer wouldn’t >> have >>>> to >>>>>> set >>>>>>> up Kudu, or use Docker to run the tests if their change doesn’t >>>> affect >>>>>> the >>>>>>> Kudu integration. >>>>>>>> >>>>>>>> Attila >>>>>>>> >>>>>>>>> On 2018. Jun 29., at 16:42, Tim Robertson < >>>>> timrobertson...@gmail.com> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi folks, >>>>>>>>> >>>>>>>>> I've written Java KuduIO for Apache Beam with integration >> tests >>>>> making >>>>>>> use >>>>>>>>> of Kudu in Docker. It is yet to be committed on Apache Beam. >>>>>>>>> >>>>>>>>> Rather than mocking Kudu client for unit tests I'd like to >>> explore >>>>> use >>>>>>> of >>>>>>>>> the MiniKuduCluster which "Depends on precompiled kudu, >>>> kudu-master, >>>>>> and >>>>>>>>> kudu-tserver binaries". >>>>>>>>> >>>>>>>>> I'd need unit tests to run on the main linux distros and OS X. >>>>>>>>> >>>>>>>>> For the linux distros, would an approach where I extract the >>>>> binaries >>>>>>> from >>>>>>>>> the packages [1] work please? Or does the MiniKuduCluster rely >>> on >>>>>> actual >>>>>>>>> installations? I am pretty weak on C builds and linked >> libraries >>>> etc >>>>>>> (Java >>>>>>>>> guy, sorry). >>>>>>>>> >>>>>>>>> For CentOS I'm exploring this for example: >>>>>>>>> rpm2cpio ./kudu-1.4.0+cdh5.12.2+0-1. >>>> cdh5.12.2.p0.8.el7.x86_64.rpm >>>>> | >>>>>>> cpio >>>>>>>>> -idmv >>>>>>>>> >>>>>>>>> I haven't explored OS X options yet. >>>>>>>>> >>>>>>>>> Any advice here would greatly be appreciated to save me going >>>> down a >>>>>>> dead >>>>>>>>> end. >>>>>>>>> >>>>>>>>> Many thanks, >>>>>>>>> Tim >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] http://kudu.apache.org/docs/installation.html#install_ >>>> packages >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Todd Lipcon >>>>> Software Engineer, Cloudera >>>>> >>>> >>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >>