On Tue, Nov 27, 2018, at 10:53 AM, Whaley, Graham wrote:
> (back to an old thread... this has rippled near the top of my pile again)
> 
> > -----Original Message-----
> > From: Clark Boylan [mailto:[email protected]]
> > Sent: Tuesday, October 23, 2018 6:03 PM
> > To: Whaley, Graham <[email protected]>; openstack-
> > [email protected]; [email protected]
> > Cc: Ernst, Eric <[email protected]>; [email protected]
> > Subject: Re: Adding index and views/dashboards for Kata to ELK stack
> [snip]
> > > I don't think the Zuul Ansible role will be applicable - the metrics run
> > > on bare metal machines running Jenkins, and export their JSON results
> > > via a filebeat socket. My theory was we'd then add the socket input to
> > > the logstash server to receive from that filebeat - as in my gist at
> > >
> > https://gist.github.com/grahamwhaley/aa730e6bbd6a8ceab82129042b186467
> > 
> > I don't think we would want to expose write access to the unauthenticated
> > logstash and elasticsearch system to external systems. The thing that makes 
> > this
> > secure today is we (community infrastructure team) control the existing 
> > writers.
> > The existing writers are available for your use (see below) should you 
> > decide to
> > use them.
> 
> My theory was we'd secure the connection at least using the logstash/
> beat SSL connection, and only we/the infra group would have access to 
> the keys:
> https://www.elastic.co/guide/en/beats/filebeat/current/configuring-ssl-logstash.html
> 
> The machines themselves are only accessible by the CNCF CIL owners and 
> nominated Kata engineers with the keys.
> > 
> > >
> > > One crux here is that the metrics have to run on a machine with
> > > guaranteed performance (so not a shared/virtual cloud instance), and
> > > hence currently run under Jenkins and not on the OSF/Zuul CI infra.
> > 
> > Zuul (by way of Nodepool) can speak to arbitrary machines as long as they 
> > speak
> > an ansible connection protocol. In this case the default of ssh would 
> > probably
> > work when tied to nodepool's static instance driver. The community
> > infrastructure happens to only talk to cloud VMs today because that is what 
> > we
> > have been given access to, but should be able to talk to other resources if
> > people show up with them.
> 
> If we ignore the fact that all current Kata CI is running on Jenkins, 
> and we are not presently transitioning to Zuul afaik, then....
> Even if we did integrate the bare metal CNCF CIL packet.net machines vi 
> ansible/SSH/nodepool/Zuul, then afaict you'd still be running the same 
> CI tasks on the same machines and injecting the Elastic data through the 
> same SSL socket/tunnel into Elastic.

No, we would inject the data through the existing test node -> Zuul -> Logstash 
-> Elasticsearch path.

> I know you'd like to keep as much of the infra under your control, but 
> the only bit I think that would be different is the Jenkins Master. 
> Given the Jenkins job running the slave only executes master branch 
> merges, which have undergone peer review (which would be the same jobs 
> that Zuul would run), then I'm not sure there is any security difference 
> here in reality between having the Kata Jenkins master or Zuul drive the 
> slaves.

There is more to it than that. This service is part of the CI system we 
operate. The way you consume it is through the use of Zuul jobs. If you want to 
inject data into our Logstash/Elasticsearch system you do that by configuring 
your jobs in Zuul to do so. We are not in the business of operating one off 
solutions to problems. We support a large variety of users and projects and 
using generic flexible systems like this one is how we make that viable.

Additionally these systems are community managed so that we can work together 
to solve these problems in a way that gives the infra team appropriate 
administrative access while still allowing you and others to get specific work 
done. Rather than avoid this tooling can we please attempt to use it when it 
has preexisting solutions to problems like this? We will happily do our best to 
make re-consumption of existing systems a success, but building one off 
solutions to solve problems that are already solved does not scale.

> 
> > 
> > >
> > > Let me know you see any issues with that Jenkins/filebeat/socket/JSON 
> > > flow.
> > >
> > > I need to deploy a new machine to process master branch merges to
> > > generate the data (currently we have a machine that is processing PRs at
> > > submission, not merge, which is not the data we want to track long
> > > term). I'll let you know when I have that up and running. If we wanted
> > > to move on this earlier, then I could inject data to a test index from
> > > my local test setup - all it would need I believe is the valid keys for
> > > the filebeat->logstash connection.
> 
> Oh, I've deployed a Jenkins slave and job to test out the first stage of 
> the flow btw:
> http://jenkins.katacontainers.io/job/kata-metrics-runtime-ubuntu-16-04-master/
> 
> > >
> > > > Clark
> > > Thanks!
> > >   Graham (now on copy ;-)
> > 
> > Ideally we'd make use of the existing community infrastructure as much as
> > possible to make this sustainable and secure. We are happy to modify our
> > existing tooling as necessary to do this. Update the logstash 
> > configuration, add
> > Nodepool resources, have grafana talk to elasticsearch, and so on.
> 
> I think the only key decision is if we can use the packet.net slaves as 
> driven by the kata Jenkins master, or if we have to move the management 
> of those into Zuul.
> For expediency and consistency with the rest of the Kata CI, obviously I 
> lean heavily towards Jenkins.
> If we do have to go with Zuul, then I think we'll have to work out who 
> has access to and how they can modify the Zuul job configs for Kata.

I wasn't directly involved with the decision making at the time but back at the 
beginning of the year my understanding was that Jenkins was chosen over Zuul 
for expediency. This wasn't a bad choice as the Github support in Zuul was 
still quite new (though having more users would likely have pushed it along 
more quickly). It probably would be worthwhile to decide separately if Jenkins 
is the permanent solution to the Kata CI tooling problem, or if we should 
continue to push for Zuul. If we want to push for Zuul then I think we need to 
stop choosing Jenkins as a default and start implementing new stuff in Zuul 
then move the existing CI as Kata is able.

As for who has Zuul access, the Infra team has administrative access to the 
service. Zuul configuration for the existing Kata jobs is done through a repo 
managed by the infra team, but anyone can push and propose changes to this 
repo. The reason for this is Zuul wants to gate its config updates to prevent 
new configs from being merged without being tested. Bypassing this testing does 
allow you to break your Zuul configuration. Currently we aren't gating Kata 
with Zuul so the configs live in the Infra repo. If we started gating Kata 
changes with Zuul we could move the configs into Kata repos and Kata could self 
manage them.

Looking ahead Zuul is multitenant aware, and we could deploy a Kata tenant. 
This would give Kata a bit more freedom to configure its Zuul pipeline behavior 
as desired, though gating is still strongly recommended as that will prevent 
broken configs from merging.

> 
> (adding Salvador to CC, as he is the Kata Jenkins owner mostly, and has 
> also worked on the Zuul PoC for Kata before).
> 
>  Graham (hoping we can come to some agreement :-) )

_______________________________________________
OpenStack-Infra mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Reply via email to