Re: [Avocado-devel] RFC: Multi-host tests
Lukáš, This RFC has already had a lot of strong points raised, and it's now a bit hard to follow the proposals and general direction. I believe it's time for a v2. What do you think? Thanks, - Cleber. On 03/30/2016 11:54 AM, Lukáš Doktor wrote: Dne 30.3.2016 v 16:52 Lukáš Doktor napsal(a): Dne 30.3.2016 v 15:52 Cleber Rosa napsal(a): On 03/30/2016 09:31 AM, Lukáš Doktor wrote: Dne 29.3.2016 v 20:25 Cleber Rosa napsal(a): On 03/29/2016 04:11 AM, Lukáš Doktor wrote: Dne 28.3.2016 v 21:49 Cleber Rosa napsal(a): - Original Message - From: "Cleber Rosa"To: "Lukáš Doktor" Cc: "Amador Pahim" , "avocado-devel" , "Ademar Reis" Sent: Monday, March 28, 2016 4:44:15 PM Subject: Re: [Avocado-devel] RFC: Multi-host tests - Original Message - From: "Lukáš Doktor" To: "Ademar Reis" , "Cleber Rosa" , "Amador Pahim" , "Lucas Meneghel Rodrigues" , "avocado-devel" Sent: Saturday, March 26, 2016 4:01:15 PM Subject: RFC: Multi-host tests Hello guys, Let's open a discussion regarding the multi-host tests for avocado. The problem === A user wants to run netperf on 2 machines. To do it manually he does: machine1: netserver -D machine1: # Wait till netserver is initialized machine2: netperf -H $machine1 -l 60 machine2: # Wait till it finishes and report store the results machine1: # stop the netserver and report possible failures Now how to support this in avocado, ideally as custom tests, ideally even with broken connections/reboots? Super tests === We don't need to do anything and leave everything on the user. He is free to write code like: ... machine1 = aexpect.ShellSession("ssh $machine1") machine2 = aexpect.ShellSession("ssh $machine2") machine1.sendline("netserver -D") # wait till the netserver starts machine1.read_until_any_line_matches(["Starting netserver"], 60) output = machine2.cmd_output("netperf -H $machine1 -l $duration") # interrupt the netserver machine1.sendline("\03") # verify netserver finished machine1.cmd("true") ... the problem is it requires active connection and the user needs to manually handle the results. And of course the biggest problem here is that it doesn't solve the Avocado problem: providing a framework and tools for tests that span multiple (Avocado) execution threads, possibly on multiple hosts. Well it does, each "ShellSession" is a new parallel process. The only problem I have with this design is that it does not allow easy code reuse and the results strictly depend on the test writer. Yes, *aexpect* allows parallel execution in an asynchronous fashion. Not targeted to tests *at all*. Avocado, as a test framework, should deliver more. Repeating the previous wording, it should be "providing a framework and tools for tests that span multiple (Avocado) execution threads, possibly on multiple hosts." That was actually my point. You can implement multi-host-tests that way, but you can't share the tests (only include some shared pieces from libraries). Right, then not related to Avocado, just an example of how a test writer could do it (painfully) today. Triggered simple tests == Alternatively we can say each machine/worker is nothing but yet another test, which occasionally needs a synchronization or data-exchange. The same example would look like this: machine1.py: process.run("netserver") barrier("server-started", 2) barrier("test-finished", 2) process.run("killall netserver") machine2.py: barrier("server-started", 2) self.log.debug(process.run("netperf -H %s -l 60" % params.get("server_ip")) barrier("test-finished", 2) where "barrier(name, no_clients)" is a framework function which makes the process wait till the specified number of processes are waiting for the same barrier. The barrier mechanism looks like an appropriate and useful utility for the example given. Even though your use case example explicitly requires it, it's worth pointing out and keeping in mind that there may be valid use cases which won't require any kind of synchronization. This may even be true to the executions of tests that spawn multiple *local* "Avocado runs". Absolutely, this would actually allow Julio to run his "Parallel (clustered) testing". So, let's try to identify what we're really looking for. For both the use case I mentioned and Julio's "Parallel (clustered) testing", we need a (the same) test run by multiple *runners*. A runner in this context is something that implements the `TestRunner` interface, such as the `RemoteTestRunner`:
Re: [Avocado-devel] RFC: Multi-host tests
Dne 30.3.2016 v 15:52 Cleber Rosa napsal(a): On 03/30/2016 09:31 AM, Lukáš Doktor wrote: Dne 29.3.2016 v 20:25 Cleber Rosa napsal(a): On 03/29/2016 04:11 AM, Lukáš Doktor wrote: Dne 28.3.2016 v 21:49 Cleber Rosa napsal(a): - Original Message - From: "Cleber Rosa"To: "Lukáš Doktor" Cc: "Amador Pahim" , "avocado-devel" , "Ademar Reis" Sent: Monday, March 28, 2016 4:44:15 PM Subject: Re: [Avocado-devel] RFC: Multi-host tests - Original Message - From: "Lukáš Doktor" To: "Ademar Reis" , "Cleber Rosa" , "Amador Pahim" , "Lucas Meneghel Rodrigues" , "avocado-devel" Sent: Saturday, March 26, 2016 4:01:15 PM Subject: RFC: Multi-host tests Hello guys, Let's open a discussion regarding the multi-host tests for avocado. The problem === A user wants to run netperf on 2 machines. To do it manually he does: machine1: netserver -D machine1: # Wait till netserver is initialized machine2: netperf -H $machine1 -l 60 machine2: # Wait till it finishes and report store the results machine1: # stop the netserver and report possible failures Now how to support this in avocado, ideally as custom tests, ideally even with broken connections/reboots? Super tests === We don't need to do anything and leave everything on the user. He is free to write code like: ... machine1 = aexpect.ShellSession("ssh $machine1") machine2 = aexpect.ShellSession("ssh $machine2") machine1.sendline("netserver -D") # wait till the netserver starts machine1.read_until_any_line_matches(["Starting netserver"], 60) output = machine2.cmd_output("netperf -H $machine1 -l $duration") # interrupt the netserver machine1.sendline("\03") # verify netserver finished machine1.cmd("true") ... the problem is it requires active connection and the user needs to manually handle the results. And of course the biggest problem here is that it doesn't solve the Avocado problem: providing a framework and tools for tests that span multiple (Avocado) execution threads, possibly on multiple hosts. Well it does, each "ShellSession" is a new parallel process. The only problem I have with this design is that it does not allow easy code reuse and the results strictly depend on the test writer. Yes, *aexpect* allows parallel execution in an asynchronous fashion. Not targeted to tests *at all*. Avocado, as a test framework, should deliver more. Repeating the previous wording, it should be "providing a framework and tools for tests that span multiple (Avocado) execution threads, possibly on multiple hosts." That was actually my point. You can implement multi-host-tests that way, but you can't share the tests (only include some shared pieces from libraries). Right, then not related to Avocado, just an example of how a test writer could do it (painfully) today. Triggered simple tests == Alternatively we can say each machine/worker is nothing but yet another test, which occasionally needs a synchronization or data-exchange. The same example would look like this: machine1.py: process.run("netserver") barrier("server-started", 2) barrier("test-finished", 2) process.run("killall netserver") machine2.py: barrier("server-started", 2) self.log.debug(process.run("netperf -H %s -l 60" % params.get("server_ip")) barrier("test-finished", 2) where "barrier(name, no_clients)" is a framework function which makes the process wait till the specified number of processes are waiting for the same barrier. The barrier mechanism looks like an appropriate and useful utility for the example given. Even though your use case example explicitly requires it, it's worth pointing out and keeping in mind that there may be valid use cases which won't require any kind of synchronization. This may even be true to the executions of tests that spawn multiple *local* "Avocado runs". Absolutely, this would actually allow Julio to run his "Parallel (clustered) testing". So, let's try to identify what we're really looking for. For both the use case I mentioned and Julio's "Parallel (clustered) testing", we need a (the same) test run by multiple *runners*. A runner in this context is something that implements the `TestRunner` interface, such as the `RemoteTestRunner`: https://github.com/avocado-framework/avocado/blob/master/avocado/core/remote/runner.py#L37 The following (pseudo) Avocado Test could be written: from avocado import Test # These are currently private APIs that could/should be or # be exposed under another level. Also, the current API is # very different from what is used here, please take it as # pseudo