stevedlawrence opened a new pull request, #865:
URL: https://github.com/apache/daffodil/pull/865
> Note: I got really tired of the really long CI times, so decided to
attempt to try speed up our integration tests (which are the biggest offenders)
and tackle DAFFODIL-2381 at the same time. As a proof on concept, I've only
updated the two largest integration test suites: `TestCLIparsing` and
`TestCLIdebugger`. With these changes, these suites went from 250 seconds and
165 seconds respectively, to 17 seconds and 6 seconds. So from almost 7 minutes
to less than 30 seconds. I think this would be a big win if we updated the rest
of the integration tests, but I wanted to get some thoughts on this new API and
if the are any thoughts or alternative approaches before doing the rest of
them, hence why this is just an RFC.
The current integration tests have too much boilerplate that can be
refactored with some new APIs. Additionally, the integration tests are
incredibly slow due to all the forking and creation of new VMs. This fixes both
of these issues.
The new integration test API creates a new `runCLI` function. The first
parameter is an array of CLI string arguments (with a new "args" string
interpolator to make specifying this array easier). The second argument is a
function that contains the normal `expect` logic all our of integration tests
already use. The third argument is the expected exit code.
A new `path` function is created to help work with paths in a OS agnostic
manner.
A new `withTempFile` function is added to help with creating temporary files
in a special temporary daffodil directory, and to delete the files when the
test is complete.
With these new functions, integration tests now look something like this:
```scala
val schema = path("path/to/schema.dfdl.xsd")
runCLI(args"parse -s $schema") { cli =>
cli.sendInput("data to parse", inputDone = true)
cli.expect("<data>string to expect</data")
} (ExitCode.Success)
```
The runCLI function creates new streams for capturing stdin/out/err. A new
thread is run that calls the org.apache.daffodil.Main run function, and
configures Main and log4j to output to these streams. A new expect instance is
also configured so that it can read/write to these streams and validate the
output, with a new CLITest class used to provide helper wrappers to make expect
logic a bit less verbose.
Multiple changes were also made to Main and the Debugger to ensure they
always use the provided streams. This is really only needed for this
integration testing, but could potentially be useful in other use cases where
stdin/out/err need need to be mimicked without forking a new process.
Note that this approach isn't thread safe, but integration tests are already
not run in parallel. By using threads instead of spawning new processes, we
drastically decrease the overhead, with test suites now taking a number of
seconds instead of a number of minutes, averaging an order of magnitude speed
up.
This also ensures coverage is enabled in the github workflow every time we
execute sbt. Otherwise, changes of coverage enablement causes a recompile. By
doing this we avoid recompiling the same files multiple times which saves a
couple of minutes overall. Note that we originally disabled coverage on some
steps because it caused failures, but whatever caused that seems to have been
fixed.
DAFFODIL-2381
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]