On Tue, Oct 8, 2019 at 6:52 AM Michał Walenia <[email protected]> wrote:
> Hi all, > I'm working on resizing IO integration tests in Beam and I'd like to ask > for the community's opinion. > > Right now each IO integration test has a set of four predetermined sizes > (1000, 100k, 1M and 100M elements). > For every size there is a pre calculated hash for read correctness > checking. > As it is now, measuring throughput in a IOIT is very costly - accessing > memory for each PCollection element increases the runtime of the test > manyfold, which changes the runtime measurements. > > My proposed improvements change the test sizes, add dataset size reporting > to metrics (throughput will be possible to calculate at dashboard level) > and change the way test parameters are passed. > The changes are in a PR here <https://github.com/apache/beam/pull/9638>. > Tests were resized to about 1GB each. > Test configurations would be set by one string parameter in pipeline > options (eg. "testConfigName=XML_1GB" instead of > "numberOfRecords=1000000"). > > What in general do you think about this approach? Do you think that 1GB > test datasets are reasonable? > Thanks, > Thanks Michal. I think these tests fulfil two purposes currently. (1) As end-to-end integration tests that confirm that connectors work with a given runner. (2) As Large scale performance tests for tracking performance and triggering alerts. It might be good to separate out these two cases and run two integration tests for each connector. For example, (1) Version with a small input (say 1KB - 1MB) that we run often, potentially with every run of post-commit test suite. (2) A version with a large input (say 10-100 GB, depending on the connector) that is used for performance tracking and triggering alerts. This version should be run less frequently (for example, once a day). WDYT ? Thanks, Cham > > Michal > > -- > > Michał Walenia > Polidea <https://www.polidea.com/> | Software Engineer > > M: +48 791 432 002 <+48791432002> > E: [email protected] > > Unique Tech > Check out our projects! <https://www.polidea.com/our-work> >
