Hi, thanks for the suggestion. I think it's reasonable to include a small configuration for fast testing. I'll add such a config to the PR.
Have a good day, Michal On Wed, Oct 9, 2019 at 5:05 AM Chamikara Jayalath <[email protected]> wrote: > > > On Tue, Oct 8, 2019 at 6:52 AM Michał Walenia <[email protected]> > wrote: > >> Hi all, >> I'm working on resizing IO integration tests in Beam and I'd like to ask >> for the community's opinion. >> >> Right now each IO integration test has a set of four predetermined sizes >> (1000, 100k, 1M and 100M elements). >> For every size there is a pre calculated hash for read correctness >> checking. >> As it is now, measuring throughput in a IOIT is very costly - accessing >> memory for each PCollection element increases the runtime of the test >> manyfold, which changes the runtime measurements. >> >> My proposed improvements change the test sizes, add dataset size >> reporting to metrics (throughput will be possible to calculate at dashboard >> level) and change the way test parameters are passed. >> The changes are in a PR here <https://github.com/apache/beam/pull/9638>. >> Tests were resized to about 1GB each. >> Test configurations would be set by one string parameter in pipeline >> options (eg. "testConfigName=XML_1GB" instead of >> "numberOfRecords=1000000"). >> >> What in general do you think about this approach? Do you think that 1GB >> test datasets are reasonable? >> Thanks, >> > > Thanks Michal. I think these tests fulfil two purposes currently. > (1) As end-to-end integration tests that confirm that connectors work with > a given runner. > (2) As Large scale performance tests for tracking performance and > triggering alerts. > > It might be good to separate out these two cases and run two integration > tests for each connector. For example, > (1) Version with a small input (say 1KB - 1MB) that we run often, > potentially with every run of post-commit test suite. > (2) A version with a large input (say 10-100 GB, depending on the > connector) that is used for performance tracking and triggering alerts. > This version should be run less frequently (for example, once a day). > > WDYT ? > > Thanks, > Cham > > >> >> Michal >> >> -- >> >> Michał Walenia >> Polidea <https://www.polidea.com/> | Software Engineer >> >> M: +48 791 432 002 <+48791432002> >> E: [email protected] >> >> Unique Tech >> Check out our projects! <https://www.polidea.com/our-work> >> > -- Michał Walenia Polidea <https://www.polidea.com/> | Software Engineer M: +48 791 432 002 <+48791432002> E: [email protected] Unique Tech Check out our projects! <https://www.polidea.com/our-work>
