Re: Resizing Beam IOITs

Michał Walenia Fri, 11 Oct 2019 03:07:34 -0700

Hi,
thanks for the suggestion. I think it's reasonable to include a small
configuration for fast testing. I'll add such a config to the PR.


Have a good day,
Michal

On Wed, Oct 9, 2019 at 5:05 AM Chamikara Jayalath <[email protected]>
wrote:

>
>
> On Tue, Oct 8, 2019 at 6:52 AM Michał Walenia <[email protected]>
> wrote:
>
>> Hi all,
>> I'm working on resizing IO integration tests in Beam and I'd like to ask
>> for the community's opinion.
>>
>> Right now each IO integration test has a set of four predetermined sizes
>> (1000, 100k, 1M and 100M elements).
>> For every size there is a pre calculated hash for read correctness
>> checking.
>> As it is now, measuring throughput in a IOIT is very costly - accessing
>> memory for each PCollection element increases the runtime of the test
>> manyfold, which changes the runtime measurements.
>>
>> My proposed improvements change the test sizes, add dataset size
>> reporting to metrics (throughput will be possible to calculate at dashboard
>> level) and change the way test parameters are passed.
>> The changes are in a PR here <https://github.com/apache/beam/pull/9638>.
>> Tests were resized to about 1GB each.
>> Test configurations would be set by one string parameter in pipeline
>> options (eg. "testConfigName=XML_1GB" instead of
>> "numberOfRecords=1000000").
>>
>> What in general do you think about this approach? Do you think that 1GB
>> test datasets are reasonable?
>> Thanks,
>>
>
> Thanks Michal. I think these tests fulfil two purposes currently.
> (1) As end-to-end integration tests that confirm that connectors work with
> a given runner.
> (2) As Large scale performance tests for tracking performance and
> triggering alerts.
>
> It might be good to separate out these two cases and run two integration
> tests for each connector. For example,
> (1) Version with a small input (say 1KB - 1MB) that we run often,
> potentially with every run of post-commit test suite.
> (2) A version with a large input (say 10-100 GB, depending on the
> connector) that is used for performance tracking and triggering alerts.
> This version should be run less frequently (for example, once a day).
>
> WDYT ?
>
> Thanks,
> Cham
>
>
>>
>> Michal
>>
>> --
>>
>> Michał Walenia
>> Polidea <https://www.polidea.com/> | Software Engineer
>>
>> M: +48 791 432 002 <+48791432002>
>> E: [email protected]
>>
>> Unique Tech
>> Check out our projects! <https://www.polidea.com/our-work>
>>
>

-- 

Michał Walenia
Polidea <https://www.polidea.com/> | Software Engineer

M: +48 791 432 002 <+48791432002>
E: [email protected]

Unique Tech
Check out our projects! <https://www.polidea.com/our-work>

Re: Resizing Beam IOITs

Reply via email to