[ https://issues.apache.org/jira/browse/FLINK-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-11463: ----------------------------------- Labels: auto-deprioritized-major auto-deprioritized-minor auto-unassigned pull-request-available (was: auto-deprioritized-major auto-unassigned pull-request-available stale-minor) Priority: Not a Priority (was: Minor) This issue was labeled "stale-minor" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Minor, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Rework end-to-end tests in Java > ------------------------------- > > Key: FLINK-11463 > URL: https://issues.apache.org/jira/browse/FLINK-11463 > Project: Flink > Issue Type: New Feature > Components: Test Infrastructure > Reporter: Chesnay Schepler > Priority: Not a Priority > Labels: auto-deprioritized-major, auto-deprioritized-minor, > auto-unassigned, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > This is the (long-term) umbrella issue for reworking our end-to-tests in Java > on top of a new set of utilities. > Below are some areas where problems have been identified that I want to > address with a prototype soon. This prototype primarily aims to introduce > certain patterns to be built upon in the future. > h2. Environments > h4. Problem > Our current tests directly work against flink-dist and setup local clusters > with/-out HA. Similar issues apply to Kafka and ElasticSearch. > This prevents us from re-using tests for other environments (Yarn, Docker) > and distributed settings. > We also frequently have issues with cleaning up resources as it is the > responsibility of the test itself. > h4. Proposal > Introduce a common interface for a given resource type (i.e. Flink, Kafka) > that tests will work against. > These resources should be implemented as jUnit external resources to allow > reasonable life-cycle management. > Tests get access to an instance of this resource through a factory method. > Each resource implementation has a dedicated factory that is loaded with a > {{ServiceLoader}}. Factories evaluate system-properties to determine whether > the implementation should be loaded, and then optionally configure the > resource. > Example: > {code} > public interface FlinkResource { > ... common methods ... > /** > * Returns the configured FlinkResource implementation, or a {@link > LocalStandaloneFlinkResource} if none is configured. > * > * @return configured FlinkResource, or {@link > LocalStandaloneFlinkResource} is none is configured > */ > FlinkResource get() { > // load factories > // evaluate system properties > // return instance > } > } > public interface FlinkResourceFactory { > /** > * Returns a {@link FlinkResource} instance. If the instance could not > be instantiated (for example, because a > * mandatory parameter was missing), then an empty {@link Optional} > should be returned. > * > * @return FlinkResource instance, or an empty Optional if the instance > could not be instantiated > */ > Optional<FlinkResource> create(); > } > {code} > As example, running {{mvn verify -De2e.flink.mode=localStandalone}} could > load a FlinkResource that sets up a local standalone cluster, while for {{mvn > verify -De2e.flink.mode=distributedStandalone -De2e.flink.hosts=...}} it > would connect to the given host and setup a distributed cluster. > Tests are not _required_ to work against the common interface, and may be > hard-wired to run against specific implementations. Simply put, the resource > implementations should be public. > h4. Future considerations > The factory method may be extended to allow tests to specify a set of > conditions that must be fulfilled, for example HA to be enabled. If this > requirement cannot be fulfilled the test should be skipped. > h2. Split Management > h4. Problem > End-to-end tests are run in separate {{cron-<version>-e2e}} branches. To > accommodate the Travis time limits we run a total of 6 jobs each covering a > subset of the tests. > These so-called splits are currently managed in the respective branches, and > not on master/release branches. > This is a rather hidden detail that not everyone is aware of, nor is it > easily discoverable. This has resulted several times in newly added tests not > actually being run. Furthermore, if the arguments for tests are modified > these changes have to be replicated to each branch. > h4. Proposal > Use jUnit Categories to assign each test explicitly to one of the Travis jobs. > {code} > @Category(TravisGroup1.class) > public class MyTestRunningInTheFirstJob { > ... > } > {code} > It's a bit on the nose but a rather simple solution. > A given group of tests could be executed by running {{mvn verify > -Dcategories="org.apache.flink.tests.util.TravisGroup1"}}. > All tests can be executed by running {{mvn verify > -Dcategories=""org.apache.flink.tests.util.TravisGroup1""}} > h4. Future considerations > Tests may furthermore be categorized based on what they are testing (e.g. > "Metrics", "Checkpointing", "Kafka") to allow running a certain subset of > tests quickly. > h2. Caching of downloaded artifacts > h4. Problem > Several tests download archives for setting up systems, like Kafka of > Elasticsearch. We currently do not cache downloads in any way, resulting in > less stable tests (as mirrors aren't always available) and overall increased > test duration (since the downloads at times are quite slow). The duration > issue becomes especially apparent when running tests in a loop for debugging > or release-testing purposes. > Finally, it also puts unnecessary strain on the download mirrors. > h4. Proposal > Add a {{DownloadCache}} interface with a single {{Path getOrDownload(String > url, Path targetDir)}} method. > Access to and loading of implementations are handled like resources (see > above). > The caching behavior is implementation-dependent. > A reasonable implementation should allow files may be cached in a > user-provided directory, with an optional time-to-live for long-term setups. -- This message was sent by Atlassian Jira (v8.20.10#820010)