[ 
https://issues.apache.org/jira/browse/FLINK-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-11463:
-----------------------------------
    Labels: auto-deprioritized-major auto-unassigned pull-request-available 
stale-minor  (was: auto-deprioritized-major auto-unassigned 
pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Minor but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is 
still Minor, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> Rework end-to-end tests in Java
> -------------------------------
>
>                 Key: FLINK-11463
>                 URL: https://issues.apache.org/jira/browse/FLINK-11463
>             Project: Flink
>          Issue Type: New Feature
>          Components: Test Infrastructure
>            Reporter: Chesnay Schepler
>            Priority: Minor
>              Labels: auto-deprioritized-major, auto-unassigned, 
> pull-request-available, stale-minor
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is the (long-term) umbrella issue for reworking our end-to-tests in Java 
> on top of a new set of utilities.
> Below are some areas where problems have been identified that I want to 
> address with a prototype soon. This prototype primarily aims to introduce 
> certain patterns to be built upon in the future.
> h2. Environments
> h4. Problem
> Our current tests directly work against flink-dist and setup local clusters 
> with/-out HA. Similar issues apply to Kafka and ElasticSearch.
> This prevents us from re-using tests for other environments (Yarn, Docker) 
> and distributed settings.
> We also frequently have issues with cleaning up resources as it is the 
> responsibility of the test itself.
> h4. Proposal
> Introduce a common interface for a given resource type (i.e. Flink, Kafka) 
> that tests will work against.
> These resources should be implemented as jUnit external resources to allow 
> reasonable life-cycle management.
> Tests get access to an instance of this resource through a factory method.
> Each resource implementation has a dedicated factory that is loaded with a 
> {{ServiceLoader}}. Factories evaluate system-properties to determine whether 
> the implementation should be loaded, and then optionally configure the 
> resource.
> Example:
> {code}
> public interface FlinkResource {
>       ... common methods ...
> /**
>        * Returns the configured FlinkResource implementation, or a {@link 
> LocalStandaloneFlinkResource} if none is configured.
>        *
>        * @return configured FlinkResource, or {@link 
> LocalStandaloneFlinkResource} is none is configured
>        */
>       FlinkResource get() {
>               // load factories
>               // evaluate system properties
>               // return instance
>       }
> }
> public interface FlinkResourceFactory {
>       /**
>        * Returns a {@link FlinkResource} instance. If the instance could not 
> be instantiated (for example, because a
>        * mandatory parameter was missing), then an empty {@link Optional} 
> should be returned.
>        *
>        * @return FlinkResource instance, or an empty Optional if the instance 
> could not be instantiated
>        */
>       Optional<FlinkResource> create();
> }
> {code}
> As example, running {{mvn verify -De2e.flink.mode=localStandalone}} could 
> load a FlinkResource that sets up a local standalone cluster, while for {{mvn 
> verify -De2e.flink.mode=distributedStandalone -De2e.flink.hosts=...}} it 
> would connect to the given host and setup a distributed cluster.
> Tests are not _required_ to work against the common interface, and may be 
> hard-wired to run against specific implementations. Simply put, the resource 
> implementations should be public.
> h4. Future considerations
> The factory method may be extended to allow tests to specify a set of 
> conditions that must be fulfilled, for example HA to be enabled. If this 
> requirement cannot be fulfilled the test should be skipped.
> h2. Split Management
> h4. Problem
> End-to-end tests are run in separate {{cron-<version>-e2e}} branches. To 
> accommodate the Travis time limits we run a total of 6 jobs each covering a 
> subset of the tests.
> These so-called splits are currently managed in the respective branches, and 
> not on master/release branches.
> This is a rather hidden detail that not everyone is aware of, nor is it 
> easily discoverable. This has resulted several times in newly added tests not 
> actually being run. Furthermore, if the arguments for tests are modified 
> these changes have to be replicated to each branch.
> h4. Proposal
> Use jUnit Categories to assign each test explicitly to one of the Travis jobs.
> {code}
> @Category(TravisGroup1.class)
> public class MyTestRunningInTheFirstJob {
>       ...
> }
> {code}
> It's a bit on the nose but a rather simple solution.
> A given group of tests could be executed by running {{mvn verify 
> -Dcategories="org.apache.flink.tests.util.TravisGroup1"}}.
> All tests can be executed by running {{mvn verify 
> -Dcategories=""org.apache.flink.tests.util.TravisGroup1""}}
> h4. Future considerations
> Tests may furthermore be categorized based on what they are testing (e.g. 
> "Metrics", "Checkpointing", "Kafka") to allow running a certain subset of 
> tests quickly.
> h2. Caching of downloaded artifacts
> h4. Problem
> Several tests download archives for setting up systems, like Kafka of 
> Elasticsearch. We currently do not cache downloads in any way, resulting in 
> less stable tests (as mirrors aren't always available) and overall increased 
> test duration (since the downloads at times are quite slow). The duration 
> issue becomes especially apparent when running tests in a loop for debugging 
> or release-testing purposes.
> Finally, it also puts unnecessary strain on the download mirrors.
> h4. Proposal
> Add a {{DownloadCache}} interface with a single {{Path getOrDownload(String 
> url, Path targetDir)}} method.
> Access to and loading of implementations are handled like resources (see 
> above).
> The caching behavior is implementation-dependent.
> A reasonable implementation should allow files may be cached in a 
> user-provided directory, with an optional time-to-live for long-term setups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to