Re: [PROPOSAL] Improve ActiveMQ 5 build stability

Matt Pavlovich Wed, 17 Mar 2021 07:56:45 -0700
+1 

> On Mar 17, 2021, at 12:00 AM, Jean-Baptiste Onofre <[email protected]> wrote:
> 
> By the way, guys, I would like to merge:
> 
> https://github.com/apache/activemq/pull/622 
> <https://github.com/apache/activemq/pull/622>
> 
> And then rebase PR branches to have "Jenkins happy".
> 
> No objection to merge PR #622 ?
> 
> Thanks,
> Regards
> JB
> 
>> Le 17 mars 2021 à 05:45, Jean-Baptiste Onofre <[email protected]> a écrit :
>> 
>> Hi Matt,
>> 
>> I agree.
>> 
>> I think we should do a full build not after a single merge, but a "group" of 
>> merges. Else, it means that we will do a full build after each PR merge, so 
>> basically it’s what we have today, and not practical at all.
>> 
>> That’s why, as a first step, I’m proposing to run once a week or "on demand".
>> 
>> Regards
>> JB
>> 
>>> Le 16 mars 2021 à 22:15, Matt Pavlovich <[email protected]> a écrit :
>>> 
>>> Feels like we are in a transition period. I don’t see a per-PR unit test 
>>> job being practical until the execution times come way down— and that is 
>>> going to be significant engineering effort. 
>>> 
>>> That being said, full build with full tests the day after a merged change 
>>> seems like a reasonable schedule.
>>> 
>>>> On Mar 16, 2021, at 3:24 PM, Hossack, Etienne <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Hi all, just some thoughts to share here: 
>>>> 
>>>> I think ideally I would expect a “static” build to be a nightly build run 
>>>> every day - but I think given the frequency of contributions, weekly makes 
>>>> sense (and I know it takes about 30% of the day 😅).
>>>> Maybe too, that runs a snapshot version of the dependency checker to fail 
>>>> the build on CVEs or new types of checks from that tool. 
>>>> 
>>>> And then for contributions, the PR can run the lightweight profile, but 
>>>> then master could run the full profile on merge?
>>>> 
>>>> Does that make sense?
>>>> In summary I think it’s me expressing agreement for a static build, but 
>>>> also suggesting a full build be run on contributions in case there are 
>>>> multiple merges in a week, or say right after the build is run, and 
>>>> increasing the time-to-discovery of errors.
>>>> 
>>>> Cheers,
>>>> Étienne Hossack
>>>> Software Development Engineer, Amazon MQ
>>>> email: [email protected] <mailto:[email protected]> 
>>>> <mailto:[email protected] <mailto:[email protected]>>
>>>> phone: +1-778-945-8287
>>>> 
>>>> 
>>>> 
>>>>> On Mar 15, 2021, at 10:05 PM, Jean-Baptiste Onofre <[email protected] 
>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>> <mailto:[email protected]>>> wrote:
>>>>> 
>>>>> CAUTION: This email originated from outside of the organization. Do not 
>>>>> click links or open attachments unless you can confirm the sender and 
>>>>> know the content is safe.
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi guys,
>>>>> 
>>>>> I created https://github.com/apache/activemq/pull/622 
>>>>> <https://github.com/apache/activemq/pull/622> 
>>>>> <https://github.com/apache/activemq/pull/622 
>>>>> <https://github.com/apache/activemq/pull/622>> PR about this and you can 
>>>>> see that Jenkins is happy now. The full build took about 120mn (2h) on 
>>>>> Jenkins.
>>>>> 
>>>>> Basically what I did in the PR:
>>>>> - remove activemq-unit-tests and itests (Karaf, Spring3) from the default 
>>>>> reactor
>>>>> - introduce full.test profile that build all modules including unit tests 
>>>>> and itests
>>>>> 
>>>>> The full.test profile is not use in Jenkinsfile, meaning that the PR 
>>>>> executes all tests but not activemq-unit-tests modules neither itests. I 
>>>>> think it’s acceptable for PR (and it already takes 2 hours ;)).
>>>>> I would like to introduce a "static" build on ci-builds.apache.org 
>>>>> <http://ci-builds.apache.org/> <http://ci-builds.apache.org/ 
>>>>> <http://ci-builds.apache.org/>> (not via Jenkinsfile) executed every week 
>>>>> and doing a full build (including full.test profile).
>>>>> 
>>>>> Thoughts ?
>>>>> 
>>>>> Regards
>>>>> JB
>>>>> 
>>>>>> Le 15 mars 2021 à 08:20, Jean-Baptiste Onofre <[email protected] 
>>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>>> <mailto:[email protected]>>> a écrit :
>>>>>> 
>>>>>> Hi guys,
>>>>>> 
>>>>>> I have create the following Jira with the tests I found "flaky" (in a 
>>>>>> full build, not necessary single execution, it can also depends of the 
>>>>>> machine, that’s why I tested with several docker setup in terms of CPU 
>>>>>> and memory):
>>>>>> 
>>>>>> AMQ-8190: DuplexAdvisoryRaceTest is failing (Jonathan said he gonna take 
>>>>>> a look)
>>>>>> AMQ-8189: CachedLDAPAuthorizationModuleTest is failing
>>>>>> AMQ-8188: AMQ5266SingleDestTest is failing
>>>>>> 
>>>>>> There’s a test failure in leveldb module, but it’s not a big deal as I 
>>>>>> have the PR ready to remove leveldb 
>>>>>> (https://github.com/apache/activemq/pull/593 
>>>>>> <https://github.com/apache/activemq/pull/593> 
>>>>>> <https://github.com/apache/activemq/pull/593 
>>>>>> <https://github.com/apache/activemq/pull/593>>).
>>>>>> 
>>>>>> I’m also retesting StompNIOSSLTest, it seems way more stable thanks to 
>>>>>> Chris
>>>>>> 
>>>>>> I also created AMQ-8191 (linked with previous Jira) about cleanup on the 
>>>>>> profiles, fast.test profile introduction and usage on Jenkins, and 
>>>>>> exclude the failing tests waiting to be fixed (and reinclude them at 
>>>>>> that time).
>>>>>> 
>>>>>> AMQ-8191 is almost ready, I’m testing.
>>>>>> 
>>>>>> Regards
>>>>>> JB
>>>>>> 
>>>>>>> Le 14 mars 2021 à 06:04, Jean-Baptiste Onofre <[email protected] 
>>>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>>>> <mailto:[email protected]>>> a écrit :
>>>>>>> 
>>>>>>> Hi guys,
>>>>>>> 
>>>>>>> I’ve updated my local branch according to your comments:
>>>>>>> 
>>>>>>> 1. I’ve cleanup the profiles and introduce/rename a fast profile that 
>>>>>>> executes all unit tests in modules but exclude the activemq-unit-tests 
>>>>>>> and karaf-itests.
>>>>>>> 2. I’m keeping the smoke test profile
>>>>>>> 3. I’ve created a tobefixed profile that include all flaky tests I’ve 
>>>>>>> identified
>>>>>>> 4. I’ve updated Jenkinsfile to use fast profile on PR
>>>>>>> 
>>>>>>> I will create the PR soon.
>>>>>>> 
>>>>>>> Regards
>>>>>>> JB
>>>>>>> 
>>>>>>>> Le 13 mars 2021 à 06:05, Jean-Baptiste Onofre <[email protected] 
>>>>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>>>>> <mailto:[email protected]>>> a écrit :
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> We already have "fast" profile, and it’s good idea to use this profile 
>>>>>>>> on Jenkins by default and move some tests here.
>>>>>>>> 
>>>>>>>> For instance, I don’t think it’s require to launch all 
>>>>>>>> activemq-unit-test by default but I would keep the tests in each 
>>>>>>>> module (they are fast and doesn’t need whole broker infra).
>>>>>>>> 
>>>>>>>> About RetryRule, I did that in Karaf as well, let me see if it helps 
>>>>>>>> for ActiveMQ.
>>>>>>>> 
>>>>>>>> Thanks !
>>>>>>>> I will improve this way.
>>>>>>>> 
>>>>>>>> Regards
>>>>>>>> JB
>>>>>>>> 
>>>>>>>>> Le 12 mars 2021 à 20:31, Clebert Suconic <[email protected] 
>>>>>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>>>>>> <mailto:[email protected]>>> a écrit :
>>>>>>>>> 
>>>>>>>>> You should instead have a fast profile, with a subset of the testsuite
>>>>>>>>> to run on every commit and branch for these cases. I looked on Jenkins
>>>>>>>>> and having many builds taking 3 Hours each won't really scale on the
>>>>>>>>> lab anyway. Failures will only make things worse there.
>>>>>>>>> 
>>>>>>>>> The lab is usually not powerful for long running tests.
>>>>>>>>> 
>>>>>>>>> And a full profile that should run as part of a full run. (say.. once
>>>>>>>>> a day instead of every commit), or any interval you chose.
>>>>>>>>> 
>>>>>>>>> I don't think you should hide tests though.. as that is like pushing
>>>>>>>>> dirt under the rug.. (even if you say to enable it later... as in
>>>>>>>>> anything in life temporary solutions endup being definitive usually).
>>>>>>>>> 
>>>>>>>>> As any System dealing with times and asynchronous flaky and races are
>>>>>>>>> part of the day. One thing I did in ActiveMQ Artemis was to write a
>>>>>>>>> Rule where the test is retried. You could also add retries to tests in
>>>>>>>>> cases where it is acceptable... but be careful to not just hide bugs
>>>>>>>>> away in this case as well.
>>>>>>>>> 
>>>>>>>>> If you are interested, on artemis, Look for usages on
>>>>>>>>> https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java
>>>>>>>>>  
>>>>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java>
>>>>>>>>>  
>>>>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java
>>>>>>>>>  
>>>>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java>>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> You need to activate a profile in artemis for the retryRule to work.
>>>>>>>>> 
>>>>>>>>> On Fri, Mar 12, 2021 at 1:56 PM JB Onofré <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>> Yes agree. I’m launching new builds ;)
>>>>>>>>>> 
>>>>>>>>>>> Le 12 mars 2021 à 19:51, Christopher Shannon 
>>>>>>>>>>> <[email protected]> a écrit :
>>>>>>>>>>> 
>>>>>>>>>>> Just running it by itself on the command line and also in the IDE. 
>>>>>>>>>>> The full
>>>>>>>>>>> build takes a while and if it's breaking with that then it's 
>>>>>>>>>>> probably some
>>>>>>>>>>> other test that isn't cleaning up properly in between runs.
>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Mar 12, 2021 at 1:47 PM JB Onofré <[email protected]> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Did you try in a full build or the test individually ? I’m running 
>>>>>>>>>>>> a new
>>>>>>>>>>>> build.
>>>>>>>>>>>> 
>>>>>>>>>>>>> Le 12 mars 2021 à 19:38, Christopher Shannon <
>>>>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I've been running the DurableSyncNetworkBridgeTest several times 
>>>>>>>>>>>>> on my
>>>>>>>>>>>> box
>>>>>>>>>>>>> and it always passes.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 1:25 PM Christopher Shannon <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Ideally it would be better to fix tests than to simply exclude 
>>>>>>>>>>>>>> them.
>>>>>>>>>>>> These
>>>>>>>>>>>>>> tests were added for a reason I would presume (I know I had 
>>>>>>>>>>>>>> worked on
>>>>>>>>>>>> the
>>>>>>>>>>>>>> durable sync stuff in the past) so randomly turning off tests 
>>>>>>>>>>>>>> could
>>>>>>>>>>>> lead to
>>>>>>>>>>>>>> missing errors.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 12:57 PM Jean-Baptiste Onofre 
>>>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I’m adding these tests to be fixed/improved:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FailoverDurableSubTransactionTest.testFailoverCommitListener
>>>>>>>>>>>>>>> DurableSyncNetworkBridgeTest.testRemoveSubscriptionPropagate
>>>>>>>>>>>>>>> DurableSyncNetworkBridgeTest.testRemoveSubscriptionWithBridgeOffline
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Let me create the Jira and create a PR to exclude the tests and 
>>>>>>>>>>>>>>> verify
>>>>>>>>>>>>>>> Jenkins is happy.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Le 12 mars 2021 à 16:14, Jonathan Gallimore <
>>>>>>>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I'm +1 on the actions :).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jon
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 3:11 PM Jean-Baptiste Onofre 
>>>>>>>>>>>>>>>> <[email protected]
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Sure, thanks for the help !
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Just waiting for some feedback before starting the "actions" 
>>>>>>>>>>>>>>>>> ;)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Le 12 mars 2021 à 14:29, Jonathan Gallimore <
>>>>>>>>>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I ran into this test failing yesterday:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java
>>>>>>>>>>>>>>>>>> - I'd be happy to try and contribute a fix. Would you like 
>>>>>>>>>>>>>>>>>> to assign
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> JIRA to me?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Jon
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 12:58 PM Jean-Baptiste Onofre <
>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Now that we have Jenkinsfile in our repo, and we use Jenkins
>>>>>>>>>>>>>>> pipeline,
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> dramatically improved our build: the build is executed for 
>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>> PullRequests or commit on the main branch.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> However, we have lot of failing tests, causing quite 
>>>>>>>>>>>>>>>>>>> systematically
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> build failing on ci-builds.apache.org.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> We really need to have a clean, accurate and stable build: 
>>>>>>>>>>>>>>>>>>> it will
>>>>>>>>>>>>>>>>> improve
>>>>>>>>>>>>>>>>>>> the issue detection and simplify the review, especially for
>>>>>>>>>>>>>>>>> PullRequests.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I ran several builds on my machine (with different docker
>>>>>>>>>>>> containers)
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> I already identified some failing/flaky tests:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> activemq-leveldb-store/src/test/java/org/apache/activemq/leveldb/test/ElectingLevelDBStoreTest.java
>>>>>>>>>>>>>>>>>>> is not a big deal as I have a PR removing leveled completely
>>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> activemq-stomp/src/test/java/org/apache/activemq/transport/stomp/Stomp11NIOSSLTest.java.
>>>>>>>>>>>>>>>>>>> Chris did an improvement, but I still have some flakiness 
>>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I propose the following action plan:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 1. Create the Jira for each failing/flaky tests
>>>>>>>>>>>>>>>>>>> 2. Exclude the tests (in surefire plugin configuration) to 
>>>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>> "green
>>>>>>>>>>>>>>>>>>> light" on Jenkins.
>>>>>>>>>>>>>>>>>>> 3. For each Jira, we work on a PullRequest, to be sure that 
>>>>>>>>>>>>>>>>>>> Jenkins
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> still "happy".
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Anyone willing to help on (3) is welcome !
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> If there’s no objection, I will start with (1) and (2).
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Clebert Suconic
>> 
>
Re: [PROPOSAL] Improve ActiveMQ 5 build stability

Reply via email to