Re: [PROPOSAL] Improve ActiveMQ 5 build stability

Jean-Baptiste Onofre Tue, 16 Mar 2021 22:01:20 -0700

By the way, guys, I would like to merge:

https://github.com/apache/activemq/pull/622 
<https://github.com/apache/activemq/pull/622>


And then rebase PR branches to have "Jenkins happy".

No objection to merge PR #622 ?

Thanks,
Regards
JB

> Le 17 mars 2021 à 05:45, Jean-Baptiste Onofre <[email protected]> a écrit :
> 
> Hi Matt,
> 
> I agree.
> 
> I think we should do a full build not after a single merge, but a "group" of 
> merges. Else, it means that we will do a full build after each PR merge, so 
> basically it’s what we have today, and not practical at all.
> 
> That’s why, as a first step, I’m proposing to run once a week or "on demand".
> 
> Regards
> JB
> 
>> Le 16 mars 2021 à 22:15, Matt Pavlovich <[email protected]> a écrit :
>> 
>> Feels like we are in a transition period. I don’t see a per-PR unit test job 
>> being practical until the execution times come way down— and that is going 
>> to be significant engineering effort. 
>> 
>> That being said, full build with full tests the day after a merged change 
>> seems like a reasonable schedule.
>> 
>>> On Mar 16, 2021, at 3:24 PM, Hossack, Etienne <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi all, just some thoughts to share here: 
>>> 
>>> I think ideally I would expect a “static” build to be a nightly build run 
>>> every day - but I think given the frequency of contributions, weekly makes 
>>> sense (and I know it takes about 30% of the day 😅).
>>> Maybe too, that runs a snapshot version of the dependency checker to fail 
>>> the build on CVEs or new types of checks from that tool. 
>>> 
>>> And then for contributions, the PR can run the lightweight profile, but 
>>> then master could run the full profile on merge?
>>> 
>>> Does that make sense?
>>> In summary I think it’s me expressing agreement for a static build, but 
>>> also suggesting a full build be run on contributions in case there are 
>>> multiple merges in a week, or say right after the build is run, and 
>>> increasing the time-to-discovery of errors.
>>> 
>>> Cheers,
>>> Étienne Hossack
>>> Software Development Engineer, Amazon MQ
>>> email: [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>
>>> phone: +1-778-945-8287
>>> 
>>> 
>>> 
>>>> On Mar 15, 2021, at 10:05 PM, Jean-Baptiste Onofre <[email protected] 
>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>> <mailto:[email protected]>>> wrote:
>>>> 
>>>> CAUTION: This email originated from outside of the organization. Do not 
>>>> click links or open attachments unless you can confirm the sender and know 
>>>> the content is safe.
>>>> 
>>>> 
>>>> 
>>>> Hi guys,
>>>> 
>>>> I created https://github.com/apache/activemq/pull/622 
>>>> <https://github.com/apache/activemq/pull/622> 
>>>> <https://github.com/apache/activemq/pull/622 
>>>> <https://github.com/apache/activemq/pull/622>> PR about this and you can 
>>>> see that Jenkins is happy now. The full build took about 120mn (2h) on 
>>>> Jenkins.
>>>> 
>>>> Basically what I did in the PR:
>>>> - remove activemq-unit-tests and itests (Karaf, Spring3) from the default 
>>>> reactor
>>>> - introduce full.test profile that build all modules including unit tests 
>>>> and itests
>>>> 
>>>> The full.test profile is not use in Jenkinsfile, meaning that the PR 
>>>> executes all tests but not activemq-unit-tests modules neither itests. I 
>>>> think it’s acceptable for PR (and it already takes 2 hours ;)).
>>>> I would like to introduce a "static" build on ci-builds.apache.org 
>>>> <http://ci-builds.apache.org/> <http://ci-builds.apache.org/ 
>>>> <http://ci-builds.apache.org/>> (not via Jenkinsfile) executed every week 
>>>> and doing a full build (including full.test profile).
>>>> 
>>>> Thoughts ?
>>>> 
>>>> Regards
>>>> JB
>>>> 
>>>>> Le 15 mars 2021 à 08:20, Jean-Baptiste Onofre <[email protected] 
>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>> <mailto:[email protected]>>> a écrit :
>>>>> 
>>>>> Hi guys,
>>>>> 
>>>>> I have create the following Jira with the tests I found "flaky" (in a 
>>>>> full build, not necessary single execution, it can also depends of the 
>>>>> machine, that’s why I tested with several docker setup in terms of CPU 
>>>>> and memory):
>>>>> 
>>>>> AMQ-8190: DuplexAdvisoryRaceTest is failing (Jonathan said he gonna take 
>>>>> a look)
>>>>> AMQ-8189: CachedLDAPAuthorizationModuleTest is failing
>>>>> AMQ-8188: AMQ5266SingleDestTest is failing
>>>>> 
>>>>> There’s a test failure in leveldb module, but it’s not a big deal as I 
>>>>> have the PR ready to remove leveldb 
>>>>> (https://github.com/apache/activemq/pull/593 
>>>>> <https://github.com/apache/activemq/pull/593> 
>>>>> <https://github.com/apache/activemq/pull/593 
>>>>> <https://github.com/apache/activemq/pull/593>>).
>>>>> 
>>>>> I’m also retesting StompNIOSSLTest, it seems way more stable thanks to 
>>>>> Chris
>>>>> 
>>>>> I also created AMQ-8191 (linked with previous Jira) about cleanup on the 
>>>>> profiles, fast.test profile introduction and usage on Jenkins, and 
>>>>> exclude the failing tests waiting to be fixed (and reinclude them at that 
>>>>> time).
>>>>> 
>>>>> AMQ-8191 is almost ready, I’m testing.
>>>>> 
>>>>> Regards
>>>>> JB
>>>>> 
>>>>>> Le 14 mars 2021 à 06:04, Jean-Baptiste Onofre <[email protected] 
>>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>>> <mailto:[email protected]>>> a écrit :
>>>>>> 
>>>>>> Hi guys,
>>>>>> 
>>>>>> I’ve updated my local branch according to your comments:
>>>>>> 
>>>>>> 1. I’ve cleanup the profiles and introduce/rename a fast profile that 
>>>>>> executes all unit tests in modules but exclude the activemq-unit-tests 
>>>>>> and karaf-itests.
>>>>>> 2. I’m keeping the smoke test profile
>>>>>> 3. I’ve created a tobefixed profile that include all flaky tests I’ve 
>>>>>> identified
>>>>>> 4. I’ve updated Jenkinsfile to use fast profile on PR
>>>>>> 
>>>>>> I will create the PR soon.
>>>>>> 
>>>>>> Regards
>>>>>> JB
>>>>>> 
>>>>>>> Le 13 mars 2021 à 06:05, Jean-Baptiste Onofre <[email protected] 
>>>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>>>> <mailto:[email protected]>>> a écrit :
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> We already have "fast" profile, and it’s good idea to use this profile 
>>>>>>> on Jenkins by default and move some tests here.
>>>>>>> 
>>>>>>> For instance, I don’t think it’s require to launch all 
>>>>>>> activemq-unit-test by default but I would keep the tests in each module 
>>>>>>> (they are fast and doesn’t need whole broker infra).
>>>>>>> 
>>>>>>> About RetryRule, I did that in Karaf as well, let me see if it helps 
>>>>>>> for ActiveMQ.
>>>>>>> 
>>>>>>> Thanks !
>>>>>>> I will improve this way.
>>>>>>> 
>>>>>>> Regards
>>>>>>> JB
>>>>>>> 
>>>>>>>> Le 12 mars 2021 à 20:31, Clebert Suconic <[email protected] 
>>>>>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>>>>>> <mailto:[email protected]>>> a écrit :
>>>>>>>> 
>>>>>>>> You should instead have a fast profile, with a subset of the testsuite
>>>>>>>> to run on every commit and branch for these cases. I looked on Jenkins
>>>>>>>> and having many builds taking 3 Hours each won't really scale on the
>>>>>>>> lab anyway. Failures will only make things worse there.
>>>>>>>> 
>>>>>>>> The lab is usually not powerful for long running tests.
>>>>>>>> 
>>>>>>>> And a full profile that should run as part of a full run. (say.. once
>>>>>>>> a day instead of every commit), or any interval you chose.
>>>>>>>> 
>>>>>>>> I don't think you should hide tests though.. as that is like pushing
>>>>>>>> dirt under the rug.. (even if you say to enable it later... as in
>>>>>>>> anything in life temporary solutions endup being definitive usually).
>>>>>>>> 
>>>>>>>> As any System dealing with times and asynchronous flaky and races are
>>>>>>>> part of the day. One thing I did in ActiveMQ Artemis was to write a
>>>>>>>> Rule where the test is retried. You could also add retries to tests in
>>>>>>>> cases where it is acceptable... but be careful to not just hide bugs
>>>>>>>> away in this case as well.
>>>>>>>> 
>>>>>>>> If you are interested, on artemis, Look for usages on
>>>>>>>> https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java
>>>>>>>>  
>>>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java>
>>>>>>>>  
>>>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java
>>>>>>>>  
>>>>>>>> <https://github.com/apache/activemq-artemis/blob/master/artemis-commons/src/test/java/org/apache/activemq/artemis/utils/RetryRule.java>>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> You need to activate a profile in artemis for the retryRule to work.
>>>>>>>> 
>>>>>>>> On Fri, Mar 12, 2021 at 1:56 PM JB Onofré <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> Yes agree. I’m launching new builds ;)
>>>>>>>>> 
>>>>>>>>>> Le 12 mars 2021 à 19:51, Christopher Shannon 
>>>>>>>>>> <[email protected]> a écrit :
>>>>>>>>>> 
>>>>>>>>>> Just running it by itself on the command line and also in the IDE. 
>>>>>>>>>> The full
>>>>>>>>>> build takes a while and if it's breaking with that then it's 
>>>>>>>>>> probably some
>>>>>>>>>> other test that isn't cleaning up properly in between runs.
>>>>>>>>>> 
>>>>>>>>>>> On Fri, Mar 12, 2021 at 1:47 PM JB Onofré <[email protected]> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Did you try in a full build or the test individually ? I’m running 
>>>>>>>>>>> a new
>>>>>>>>>>> build.
>>>>>>>>>>> 
>>>>>>>>>>>> Le 12 mars 2021 à 19:38, Christopher Shannon <
>>>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>>>> 
>>>>>>>>>>>> I've been running the DurableSyncNetworkBridgeTest several times 
>>>>>>>>>>>> on my
>>>>>>>>>>> box
>>>>>>>>>>>> and it always passes.
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 1:25 PM Christopher Shannon <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ideally it would be better to fix tests than to simply exclude 
>>>>>>>>>>>>> them.
>>>>>>>>>>> These
>>>>>>>>>>>>> tests were added for a reason I would presume (I know I had 
>>>>>>>>>>>>> worked on
>>>>>>>>>>> the
>>>>>>>>>>>>> durable sync stuff in the past) so randomly turning off tests 
>>>>>>>>>>>>> could
>>>>>>>>>>> lead to
>>>>>>>>>>>>> missing errors.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 12:57 PM Jean-Baptiste Onofre 
>>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I’m adding these tests to be fixed/improved:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> FailoverDurableSubTransactionTest.testFailoverCommitListener
>>>>>>>>>>>>>> DurableSyncNetworkBridgeTest.testRemoveSubscriptionPropagate
>>>>>>>>>>>>>> DurableSyncNetworkBridgeTest.testRemoveSubscriptionWithBridgeOffline
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Let me create the Jira and create a PR to exclude the tests and 
>>>>>>>>>>>>>> verify
>>>>>>>>>>>>>> Jenkins is happy.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Le 12 mars 2021 à 16:14, Jonathan Gallimore <
>>>>>>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'm +1 on the actions :).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Jon
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 3:11 PM Jean-Baptiste Onofre 
>>>>>>>>>>>>>>> <[email protected]
>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Sure, thanks for the help !
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Just waiting for some feedback before starting the "actions" ;)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Le 12 mars 2021 à 14:29, Jonathan Gallimore <
>>>>>>>>>>>>>>>> [email protected]> a écrit :
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I ran into this test failing yesterday:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java
>>>>>>>>>>>>>>>>> - I'd be happy to try and contribute a fix. Would you like to 
>>>>>>>>>>>>>>>>> assign
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> JIRA to me?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Jon
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Mar 12, 2021 at 12:58 PM Jean-Baptiste Onofre <
>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Now that we have Jenkinsfile in our repo, and we use Jenkins
>>>>>>>>>>>>>> pipeline,
>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>> dramatically improved our build: the build is executed for 
>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>> PullRequests or commit on the main branch.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> However, we have lot of failing tests, causing quite 
>>>>>>>>>>>>>>>>>> systematically
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> build failing on ci-builds.apache.org.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> We really need to have a clean, accurate and stable build: 
>>>>>>>>>>>>>>>>>> it will
>>>>>>>>>>>>>>>> improve
>>>>>>>>>>>>>>>>>> the issue detection and simplify the review, especially for
>>>>>>>>>>>>>>>> PullRequests.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I ran several builds on my machine (with different docker
>>>>>>>>>>> containers)
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> I already identified some failing/flaky tests:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> activemq-leveldb-store/src/test/java/org/apache/activemq/leveldb/test/ElectingLevelDBStoreTest.java
>>>>>>>>>>>>>>>>>> is not a big deal as I have a PR removing leveled completely
>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> activemq-stomp/src/test/java/org/apache/activemq/transport/stomp/Stomp11NIOSSLTest.java.
>>>>>>>>>>>>>>>>>> Chris did an improvement, but I still have some flakiness 
>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> activemq-unit-tests/src/test/java/org/apache/activemq/usecases/DuplexAdvisoryRaceTest.java
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I propose the following action plan:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 1. Create the Jira for each failing/flaky tests
>>>>>>>>>>>>>>>>>> 2. Exclude the tests (in surefire plugin configuration) to 
>>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>> "green
>>>>>>>>>>>>>>>>>> light" on Jenkins.
>>>>>>>>>>>>>>>>>> 3. For each Jira, we work on a PullRequest, to be sure that 
>>>>>>>>>>>>>>>>>> Jenkins
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> still "happy".
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Anyone willing to help on (3) is welcome !
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> If there’s no objection, I will start with (1) and (2).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Clebert Suconic
>

Re: [PROPOSAL] Improve ActiveMQ 5 build stability

Reply via email to