We can certainly prioritise work on making the tests easier to work
with. It would be good from a community building perspective.

We have the tests working in the GitHub Actions CI framework [1].
Pekko is a complex clustering tool and it does require quite some
setup to run some of the tests.

Unfortunately, blocking a 1.0.0 release would damage our hopes of
building the community.


On Thu, 6 Jul 2023 at 10:39, Matthew de Detrich
<[email protected]> wrote:
>
> > Efforts have been made in the past to clean things up, but the reality
> is that it is very hard to make tests reliable in an asynchronous
> framework and in general it is almost impossible to accommodate for
> all possible running environments.
>
> I really want to highlight this takeaway, if we were so strict with other
> comparable
> Apache TLP (i.e. Kafka, Spark, Cassandra, Flink etc etc) then no release be
> ever
> made. While there is merit in discussing how bespoke the testing for Pekko
> is
> vs other "typical" ASF projects, if the expectation is that you can just run
> sbt test on a local laptop and the tests to reliably pass then thats not
> going to
> happen any time soon.
>
> As we speak I am adding in documentation in various places (i.e.
> https://cwiki.apache.org/confluence/display/PEKKO/Testing
> and https://github.com/apache/incubator-pekko/pull/469) for techniques on
> how to
> handle this (i.e. testQuick) and I will also document what Johannes said
> right now regarding the timing factor/timing tests specifically for Pekko
> core.
>
> On Thu, Jul 6, 2023 at 11:28 AM Johannes Rudolph <[email protected]>
> wrote:
>
> > The main test suite that is run nightly is run with this command:
> >
> >           sbt \
> >             -Dpekko.cluster.assert=on \
> >             -Dpekko.log.timestamps=true \
> >             -Dpekko.test.timefactor=2 \
> >             -Dpekko.actor.testkit.typed.timefactor=2 \
> >             -Dpekko.test.tags.exclude=gh-exclude,timing \
> >             -Dpekko.test.multi-in-test=false \
> >             clean "+~ ${{ matrix.scalaVersion }} test" checkTestsHaveRun
> >
> >
> > https://github.com/apache/incubator-pekko/blob/88bf6329f193eedd45091f4f9a515943bd8ecb23/.github/workflows/nightly-builds.yml#L168-L175
> >
> > Unfortunately, the amount of flaky tests is high, so the important bits are
> >
> >             -Dpekko.test.timefactor=2
> >             -Dpekko.test.tags.exclude=gh-exclude,timing
> >
> > which makes timing in tests more lenient and also excludes some notorious
> > ones.
> >
> > Efforts have been made in the past to clean things up, but the reality
> > is that it is very hard to make tests reliable in an asynchronous
> > framework and in general it is almost impossible to accommodate for
> > all possible running environments.
> >
> > Johannes
> >
> > On Thu, Jul 6, 2023 at 10:30 AM Matthew de Detrich
> > <[email protected]> wrote:
> > >
> > > So in general testing software like Pekko is going to be problematic due
> > to
> > > it being a distributed/concurrent system i.e. there are determinism (i.e.
> > > flaky) test issues. One thing that I did however notice is that in the
> > > github actions CI we pass arguments to help alleviate these issues (i.e.
> > >
> > https://github.com/apache/incubator-pekko/blob/main/.github/workflows/nightly-builds.yml#L35-L42
> > ).
> > > The way that ASF release process works where it compels committers to run
> > > tests locally has surfaced this, where as in the past the source of truth
> > > for tests was either in github actions CI or a in the case of Lightbend
> > > private machines/scripts that were specifically setup to test the
> > software
> > > before a release.
> > >
> > > A final thing to note is that when someone makes a PR against Pekko,
> > tests
> > > are only run on the module that has changed (this is achieved via
> > > https://github.com/sbt/sbt-pull-request-validator) and most of the
> > > flakiness occurs when you try to run all of the tests at once. For this,
> > > having a powerful machine helps.
> > >
> > > On Thu, Jul 6, 2023 at 10:20 AM Claude Warren, Jr
> > > <[email protected]> wrote:
> > >
> > > > My opinion is that if I check out the release code the tests should
> > pass,
> > > > or there should be a list of "flaky" tests that are known to have
> > problems
> > > > so I can at least verify that the failures are in them.
> > > >
> > > > On Thu, Jul 6, 2023 at 10:15 AM PJ Fanning <[email protected]>
> > wrote:
> > > >
> > > > > There are multiple modules. The tests for some modules are passing
> > but
> > > > > for other modules they are failing.
> > > > >
> > > > > With one example:
> > > > >
> > > > > [error] (remote-tests / Test / test) sbt.TestsFailedException: Tests
> > > > > unsuccessful
> > > > >
> > > > > remote-tests module is in the directory of the same name.
> > > > >
> > > > > You can use this command just to run the tests in that module
> > > > >
> > > > > sbt remote-tests/test
> > > > >
> > > > > Some of the tests can be sensitive to the performance of your
> > machine.
> > > > >
> > > > > If you continue to have trouble, maybe you could send me your full
> > > > > output. I don't think this public mailing list would be a good place
> > > > > for that large output but you can email me directly or message it
> > > > > using Slack.
> > > > >
> > > > >
> > > > > On Thu, 6 Jul 2023 at 09:06, Claude Warren, Jr
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > While testing RC3 I did the following:
> > > > > >
> > > > > > sbt test
> > > > > >
> > > > > > the result I got was:
> > > > > > [info] Total number of tests run: 628
> > > > > > [info] Suites: completed 181, aborted 0
> > > > > > [info] Tests: succeeded 628, failed 0, canceled 0, ignored 6,
> > pending 2
> > > > > > [info] All tests passed.
> > > > > > [error] (remote-tests / Test / test) sbt.TestsFailedException:
> > Tests
> > > > > > unsuccessful
> > > > > > [error] (persistence / Test / test) sbt.TestsFailedException: Tests
> > > > > > unsuccessful
> > > > > > [error] (persistence-shared / Test / test)
> > sbt.TestsFailedException:
> > > > > Tests
> > > > > > unsuccessful
> > > > > > [error] (remote / Test / test) sbt.TestsFailedException: Tests
> > > > > unsuccessful
> > > > > > [error] (stream-tests / Test / test) sbt.TestsFailedException:
> > Tests
> > > > > > unsuccessful
> > > > > > [error] Total time: 7313 s (02:01:53), completed 5 Jul 2023,
> > 19:10:06
> > > > > >
> > > > > > Why the success and yet the failures?
> > > > > >
> > > > > > Claude
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [email protected]
> > > > > For additional commands, e-mail: [email protected]
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Matthew de Detrich
> > >
> > > *Aiven Deutschland GmbH*
> > >
> > > Immanuelkirchstraße 26, 10405 Berlin
> > >
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > >
> > > *m:* +491603708037
> > >
> > > *w:* aiven.io *e:* [email protected]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>
> --
>
> Matthew de Detrich
>
> *Aiven Deutschland GmbH*
>
> Immanuelkirchstraße 26, 10405 Berlin
>
> Amtsgericht Charlottenburg, HRB 209739 B
>
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>
> *m:* +491603708037
>
> *w:* aiven.io *e:* [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to