Hi Calcite and Drill devs

I have a quick follow up on the discussion of running Drill tests against the Calcite main branch.

Firstly, as feedback on Stamatis' comments below, the Drill unit test suite is big, diverse, slow, resource hungry and temperamental. By which I mean that it does work but a clean run needs about two hours, all of the RAM that can be squeezed from a GitHub Runner and any number of dependencies including some in Docker containers. Masochists may browse the Drill CI run logs [1]. That said, there are many tests that Calcite would never want in its CI and it would be possible to run a much more reasonable subset of the tests in the context of testing Calcite main.

Given the above, we have in the meantime opted to try basing Drill master on Calcite main [2] instead of some or other released version of Calcite. What's appealing about this is the one line change to make it happen. What could be difficult is the "two moving targets" mentioned by Stamatis. Nonetheless we want to try it for a while and in the worst case we'll simply retreat back to testing with a released Calcite.

My prediction is that there will be more "Something changed! Was it in Drill? No, it looks like this Calcite commit. Was it a regression? No, I asked and it's an intentional change that happens to affect a customisation that Drill has made. Okay let's deal with it this way", and permutations thereof. That would be useful stuff, to Drill at the very least.

Thanks
James

[1] https://github.com/apache/drill/actions
[2] https://github.com/apache/drill/commit/b6d59eaf39ef4c2f4417b633a97121223bc2569f

On 2023/03/14 10:42, Stamatis Zampetakis wrote:
Hi James,

Thanks for starting this discussion.

Regarding Calcite snapshots there is already a Jenkins job [1] publishing
regularly artifacts to the snapshots repo [2].

Apart from that, having regular integration tests between Drill and Calcite
is a very good idea. The tricky part is to decide where the integration
tests are going to run:
A) Part of Calcite CI
B) Part of Drill CI
C) Both

I will mostly comment about the option of adding extra tests in Calcite CI
since I am most familiar with it.
If this happens, I would prefer to use a fixed Drill commit as a reference.
This is more or less the same with pinning the versions in other Calcite
adapters. I am afraid that having two (or more) moving targets will make
things very unstable.
Second, I would be mindful of the total duration and frequency of the runs
especially if the intention is to run it as part of every PR.
Finally it would be nice to select a subset of Drill tests to run that are
relevant and hopefully meaningful to calcite devs and not everything so
that when they fail somebody familiar with Calcite can understand what
happens.

Apart from running integration tests anything that can be captured as a
simple Calcite unit test would be of immense help in long term stability.

Best,
Stamatis

[1] https://ci-builds.apache.org/job/Calcite/job/Calcite-snapshots/
[2] https://repository.apache.org/content/groups/snapshots/org/apache/calcite/





On Tue, Mar 14, 2023 at 9:02 AM James Turton <[email protected]> wrote:

Hi Calcite and Drill devs

Here's an idea that, while it doesn't equate to the upstreaming of Drill
tests to the Calcite test suite, could mean that all of Drill's tests
would automatically and frequently be run with the HEAD of the Calcite
main branch.

Something Drill started doing last year is the continuous publication of
SNAPSHOT artefacts to the Apache snapshots repo [1]. We wanted to offer
Drill plugin developers targeting the upcoming version of Drill the
ability to pull in up to date libraries without having to build Drill
from its master branch themselves.

Being concerned about causing an explosion of artefact versions we asked
Infra whether it would be okay for Drill to republish every time a
commit is merged into master and they told us that we are not the first
and can publish there as often as we like. The only tricky bit to
setting this up was obtaining Nexus credentials from GitHub Secrets and
using them in a new GitHub workflow but we do now have a working example
in Drill [2] and I'd happy to help if Calcite is interested in doing the
same.

If Drill then bases its master branch on these proposed SNAPSHOT
artefacts then Drill's normal CI runs would continuously test it with
Calcite as at its most recent commit and we'd be made aware of
compatibility problems early. If declaring a dependency on a SNAPSHOT
version in master is too much malpractice, instability or extra process
[3] for Drill devs then I expect that the same thing could still be
achieved in a new "calcite-next" branch. In any event we'd continue to
have the stable Drill branch which would of course only ever depend on a
released version of Calcite.

Regards
James

[1]

https://repository.apache.org/content/repositories/snapshots/org/apache/drill/
[2]

https://github.com/apache/drill/blob/master/.github/workflows/publish-snapshot.yml
[3] If we did this in Drill master then I guess a new step in the Drill
release process would need to see us push a commit that pins the Calcite
dependency to the latest released version. We'd probably also become
motivated to time Drill releases so that they happen just after Calcite
releases. Personally I'd be prepared to try this out for a while.


Reply via email to