This is now resolved, please rebase your PRs if you were being affected.
https://github.com/apache/drill/actions/runs/5882244920
On 2023/08/16 08:36, James Turton wrote:
I took another shot in the dark and hit something: I now believe that
there's some inadequate clean up (or setup) in the Kerberos and/or
Hadoop impersonation unit tests that produces errors in a racy way. I
claim that because moving the impersonation tests out to the
EasyOutOfMemory run was enough to get everything through without
errors [1]. Exactly /how/ the recent Github runner image updates,
which look wholly unrelated to anything Hadoop, manage to reveal a
latent race condition is probably going to remain a mystery. I'll
spend a little time looking at the relevant setup and clean up code.
1. https://github.com/apache/drill/actions/runs/5870147147
On 2023/08/15 17:09, James Turton wrote:
Hi
This is a write up of some notes I've written in Slack. Since a
recent Github Actions Runner image update, every CI run we do under
the Hadoop 3 build profile dies on Hadoop impersonation tests with
the following error [1].
|Error:
TestImpersonationMetadata.setup:72->BaseTestImpersonation.startMiniDfsCluster:84->BaseTestImpersonation.startMiniDfsCluster:112
» IO Running in secure mode, but config doesn't have a keytab |
|Error:
TestImpersonationMetadata.setup:72->BaseTestImpersonation.startMiniDfsCluster:84->BaseTestImpersonation.startMiniDfsCluster:112
» IO Running in secure mode, but config doesn't have a keytab |
|...
|Note that these errors do not show up under the Hadoop 2 build
profile [2]. ||We actually had this exact problem in the CI a few
months ago but it was resolved on that occasion by a subsequent
Github Runner image update [3]. At that time we could also dodge the
problem by downgrading our Runner image from ubuntu-latest to
ubuntu-20.04, but that little trick does not work today. Upgrading
Hadoop to the latest release, 3.3.6, also doesn't help here [4].
This issue hasn't ever been reproduced locally where the Hadoop Mini
DFS cluster remains in "simple" auth mode with the result that no
keytab file is sought and no test errors occur. One way to debug
locally would be to create a Github Runner image from source and run
it in a VM, container or chroot [5]. This looks unappetising to me so
far, mainly because of the needed tools that I don't have or want,
but it might prove to be the only way to stop shooting in the dark.
Regards
James
1.
https://github.com/apache/drill/actions/runs/5845539423/job/15849744146#step:4:15538
2.
https://github.com/apache/drill/actions/runs/5834759769/job/15824986427
3. https://github.com/actions/runner-images/issues/7340
4. https://github.com/apache/drill/actions/runs/5834722006
5.
https://github.com/actions/runner-images/blob/main/docs/create-image-and-azure-resources.md