I took another shot in the dark and hit something: I now believe that
there's some inadequate clean up (or setup) in the Kerberos and/or
Hadoop impersonation unit tests that produces errors in a racy way. I
claim that because moving the impersonation tests out to the
EasyOutOfMemory run was enough to get everything through without errors
[1]. Exactly /how/ the recent Github runner image updates, which look
wholly unrelated to anything Hadoop, manage to reveal a latent race
condition is probably going to remain a mystery. I'll spend a little
time looking at the relevant setup and clean up code.
1. https://github.com/apache/drill/actions/runs/5870147147
On 2023/08/15 17:09, James Turton wrote:
Hi
This is a write up of some notes I've written in Slack. Since a recent
Github Actions Runner image update, every CI run we do under the
Hadoop 3 build profile dies on Hadoop impersonation tests with the
following error [1].
|Error:
TestImpersonationMetadata.setup:72->BaseTestImpersonation.startMiniDfsCluster:84->BaseTestImpersonation.startMiniDfsCluster:112
» IO Running in secure mode, but config doesn't have a keytab |
|Error:
TestImpersonationMetadata.setup:72->BaseTestImpersonation.startMiniDfsCluster:84->BaseTestImpersonation.startMiniDfsCluster:112
» IO Running in secure mode, but config doesn't have a keytab |
|...
|Note that these errors do not show up under the Hadoop 2 build
profile [2]. ||We actually had this exact problem in the CI a few
months ago but it was resolved on that occasion by a subsequent Github
Runner image update [3]. At that time we could also dodge the problem
by downgrading our Runner image from ubuntu-latest to ubuntu-20.04,
but that little trick does not work today. Upgrading Hadoop to the
latest release, 3.3.6, also doesn't help here [4].
This issue hasn't ever been reproduced locally where the Hadoop Mini
DFS cluster remains in "simple" auth mode with the result that no
keytab file is sought and no test errors occur. One way to debug
locally would be to create a Github Runner image from source and run
it in a VM, container or chroot [5]. This looks unappetising to me so
far, mainly because of the needed tools that I don't have or want, but
it might prove to be the only way to stop shooting in the dark.
Regards
James
1.
https://github.com/apache/drill/actions/runs/5845539423/job/15849744146#step:4:15538
2.
https://github.com/apache/drill/actions/runs/5834759769/job/15824986427
3. https://github.com/actions/runner-images/issues/7340
4. https://github.com/apache/drill/actions/runs/5834722006
5.
https://github.com/actions/runner-images/blob/main/docs/create-image-and-azure-resources.md