I took another shot in the dark and hit something: I now believe that there's some inadequate clean up (or setup) in the Kerberos and/or Hadoop impersonation unit tests that produces errors in a racy way. I claim that because moving the impersonation tests out to the EasyOutOfMemory run was enough to get everything through without errors [1]. Exactly /how/ the recent Github runner image updates, which look wholly unrelated to anything Hadoop, manage to reveal a latent race condition is probably going to remain a mystery. I'll spend a little time looking at the relevant setup and clean up code.

1. https://github.com/apache/drill/actions/runs/5870147147

On 2023/08/15 17:09, James Turton wrote:
Hi

This is a write up of some notes I've written in Slack. Since a recent Github Actions Runner image update, every CI run we do under the Hadoop 3 build profile dies on Hadoop impersonation tests with the following error [1].

|Error: TestImpersonationMetadata.setup:72->BaseTestImpersonation.startMiniDfsCluster:84->BaseTestImpersonation.startMiniDfsCluster:112 » IO Running in secure mode, but config doesn't have a keytab | |Error: TestImpersonationMetadata.setup:72->BaseTestImpersonation.startMiniDfsCluster:84->BaseTestImpersonation.startMiniDfsCluster:112 » IO Running in secure mode, but config doesn't have a keytab |
|...

|Note that these errors do not show up under the Hadoop 2 build profile [2]. ||We actually had this exact problem in the CI a few months ago but it was resolved on that occasion by a subsequent Github Runner image update [3]. At that time we could also dodge the problem by downgrading our Runner image from ubuntu-latest to ubuntu-20.04, but that little trick does not work today. Upgrading Hadoop to the latest release, 3.3.6, also doesn't help here [4].

This issue hasn't ever been reproduced locally where the Hadoop Mini DFS cluster remains in "simple" auth mode with the result that no keytab file is sought and no test errors occur. One way to debug locally would be to create a Github Runner image from source and run it in a VM, container or chroot [5]. This looks unappetising to me so far, mainly because of the needed tools that I don't have or want, but it might prove to be the only way to stop shooting in the dark.

Regards
James

1. https://github.com/apache/drill/actions/runs/5845539423/job/15849744146#step:4:15538 2. https://github.com/apache/drill/actions/runs/5834759769/job/15824986427
3. https://github.com/actions/runner-images/issues/7340
4. https://github.com/apache/drill/actions/runs/5834722006
5. https://github.com/actions/runner-images/blob/main/docs/create-image-and-azure-resources.md

Reply via email to