Martijn Visser created FLINK-39871:
--------------------------------------

             Summary: MinioTestContainerTest fails on JDK 8 CI agents in a 
non-UTC timezone because testcontainers cannot parse the +08:00 Network.created 
timestamp
                 Key: FLINK-39871
                 URL: https://issues.apache.org/jira/browse/FLINK-39871
             Project: Flink
          Issue Type: Bug
          Components: Build System / Azure Pipelines, Build System / CI, Test 
Infrastructure
    Affects Versions: 1.20.6
            Reporter: Martijn Visser


MinioTestContainerTest in flink-s3-fs-base fails deterministically on the Azure 
CI "connect" stage for release-1.20 (6 tests, 6 errors), while the same code 
passes on GitHub Actions and on release-2.0 and later. The container fails to 
start during the Docker network inspect with:

{code}
org.testcontainers.shaded.com.fasterxml.jackson.databind.exc.InvalidFormatException:
  Can not deserialize value of type java.util.Date from String
  "2025-06-11T15:22:00.047060135+08:00": ... while it seems to fit format
  'yyyy-MM-dd'T'HH:mm:ss.SSSZ', parsing fails (leniency? null)
{code}

The exact value varies per agent (another run showed 
2022-09-25T13:46:35.209299016+08:00); the constant is a +08:00 offset with 
sub-millisecond precision.

Root cause: MinioTestContainer calls withNetworkAliases(...), which makes 
Testcontainers create and inspect a Docker network. docker-java deserializes 
Network.created, a java.util.Date. The Docker daemon serializes that timestamp 
in the host's local timezone. The Azure "Default" agent pool is self-hosted 
machines in the +08 (China) timezone, so Network.created is returned as 
...+08:00 with nanosecond precision. The Jackson bundled in Testcontainers 
1.21.4 cannot parse a colon timezone offset combined with sub-millisecond 
precision under JDK 8; it parses the same value fine under JDK 17 and 21.

This is why the failure is environment-specific: it requires both a non-UTC 
Docker daemon host and JDK 8.

{code}
Environment                              JDK   Network.created   Result
Azure Default pool (self-hosted, +08)     8    +08:00            FAIL
Azure Default pool (same agents)         17    +08:00            PASS (JDK 17 
parses it)
GitHub Actions (UTC runners)              8    Z                 PASS
{code}

release-2.0 and later are green because that stage runs on JDK 17; GitHub 
Actions is green because its runners are UTC. release-1.20 on Azure is the only 
combination with both conditions.

I've been able to reproduce this locally with the Jackson bundled in 
Testcontainers 1.21.4: a "Z" timestamp with nanoseconds parses on JDK 8, 17 and 
21, while a "+08:00" timestamp with nanoseconds fails only on JDK 8 (the 
jackson-annotations version made no difference, only the JDK did). In 
https://github.com/apache/flink/pull/28313 I've tried both bumping the Minio 
image, and explicitly tagging a version, but that only confirmed that the value 
is generated by the daemon and does not come from any image. Testcontainers 
1.21.4 and docker-java 3.4.2 are identical across release-1.20, release-2.0 and 
master. The only differing knob in the failing stage's pipeline config is jdk: 
8 versus jdk: 17.

I don't know why this is now occurring on the self-hosted machines and not 
before. A couple of options to potentially fix this are:

1. Run dockerd on the donated Default-pool agents in UTC (Environment=TZ=UTC in 
the docker service, or set the host timezone) and prune pre-existing networks.

2. Replace the MinIO Testcontainer with an in-JVM S3 mock, removing the 
Docker-inspect/date-parse path entirely.

Failing build examples:

* 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=75608&view=logs&j=f6116f8b-8a37-5717-446a-683b889a4082&t=c7d084eb-e1f0-5a0b-d4c7-bbe0f84d72b4&l=13962
* bump of MinIO container 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=75648&view=logs&j=2e8cb2f7-b2d3-5c62-9c05-cd756d33a819&t=2dd510a3-5041-5201-6dc3-54d310f68906&l=13635
* Pinned MinIO 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=75676&view=logs&j=2e8cb2f7-b2d3-5c62-9c05-cd756d33a819&t=2dd510a3-5041-5201-6dc3-54d310f68906&l=13962

Until this is addressed, nothing can be completely validated in release-1.20 
builds anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to