[
https://issues.apache.org/jira/browse/TIKA-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18076581#comment-18076581
]
ASF GitHub Bot commented on TIKA-4703:
--------------------------------------
Copilot commented on code in PR #2790:
URL: https://github.com/apache/tika/pull/2790#discussion_r3148047911
##########
.github/workflows/docker-snapshot.yml:
##########
@@ -89,6 +115,32 @@ jobs:
tar xzf
"tika-server/tika-server-standard/target/tika-server-standard-${TIKA_VERSION}-bin.tgz"
-C "${OUT_DIR}/tika-server"
cp "tika-server/docker-build/full/Dockerfile.snapshot"
"${OUT_DIR}/Dockerfile"
+ - name: Build tika-server full image for smoke test
+ uses:
docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6.19.2
+ with:
+ context: target/tika-server-full-docker
+ platforms: linux/amd64
+ load: true
+ build-args: |
+ TIKA_VERSION=${{ steps.version.outputs.tika_version }}
+ tags: tika-server-full-smoke:ci
+
+ - name: Smoke test tika-server full image
+ run: |
+ docker run -d --name tika-server-full-smoke -p 9999:9998
tika-server-full-smoke:ci
+ for i in $(seq 1 20); do
+ if docker logs tika-server-full-smoke 2>&1 | grep -q "Started
Apache Tika server"; then
+ echo "tika-server full started successfully"
+ docker stop tika-server-full-smoke
+ exit 0
+ fi
+ sleep 2
+ done
+ echo "ERROR: tika-server full did not start within 40 seconds"
+ docker logs tika-server-full-smoke
+ docker stop tika-server-full-smoke
Review Comment:
Same issue as above: `docker stop tika-server-full-smoke` can fail under
`bash -e` if the container has already exited, preventing log
collection/cleanup. Use `docker rm -f` (or guard `docker stop`) in a trap to
ensure consistent cleanup and clearer failures.
```suggestion
cleanup() {
status=$?
if [ "$status" -ne 0 ]; then
docker logs tika-server-full-smoke || true
fi
docker rm -f tika-server-full-smoke >/dev/null 2>&1 || true
exit "$status"
}
docker run -d --name tika-server-full-smoke -p 9999:9998
tika-server-full-smoke:ci
trap cleanup EXIT
for i in $(seq 1 20); do
if docker logs tika-server-full-smoke 2>&1 | grep -q "Started
Apache Tika server"; then
echo "tika-server full started successfully"
exit 0
fi
sleep 2
done
echo "ERROR: tika-server full did not start within 40 seconds"
```
##########
.github/workflows/docker-snapshot.yml:
##########
@@ -135,6 +190,32 @@ jobs:
cp "tika-grpc/docker-build/start-tika-grpc.sh" "${OUT_DIR}/bin/"
cp "tika-grpc/docker-build/Dockerfile" "${OUT_DIR}/Dockerfile"
+ - name: Build tika-grpc image for smoke test
+ uses:
docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6.19.2
+ with:
+ context: target/tika-grpc-docker
+ platforms: linux/amd64
+ load: true
+ build-args: |
+ VERSION=${{ steps.version.outputs.tika_version }}
+ tags: tika-grpc-smoke:ci
+
+ - name: Smoke test tika-grpc image
+ run: |
+ docker run -d --name tika-grpc-smoke -p 9090:9090 tika-grpc-smoke:ci
+ for i in $(seq 1 15); do
+ if docker logs tika-grpc-smoke 2>&1 | grep -q "Server started,
listening on"; then
+ echo "tika-grpc started successfully"
+ docker stop tika-grpc-smoke
+ exit 0
+ fi
+ sleep 2
+ done
+ echo "ERROR: tika-grpc did not start within 30 seconds"
+ docker logs tika-grpc-smoke
+ docker stop tika-grpc-smoke
Review Comment:
Same robustness concern here: if the container exits quickly, `docker stop
tika-grpc-smoke` may error under `bash -e` and mask the real startup failure.
Prefer `docker rm -f` (or guard the stop) and consider always printing `docker
ps -a`/exit code on timeout to aid debugging.
```suggestion
docker rm -f tika-grpc-smoke >/dev/null 2>&1 || true
exit 0
fi
sleep 2
done
echo "ERROR: tika-grpc did not start within 30 seconds"
docker logs tika-grpc-smoke || true
docker ps -a --filter "name=^tika-grpc-smoke$" || true
docker inspect -f '{{.State.ExitCode}}' tika-grpc-smoke || true
docker rm -f tika-grpc-smoke >/dev/null 2>&1 || true
```
##########
tika-grpc/pom.xml:
##########
@@ -474,7 +474,7 @@
<manifest>
<mainClass>org.apache.tika.pipes.grpc.TikaGrpcServer</mainClass>
<addClasspath>true</addClasspath>
- <classpathPrefix>lib/</classpathPrefix>
+ <classpathPrefix>tika-grpc/</classpathPrefix>
Review Comment:
Changing the jar manifest Class-Path prefix to `tika-grpc/` will break the
existing tika-grpc binary assembly: `tika-grpc/src/main/assembly/assembly.xml`
places runtime dependencies under `lib/`, so `java -jar` from the assembled zip
will no longer find its dependencies. Either keep `classpathPrefix` as `lib/`
and adjust the Docker image to copy deps into a `lib/` subdirectory next to the
jar, or update the assembly descriptor outputDirectory to match the new
`tika-grpc/` layout.
```suggestion
<classpathPrefix>lib/</classpathPrefix>
```
##########
.github/workflows/docker-snapshot.yml:
##########
@@ -69,6 +69,32 @@ jobs:
tar xzf
"tika-server/tika-server-standard/target/tika-server-standard-${TIKA_VERSION}-bin.tgz"
-C "${OUT_DIR}/tika-server"
cp "tika-server/docker-build/minimal/Dockerfile.snapshot"
"${OUT_DIR}/Dockerfile"
+ - name: Build tika-server minimal image for smoke test
+ uses:
docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6.19.2
+ with:
+ context: target/tika-server-minimal-docker
+ platforms: linux/amd64
+ load: true
+ build-args: |
+ TIKA_VERSION=${{ steps.version.outputs.tika_version }}
+ tags: tika-server-minimal-smoke:ci
+
+ - name: Smoke test tika-server minimal image
+ run: |
+ docker run -d --name tika-server-minimal-smoke -p 9998:9998
tika-server-minimal-smoke:ci
+ for i in $(seq 1 20); do
+ if docker logs tika-server-minimal-smoke 2>&1 | grep -q "Started
Apache Tika server"; then
+ echo "tika-server minimal started successfully"
+ docker stop tika-server-minimal-smoke
+ exit 0
+ fi
+ sleep 2
+ done
+ echo "ERROR: tika-server minimal did not start within 40 seconds"
+ docker logs tika-server-minimal-smoke
+ docker stop tika-server-minimal-smoke
Review Comment:
This smoke test step runs under `bash -e`, so `docker stop
tika-server-minimal-smoke` can fail (and fail the job) if the container exits
early or isn’t running. Consider using a cleanup trap and `docker rm -f` (or
`docker stop ... || true` followed by `docker rm ... || true`) so the job
reliably collects logs and cleans up even on failure.
```suggestion
cleanup() {
docker logs tika-server-minimal-smoke 2>&1 || true
docker rm -f tika-server-minimal-smoke >/dev/null 2>&1 || true
}
trap cleanup EXIT
docker run -d --name tika-server-minimal-smoke -p 9998:9998
tika-server-minimal-smoke:ci
for i in $(seq 1 20); do
if docker logs tika-server-minimal-smoke 2>&1 | grep -q "Started
Apache Tika server"; then
echo "tika-server minimal started successfully"
exit 0
fi
sleep 2
done
echo "ERROR: tika-server minimal did not start within 40 seconds"
```
##########
.github/workflows/docker-release.yml:
##########
@@ -112,9 +112,12 @@ jobs:
TIKA_VERSION="${{ steps.version.outputs.tag }}"
OUT_DIR=target/tika-grpc-docker
- mkdir -p "${OUT_DIR}/libs" "${OUT_DIR}/plugins" "${OUT_DIR}/config"
"${OUT_DIR}/bin"
+ mkdir -p "${OUT_DIR}/libs/tika-grpc" "${OUT_DIR}/plugins"
"${OUT_DIR}/config" "${OUT_DIR}/bin"
cp "tika-grpc/target/tika-grpc-${TIKA_VERSION}.jar"
"${OUT_DIR}/libs/"
+ mvn -pl tika-grpc dependency:copy-dependencies \
+ -DoutputDirectory="${PWD}/${OUT_DIR}/libs/tika-grpc" \
+ -DincludeScope=runtime -q -B
Review Comment:
The PR description says smoke tests were added before the final multi-arch
push for all three images, but `docker-release.yml` currently has no smoke test
steps (only snapshot does). Either update the release workflow to include the
same smoke testing, or adjust the PR description to reflect that smoke tests
are snapshot-only.
> Integrate Docker image builds into apache/tika and deprecate standalone
> Docker repos
> ------------------------------------------------------------------------------------
>
> Key: TIKA-4703
> URL: https://issues.apache.org/jira/browse/TIKA-4703
> Project: Tika
> Issue Type: Task
> Reporter: Nicholas DiPiazza
> Priority: Major
>
> h2. Summary
> Move Docker image building and publishing into the main
> [apache/tika|https://github.com/apache/tika] repository, deprecating the
> standalone Docker repos. This ensures Docker image releases are naturally
> tied to Tika releases through the existing Maven workflow, rather than
> requiring cross-repo coordination.
> h2. Current State
> * [tika-docker|https://github.com/apache/tika-docker] - standalone repo that
> builds the tika-server Docker image, published to [apache/tika on Docker
> Hub|https://hub.docker.com/r/apache/tika]
> * [tika-grpc-docker|https://github.com/apache/tika-grpc-docker] - standalone
> repo that builds the tika-grpc Docker image, published to [apache/tika-grpc
> on Docker Hub|https://hub.docker.com/r/apache/tika-grpc]
> h2. Problem
> Having Docker builds in separate repos means:
> * Docker image releases are decoupled from Tika releases - requires manual
> coordination
> * No guarantee Docker images match the released Tika version
> * Extra maintenance burden across multiple repos
> * Harder for contributors to understand the full release pipeline
> h2. Proposed Approach
> # Move Dockerfiles and related build config from {{tika-docker}} and
> {{tika-grpc-docker}} into the main {{apache/tika}} repo
> # Add GitHub Actions workflows to {{apache/tika}} that build and publish
> Docker images as part of the release process
> # Integrate with the existing Maven workflow so Docker builds happen
> naturally alongside Java artifact publishing
> # Docker images to publish:
> #* {{apache/tika}} (tika-server) to [Docker
> Hub|https://hub.docker.com/r/apache/tika]
> #* {{apache/tika-grpc}} (tika-grpc) to [Docker
> Hub|https://hub.docker.com/r/apache/tika-grpc]
> # Support multi-architecture builds (amd64, arm64) if applicable
> # Proper image tagging tied to Maven release versions (e.g. {{3.1.0}},
> {{latest}})
> # Deprecate {{tika-docker}} and {{tika-grpc-docker}} repos with README
> notices pointing to {{apache/tika}}
> h2. Acceptance Criteria
> * Dockerfiles and build config live in the {{apache/tika}} repo
> * GitHub Actions in {{apache/tika}} build and publish both Docker images on
> release
> * Docker image versions are automatically tied to Tika release versions
> * {{tika-docker}} and {{tika-grpc-docker}} repos are marked as deprecated
--
This message was sent by Atlassian Jira
(v8.20.10#820010)